Every once in a while, maybe every 4 months, we get the following error (came in an email because run.php could not run):
Fatal error: Uncaught exception 'Exception' with message 'Could not create database connection: DBError!:SQLSTATE[08006] [7] FATAL: connection limit exceeded for non-superusers' in /home/websites/mysource_matrix/core/include/mysource.inc:3267
Stack trace:
#0 /home/websites/mysource_matrix/core/include/mysource.inc(220): MySource->changeDatabaseConnection('db')
#1 /home/websites/mysource_matrix/core/include/init.inc(243): MySource->init()
#2 /home/websites/mysource_matrix/core/cron/run.php(36): require_once('/home/websites/...')
#3 {main}
thrown in /home/websites/mysource_matrix/core/include/mysource.inc on line 3267
Obviously this is a bad error, and it crashes the site until I restart Apache.
Luckily, each time this has happened, I got the error email right away and was able to restart apache. But, I know there will be a time when I might know realize it for quite a few hours.
Is there a way to prevent this error? And, what is the cause?
That basically means that you're out of available postgres threads, which is usually one of two primary things.
Your DB server is too small. Check standard performance metrics and try to rule this out/in.
You're deadlocking on somthing and threads are slowly piling up.
#2 is more fun. I usually turn on stats_command_string in postgresql.conf (requires a restart) and then look at what appears in the pg_stat_activity table.
[quote]
Too small in what way? RAM? Processor speed?
[/quote]
We used to get this a lot, and it was a pain to track down.
This is probably what happens:
Client (or clients) request a page.
Matrix has to establish two connections for each - one for the page and one for the cache db
If there are any pages that are slow to build these will take two connections for the build time.
You have some slow to build pages (the cache has expired on these), and a sudden peak in requests.
Due to the delay in page builds (used connections) and the number of new requests (the waiting requests are each holding two connections) you run out.
Things that can contribute to this are:
complex pages (many nests/asset lists).
many pages expiring at the same time.
Have a look at the load average on the server first. Is the load average indicating queueing processes?
When I say peak, it does not have to be much. If you have 50 connections max, you only needs 25 queued requests to max that out.
If you have some pages that are slow to build - say 2 seconds or more - then even with quite low traffic it is easy to get to 25 waiting processes.
Say you have a 4 core box. You home page expires. It takes two seconds to build. You get 2 requests for it. Request one takes 2 connections, request 2 takes 2 and has to wait for the request one to build the page.
In the meantime (after 200 mS) some folks request the /news/calendar/this-month page.
These cannot be served immediately, so grab a db connection each and wait.
In our case, at a rate of only 2 requests a second, we ran out of DB connection (we had 100) in about 20 seconds due to some slow to build pages.
How many DB connections do you have and is the DB on a different server? And are you running squid?
Looking at the site, there are a few general tweaks you could make to speed up page loads. Email direct if you want more info.
[quote]
Too small in what way? RAM? Processor speed?
[/quote]
Sorry, it's pretty general so hard to answer. Basically anything that can make requests run slowly can induce this error. That could be physical hardware being too slow/small (e.g. not enough ram or disks being too slow) or it could be a misconfiguration of some sort.
If your load averages, swap usage, IO wait, etc are all reasonable, more than likely you have a misconfiguration. Even if one of those isn't reasonable you might have a misconfiguration though
Start by looking at pg_stat_activity and see if there's anything obvious.