Extremely slow SQL requests from Matrix

Hi all. We've noticed that Matrix periodically runs some very slow SQLs, taking between 15 and 120 minutes to execute on our (dedicated) Postgres database server. Sometimes this actually manages to bring down the DB server, requiring a restart, especially when more than one of these queries is running in parallel.


Could anyone give us a clue as to what is likely to be causing these queries, and if there's any way we can speed them up?



Here's a sample:


    
    SELECT ai.assetid, SUM(ai.score) as search_score
    FROM sq_sch_idx ai 
    WHERE (ai.assetid IN (
      SELECT a.assetid FROM sq_ast a INNER JOIN sq_ast_lnk l ON l.minorid = a.assetid INNER JOIN sq_ast_lnk_tree t 
      ON t.linkid = l.linkid WHERE 
      ((t.treeid LIKE '000t0006%') OR (t.treeid LIKE '000t000B%') OR (t.treeid LIKE '000t000D%') OR 
      (t.treeid LIKE '000t0008%') OR (t.treeid LIKE '000t000I%') OR (t.treeid LIKE '000t%') OR 
      (t.treeid LIKE '000t000F%') OR (t.treeid LIKE '000t0007%') OR (t.treeid LIKE '000t000=%') OR 
      (t.treeid LIKE '000t000A%') OR (t.treeid LIKE '000t0005%') OR (t.treeid LIKE '000t0002%') OR 
      (t.treeid LIKE '000t0004%') OR (t.treeid LIKE '000t0000%') OR (t.treeid LIKE '000t000H%')) 
      AND a.status IN ('2', '16') AND a.type_code IN ('page_standard', 'page_custom_form', 'news_item') 
      AND  a.assetid IN (SELECT p.assetid FROM sq_ast_perm p  WHERE (
    p.userid IN ('7'))  GROUP BY p.assetid HAVING MIN(p.granted) <> '0')))
    AND (ai.value LIKE 'witbe%')
    GROUP BY ai.assetid


although sometimes the "ai.value LIKE" clause matches on a different term.

We have over 11 million rows in sq_sch_idx, so it's not surprising that a query like this is straining the machine.

Any suggestions?

Ouch, that's nasty SQL load!


The query looks like its a search query to me. You appear to be allowing 15 different search locations, 14 of which are children of one of a higher level allowed location. Reducing that to just the one and thereby getting rid of 14 LIKE statements will likely help considerably.

We've seen this a bit before. There's a number of things that can cause it.


If possible, could you post the following details:



  • Your Matrix version
    [*]Your PostgreSQL version. You can get this by invoking postgres with -V. Eg.
        /usr/bin/postgres -V

    [*]PostgreSQL cluster locale. Find this by typing the following at the psql prompt:
        show LC_CTYPE;

    [*]The value of "default_statistics_target" in your postgresql.conf (if it's uncommented)
    [*]The output of EXPLAIN for the above search query (append "EXPLAIN" to the front of the query and post the results)


Dan.

[quote]
Hi all. We've noticed that Matrix periodically runs some very slow SQLs, taking between 15 and 120 minutes to execute on our (dedicated) Postgres database server. Sometimes this actually manages to bring down the DB server, requiring a restart, especially when more than one of these queries is running in parallel.



Could anyone give us a clue as to what is likely to be causing these queries, and if there's any way we can speed them up?



Here's a sample:


    
    SELECT ai.assetid, SUM(ai.score) as search_score
    FROM sq_sch_idx ai 
    WHERE (ai.assetid IN (
      SELECT a.assetid FROM sq_ast a INNER JOIN sq_ast_lnk l ON l.minorid = a.assetid INNER JOIN sq_ast_lnk_tree t 
      ON t.linkid = l.linkid WHERE 
      ((t.treeid LIKE '000t0006%') OR (t.treeid LIKE '000t000B%') OR (t.treeid LIKE '000t000D%') OR 
      (t.treeid LIKE '000t0008%') OR (t.treeid LIKE '000t000I%') OR (t.treeid LIKE '000t%') OR 
      (t.treeid LIKE '000t000F%') OR (t.treeid LIKE '000t0007%') OR (t.treeid LIKE '000t000=%') OR 
      (t.treeid LIKE '000t000A%') OR (t.treeid LIKE '000t0005%') OR (t.treeid LIKE '000t0002%') OR 
      (t.treeid LIKE '000t0004%') OR (t.treeid LIKE '000t0000%') OR (t.treeid LIKE '000t000H%')) 
      AND a.status IN ('2', '16') AND a.type_code IN ('page_standard', 'page_custom_form', 'news_item') 
      AND  a.assetid IN (SELECT p.assetid FROM sq_ast_perm p  WHERE (
    p.userid IN ('7'))  GROUP BY p.assetid HAVING MIN(p.granted) <> '0')))
    AND (ai.value LIKE 'witbe%')
    GROUP BY ai.assetid


although sometimes the "ai.value LIKE" clause matches on a different term.

We have over 11 million rows in sq_sch_idx, so it's not surprising that a query like this is straining the machine.

Any suggestions?
[/quote]

Looks like each of those treeid's represents a Root Node that has been configured for the Search Page asset.

For performance reasons, you will need to reduce it to a single Root Node.

[quote]
We've seen this a bit before. There's a number of things that can cause it.



If possible, could you post the following details:



  • Your Matrix version
    [*]Your PostgreSQL version. You can get this by invoking postgres with -V. Eg.
        /usr/bin/postgres -V

    [*]PostgreSQL cluster locale. Find this by typing the following at the psql prompt:
        show LC_CTYPE;

    [*]The value of "default_statistics_target" in your postgresql.conf (if it's uncommented)
    [*]The output of EXPLAIN for the above search query (append "EXPLAIN" to the front of the query and post the results)


Dan.
[/quote]

Thanks Dan, here are the relevant details:

Matrix version: 3.20.3
Postgres: 8.1.19
Postgres LC_ctype: C
Postgres default_statistics_target: is commented out in postgresql.conf
EXPLAIN syntax (apologies, this looks pretty horrible):

    
    QUERY 
    PLAN                                                                                                                                        
--------------------------------------------------------------------------------------------------------------------------------------------
--------------------------------------------------------------------------------------------------------------------------------------------
--------------------------------------------------------------------------------------------------------------------------------------------
------------------------------------------------------------------------------------------------------------------------------------------
 HashAggregate  (cost=19146.62..19146.63 rows=1 width=13)
   ->  Nested Loop IN Join  (cost=16916.43..19146.62 rows=1 width=13)
     Join Filter: (("outer".assetid)::text = ("inner".assetid)::text)
     ->  Index Scan using sch_idx_pk on sq_sch_idx ai  (cost=0.00..4.01 rows=1 width=13)
           Index Cond: (((value)::text >= 'witbe'::character varying) AND ((value)::text < 'witbf'::character varying))
           Filter: ((value)::text ~~ 'witbe%'::text)
     ->  Nested Loop  (cost=16916.43..19142.58 rows=2 width=163)
           ->  Hash Join  (cost=16916.43..17343.66 rows=454 width=154)
                 Hash Cond: (("outer".assetid)::text = ("inner".minorid)::text)
                 ->  HashAggregate  (cost=14083.18..14294.53 rows=14090 width=14)
                       Filter: (min("granted") <> '0'::bpchar)
                       ->  Bitmap Heap Scan on sq_ast_perm p  (cost=1593.86..13017.66 rows=213104 width=14)
                             Recheck Cond: ((userid)::text = '7'::text)
                             ->  Bitmap Index Scan on sq_ast_perm_userid  (cost=0.00..1593.86 rows=213104 width=0)
                                   Index Cond: ((userid)::text = '7'::text)
                 ->  Hash  (cost=2831.97..2831.97 rows=512 width=9)
                       ->  Nested Loop  (cost=51.45..2831.97 rows=512 width=9)
                             ->  Bitmap Heap Scan on sq_ast_lnk_tree t  (cost=51.45..1057.75 rows=561 width=4)
                                   Recheck Cond: ((treeid ~~ '000t0006%'::bytea) OR (treeid ~~ '000t000B%'::bytea) OR (treeid ~~ '000t00
0D%'::bytea) OR (treeid ~~ '000t0008%'::bytea) OR (treeid ~~ '000t000I%'::bytea) OR (treeid ~~ '000t%'::bytea) OR (treeid ~~ '000t000F%'::by
tea) OR (treeid ~~ '000t0007%'::bytea) OR (treeid ~~ '000t000=%'::bytea) OR (treeid ~~ '000t000A%'::bytea) OR (treeid ~~ '000t0005%'::bytea)
 OR (treeid ~~ '000t0002%'::bytea) OR (treeid ~~ '000t0004%'::bytea) OR (treeid ~~ '000t0000%'::bytea) OR (treeid ~~ '000t000H%'::bytea))
                                   Filter: ((treeid ~~ '000t0006%'::bytea) OR (treeid ~~ '000t000B%'::bytea) OR (treeid ~~ '000t000D%'::
bytea) OR (treeid ~~ '000t0008%'::bytea) OR (treeid ~~ '000t000I%'::bytea) OR (treeid ~~ '000t%'::bytea) OR (treeid ~~ '000t000F%'::bytea) O
R (treeid ~~ '000t0007%'::bytea) OR (treeid ~~ '000t000=%'::bytea) OR (treeid ~~ '000t000A%'::bytea) OR (treeid ~~ '000t0005%'::bytea) OR (t
reeid ~~ '000t0002%'::bytea) OR (treeid ~~ '000t0004%'::bytea) OR (treeid ~~ '000t0000%'::bytea) OR (treeid ~~ '000t000H%'::bytea))
                                   ->  BitmapOr  (cost=51.45..51.45 rows=561 width=0)
                                         ->  Bitmap Index Scan on ast_lnk_tree_pk  (cost=0.00..2.01 rows=1 width=0)
                                               Index Cond: ((treeid >= '000t0006'::bytea) AND (treeid < '000t0007'::bytea))
                                         ->  Bitmap Index Scan on ast_lnk_tree_pk  (cost=0.00..2.01 rows=1 width=0)
                                               Index Cond: ((treeid >= '000t000B'::bytea) AND (treeid < '000t000C'::bytea))
                                         ->  Bitmap Index Scan on ast_lnk_tree_pk  (cost=0.00..2.01 rows=1 width=0)
                                               Index Cond: ((treeid >= '000t000D'::bytea) AND (treeid < '000t000E'::bytea))
                                         ->  Bitmap Index Scan on ast_lnk_tree_pk  (cost=0.00..2.01 rows=1 width=0)
                                               Index Cond: ((treeid >= '000t0008'::bytea) AND (treeid < '000t0009'::bytea))
                                         ->  Bitmap Index Scan on ast_lnk_tree_pk  (cost=0.00..2.01 rows=1 width=0)
                                               Index Cond: ((treeid >= '000t000I'::bytea) AND (treeid < '000t000J'::bytea))
                                         ->  Bitmap Index Scan on ast_lnk_tree_pk  (cost=0.00..23.37 rows=561 width=0)
                                               Index Cond: ((treeid >= '000t'::bytea) AND (treeid < '000u'::bytea))
                                         ->  Bitmap Index Scan on ast_lnk_tree_pk  (cost=0.00..2.01 rows=1 width=0)
                                               Index Cond: ((treeid >= '000t000F'::bytea) AND (treeid < '000t000G'::bytea))
                                         ->  Bitmap Index Scan on ast_lnk_tree_pk  (cost=0.00..2.01 rows=1 width=0)
                                               Index Cond: ((treeid >= '000t0007'::bytea) AND (treeid < '000t0008'::bytea))
                                         ->  Bitmap Index Scan on ast_lnk_tree_pk  (cost=0.00..2.01 rows=1 width=0)
                                               Index Cond: ((treeid >= '000t000='::bytea) AND (treeid < '000t000>'::bytea))
                                         ->  Bitmap Index Scan on ast_lnk_tree_pk  (cost=0.00..2.01 rows=1 width=0)
                                               Index Cond: ((treeid >= '000t000A'::bytea) AND (treeid < '000t000B'::bytea))
                                         ->  Bitmap Index Scan on ast_lnk_tree_pk  (cost=0.00..2.01 rows=1 width=0)
                                               Index Cond: ((treeid >= '000t0005'::bytea) AND (treeid < '000t0006'::bytea))
                                         ->  Bitmap Index Scan on ast_lnk_tree_pk  (cost=0.00..2.01 rows=1 width=0)
                                               Index Cond: ((treeid >= '000t0002'::bytea) AND (treeid < '000t0003'::bytea))
                                         ->  Bitmap Index Scan on ast_lnk_tree_pk  (cost=0.00..2.01 rows=1 width=0)
                                               Index Cond: ((treeid >= '000t0004'::bytea) AND (treeid < '000t0005'::bytea))
                                         ->  Bitmap Index Scan on ast_lnk_tree_pk  (cost=0.00..2.01 rows=1 width=0)
                                               Index Cond: ((treeid >= '000t0000'::bytea) AND (treeid < '000t0001'::bytea))
                                         ->  Bitmap Index Scan on ast_lnk_tree_pk  (cost=0.00..2.01 rows=1 width=0)
                                               Index Cond: ((treeid >= '000t000H'::bytea) AND (treeid < '000t000I'::bytea))
                             ->  Index Scan using ast_lnk_pk on sq_ast_lnk l  (cost=0.00..3.15 rows=1 width=13)
                                   Index Cond: ("outer".linkid = l.linkid)
           ->  Index Scan using ast_pk on sq_ast a  (cost=0.00..3.95 rows=1 width=9)
                 Index Cond: (("outer".minorid)::text = (a.assetid)::text)
                 Filter: (((status = 2::smallint) OR (status = 16::smallint)) AND (((type_code)::text = 'page_standard'::text) OR ((type
_code)::text = 'page_custom_form'::text) OR ((type_code)::text = 'news_item'::text)))
(56 rows)




Any advice gratefully received!

Hi TimJW,


Thanks for the information.



Can you try adding this to your postgresql.conf and restarting:

    default_statistics_target = 100


Then doing a vacuum analyse on your database:
    vacuumdb -ezd 


The default value for default_statistics_target param has been raised from 10 to 100 in later versions of PostgreSQL because people saw strange behaviour with the query planner. See:
http://archives.postgresql.org/pgsql-patches/2007-11/msg00131.php
http://archives.postgresql.org/pgsql-hackers/2009-05/msg00827.php

If that doesn't help, you might also want to download the latest version of Matrix 3.20 branch - there were changes to packages/search/search_manager/search_manager.inc around your version to change the actual search query itself; I can't remember whether those changes are in your version or not, so might be worth checking.

Thanks Dan - we've tried these changes and hopefully it will solve the problem!