Inefficient query causing general slowness


(Peter Sheppard) #1

I've been trying to implement a "related content" section on our new website. This is done with a nested related content asset listing, with the root node set to the root of the site, and the related node replaced with the current asset. Unfortunately, this was exceptionally slow, taking up to 3 minutes to render any page it was applied to!


Having had a look through the code, the procedure that appears to be happening is:

  1. Find all pages under the root node
  2. Ditch any that the user doesn't have permission to access (incl via roles)
  3. See what's left that's related to the original page



    Steps 1 and 2 were being undertaken by a rather large database query from generateGetLinksQuery(). This was doing a LEFT JOIN between the permissions table and the roles view. This join is what was taking the time, as it was creating a table join with millions of tuples.



    After a bit of experimentation, I have come up with what may be a better way of doing this query, which basically moves the roles part of the WHERE clause from the joined tables into a sub query pre-join. This has brought the related content generation down to 6 seconds, but also made a massive improvement to front-end performance system-wide. However, whilst this hasn't caused any noticable side-effects on out setup, I'm not yet convinced that it proves correct for all instances. Is there anybody with greater knowledge of the matrix core, and database integrity, who could have a look at this?



    The patch is as follows:
    --- asset_manager.inc.3-16-0	2007-09-13 14:08:06.000000000 +0100
    +++ asset_manager.inc   2007-09-14 10:26:56.000000000 +0100
    @@ -7417,10 +7417,6 @@
    						if (!$GLOBALS['SQ_SYSTEM']->userRoot() && !$GLOBALS['SQ_SYSTEM']->userSystemAdmin()) {
    								$from .= ' INNER JOIN '.SQ_TABLE_RUNNING_PREFIX.'ast_perm p ON p.assetid = l.minorid';
    
    -							   // join to roles table
    -							   $from .= '
    -									   LEFT JOIN '.SQ_TABLE_RUNNING_PREFIX.'vw_ast_role r on p.userid = r.roleid';
    -
    								// get user and group ids
    								$userids = array_keys($this->getParents($GLOBALS['SQ_SYSTEM']->user->id, 'user_group', FALSE));
    								$userids[] = (String)$GLOBALS['SQ_SYSTEM']->user->id;
    @@ -7431,7 +7427,12 @@
    								}
    
    								$userids_str = implode(',', $userids);
    -							   $userid_cond = ' AND (p.userid IN ('.$userids_str.') OR r.userid IN ('.$userids_str.'))';
    +							   $userid_cond = ' AND (p.userid IN ('.$userids_str.'))';
    +
    +							   // join to roles table
    +							   $from .= '
    +									   LEFT JOIN (SELECT * FROM '.SQ_TABLE_RUNNING_PREFIX.'vw_ast_role WHERE userid IN ('.$userids_str.')) r on p.userid = r.roleid';
    +
    								$where .= $userid_cond.'
    										AND (
    												(p.permission = '.$db->quote($access).' AND (


As I say, that change has really improved the performance of our system across the board, but related content still takes 6 seconds. Does anyone think it would be better looking for related stuff first, then to see if it can be accessed? (However, that would probably mean re-writing half of the listings engine, or not having related content based on the listings engine)

(Greg Sherwood) #2

Thanks for posting these changes. We'll do some testing and incorporate them into MySource Matrix if there are no failing tests.


(Peter Sheppard) #3

Oops, just noticed that diff is the wrong way around! Will re-post properly tomorrow…


(Avi Miller) #4

We're always interested in testing new/improved SQL queries. :)

(Luke Wright) #5

No huge need to from the developers' point of view; we're having a look at it right now and we eventually worked out that it was meant to be in the opposite direction. :)

(Peter Sheppard) #6

Edited the original post so the diff is now the right way around :slight_smile:


Also note the change to the LEFT JOIN line in the query - the alias 'r' is now in the right place - I deleted it from the wrong place in the original diff, having deleted the right one from my actual code file after doing the diff :rolleyes:


(Huan Nguyen) #7

Hi, I have tested out this query, and the result is

  1. the FROM clause,

    The increase in perfomance using the new From clause is very dependant on the sq_ast_role table as you say, though this is a very good suggestion.
  2. the WHERE clause,

    the change in the WHERE clause is not so good, the WHERE clause needs to stay the same, as in

    $userid_cond = ' AND (p.userid IN ('.$userids_str.') OR r.userid IN

    $where = $userid_cond . , etc…

    The reason is that when you have a user who have write permission to Asset X, but the user is denied with a Role on Asset X, then the user should be denied access to the asset. By removing the "OR r.userid IN" condition, matrix will not check if the asset has a denied role or not. Having the "OR r.userid IN" condition while left joining sq_vw_ast_role does not facilitate this behaviour. (regarding the FROM clause :

    $from .= '

    LEFT JOIN (SELECT * FROM '.SQ_TABLE_RUNNING_PREFIX.'vw_ast_role WHERE userid IN ('.$userids_str.')) r on p.userid = r.roleid';

    ), simply because the WHERE clause will eliminate whichever entries doesn't satisfy the AND (p.userid IN ('.$userids_str.')) condition ONLY, compared to two conditions before AND (p.userid IN ('.$userids_str.') OR r.userid IN ('.$userids_str.')).



    I have committed the change below to Matrix Dev version, so you should be expecting this change in 3.18 ,



    $userids_str = implode(',', $userids);

    $userid_cond = ' AND (p.userid IN ('.$userids_str.') OR r.userid IN ('.$userids_str.'))';

    +
  •                           // join to roles table<br />
    
  •                           &#036;from .= &#39;<br />
    
  •                                   LEFT JOIN (SELECT * FROM &#39;.SQ_TABLE_RUNNING_PREFIX.&#39;vw_ast_role WHERE userid IN (&#39;.&#036;userids_str.&#39;)) r on p.userid = r.roleid&#39;;<br />
    

+



$where .= $userid_cond.'



Thank you for your contribution.


(Peter Sheppard) #8

The same table join appears to have also been giving us performance issues site-wide. I have now applied the subquery to the joins in the following files, which has made a significant performance improvement:


/core/assets/designs/design_areas/menu/design_area_menu_type/menu_get_assets.inc

/lib/asset_map/asset_map.inc (Two queries to change in this file)



There's another instance in /include/asset_manager.inc in the function generateGetParentsQuery() but I've not got around to editing that one yet, as it's not been hitting my database logs as a query taking > 2 seconds yet…



Can I suggest that the changes to the other files are also committed to the core?



Now, where did I put those search queries…


(Greg Sherwood) #9

Thanks Peter. We'll look into making these changes too.


We are currently rewriting all our queries for the PHP5 version (new DB abstraction layer with bind vars) so it might take us longer to test these changes.