Database bottleneck: sort_order on TYPE_3 links


(Peter Sheppard) #1

I've been monitoring our DB servers and looking at some queries which are causing deadlocks and/or taking ridiculously long times to execute.


One of the things we do is have a trigger to create a future status to put a page up for review 12 months after it was made live. As such, we now have over 1500 cron jobs linked under the cron manager with TYPE_3 links. Whenever any of these runs (or is deleted manually), we get a query along the lines of:

UPDATE sq_ast_lnk SET sort_order = sort_order - 1 WHERE majorid = '12' AND sort_order > 17



However, due to the 1500 or so rows which need to be updated, this can take up to 4 minutes to execute! If say 20 or so pages were made live at the same time, then they will come up for review under the same cron run, and 20 times 4 minutes is a very long cron run, and is now leading to cron deadlocks.



Basically, is there a functional requirement for TYPE_3 (and NOTICE) links to have a sort_order, or in some future release could they be set to have a sort_order of -1, and not be updated when deleted etc?



…or am I missing something else obvious here?


(Avi Miller) #2

Can you log this as a bug so that the developers can investigate?


(Greg Sherwood) #3

Don't log a bug. This will be considered a feature and will not be developed. There is no need to remove the sort orders from TYPE_3 and NOTICE links. That is not your problem here.



Instead, look at the query plan to find out why this query is taking so long and check the table schema to make sure you have indexes on majorid and sort_order. If you are using postgres, do the following:

    explain analyze UPDATE sq_ast_lnk SET sort_order = sort_order - 1 WHERE majorid = '12' AND sort_order > 17;
    \d sq_ast_lnk


and post the results.

Also, is rollback enabled? If so, we would need to ensure you have all the required indexes on the rollback tables as well. If you are using postgres:
    \d sq_rb_ast_lnk


There is no reason any query directly on the sq_ast_lnk table should take anywhere near 4 minutes to run. This should be a sub-second query.

(Daniel Nitsche) #4

Also, in 3.12.x functionality was added to triggers to allow them to delete any existing Future Status jobs on a particular asset. If you're not using that functionality already, it should reduce the total number of Future Status jobs.


(Peter Sheppard) #5
    rcn=# BEGIN;
    BEGIN
    rcn=# explain analyze UPDATE sq_ast_lnk SET sort_order = sort_order - 1 WHERE majorid = '12' AND sort_order > 2;
    																  QUERY PLAN											
    ----------------------------------------------------------------------------------------------------------------------------------------------
     Bitmap Heap Scan on sq_ast_lnk  (cost=200.10..1694.27 rows=531 width=68) (actual time=7.342..25.599 rows=1648 loops=1)
       Recheck Cond: (((majorid)::text = '12'::text) AND (sort_order > 2))
       ->  BitmapAnd  (cost=200.10..200.10 rows=531 width=0) (actual time=7.262..7.262 rows=0 loops=1)
    		 ->  Bitmap Index Scan on sq_ast_lnk_majorid  (cost=0.00..92.85 rows=1386 width=0) (actual time=2.809..2.809 rows=8246 loops=1)
    			   Index Cond: ((majorid)::text = '12'::text)
    		 ->  Bitmap Index Scan on sq_ast_lnk_sort_order  (cost=0.00..107.00 rows=17713 width=0) (actual time=4.399..4.399 rows=20644 loops=1)
    			   Index Cond: (sort_order > 2)
     Trigger _rcn_repl_logtrigger_6: time=310.513 calls=1648
     Trigger sq_ast_lnk_rb_trg: time=341451.045 calls=1648
     Total runtime: 344266.099 ms
    (10 rows)
    
    rcn=# rollback;
    ROLLBACK


So clearly the problem is with the rollback.

I've added a sort_order index on sq_ast_lnk and sq_rb_ast_lnk and also indexes on the rollback table for sq_eff_from and sq_eff_to, but that's made very little difference. Those results are after adding the indexes.

Table definitions are:
    						  Table "public.sq_ast_lnk"
    	 Column	 |			Type			 |		   Modifiers
    ----------------+-----------------------------+-------------------------------
     linkid		 | integer					 | not null
     majorid		| character varying(15)	   | not null
     minorid		| character varying(15)	   | not null
     link_type	  | integer					 | not null
     value		  | character varying(255)	  | default ''::character varying
     sort_order	 | integer					 | not null default 0
     is_dependant   | character(1)				| not null default '0'::bpchar
     is_exclusive   | character(1)				| not null default '0'::bpchar
     updated		| timestamp without time zone | not null
     updated_userid | character varying(255)	  | not null
    Indexes:
    	"sq_ast_lnk_pkey" PRIMARY KEY, btree (linkid)
    	"sq_ast_lnk_minorid_key" UNIQUE, btree (minorid, majorid, link_type, value)
    	"sq_ast_lnk_link_type" btree (link_type)
    	"sq_ast_lnk_majorid" btree (majorid)
    	"sq_ast_lnk_minorid" btree (minorid)
    	"sq_ast_lnk_sort_order" btree (sort_order)
    Triggers:
    	_rcn_repl_logtrigger_6 AFTER INSERT OR DELETE OR UPDATE ON sq_ast_lnk FOR EACH ROW EXECUTE PROCEDURE _rcn_repl.logtrigger('_rcn_repl', '6', 'kvvvvvvvvv')
    	sq_ast_lnk_rb_trg BEFORE INSERT OR DELETE OR UPDATE ON sq_ast_lnk FOR EACH ROW EXECUTE PROCEDURE sq_ast_lnk_rb_trg_fn()

    						 Table "public.sq_rb_ast_lnk"
    	 Column	 |			Type			 |		   Modifiers
    ----------------+-----------------------------+-------------------------------
     sq_eff_from	| timestamp without time zone | not null
     sq_eff_to	  | timestamp without time zone |
     linkid		 | integer					 | not null
     majorid		| character varying(15)	   | not null
     minorid		| character varying(15)	   | not null
     link_type	  | integer					 | not null
     value		  | character varying(255)	  | default ''::character varying
     sort_order	 | integer					 | not null default 0
     is_dependant   | character(1)				| not null default '0'::bpchar
     is_exclusive   | character(1)				| not null default '0'::bpchar
     updated		| timestamp without time zone | not null
     updated_userid | character varying(255)	  | not null
    Indexes:
    	"sq_rb_ast_lnk_pkey" PRIMARY KEY, btree (sq_eff_from, linkid)
    	"sq_rb_ast_lnk_sq_eff_from_key" UNIQUE, btree (sq_eff_from, minorid, majorid, link_type, value)
    	"sq_rb_ast_lnk_link_type" btree (link_type)
    	"sq_rb_ast_lnk_linkid" btree (linkid)
    	"sq_rb_ast_lnk_majorid" btree (majorid)
    	"sq_rb_ast_lnk_minorid" btree (minorid)
    	"sq_rb_ast_lnk_sort_order" btree (sort_order)
    	"sq_rb_ast_lnk_sq_eff_from" btree (sq_eff_from)
    	"sq_rb_ast_lnk_sq_eff_to" btree (sq_eff_to)
    	"sq_rb_ast_lnk_value" btree (value)
    Triggers:
    	_rcn_repl_logtrigger_37 AFTER INSERT OR DELETE OR UPDATE ON sq_rb_ast_lnk FOR EACH ROW EXECUTE PROCEDURE _rcn_repl.logtrigger('_rcn_repl', '37', 'kvkvvvvvvvvv')


And yes, we have the trigger set up to delete existing review statuses.

(Avi Miller) #6

You should do a VACUUM FULL ANALYZE after adding the indexes so that PostgreSQL can update the query planner. Also, can you post your PostgreSQL server specifications and postgresql.conf file? Your database may not be tuned appropriately (the default PostgreSQL config is awful).


(Greg Sherwood) #7

I'm also interested to know how many rows you have in your rollback table. If your table grows too large, the DB may not have enough resources to write to efficiently. You should consider only keeping a small amount of rollback data for rollback view mode and then implementing a backup strategy for long term archiving.


(Peter Sheppard) #8

Given that it was rollback causing the problem, we took a look at a few other things, and noticed that performance in a few places had deteriorated when we enabled rollback about 3 months ago. However, we have now realised that what we wanted rollback for (recovering accidental content edits) can be done from the system.log files anyway, so we have now turned off rollback.


However, for info:

I didn't do a full vacuum after adding the indexes, so that may have made a difference :unsure:



The db is running on a dedicated postgres server, though it does serve both our web and intranet installs of matrix. It's apparantly a quad processor 3GHz Xeon-based system with 3 gigs of ram. The storage is a SCSI RAID5 array. All running RHEL4. I'd need to speak to our IT dept to get the full specs.



Here's postgresql.conf

    # -----------------------------
    # PostgreSQL configuration file
    # -----------------------------
    #
    # This file consists of lines of the form:
    #
    #   name = value
    #
    # (The '=' is optional.) White space may be used. Comments are introduced
    # with '#' anywhere on a line. The complete list of option names and
    # allowed values can be found in the PostgreSQL documentation. The
    # commented-out settings shown in this file represent the default values.
    #
    # Please note that re-commenting a setting is NOT sufficient to revert it
    # to the default value, unless you restart the postmaster.
    #
    # Any option can also be given as a command line switch to the
    # postmaster, e.g. 'postmaster -c log_connections=on'. Some options
    # can be changed at run-time with the 'SET' SQL command.
    #
    # This file is read on postmaster startup and when the postmaster
    # receives a SIGHUP. If you edit the file on a running system, you have 
    # to SIGHUP the postmaster for the changes to take effect, or use 
    # "pg_ctl reload". Some settings, such as listen_addresses, require
    # a postmaster shutdown and restart to take effect.
    
    
    #---------------------------------------------------------------------------
    # FILE LOCATIONS
    #---------------------------------------------------------------------------
    
    # The default values of these variables are driven from the -D command line
    # switch or PGDATA environment variable, represented here as ConfigDir.
    
    #data_directory = 'ConfigDir'		# use data in another directory
    #hba_file = 'ConfigDir/pg_hba.conf'	# host-based authentication file
    #ident_file = 'ConfigDir/pg_ident.conf'	# IDENT configuration file
    
    # If external_pid_file is not explicitly set, no extra pid file is written.
    #external_pid_file = '(none)'		# write an extra pid file
    
    
    #---------------------------------------------------------------------------
    # CONNECTIONS AND AUTHENTICATION
    #---------------------------------------------------------------------------
    
    # - Connection Settings -
    
    listen_addresses = '*'		# what IP address(es) to listen on; 
    					# comma-separated list of addresses;
    					# defaults to 'localhost', '*' = all
    port = 5432
    max_connections = 200
    # note: increasing max_connections costs ~400 bytes of shared memory per 
    # connection slot, plus lock space (see max_locks_per_transaction).  You
    # might also need to raise shared_buffers to support more connections.
    superuser_reserved_connections = 5
    #unix_socket_directory = ''
    #unix_socket_group = ''
    #unix_socket_permissions = 0777		# octal
    #bonjour_name = ''			# defaults to the computer name
    
    # - Security & Authentication -
    
    #authentication_timeout = 60		# 1-600, in seconds
    #ssl = off
    #password_encryption = on
    #db_user_namespace = off
    
    # Kerberos
    #krb_server_keyfile = ''
    #krb_srvname = 'postgres'
    #krb_server_hostname = ''		# empty string matches any keytab entry
    #krb_caseins_users = off
    
    # - TCP Keepalives -
    # see 'man 7 tcp' for details
    
    #tcp_keepalives_idle = 0		# TCP_KEEPIDLE, in seconds;
    					# 0 selects the system default
    #tcp_keepalives_interval = 0		# TCP_KEEPINTVL, in seconds;
    					# 0 selects the system default
    #tcp_keepalives_count = 0		# TCP_KEEPCNT;
    					# 0 selects the system default
    
    
    #---------------------------------------------------------------------------
    # RESOURCE USAGE (except WAL)
    #---------------------------------------------------------------------------
    
    # - Memory -
    
    shared_buffers = 1000			# min 16 or max_connections*2, 8KB each
    #temp_buffers = 1000			# min 100, 8KB each
    #max_prepared_transactions = 5		# can be 0 or more
    # note: increasing max_prepared_transactions costs ~600 bytes of shared memory
    # per transaction slot, plus lock space (see max_locks_per_transaction).
    #work_mem = 1024			# min 64, size in KB
    #maintenance_work_mem = 16384		# min 1024, size in KB
    #max_stack_depth = 2048			# min 100, size in KB
    
    # - Free Space Map -
    
    max_fsm_pages = 120000			# min max_fsm_relations*16, 6 bytes each
    #max_fsm_relations = 1000		# min 100, ~70 bytes each
    
    # - Kernel Resource Usage -
    
    #max_files_per_process = 1000		# min 25
    #preload_libraries = ''
    
    # - Cost-Based Vacuum Delay -
    
    #vacuum_cost_delay = 0			# 0-1000 milliseconds
    #vacuum_cost_page_hit = 1		# 0-10000 credits
    #vacuum_cost_page_miss = 10		# 0-10000 credits
    #vacuum_cost_page_dirty = 20		# 0-10000 credits
    #vacuum_cost_limit = 200		# 0-10000 credits
    
    # - Background writer -
    
    #bgwriter_delay = 200			# 10-10000 milliseconds between rounds
    #bgwriter_lru_percent = 1.0		# 0-100% of LRU buffers scanned/round
    #bgwriter_lru_maxpages = 5		# 0-1000 buffers max written/round
    #bgwriter_all_percent = 0.333		# 0-100% of all buffers scanned/round
    #bgwriter_all_maxpages = 5		# 0-1000 buffers max written/round
    
    
    #---------------------------------------------------------------------------
    # WRITE AHEAD LOG
    #---------------------------------------------------------------------------
    
    # - Settings -
    
    #fsync = on				# turns forced synchronization on or off
    #wal_sync_method = fsync		# the default is the first option 
    					# supported by the operating system:
    					#   open_datasync
    					#   fdatasync
    					#   fsync
    					#   fsync_writethrough
    					#   open_sync
    #full_page_writes = on			# recover from partial page writes
    #wal_buffers = 8			# min 4, 8KB each
    #commit_delay = 0			# range 0-100000, in microseconds
    #commit_siblings = 5			# range 1-1000
    
    # - Checkpoints -
    
    #checkpoint_segments = 3		# in logfile segments, min 1, 16MB each
    #checkpoint_timeout = 300		# range 30-3600, in seconds
    #checkpoint_warning = 30		# in seconds, 0 is off
    
    # - Archiving -
    
    #archive_command = ''			# command to use to archive a logfile 
    					# segment
    
    
    #---------------------------------------------------------------------------
    # QUERY TUNING
    #---------------------------------------------------------------------------
    
    # - Planner Method Configuration -
    
    #enable_bitmapscan = on
    #enable_hashagg = on
    #enable_hashjoin = on
    #enable_indexscan = on
    #enable_mergejoin = on
    #enable_nestloop = on
    #enable_seqscan = on
    #enable_sort = on
    #enable_tidscan = on
    
    # - Planner Cost Constants -
    
    effective_cache_size = 300000		# typically 8KB each
    #random_page_cost = 4			# units are one sequential page fetch 
    					# cost
    #cpu_tuple_cost = 0.01			# (same)
    #cpu_index_tuple_cost = 0.001		# (same)
    #cpu_operator_cost = 0.0025		# (same)
    
    # - Genetic Query Optimizer -
    
    #geqo = on
    #geqo_threshold = 12
    #geqo_effort = 5			# range 1-10
    #geqo_pool_size = 0			# selects default based on effort
    #geqo_generations = 0			# selects default based on effort
    #geqo_selection_bias = 2.0		# range 1.5-2.0
    
    # - Other Planner Options -
    
    #default_statistics_target = 10		# range 1-1000
    #constraint_exclusion = off
    #from_collapse_limit = 8
    #join_collapse_limit = 8		# 1 disables collapsing of explicit 
    					# JOINs
    
    
    #---------------------------------------------------------------------------
    # ERROR REPORTING AND LOGGING
    #---------------------------------------------------------------------------
    
    # - Where to Log -
    
    #log_destination = 'stderr'		# Valid values are combinations of 
    					# stderr, syslog and eventlog, 
    					# depending on platform.
    
    # This is used when logging to stderr:
    redirect_stderr = on			# Enable capturing of stderr into log 
    					# files
    
    # These are only used if redirect_stderr is on:
    log_directory = 'pg_log'		# Directory where log files are written
    					# Can be absolute or relative to PGDATA
    #log_filename = 'postgresql-%Y-%m-%d_%H%M%S.log' # Log file name pattern.
    					# Can include strftime() escapes
    log_truncate_on_rotation = on		   # If on, any existing log file of the same 
    					# name as the new log file will be
    					# truncated rather than appended to. But
    					# such truncation only occurs on
    					# time-driven rotation, not on restarts
    					# or size-driven rotation. Default is
    					# off, meaning append to existing files
    					# in all cases.
    log_rotation_age = 1440			# Automatic rotation of logfiles will 
    					# happen after so many minutes.  0 to 
    					# disable.
    log_rotation_size = 0			# Automatic rotation of logfiles will 
    					# happen after so many kilobytes of log
    					# output.  0 to disable.
    
    # These are relevant when logging to syslog:
    #syslog_facility = 'LOCAL0'
    #syslog_ident = 'postgres'
    
    
    # - When to Log -
    
    #client_min_messages = notice		# Values, in order of decreasing detail:
    					#   debug5
    					#   debug4
    					#   debug3
    					#   debug2
    					#   debug1
    					#   log
    					#   notice
    					#   warning
    					#   error
    
    #log_min_messages = notice		# Values, in order of decreasing detail:
    					#   debug5
    					#   debug4
    					#   debug3
    					#   debug2
    					#   debug1
    					#   info
    					#   notice
    					#   warning
    					#   error
    					#   log
    					#   fatal
    					#   panic
    
    #log_error_verbosity = default		# terse, default, or verbose messages
    
    #log_min_error_statement = panic	# Values in order of increasing severity:
    				 	#   debug5
    					#   debug4
    					#   debug3
    					#   debug2
    					#   debug1
    				 	#   info
    					#   notice
    					#   warning
    					#   error
    					#   panic(off)
    				 
    #log_min_duration_statement = -1	# -1 is disabled, 0 logs all statements
    					# and their durations, in milliseconds.
    
    #silent_mode = off			# DO NOT USE without syslog or 
    					# redirect_stderr
    
    # - What to Log -
    
    #debug_print_parse = off
    #debug_print_rewritten = off
    #debug_print_plan = off
    #debug_pretty_print = off
    #log_connections = off
    #log_disconnections = off
    #log_duration = off
    #log_line_prefix = ''			# Special values:
    					#   %u = user name
    					#   %d = database name
    					#   %r = remote host and port
    					#   %h = remote host
    					#   %p = PID
    					#   %t = timestamp (no milliseconds)
    					#   %m = timestamp with milliseconds
    					#   %i = command tag
    					#   %c = session id
    					#   %l = session line number
    					#   %s = session start timestamp
    					#   %x = transaction id
    					#   %q = stop here in non-session 
    					#		processes
    					#   %% = '%'
    					# e.g. '<%u%%%d> '
    #log_statement = 'none'			# none, mod, ddl, all
    #log_hostname = off
    
    
    #---------------------------------------------------------------------------
    # RUNTIME STATISTICS
    #---------------------------------------------------------------------------
    
    # - Statistics Monitoring -
    
    #log_parser_stats = off
    #log_planner_stats = off
    #log_executor_stats = off
    #log_statement_stats = off
    
    # - Query/Index Statistics Collector -
    
    stats_start_collector = on
    stats_command_string = on
    #stats_block_level = off
    stats_row_level = on
    #stats_reset_on_server_start = off
    
    
    #---------------------------------------------------------------------------
    # AUTOVACUUM PARAMETERS
    #---------------------------------------------------------------------------
    
    autovacuum = on				# enable autovacuum subprocess?
    autovacuum_naptime = 600		# time between autovacuum runs, in secs
    #autovacuum_vacuum_threshold = 1000	# min # of tuple updates before
    					# vacuum
    #autovacuum_analyze_threshold = 500	# min # of tuple updates before 
    					# analyze
    #autovacuum_vacuum_scale_factor = 0.4	# fraction of rel size before 
    					# vacuum
    #autovacuum_analyze_scale_factor = 0.2	# fraction of rel size before 
    					# analyze
    #autovacuum_vacuum_cost_delay = -1	# default vacuum cost delay for 
    					# autovac, -1 means use 
    					# vacuum_cost_delay
    #autovacuum_vacuum_cost_limit = -1	# default vacuum cost limit for 
    					# autovac, -1 means use
    					# vacuum_cost_limit
    
    
    #---------------------------------------------------------------------------
    # CLIENT CONNECTION DEFAULTS
    #---------------------------------------------------------------------------
    
    # - Statement Behavior -
    
    #search_path = '$user,public'		# schema names
    #default_tablespace = ''		# a tablespace name, '' uses
    					# the default
    #check_function_bodies = on
    #default_transaction_isolation = 'read committed'
    #default_transaction_read_only = off
    #statement_timeout = 0			# 0 is disabled, in milliseconds
    
    # - Locale and Formatting -
    
    #datestyle = 'iso, mdy'
    #timezone = unknown			# actually, defaults to TZ 
    					# environment setting
    #australian_timezones = off
    #extra_float_digits = 0			# min -15, max 2
    #client_encoding = sql_ascii		# actually, defaults to database
    					# encoding
    
    # These settings are initialized by initdb -- they might be changed
    lc_messages = 'C'			# locale for system error message 
    					# strings
    lc_monetary = 'C'			# locale for monetary formatting
    lc_numeric = 'C'			# locale for number formatting
    lc_time = 'C'				# locale for time formatting
    
    # - Other Defaults -
    
    #explain_pretty_print = on
    #dynamic_library_path = '$libdir'
    
    
    #---------------------------------------------------------------------------
    # LOCK MANAGEMENT
    #---------------------------------------------------------------------------
    
    #deadlock_timeout = 1000		# in milliseconds
    #max_locks_per_transaction = 64		# min 10
    # note: each lock table slot uses ~220 bytes of shared memory, and there are
    # max_locks_per_transaction * (max_connections + max_prepared_transactions)
    # lock table slots.
    
    
    #---------------------------------------------------------------------------
    # VERSION/PLATFORM COMPATIBILITY
    #---------------------------------------------------------------------------
    
    # - Previous Postgres Versions -
    
    #add_missing_from = off
    #backslash_quote = safe_encoding	# on, off, or safe_encoding
    #default_with_oids = off
    #escape_string_warning = off
    #regex_flavor = advanced		# advanced, extended, or basic
    #sql_inheritance = on
    
    # - Other Platforms & Clients -
    
    #transform_null_equals = off
    
    
    #---------------------------------------------------------------------------
    # CUSTOMIZED OPTIONS
    #---------------------------------------------------------------------------
    
    #custom_variable_classes = ''		# list of custom variable class names


There was three months worth of rollback data in the database. There were 1.5 million tuples in sq_rb_ast_lnk.

As I say, we've fixed it now by turning rollback off, because we don't actually need that level of functionality. However, there are a couple of other potentially inefficient queries I've noticed, especially around the logs screen in the backend, which I'll post about tomorrow if I get some time.

(Avi Miller) #9

That postgresql.conf looks like it's been almost tuned. There are non-default settings, but they don't look like they've been setup to match your hardware -- or at least, they don't look like the postgresql.conf files on similar spec'd boxes I've been looking at recently. If Squiz is involved in your PostgreSQL configuration, I would log a support request and ask the sysadmins to run our PostgreSQL auto-tuning process on the box.