When you have a large number of valid cache entries, and you try to run the cache clearup script, the time it takes is exponentially relational to the number of valid entries, due to the way the php function in_array() works.
We had some 50,000 valid cache entries, but hadn't run the script for a few months, so had somewhere in the region of 10 gigs worth of cache files. When we tried to run it, it took three weeks!! Over that time, it managed to delete numerous valid entries, as it's "valid" list had been generated long before all the new entries were created.
I've re-written the script (see below), which took 2 hours on first run to clear up the whole lot, and subsequent runs take about 10mins - enough to now run it on a daily cron.
The script relies on a little bit of postgres trickery, so you will need to create a special index for it to work.
Index creation statements:
CREATE INDEX sq_cache_path ON sq_cache USING btree (path); CREATE INDEX sq_cache_dir ON sq_cache USING btree (substring(path, 1, 4));
For any readers who've not seen the latter syntax before, this tells postgres to create an index of values of the [b]result[/b] of the function. This then makes WHERE clauses using that function with those parameters practically instant.
The code...
error_reporting(E_ALL); if ((php_sapi_name() != 'cli')) { trigger_error("You can only run this script from the command line\n", E_USER_ERROR); }$SYSTEM_ROOT = (isset($_SERVER['argv'][1])) ? $_SERVER['argv'][1] : ''; if (empty($SYSTEM_ROOT) || !is_dir($SYSTEM_ROOT)) { echo "ERROR: You need to supply the path to the System Root as the first argument\n"; exit(); } require_once $SYSTEM_ROOT.'/core/include/init.inc'; echo "\nWarning: Please make sure you have the correct permission to remove cache files.\n"; echo 'SQ_CACHE_PATH is \''.SQ_CACHE_PATH."'\n\n"; // ask for the root password for the system echo 'Enter the root password for "'.SQ_CONF_SYSTEM_NAME.'": '; $root_password = rtrim(fgets(STDIN, 4094)); // check that the correct root password was entered $root_user = & $GLOBALS['SQ_SYSTEM']->am->getSystemAsset('root_user'); if (!$root_user->comparePassword($root_password)) { echo "ERROR: The root password entered was incorrect\n"; exit(); } // log in as root if (!$GLOBALS['SQ_SYSTEM']->setCurrentUser($root_user)) { trigger_error("Failed login in as root user\n", E_USER_ERROR); } $cache_path_len = strlen(SQ_CACHE_PATH) + 1; // Firstly clear all expired entries from sq_cache table $GLOBALS['SQ_SYSTEM']->changeDatabaseConnection('dbcache'); $GLOBALS['SQ_SYSTEM']->doTransaction('BEGIN'); $db =& $GLOBALS['SQ_SYSTEM']->db; $str = "Clearing expired entries from sq_cache table"; printf ('%s%'.(60 - strlen($str)).'s', $str,''); $sql = 'DELETE FROM sq_cache WHERE expires < NOW()'; $result = $db->query($sql); assert_valid_db_result($result); $GLOBALS['SQ_SYSTEM']->doTransaction('COMMIT'); $GLOBALS['SQ_SYSTEM']->restoreDatabaseConnection(); printStatus('OK'); $str = "Finding cache buckets"; printf ('%s%'.(60 - strlen($str)).'s', $str,''); // get all the cache directory names exec('find '.SQ_CACHE_PATH." -type d -maxdepth 1 -name '[0-9]*' | sort", $current_dirs); printStatus('OK'); $count = 0; $total = 0; // loop through each directory, to make it less memory intensive foreach ($current_dirs as $dir) { $bucket = substr($dir, -4); $str = "Getting valid entries from sq_cache table for bucket $bucket"; printf ('%s%'.(60 - strlen($str)).'s', $str,''); // get valid entries from the database $GLOBALS['SQ_SYSTEM']->changeDatabaseConnection('dbcache'); $db =& $GLOBALS['SQ_SYSTEM']->db; $sql = 'SELECT substring(path,6) AS filename FROM sq_cache WHERE expires > NOW() AND substring(path, 1, 4) = '.$db->quote($bucket).' '; $result = $db->getCol($sql); assert_valid_db_result($result); $GLOBALS['SQ_SYSTEM']->restoreDatabaseConnection(); // Convert the result into an associative array foreach ($result as $valid_file) { $valid_files[$bucket.'/'.$valid_file] = true; } printStatus('OK'); $current_files = Array(); $str = "\tFinding files in bucket $bucket"; printf ('%s%'.(50 - strlen($str)).'s', $str,''); // remove the file if there isnt a corresponding entry in the sq_cache table exec("find $dir -type f -name '[a-z0-9]*' | sort", $current_files); printStatus('OK'); foreach ($current_files as $file) { $file_name = substr($file, $cache_path_len); if (!isset($valid_files[$file_name])) { $total++; printFileName($file_name); $status = @unlink(SQ_CACHE_PATH.'/'.$file_name); $ok = ($status) ? 'OK' : 'FAILED'; printStatus($ok); if ($status) $count++; } } } echo "\nSummary: $count/$total cache file(s) removed.\n"; if ($count != $total) { $problematic = $total - $count; trigger_error("$problematic file(s) cannot be removed, please check file permission.", E_USER_WARNING); } /** * Prints the file path to be removed * * @param string $file_name the name of the cache file * * @return void * @access public */ function printFileName($file_name) { $str = "\tRemoving ".$file_name; printf ('%s%'.(50 - strlen($str)).'s', $str,''); }//end printFileName() /** * Prints the status of the container integrity check * * @param string $status the status of the check * * @return void * @access public */ function printStatus($status) { echo "[ $status ]\n"; }//end printStatus()</pre>