Potential cron manager dead lock issue

Hi,


In my inbox I keep getting this message:



POTENTIAL CRON MANAGER DEAD LOCK ISSUE


    The cron manager for the system "The System" has attempted to run 3 times, but has not been able to for one of two reasons:
    1) An existing cron manager is currently running an extremely long running job.
    2) Something has caused the cron manager to exit uncleanly and the lock wasn't able to be removed.


I have no clue why this is happening.

Is this something I should be worried about?

thanks.

Go to the Running Status screen of your Cron Manager. There you should see the Dead Lock Status section. You should be able to clear the deadlock from here.

Thanks Nic,


Do you know why this happens? It only recently started happening.

We tend to get this message after performing otherwise successful upgrades. (We're now on Matrix 3.22.3) Most recently the "Manage Pending Accounts Cron Job" was the one that caused the deadlock. I've examined the log files, but I can't work out why this happens.


While it's possible to remove the deadlock using the "Dead Lock Status" section of the Cron Manager, the fact that you have to enter a 25-character CAPTCHA string makes it look as if you're undertaking a very scary and serious operation (somewhat akin to launching a nuclear missile maybe). Does anyone know if it's really such a big deal to remove the dead lock? Is there the potential to cause serious problems to the system?

The CAPTCHA is large as there is potential risk when removing a deadlock as related processing may still be running. If this is happening frequently and/or without explanation then I would recommend contacting Squiz Support to track down possible contributing factors.

Are you turning off the cron job during your upgrade I think this is in the upgrade steps somewhere
crontab -u apache -e

Comment out the cron job

*/15 * * * * php /home/websites/mysource_matrix/core/cron/run.php

We're having exactly the same problem here, about 2000 emails I've received so far. :rolleyes: I've looked at the Cron Manager and there is nothing showing a deadlock. Any other suggestions? I'm not sure how to proceed. We are a bit behind on patch levels, at 3.20.1.


Thanks for any suggestions.



Ivan

I'm getting this problem again, but I don't see any dead lock status in the Running Options for Cron Manager.




How do you go about diagnosing this sort of issue?

[quote]
I'm getting this problem again, but I don't see any dead lock status in the Running Options for Cron Manager.





How do you go about diagnosing this sort of issue?

[/quote]



What version of matrix?

Do you have rollback enabled?



If you do have rollback enabled and you're running something before 4.4.0 that could be the reason - cron jobs are overlapping and as the message says, it tries to run again and sees that one is already running (and the 2nd attempt aborts). The upgrade to something more recent would be worth it for this reason and a lot of other improvements and new features.

I'm running v4.0.0 and rollback is disabled under roll back management.




Maybe it is time to upgrade.





Thanks.

[quote]
I'm running v4.0.0 and rollback is disabled under roll back management.





Maybe it is time to upgrade.

[/quote]



I highly doubt an upgrade is going to fix this. There is still something that is causing it to be locked. Go to the running status screen and see if it says what it thinks the deadlock is being caused by. Most likely it is something that is trying to set a cron job to a time in the past.



These are always annoying as you need to unlock and clear them a few times before it will be working again. Just try that and see what happens.

[quote]
I'm running v4.0.0 and rollback is disabled under roll back management.





Maybe it is time to upgrade.





Thanks.

[/quote]



As Nic mentioned, an upgrade won't necessarily fix this.



What cron jobs are enabled in matrix?



Can you check matrix logs (/path/to/matrix/data/private/logs/) in particular cron_errors.log and errors.log and database logs too. Any query errors showing in the database logs ?

This is what it shows on the cron running status.








I don't have shell access at the moment but this is the message from Log Manager.


    
    [2011-10-28 01:00:02][12:Root User][1024:mysource notice][ ] [cron.job.success - ] (Generate Log Reports Cron Job Success) - Generate Log Reports Cron Job
    Scheduled : Every time the cron manager runs
    Completed Successfully
    
    
    
    Error Messages :
    Nil
    [2011-10-28 01:00:02][12:Root User][1024:mysource notice][ ] [cron.job.success - ] (Cron Job Metadata Triggers Success) - Metadata Triggers Cron Job
    Scheduled : Every time the cron manager runs
    Completed Successfully
    
    
    
    Error Messages :
    Nil
    [2011-10-28 01:00:02][12:Root User][1024:mysource notice][ ] [cron.job.success - ] (Expired Matrix Cache Cron Job Success) - Expired Matrix Cache Cron Job
    Scheduled : Every time the cron manager runs
    Completed Successfully
    
    
    
    Error Messages :
    Nil
    [2011-10-28 01:00:02][12:Root User][1024:mysource notice][ ] [cron.job.success - ] (Expired Matrix Locks Cron Job Success) - Expired Matrix Locks Cron Job
    Scheduled : Every time the cron manager runs
    Completed Successfully
    
    
    
    Error Messages :
    Nil
    [2011-10-28 01:00:02][12:Root User][1024:mysource notice][ ] [cron.job.success - ] (Manage Pending Accounts Cron Job Success) - Manage Pending Accounts Cron Job
    Scheduled : Every time the cron manager runs
    Completed Successfully
    
    
    
    Error Messages :
    Nil
    [2011-10-28 01:00:02][12:Root User][1024:mysource notice][ ] [cron.job.success - ] (Manage Incomplete Submissions Cron Job Success) - Manage Incomplete Submissions Cron Job
    Scheduled : Every time the cron manager runs
    Completed Successfully
    
    
    
    Error Messages :
    Nil
    [2011-10-28 01:00:02][12:Root User][1024:mysource notice][ ] [asset.attributes.fulllog.scalar - assetid:16;] (Asset Attribute Changed) - The value of attribute "last_run" for asset "Cron Manager" has been changed from "1319758200.000000" to "1319758200"
    [2011-10-28 01:00:02][12:Root User][1024:mysource notice][ ] [asset.attributes.scalar - assetid:16;] (Asset Attribute Changed) - The value of attribute "last_run" for asset "Cron Manager" has been changed from "1319758200.000000" to ""
    [2011-10-28 01:00:02][12:Root User][1024:mysource notice][ ] [asset.attributes.fulllog.scalar - assetid:16;] (Asset Attribute Changed) - The value of attribute "running" for asset "Cron Manager" has been changed from "1" to "0"
    [2011-10-28 01:00:02][12:Root User][1024:mysource notice][ ] [asset.attributes.scalar - assetid:16;] (Asset Attribute Changed) - The value of attribute "running" for asset "Cron Manager" has been changed from "" to ""
    [2011-10-28 01:15:01][12:Root User][1024:mysource notice][ ] [asset.attributes.fulllog.scalar - assetid:16;] (Asset Attribute Changed) - The value of attribute "running" for asset "Cron Manager" has been changed from "0" to "1"
    [2011-10-28 01:15:01][12:Root User][1024:mysource notice][ ] [asset.attributes.scalar - assetid:16;] (Asset Attribute Changed) - The value of attribute "running" for asset "Cron Manager" has been changed from "0" to ""
    [2011-10-28 01:15:01][12:Root User][1024:mysource notice][ ] [asset.attributes.fulllog.scalar - assetid:16;] (Asset Attribute Changed) - The value of attribute "run_check" for asset "Cron Manager" has been changed from "0" to "0"
    [2011-10-28 01:15:01][12:Root User][1024:mysource notice][ ] [asset.attributes.scalar - assetid:16;] (Asset Attribute Changed) - The value of attribute "run_check" for asset "Cron Manager" has been changed from "0" to ""
    [2011-10-28 01:15:02][12:Root User][1024:mysource notice][ ] [asset.attributes.fulllog.scalar - assetid:16;] (Asset Attribute Changed) - The value of attribute "current_job" for asset "Cron Manager" has been changed from "" to "42"
    [2011-10-28 01:15:02][12:Root User][1024:mysource notice][ ] [asset.attributes.scalar - assetid:16;] (Asset Attribute Changed) - The value of attribute "current_job" for asset "Cron Manager" has been changed from "" to "42"
    [2011-10-28 01:15:02][12:Root User][1024:mysource notice][ ] [asset.attributes.fulllog.scalar - assetid:16;] (Asset Attribute Changed) - The value of attribute "current_job" for asset "Cron Manager" has been changed from "42" to ""
    [2011-10-28 01:15:02][12:Root User][1024:mysource notice][ ] [asset.attributes.scalar - assetid:16;] (Asset Attribute Changed) - The value of attribute "current_job" for asset "Cron Manager" has been changed from "42" to ""
    [2011-10-28 01:15:02][12:Root User][1024:mysource notice][ ] [asset.attributes.fulllog.scalar - assetid:16;] (Asset Attribute Changed) - The value of attribute "current_job" for asset "Cron Manager" has been changed from "" to "43"
    [2011-10-28 01:15:02][12:Root User][1024:mysource notice][ ] [asset.attributes.scalar - assetid:16;] (Asset Attribute Changed) - The value of attribute "current_job" for asset "Cron Manager" has been changed from "" to "43"
    [2011-10-28 01:15:02][12:Root User][1024:mysource notice][ ] [asset.attributes.fulllog.scalar - assetid:16;] (Asset Attribute Changed) - The value of attribute "current_job" for asset "Cron Manager" has been changed from "43" to ""
    [2011-10-28 01:15:02][12:Root User][1024:mysource notice][ ] [asset.attributes.scalar - assetid:16;] (Asset Attribute Changed) - The value of attribute "current_job" for asset "Cron Manager" has been changed from "43" to ""
    

[quote]

    
    [2011-10-28 01:15:01][12:Root User][1024:mysource notice][ ] [asset.attributes.fulllog.scalar - assetid:16;] (Asset Attribute Changed) - The value of attribute "running" for asset "Cron Manager" has been changed from "0" to "1"
    [2011-10-28 01:15:01][12:Root User][1024:mysource notice][ ] [asset.attributes.scalar - assetid:16;] (Asset Attribute Changed) - The value of attribute "running" for asset "Cron Manager" has been changed from "0" to ""
    [2011-10-28 01:15:01][12:Root User][1024:mysource notice][ ] [asset.attributes.fulllog.scalar - assetid:16;] (Asset Attribute Changed) - The value of attribute "run_check" for asset "Cron Manager" has been changed from "0" to "0"
    [2011-10-28 01:15:01][12:Root User][1024:mysource notice][ ] [asset.attributes.scalar - assetid:16;] (Asset Attribute Changed) - The value of attribute "run_check" for asset "Cron Manager" has been changed from "0" to ""
    [2011-10-28 01:15:02][12:Root User][1024:mysource notice][ ] [asset.attributes.fulllog.scalar - assetid:16;] (Asset Attribute Changed) - The value of attribute "current_job" for asset "Cron Manager" has been changed from "" to "42"
    [2011-10-28 01:15:02][12:Root User][1024:mysource notice][ ] [asset.attributes.scalar - assetid:16;] (Asset Attribute Changed) - The value of attribute "current_job" for asset "Cron Manager" has been changed from "" to "42"
    [2011-10-28 01:15:02][12:Root User][1024:mysource notice][ ] [asset.attributes.fulllog.scalar - assetid:16;] (Asset Attribute Changed) - The value of attribute "current_job" for asset "Cron Manager" has been changed from "42" to ""
    [2011-10-28 01:15:02][12:Root User][1024:mysource notice][ ] [asset.attributes.scalar - assetid:16;] (Asset Attribute Changed) - The value of attribute "current_job" for asset "Cron Manager" has been changed from "42" to ""
    [2011-10-28 01:15:02][12:Root User][1024:mysource notice][ ] [asset.attributes.fulllog.scalar - assetid:16;] (Asset Attribute Changed) - The value of attribute "current_job" for asset "Cron Manager" has been changed from "" to "43"
    [2011-10-28 01:15:02][12:Root User][1024:mysource notice][ ] [asset.attributes.scalar - assetid:16;] (Asset Attribute Changed) - The value of attribute "current_job" for asset "Cron Manager" has been changed from "" to "43"
    [2011-10-28 01:15:02][12:Root User][1024:mysource notice][ ] [asset.attributes.fulllog.scalar - assetid:16;] (Asset Attribute Changed) - The value of attribute "current_job" for asset "Cron Manager" has been changed from "43" to ""
    [2011-10-28 01:15:02][12:Root User][1024:mysource notice][ ] [asset.attributes.scalar - assetid:16;] (Asset Attribute Changed) - The value of attribute "current_job" for asset "Cron Manager" has been changed from "43" to ""

[/quote]



That seems a little strange. Do you have matrix hooked in to any squiz-suite products? I wonder if you're running into this bug: http://bugs.matrix.squiz.net/view_bug.php?bug_id=5035 (if you don't have matrix connected to any squiz-suite products, then this won't be the problem).

No it wasn't hooked to any squiz suite, although i was trying to hook it to test sage pay.


I decided to delete the whole server and start again, but I still getting these emails!!



Haunted server!! :ph34r:

[quote]
No it wasn't hooked to any squiz suite, although i was trying to hook it to test sage pay.



I decided to delete the whole server and start again, but I still getting these emails!!



Haunted server!! :ph34r:

[/quote]



Which job is actually deadlocked? you can tell from the running status then follow the path to investigate. We had deadlocks from "Expired Matrix Cache".