Removing urls from massive sets of assets - best practice?


(Douglas (@finnatic at @waikato)) #1

Matrix Version:

We need to remove a number of site urls from some of our assets, but as per my other thread there’s an excessive number of assets and doing it via _admin the HIPO job begins and I expect it to fail given the number of assets it appears to be working on.

So.

What is the best practice for removing urls from massive sets of assets?

In the past we’ve had to resort to using system_update_lookups.php after the HIPO jobs stall. I’ve looked at https://matrix.squiz.net/manuals/server-administrator/chapters/system-management-scripts but don’t see any scripts that suggest they might be helpful for removing a site url.

Suggesting that the best tactic might be to update the urls on the site asset, kill the HIPO manually after a few minutes and then run the system_update_lookups.php script from the command line?

Any tactics known to work, or options to consider would be appreciated.


(Douglas (@finnatic at @waikato)) #2

Alternatively, how safe / effective is the Add/Remove URL Script ?


(Marcus Fong) #3

That script is what I’ve always used for bulk URL removal. It’s considerably faster than removing URLs via the Matrix admin interface, since it operates directly on the database.

I don’t recall encountering any issues after removal, although obviously you do have to be very careful not to mistakenly remove the wrong URL. Normally we would take a backup before making large-scale changes, just in case.