Orphaned files - to do I find them all?


(Cathy Wells) #1

I was wondering if there was anyway to find out what assets (specifically file assets) don't have any notice links. I know you can see that information for each asset in the linking screen, but I want to do it on bulk.


Basically, I want to know what files are orphaned on the system, and if I can see a list of assets that don't have any notice links then they could pretty much be removed - especially if they are sitting under the media folder.



If a list of files without notice links can't be generated in Matrix, how do other people bulk check for orphan files on their sites?


(Greg Sherwood) #2

There is no script to list which assets have no NOTICE links, but a DB query could be used to find them if required, or a script written.


However, just looking for assets without NOTICE links is not always good enough. Some File assets are used in searches, listings, or even in menus - and those links are either non-existent or not NOTICE links.



Obviously, you know your system better than we do and know where content is used, but those are a few things to consider for everyone.


(Ben) #3

Is there any update to this issue? We have a media folder which has a fair old whack of stale content, but there doesn't seem to be any easy way of producing a report that lists content that might be stale.


And further, when you are looking at a media asset, is there any way of determining whether it is in fact used in listings or menus if you don't already know? This would be important when you have a large number of editors, and a bit of turnover in who is looking after which content.



I guess the issue for the future is training people to archive, delete or otherwise mark stale media assets, but it would be just super if there was a reasonably quick way of creating a list of assets that people at least needed to look at…


[quote]There is no script to list which assets have no NOTICE links, but a DB query could be used to find them if required, or a script written.



However, just looking for assets without NOTICE links is not always good enough. Some File assets are used in searches, listings, or even in menus - and those links are either non-existent or not NOTICE links.



Obviously, you know your system better than we do and know where content is used, but those are a few things to consider for everyone.[/quote]


(Daniel Nitsche) #4

It depends what you define as stale content. Is it content that hasn’t been updated or accessed recently? Or is it content that is no longer relevant, or maybe content that is no longer used? Some of those Matrix can help you with, others, like content usage could be determined by an external web stats application.


If you’re talking about content that is no longer linked to, I wouldn’t rely entirely on Matrix’s list of notice links. You could crawl your site and produce an external report on where the asset is being linked from. Google even has a crack at this for you, eg:

http://www.google.com/search?q=link%3Amatrix.squiz.net%2Fresources%2Finstallation


(Ben) #5

Thanks Daniel, the google link hint is a good one.


What we're trying to define as "stale" is "assets in the Media folder which have no links to other, currently live, assets".



Mostly the issue is with PDF files that have been superseded by updated documents, where the user has delinked the file from the live page and replaced it with the updated file (good), but not removed, refiled or placed into archive the old file (bad!). Because of the nature of our setup - we have several business units managing their own content and editors - there's not really central control of content. And it is often the case that content managers are moved from one BU to another, and the incoming editor will typically throw up their hands at the scale of the job of sifting out redundant content - precisely because there appears to be no simple way of telling whether it really is redundant without opening each file itself - OK for twenty or fifty, not so much for 500 or so.



The problem is compounded because the media content is always web available, so people who bookmark those files can find themselves getting incorrect or out of date information - a big problem for things like tenders and other date-sensitive commercial documents.



Anyway, I guess we will keep looking around for a potential solution.



Cheers,



Ben


[quote]It depends what you define as stale content. Is it content that hasn’t been updated or accessed recently? Or is it content that is no longer relevant, or maybe content that is no longer used? Some of those Matrix can help you with, others, like content usage could be determined by an external web stats application.



If you’re talking about content that is no longer linked to, I wouldn’t rely entirely on Matrix’s list of notice links. You could crawl your site and produce an external report on where the asset is being linked from. Google even has a crack at this for you, eg:

http://www.google.com/search?q=link%3Amatrix.squiz.net%2Fresources%2Finstallation[/quote]


(Dw Andrew) #6

Did you have any luck with this, beeroll?


(Amurray) #7

We're keen on this one too!