Files across multiple sites


(Ian Hirst) #1

Matrix Version:5.5.3.3
We have Squiz Matrix with multiple system root urls and running multiple websites.

A URL is only applied to its own site.

When we publish a PDF file, the url resolves to:

https://website.url/__data/assets/pdf_file/0011/10505/docname.pdf

This is totally normal.

However, if we take the __data/assets/pdf_file/0011/10505/docname.pdf part of the url and append it to one of our other website names, e.g.

https://otherwebsite.url/__data/assets/pdf_file/0011/10505/docname.pdf

Then the file will display, even though the otherwebsite URL is not applied to the asset.

This is causing us problems with a custom google search engine, because it is showing files from a completely separate website in the search results.

Can anyone replicate this or has experienced it?


(Aleks Bochniak) #2

Google can only display this url in it’s search results, if it’s found this url somewhere on your site.

__data is served directly from your web server - check your web server’s configuration for these site locations


(Bart Banda) #3

Hi Ian, this is expected behaviour because the file exists on the file system and is requested directly through to that without going through the Matrix application.

As Aleks said, as long as you don’t have links on your site pointing to that version of the URL anywhere then it should eventually drop out of SEO, unless of course other people on the internet are linking to it.


(Ian Hirst) #4

That’d be fine, except a link to that url has never been published and the alternative domain URL has never even been applied to the asset.


(David Schoen) #5

Google has to find the URL from somewhere. If the __data path on the site is configured to list files out (not recommended), it might have just crawled the directory structure - otherwise the file had to be linked on that domain from somewhere (not necessarily your site).

Can you review access logs to see if a referrer is listed when the crawler downloads the file?


(Ian Hirst) #6

Thanks David - how would we know if the __data path on the site is configured to list files out?


(David Schoen) #7

Navigate to the directories (https://otherwebsite.url/__data/assets/pdf_file/0011/10505 , https://otherwebsite.url/__data/assets/pdf_file/0011 , etc) and confirm you get either 404 or 403s and don’t see links to the file itself.