I've been looking into HTTrack as a way of creating a fast version of a static site from Squiz every hour or so.
The problem I have is that it doesn't seem to play nice with redirect pages. I have one page that redirects to an external site, and this isn't being picked up.
Has anyone else come across this, and how did you resolve it?
One other question. I have images that are embedded within the content of some pages, and the URL that appears in the HTML link for that image in httrack is something like "./?a=121". This seems to be causing problems to httrack - is there any way to have Squiz display the full URL to the image rather than the asset id link?
Thanks, Tom
Firstly, I'm not sure how HTTrack can properly index Redirect pages, unless you give them a Redirect Timeout (which you can do in the v3.8 or higher of Matrix) so that the redirect is done in the HTML that is output, instead of by the server itself.
If there is no timeout, the redirect is done by the server, so HTtrack would only see it as a HTTP 302 response.
[quote]One other question. I have images that are embedded within the content of some pages, and the URL that appears in the HTML link for that image in httrack is something like “./?a=121”. This seems to be causing problems to httrack - is there any way to have Squiz display the full URL to the image rather than the asset id link?
[right][post=“10740”]<{POST_SNAPBACK}>[/post][/right][/quote]
Matrix should always print the “proper” URL for an image in its content – either a restriced Matrix URL (if the image is not Live, or doesn’t have Public read permission), or an unrestricted URL (once the image is both Live and has Public read permission).
If you’re getting images that have an ./?a=121 URL, then something is wrong with the way they are linked into the content.
[quote]Firstly, I’m not sure how HTTrack can properly index Redirect pages, unless you give them a Redirect Timeout (which you can do in the v3.8 or higher of Matrix) so that the redirect is done in the HTML that is output, instead of by the server itself.
If there is no timeout, the redirect is done by the server, so HTtrack would only see it as a HTTP 302 response.
Matrix should always print the “proper” URL for an image in its content – either a restriced Matrix URL (if the image is not Live, or doesn’t have Public read permission), or an unrestricted URL (once the image is both Live and has Public read permission).
If you’re getting images that have an ./?a=121 URL, then something is wrong with the way they are linked into the content.
[right][post=“10743”]<{POST_SNAPBACK}>[/post][/right][/quote]
This in an image that is embedded within the content div. I clicked on the link image icon, and selected the image from my Document Library/Images folder. The image itself is live and has unrestricted access set to yes. The Document Library folder is a top level folder.
Can you let me know what is wrong with this and why this is showing the asset id in the href instead of the correct url for the image?
Thanks, Tom
Try changing the content of the content div that image is in (add a space to the end or something) and commit. View the page source again and see if the URL has now been replaced.
[quote]Try changing the content of the content div that image is in (add a space to the end or something) and commit. View the page source again and see if the URL has now been replaced.
[right][post=“10752”]<{POST_SNAPBACK}>[/post][/right][/quote]
I tried adding a space and commiting, and adding some text and commiting, but in both cases it’s still showing the asset url instead of the fully qualified image url.
Your image doesn't have a proper URL, because its not under a site. You should use the Media folder that Matrix creates automatically, as this can have a URL applied. If you have multiple sites, each with their own URL, you should ensure that the Media folder has a URL that corresponds to each Site.
[quote]Your image doesn’t have a proper URL, because its not under a site. You should use the Media folder that Matrix creates automatically, as this can have a URL applied. If you have multiple sites, each with their own URL, you should ensure that the Media folder has a URL that corresponds to each Site.
[right][post=“10754”]<{POST_SNAPBACK}>[/post][/right][/quote]
Great, thanks, Avi. Works a treat.