Hey guys,
Just wanted to ask regarding Word HTML and how 'easy' it is to house inside Matrix.
As you no doubt know, when screenshots are done with Word HTML, its images are stored within a folder of the same name as the html page itself.
Now without having to load each image (including arrowheads and clip art arrows 685 items approx etc) individually, how can we get this to work?
Should i put the page/folders in the root so that i don't have to edit the HTML to look for a folder within Matrix? Is there a bulk file upload feature that i can use? I remember Angela Bennett referring to such a feature in a previous phonecall a few months ago.
Any advice appreciated muchly 
Hi Gav,
This is a feature that I have tackled recently, and know that it can cause a few problems. Before I look at some solutions, there is a new package Import Tools that has been added to MySource Matrix. It is a premium package, so if you will need to discuss with Angela or your account manager options for obtaining this. It allows you to import html files into Matrix pages, including importing and remapping of images and their URLs in the content. It also has a Word conversion process that will convert Word documents to html, but it does require an additional windows server and some setup. This option would be desirable if you wish to import many of these files, as it greatly reduces the time needed to import large and complex HTML documents. It also allows the file to be split up and imported as multiple pages depending on the structure of the document.
It is quite a convoluted process to map images correctly from their static representations to Matrix Assets. This is often due to the publicly accessible URLs that are desirable when serving files in Matrix. When a file is set to public read access and Allow Unrestricted access is set, then the file will recieve a ROOT/__data/public/asset_type/asset_id URL. This doesn't allow easy referencing of the file with a path-based solution like you suggested. In this form, the ./?a=ASSET_ID, becomes important, as it will evolve depending on the current state of the file asset. In this case, the only solution is to import all the images first(using the import_files script or something similar), and keep a mapping of the image filenames imported, with their new asset ids. Then the content of the page to be inserted would need to have and 'page_name_files/imagename.jpg' references replace to the asset id that imagename.jpg was assigned when imported.
Of course, the method you proposed to import all the images in the root of the page is an option and would work in theory, however this option requires Matrix to serve all images in the document (No __data URLs), and could affect the performance of your system especially if there are many images like you described. In this case, you could import all the images under the same asset as the new page you are going to put your content into. Then in the content of the page, replace all references to page_name_files/image_name.jpg with just image_name.jpg. Then import this content to a page in the same position in the tree as the images. If you get a lot of broken images, the images may have their Allow Unrestricted Access option set to true.
I apologise for the complicated explanations, but as I said it is quite a difficult process to grasp correctly. Please Throw any questions back at me on here and I'll be glad to help out.
Regards,
Darren McKee
Ok, thanks for that comprehensive reply
At least i know its being tackled.
One thing that I forgot to say before was that i also tried uploaded the desired content to personal webspace then build a remote content page to pull from that. I only got a blank page. Is there anything to suggest/change that would make this work? Seems doable i think.
Thanks again
I can't tell you off hand what could be causing that. I haven't had a lot of experience with remote content. I will try and get the developer of the Remote Content Asset to shed some light on it.
It's hard to debug Remote Content in the forum. You might want to take a look at the error log to get a pointer to the problem. But before you do that, make sure that the Matrix server is allowed to access to the page you're trying to bring in. Also, make sure you understand all the options on the Remote Content's Details screen. When RC asset is first created, all you need to do is add the URL of the target page. If that works, you can increase security by enabling other options, etc.
If you cannot get any output, try pointing RC to www.google.com. If that doesn't work with all the standard settings, talk to your system administrator about the firewall. If the firewall rules cannot be relaxed, you might want to host your document on the same server as the matrix install, but outside of matrix and then use RC to bring it in.
‘Content cannot be accessed’ on www.google.com as its URL…
And when i try the other URL http://users.bigpond.net.au/gav etc etc i get nothing but a blank page, not even a blank page but surrounded by our intranet.
Firewall then?
Although now i think about it, i don't think the firewall would be an issue.
I have another Remote Content page hooking up with dictionary.com so i'm not sure whats up.
[quote]And when i try the other URL http://users.bigpond.net.au/gav etc etc i get nothing but a blank page, not even a blank page but surrounded by our intranet.
[right][post=“8923”]<{POST_SNAPBACK}>[/post][/right][/quote]
That URL gives me a Directory Listing Error from the webserver, so its invalid.
You may want to fix that and see how it goes.
[quote]That URL gives me a Directory Listing Error from the webserver, so its invalid.
You may want to fix that and see how it goes.
[right][post=“8925”]<{POST_SNAPBACK}>[/post][/right][/quote]
Yeah bad example i know, but there is an html file on that site that should be viewable. Don’t want to give the full link for privacy reasons thats all 
www.google.com doesn’t work
www.smh.com.au doesn’t work
yet www.dictionary.com works fine (with start and end tags)
[quote]www.google.com doesn’t work
www.smh.com.au doesn’t work
[right][post=“8926”]<{POST_SNAPBACK}>[/post][/right][/quote]
Do you get any entries in your {mysource_root}/data/private/logs/error.log file when you access the Remote Content asset?
27-Feb-2006 16:16:29] PHP Fatal error: Allowed memory size of 16777216 bytes exhausted (tried to allocate 35 bytes) in /home/websites/mysource_matrix/packages/cms/page_templates/page_remote_content/page_remote_content.inc on line 225
Not that i can decypher that 
[quote]27-Feb-2006 16:16:29] PHP Fatal error: Allowed memory size of 16777216 bytes exhausted (tried to allocate 35 bytes) in /home/websites/mysource_matrix/packages/cms/page_templates/page_remote_content/page_remote_content.inc on line 225
[right][post=“8929”]<{POST_SNAPBACK}>[/post][/right][/quote]
Matrix is running into the PHP memory limit when trying to retrieve and render the remote content. Edit mysource_matrix/core/web/index.php and where it says 16M in the init_set line, change that to 32M and see how you go.
Sorry, can you tell me where i'd edit that.
Cheers
[quote]Sorry, can you tell me where i’d edit that.
[right][post=“8932”]<{POST_SNAPBACK}>[/post][/right][/quote]
You need to edit the following file:
home/websites/mysource_matrix/core/web/index.php
There will be the following line:
[font=“Courier”]ini_set(‘memory_limit’, ‘16M’);[/font]
Change that to:
[font=“Courier”]ini_set(‘memory_limit’, ‘32M’);[/font]
And see how you go. 
[quote]You need to edit the following file:
home/websites/mysource_matrix/core/web/index.php
There will be the following line:
[font=“Courier”]ini_set(‘memory_limit’, ‘16M’);[/font]
Change that to:
[font=“Courier”]ini_set(‘memory_limit’, ‘32M’);[/font]
And see how you go. 
[right][post=“8933”]<{POST_SNAPBACK}>[/post][/right][/quote]
and thats a file accessed in the _admin backend yeah?
[quote]and thats a file accessed in the _admin backend yeah?
[right][post=“8936”]<{POST_SNAPBACK}>[/post][/right][/quote]
No, this is on the webserver itself. You cannot edit this file from the Administration Interface.
[quote]No, this is on the webserver itself. You cannot edit this file from the Administration Interface.
[right][post=“8937”]<{POST_SNAPBACK}>[/post][/right][/quote]
Ah ok. My Network Admin who oversees he has changed this file but problem still remains.
[quote]Ah ok. My Network Admin who oversees he has changed this file but problem still remains.
[right][post=“8943”]<{POST_SNAPBACK}>[/post][/right][/quote]
What are you getting in the error.log file now?
It seems to be specific to the site its pointing to.
www.smh.com.au doesn't work
www.dictionary.com works fine
html file on personal webspace = Content cannot be accessed
www.news.com.au works (albeit a little cramped)
You may want to contact Squiz Support to take a closer look at this for you.