Hi Squiz people,
Just wondering if there is any kind of simple guide on how to format the XML documents for mass importing new assets into the system with parent, child attributes etc?
We have a migration to do on 3 sites, and would be nice to be able to write a script which spiders the pages, and builds the XML document to import all placeholders for the asset structure within MySource.
Or if anyone else has done similar would be interested to hear how you achieved it.
Cheers
Have a chat with your account manager -- I'm pretty sure the guys in Canberra have been working on something like this. :)
We have created php class library to encapsulate the XML format. It's not complete; we keep adding functionality by reverse engineering the trigger actions :ph34r:
So we can do stuff like
$mypage = new Page('page1');
$mypage->setName('My new page');
$mypage->setHTML('<h1>My page</h1>');
echo $mypage->getXML();
So far, it handles:
- Asset creation
- Asset attributes
- Links
- Metadata and schemas
- Asset status
- Future status
- Web paths
- Permissions
We have completely abstracted classes for creating:
- Standard pages- Asset listings (very crude)
- Events
That lot was originally used for importing news articles and stuff from a mysql database into matrix.
But we had somewhere around 2-3000 static web pages which needed to go in. So we wrote some very crude code which goes through a set of pages by opening the local static php/html files, extracting links to files and writng this to a csv file, and to a shell script for copying. We can then run scripts/import/import_files.php piping the output to another file. A third bit of code then runs through that piped log and the original csv file, matches them, and writes the old url and new asset ID to a database.
We then have yet more code which can go through a set of pages recursively, and call the class to build matrix pages from it. It runs in two stages, the first creates the page structure, finally creating a dummy page and populating it with some csv data with the old url and the new asset ID (a little hack using [[output://]]. That .csv is then imported into the database along with the file URLs. The second phase of the page run then extracts the page contents, replaces any links with the ./?a= format based on the database of links, and then writes out the XML to populate the new assets with the content.
The biggest problem in all of that… reliably finding where the page content starts!! It's fine if you have <!-- START CONTENT --> and <!-- END CONTENT --> in all your files, but we didn't! We also had about 30 microsites each with their own template.
I'll have a chat with management here about releasing the code, but it would come with absolutely no support, and a big warning that it was written as throw-away code, and therefore has little/no error checking along the way. Or any documentation.
Hey there Peter that does sound very interesting to say the least, if you are able to release the code I would be interested in having a look. I am more interested in acheiving the output format of the XML structure MySource requires to import from the XML.
If you are able to assist, please send me a PM.
Cheers