Export to XML Script with self contained tar file?


(Tbaatar) #1

Hi,

I need to move across around 300 pages with child images and I was wondering how you go about creating self contained tar file with all the images like the export tool but using the export_to_xml.php script instead?

Thanks.


(David Schoen) #2

Something like:

# mkdir /tmp/my_export
# chown -R apache /tmp/my_export
# cd /tmp/my_export
# sudo -u apache php /var/www/matrix/scripts/import/export_to_xml.php /var/www/matrix 57:1 1 > export.xml
--------------------------------------
CREATING ASSET: text.txt
--------------------------------------
# tar cvzf ../my_export.tgz .
./
./export.xml
./export/
./export/Text_File/
./export/Text_File/57/
./export/Text_File/57/text.txt


(Tbaatar) #3

Hi David,

Thanks for the feedback.

I did pretty much everything the same however the exported file only gives me .xml file as opposed to .tgz.
Not sure if this makes any difference but I’m trying to export from Matrix 4.18.9.

In addition when importing the .XML file with all the text content into Matrix 5.4.X for News Item asset, it fails to create Body content and only creates the Summary content. Is this due to because the images that were referenced were missing?

In the end i just went with the export/import tool 30 times to move everything over.

The migration aspect of Matrix still seems very old and has not improved for the past 10-15 years. Are there any future plans to improve this?


(David Schoen) #4

Hi tbaatar, the tar is not generated by Matrix in my example it’s an explicit command and it’s worth noting that both the export and the tar depend on being within a directory that can be tarred up in its entirety with the way the commands are arranged. Nothing relevant to the tar generation above has changed since well before 3.16. The images would likely have been missing if you hadn’t created a directory and used cd to move in to it as per the example.

Internally Squiz uses Mirror and I think some Squiz clients have direct access as well. It’s the same kind of idea, but for cases like having to generate a tar to wrap up files and XML, the files are instead base64 encoded and inlined directly in side the XML. There’s also a lot of other reliability work and better feature support (preserving dates, both safe edit versions, tracking IDs between multiple installs for dev->uat->prod workflows, improved reliability, etc). I expect at some point we’ll be integrating that with core, but that’s not actively planned for yet.


(Tbaatar) #5

Hi David,

I’ve been playing around with the export_to_xml.php script and couple of things I can’t get my head around.

  • The imported assets date is not preserved
  • The exported asset does not preserve the images.

Is there any reason why the images (child) is not preserved/contained? and why does the created date change upon importing into the new system?

The import/export through the Admin interface preserves the images but there seems to be a size limit and it does not preserve the metadata or date.

This Mirror tool that is currently used by Squiz, is this just another php script? and what is holding back from releasing it for core?

Thanks


(David Schoen) #6

The date is changing because at import you are creating a new asset - the XML scripts were never designed to preserve everything.

When using the scripts on disk, images will end up under $PWD/export/... assuming that’s writeable by the apache user - that’s why I create a directory under /tmp and chown it in the example. If images are not creating, try doing that.

Metadata should be preserving - is it showing up in the XML at all?


(Tbaatar) #7

Hi David,

Yes I have followed the instructions to create export of the .xml file along with a folder containing all the media assets.

09

The problem is when importing these assets using the following example code it does not pull in the media assets inside the export directory.

sudo -u www-data php import_from_xml.php /var/www/squiz_matrix /var/www/backup/export.xml 64:1 1

As for the preserving metadata for export/import from the admin screen, I can see the metadata but it requires mapping the metadata IDs to the newer system and because of its limitation for handling larger export it requires dozens of export and mapping these is really painful.

The export and import php script is the most ideal scenario but I’m not sure why the images is not getting uploaded or i’m missing something?


(David Schoen) #8

The import would be expecting the “export” dir to be relative to $PWD when the script is run, so assuming it’s at /var/www/backup/export you would need to run:

cd /var/www/backup/export
sudo -u www-data php /var/www/squiz_matrix/scripts/import/import_from_xml.php /var/www/squiz_matrix /var/www/backup/export.xml 64:1 1

(Tbaatar) #9

Hi David,

Running it from the $PWD worked. Didn’t realise it had to be ran from the same directory.

Final question.
Will running database backup script preserve the created/published dates when restored from another system? and does it cause any issues restoring it to later version of Postgres e.g from 9.4 to 9.6 or 10 with latest Matrix build?

I had quick go at it couple of months back and ran into some issues.

Thanks,
Tuguldur


(David Schoen) #10

The backup script will definitely preserve everything when restored. Changing versions should generally work (potentially with some massaging) as long as you’re going to a newer version of PostgreSQL - but it’s probably worth a new post.


(Tbaatar) #11

Hi David,

Thanks for the feedback. It gives me great hope :slight_smile:
You are right it is for a new post when the time comes.

Thanks,
Tuguldur