Using XML Import script


(Willh) #1

This is a tip which I've found very useful. I hope it's useful to you too.

 

If you have external systems which you need to present data from there are four ways which I'm aware of in Matrix:

  • Remote Content asset, which passes through data
  • Data Source assets (database, csv etc), which request on demand
  • Import scripts, which update assets in batches
  • and Web Services, of which I know nothing

I've found the import XML script very useful.

http://manuals.matrix.squizsuite.net/server-administrator/chapters/import-scripts#import_from_xml.php

 

Here are two ways using it:

  1. wget or curl an API feed from source system. Transform it using XSLT into the import script syntax. Run import script from command line or from within Matrix. Source data can be any structure markup: XML, XHTML, RSS etc. HTML can be used but consider running it through Tidy first.
  2. get some tabular for import. Get list of assets from Matrix using Asset Listing asset. Munge both tabular lists together and wrap in XML tags. I prefer using VLookups in Excel/LibreOffice Calc. If you have the AssetId from Matrix in one column and the new value in the next then you have nearly all you need.

I've used the above methods for:

  • a one-off for updating metadata on 200 profiles
  • as a daily update from a central store of person profiles
  • a one-off for updating usernames of 100 people to follow new login system

 

Don't be daunted by XSLT. It's a really useful language. I found oXygen XML editing software, Saxon transform engine and the O'Reilly book XSLT by Tidwell were a winning combination.

http://www.oxygenxml.com/

http://sourceforge.net/projects/saxon/

http://shop.oreilly.com/product/9780596527211.do

 

My only complaint about this script is that the syntax is not elaborated in the documentation. A bit more documentation would be helpful.

 

How that's useful,

Will

 


(Tmlalazi) #2

Thx for the tips you offer, I am interested in updating my metadata with information I have from the database. I have created an XML doc with the ID of the asset and the asset name and then the metadata that I'd like to update the assets with for my pictures. This is part of a migration process.

 

I've looked at your pointers but I can't get it started. any help will be appreciated.


(Willh) #3

Hi Thando,

 

The process below lists XML documents from a source system. It retrieves them then transforms so that they can be used by the import script.

 

The bash script looks like this

#!/bin/bash

set -e

URL=http://example.com/matrix_import_profiles.xml

MATRIX_ROOT=/home/www/example/matrix
XSL=/home/www/example/scripts/data/matrix_import_example_update.xsl

TMPDIR=$(mktemp -d /tmp/example-cron.XXXXXXXXXX)

cd ${TMPDIR}
wget -O matrix_import_profiles.xml ${URL}
xsltproc -o matrix_update_profiles_run.xml
${XSL} matrix_import_profiles.xml

php /home/www/example/matrix/scripts/import/import_from_xml.php
${MATRIX_ROOT} matrix_update_profiles_run.xml

cd /tmp
\rm -rf ${TMPDIR}

My XML master file is generated by Matrix. It's very simple, so producing Well Formed XML isn't a problem.

<?xml version="1.0" encoding="UTF-8"?>
<profile>
  <data filename="http://example.com/pi.php?id=1953" matrixid="8987" />
</profile>

The XSLT starts like this

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
    <xsl:output method="xml" />
    <xsl:template match="/">
        <actions>
            <xsl:for-each select="/profile/data">
                <xsl:variable name="assetid" select="@matrixid"/>
                <xsl:for-each select="document(@filename)">
                    <action>
                        <action_id><xsl:value-of select="pi/pi_key"/>_path</action_id>
                        <action_type>add_web_path</action_type>
                        <asset><xsl:value-of select="$assetid"/></asset>
                        <path><xsl:value-of select="pi/pi_key"/></path>
                    </action>
                    <action>
                        <action_id><xsl:value-of select="pi/pi_key"/>_name</action_id>
                        <action_type>set_attribute_value</action_type>
                        <asset><xsl:value-of select="$assetid"/></asset>
                        <attribute>name</attribute>
                        <value><xsl:value-of select="pi/pi_surname"/><xsl:text>, </xsl:text><xsl:value-of select="substring(pi/forenames,1,1)"/></value>
                    </action>

Later parts cope with empty fields, non-compliant characters etc - deeper XSLT stuff which are outside the scope of this post. The most important bit is the command in XSLT1 <xsl:for-each select="document(@filename)"> which retrieves the documents from the web.

 

The retrieved input XML starts like this:

<?xml version="1.0" encoding="UTF-8"?>
    <pi id="1953">
        <pi_key>person_person</pi_key>
        <title>Dr</title>
        <forenames>Person</forenames>
        <middle_initial/>
        <pi_surname>Person</pi_surname>
        <name_suffixes/>
        <display_name>Dr Person Person</display_name>

Check the documentation for the desired output. You can't exactly cut and paste what I've shown here but I hope it will get you started.

 

 

hth,

Will


(Tmlalazi) #4

Thx, I had already gone as far as preparing the XML. The issue for me now is with the actions as I already have the assets created and just want to update their metadata. But great help anyway you've opened my eyes to something else I can do with other parts of the site.