Parsing an XML Feed


(Andrew Harris) #1

I need to improve on an earlier implementation of an XML data source, so I'm seeking a bit of advice.

The data source looks like this (simplified to show only the troublesome bits):

 

<myfeed type="array">
<event-item>
<presenters type="array">
<presenter>
</presenter>
</presenters>
<link>
</event-item>

</myfeed>

 

As you can see, each event item includes two <link> nodes, and this is causing me issues.

Is there a way in Matrix of distinguishing between the two?

When I pull in the feed, can I specify tag names like event-item:title, and presenter:title so that I can deal with the two separately? I've tried a few variations without success, there may be some syntax, I just can't find a reference to it anywhere.

I realise that it will handle nodes of the same name, separated by a Data delimiter, but this is really not that useful, as I'll have to process them again somehow to separate them. It would be much nicer if I could uniquely identify them in matrix instead of having to do it with XSLT or Javascript.

 

Am I making sense?


(Bart Banda) #2

Not sure if you can extract each one separately using the config in the parser, might get a dev to check on that. But if you can get them both with comma separation, you could always separate them with a regex keyword modifier right? 


(Ashish Karelia) #3

Is there a way in Matrix of distinguishing between the two?

When I pull in the feed, can I specify tag names like event-item:title, and presenter:title so that I can deal with the two separately? I've tried a few variations without success, there may be some syntax, I just can't find a reference to it anywhere.

I realise that it will handle nodes of the same name, separated by a Data delimiter, but this is really not that useful, as I'll have to process them again somehow to separate them. It would be much nicer if I could uniquely identify them in matrix instead of having to do it with XSLT or Javascript.

 

Am I making sense?

 

Hi,

Unfortunately Matrix currently doesn't support the same name tags well. As Bart mentioned above, the best way would be to use keyword modifier on the content which would have been separated by data delimiter. 

 

There is also a SquizMap idea filed for similar issue you can have a read of: https://squizmap.squiz.net/matrix/5374(access limited to Squiz Clients).

 

Ash


(Andrew Harris) #4

OK, thanks for the input. Glad to see it's on the radar.

 

Unfortunately, the keyword modifier is not going to help in this case, as it's being displayed with a straight XSLT - no keywords. Might have to see if there's another way of doing this which uses an asset listing or something.

 

I'll keep plugging away at this one ;-)


(Andrew Harris) #5

Also, Bart, what sort of regex keyword modifier did you think might be useful here? I'm no expert on regex, but I'm struggling to understand how it could have helped me.


(Bart Banda) #6

What keyword are you using to print the current comma separated value? And what does it produce? 


(Andrew Harris) #7

Bart, as I said, we're not using keywords, we are using XSLT, which does not support matrix keywords afaik.

So, <xsl:value-of select="title" />, but there's no real reason why I couldn't switch to using an method that supported keywords.

 

So, in addition to the <link> example I showed above, each item in the feed has a <title>, but each <presenter> also has a <title>, so when I tell matrix to capture the <title>, I get them bundled together in: %ds__title%, eg: Single molecule, structural and biochemical insights into actin regulation in the malaria parasite‡Ms

I'm using the double-dagger as a delimiter because, comma. So, what regex keyword modifier could I use to help me separate these - especially given that there may be more than one presenter? It possibly gets even worse with some other tags, because, as you can see in the original example code, I get a <link> tag for each presenter, followed by the main <link> tag for the article, so i can't even reliably count on it being at a certain position in the array.

 

Regex is not my strong suit. Would be grateful for any tricks that help.


(Bart Banda) #8

Ah ok, I was thinking if the format was consistent, you could use a replace keyword modifier to handle it. Where replacing everything after the double dagger with nothing for the first value, and the reverse for the second value. But it it's not going to be consistent then that won't work either.