Generating RSS/Atom feeds with asset listing

evanwills · August 17, 2014, 11:15pm

I'm created an asset listing that generates an RSS feed based on new content for a section of my site. I've got everything working but I'm having trouble with HTML character entities and special characters. It seems that the special chars and HTML entities break the XML. So entries with bad chars or html entities just get omitted when being parsed (or worse still, when using PHP SimpleXMLElement break the parser altogether).The problem is that I can't remove/replace the character entities.

I've tried a number of things, including using keyword modifiers:

^striphtml which is helpful but doesn't remove the entities or fix the special chars
^urlencode which does make the HTML entities and special chars safe for use in XML but makes the feed useless because even the white space is URL encoded.
^as_xml which is meant for arrays (not string) and doesn't do anything in my use case.
^escapehtml which only replaces one set of HTML character entities with another.

Part of the problem is that were serving pages the headers charset as iso-8859-1 but the HTML charset as UTF8. (I know... WTF!!!)

The simple solution is to get editors to be more careful about what the enter. However... That's easier said than done.

evanwills · August 18, 2014, 1:04am

Another issue is specifying the "content-type" header (this probably needs to be a feature request) but is there a way of specifying the content type header on a per page basis or per design customisation/parse file basis?

evanwills · August 18, 2014, 1:12am

Scratch that last comment. My colleague, Robin shi showed me how by using "<MySource_PRINT id_name="__global__" var="content_type" content_type="application/xml; charset=utf-8" />" in the parse file.

nnhubbard · August 18, 2014, 6:19pm

Have you tried just wrapping them in CDATA tags?

<description><![CDATA[%asset_attribute_summary^trim%]]></description>

evanwills · August 19, 2014, 1:07am

Hi Nick

That's not a bad idea. I'll have a go.

The conclusion I came to was to get editors to view the feeds in Firefox so they could see where they had to fix them. but CDATA might be a better option.

PS: Congratulations on the 5k+ posts. Very impressive!