Google Sitemap XML no longer being parsed by Google


(paul.hollands@gmail.com) #1

We were using an asset listing to produce a Google Sitemap format XML feed. This previously worked depite the fact that the MSM asset listing threw in extra divs. Google have obviously changed their XML parser to one which is stricter as these extra divs now cause Google not to parse our feed.


The following XSLT applied to your asset listing feed removes the extra divs.



[xml]

<?xml version="1.0"?>

<xsl:stylesheet version="1.0"

xmlns:xsl="http://www.w3.org/1999/XSL/Transform"

xmlns="http://www.google.com/schemas/sitemap/0.84"

xmlns:gsm="http://www.google.com/schemas/sitemap/0.84"

xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"

>

<xsl:output method="xml" indent="yes"/>

<xsl:template match="/gsm:urlset/gsm:div/gsm:div">

<urlset>

<xsl:for-each select="gsm:url">

<url>

<loc><xsl:value-of select="gsm:loc"/></loc>

<lastmod><xsl:value-of select="gsm:lastmod"/></lastmod>

</url>

</xsl:for-each>

</urlset>

</xsl:template>

</xsl:stylesheet>

[/xml]


(Justin Cormack) #2

You can just remove the divs. Find the asset where they are and set it to "raw html" not "block level".


(Duncan Robertson) #3

Yes, just remove the block level on the DIVs.


(paul.hollands@gmail.com) #4

First thing I tried. It didn't work. Don't know whether it's specific to an asset listing or a bug but it made no difference.


(Nic Hubbard) #5

[quote]
First thing I tried. It didn't work. Don't know whether it's specific to an asset listing or a bug but it made no difference.

[/quote]



How are you checking it? Are you checking the non-cached version, or trying to get google to verify it again, without clearing the cache?