Our government site is about to implement the whole of government search provided by Funnelback. However many of our results are duplicated. Advice from Funnelback is that the duplicate results actually show slightly different base hrefs:
<BASE HREF="http://www.daff.gov.au/agriculture-food/drought">
<BASE HREF="http://www.daff.gov.au/agriculture-food/drought/">
The difference is the / at the end.
Because of this, the search doesn't consider them to occupy the same namespace. They are looking into what they can do, but have asked if we can stop the system generating two base hrefs for the same page.
Any ideas?
There is very little that can be done. You obviously want both URLs to resolve, so it's really up to the search engine to know they are the same page/content.
Presumerably Funnelback is picking the URLs up from your content, so you could check if one of the URLs with the trailing slash has been hard coded into your content somewhere, but there is no automated way to do this site wide.
FYI, just about any combination of slash will resolve (even on a non-matrix system), for example:
http://www.daff.gov.au/agriculture-food/drought//
http://www.daff.gov.au/agriculture-food//drought//
do you have a perl script merge_redirex.pl in blah/panoptic/bin/ folder?