Funnelback sitemap xml/cache

Hi,


Firstly, the funnelback integration seems really awesome, playing around with it has been fun so far.



Is there a way to exclude some urls from getting crawled/cached (in sq_fnb_idx)? We have a bunch of URLs that are for maintenance and backend stuff that never need to be crawled. I realise the searchable urls are limited when you create collections, but these extra urls really bloat sq_fnb_idx, and with extra urls, the rebuild_cache script would take far longer as well.



Thanks

[quote]
Hi,



Firstly, the funnelback integration seems really awesome, playing around with it has been fun so far.



Is there a way to exclude some urls from getting crawled/cached (in sq_fnb_idx)? We have a bunch of URLs that are for maintenance and backend stuff that never need to be crawled. I realise the searchable urls are limited when you create collections, but these extra urls really bloat sq_fnb_idx, and with extra urls, the rebuild_cache script would take far longer as well.



Thanks

[/quote]



Good to see people are playing with Funnelback and enjoying it :wink:



The currently only way extra urls can be omitted from sq_fnb_idx is by omitting all permissions (which is probably no use for what you what). It could be something that could be looked at in future (filling out a bug report or talking with your Squiz rep could get the ball rolling).



I should point out, that after running the rebuild_cache script the first time, you should not need to run it frequently after that, as changes to an asset trigger a process to keep this table updated (similar to standard Matrix searching). It is just more of a maintenance step after running for the first time. Also to improve/speed up the rebuild_cache script you can break it up into smaller sections (by selecting a root node ie. the sites each collection is set on) and spread it out for optimal performance.



Hope that helps

Thanks Ben, useful info. patched in 3.26.3 and got it all running better now.


I'm also trying the integration with the 'funnelback search server'. Is it compatible with 8.3.0? I can see results are returned in processSearch()(using log_dump to debug) but then it crashes at loadXML($xml_results);


    [22-Apr-2010 14:35:53] 
    1: funnelback_xml_error_handler (/www/mysource_matrix/packages/funnelback/page_templates/funnelback_search_page/funnelback_search_page.inc line 2721): 
    XML ERROR: DOMDocument::loadXML() [function.DOMDocument-loadXML]: EntityRef: expecting ';' in Entity, line: 117. Error Number:2. Error File:/www/mysource_matrix/packages/funnelback/page_templates/funnelback_search_page/funnelback_search_page.inc. Error Line:642
    [2010-04-22 14:35:53][91217:andrew][512:mysource warning][R] (/core/include/locale_manager.inc:504) - Invalid response from the Funnelback server [FNB0002]

[quote]
Thanks Ben, useful info. patched in 3.26.3 and got it all running better now.



I'm also trying the integration with the 'funnelback search server'. Is it compatible with 8.3.0? I can see results are returned in processSearch()(using log_dump to debug) but then it crashes at loadXML($xml_results);


    [22-Apr-2010 14:35:53] 
    1: funnelback_xml_error_handler (/www/mysource_matrix/packages/funnelback/page_templates/funnelback_search_page/funnelback_search_page.inc line 2721): 
    XML ERROR: DOMDocument::loadXML() [function.DOMDocument-loadXML]: EntityRef: expecting ';' in Entity, line: 117. Error Number:2. Error File:/www/mysource_matrix/packages/funnelback/page_templates/funnelback_search_page/funnelback_search_page.inc. Error Line:642
    [2010-04-22 14:35:53][91217:andrew][512:mysource warning][R] (/core/include/locale_manager.inc:504) - Invalid response from the Funnelback server [FNB0002]

[/quote]



As far as I know we have only tested with Funnelback 9.0.



The problem looks specifically in your case like Funnelback is returning invalid xml which Matrix cannot process.



Hope that helps,

Ben