Indexing of asset listings


(Rhulse) #1

I thought I'd start a discussion here on this topic first, before thinking about other ways of solving a problem.


The problem is that we put a lot of static content on asset listing pages to supplement the listings. The downside is that this content is not indexed.



What is the feasibility of indexing asset listings, excluding any %tags% in the text and obviously not resolving the contents of the list into the index?



I have considered nesting the asset listings in standard pages, but I am looking to extend the current archive, and this solution is not really viable from a management point of view over a longer period and as the asset count increases (which it will, quickly).



Richard


(Greg Sherwood) #2

Firstly, the reason why listings have never been indexed is because they are designed to be dynamic. That's the same for a few other assets (like asset builders) where the content you see depends on who you are and what state the asset is in. If we indexed content on the No Results bodycopy (or type formats even) for an asset listing, that page would appear in search results for those words, but they are only seen by select users, if ever.


That is what makes turning indexing on hard. What do we index? We can't index everything because then you get problems like the one above. So do you just index the main bodycopy for the listing? We can certainly do that, but will the content make sense? If you have some text like "You selected news items under %asset_name%. Use the sort drop-down to change the sort order" will it make sense for it to appear in your searches? Is this text actually content, or is it just instructional?



The problem is that the asset listings you use contain a great deal of valuable content. The listing component is only half the content in your case. Turning indexing on for listings where this is not the case will likely result in new unrelated search results appearing for some clients, and that is what is stopping us from just turning it on. If our existing clients now to reevaluate their search results and potentially add new search indexing rules based on specific asset IDs and tree positions, we could be in for one hard upgrade, including a full system reindex.


(Rhulse) #3

Thanks Greg,


Ill ask these questions here as probably you devs can give me a definitive response.



If we were to change all these page to standard pages with nested asset listings, what would be the performance impact?



I'd also be worried about the Matrix caches for the two not clearing at the same time, so potentially meaning that pages would (worst case) not update for cache_time_for_page + cache_time_for_asset_list due to overlapping cache cycle times.


(Greg Sherwood) #4

An additional asset will be loaded, but the listing is where all the page generation time is going, so you wouldn't notice any drop in performance, although there would be a few more queries for uncached pages (around 5-10).


[quote]I'd also be worried about the Matrix caches for the two not clearing at the same time, so potentially meaning that pages would (worst case) not update for cache_time_for_page + cache_time_for_asset_list due to overlapping cache cycle times.[/quote]

If your standard pages and asset listings have the same cache timeout, both assets would have the same expiry time as they are generated at the same time. Unless, of course, you view the asset listing both nested and stand-alone, in which case you could get those pages cached at different times. The real problem is when you clear the cache manually. You need to remember to clear the cache for both assets, or clear the cache for the standard page and it's children assuming the listing is a child of the standard page.