Caching problems on our website


#1

So we have a URL like this

http://www.xyyx.com.au/my-page/?utm_source=Marketo&utm_medium=EDM&utm_campaign=generic&mkt_tok=xxxx-A-VERY-LONG-UNIQUE-TOKEN-xxxxxx’

This issue is that the uncached version of this page takes about 20 seconds to load. I know its slow … We are trying to speed it up.

My question is that is it possible to still serve this URL from cache ? The parameters are just tracking parameters and doesn’t play a role in painting the page in anyway.

We are planning a campaign to drive traffic to our website, which may bring upto 20-25K visitors at the same time. The concern is that if everyone requests a non-cached version, the server will crash immidieatly.

I know there is a solution. Don’t know what it is …

Any help is very appreciated.

Thanks


(Marcus Fong) #2

There are a couple of levels of caching we have to distinguish between here.

Matrix has its own application-level cache, which should still work regardless of your query parameters. The problem there is that Matrix caching on its own probably won’t be enough to handle thousands of visitors hitting your site in a very short period of time.

For really high-traffic sites you need a caching reverse proxy in front of Matrix. However, these proxies store cache objects by URL including all query parameters, so changing the query parameters will result in an uncached query. Is that the setup you have? If so, you’ll need to find a way to drop the query parameters at the proxy level - the best approach will vary depending on the capabilities of your reverse proxy.

If you’re a Squiz client, I’d definitely recommend giving your account manager a call so we can look at your system. Caching is all about getting the details right and testing things properly, which is much easier to do directly than over a forum post.


#3

Thanks Fong. We use Squid caching. But it doesn’t work as the URL is unique each time due to a unique token in querystring. Just wondering how others are managing this scenario. There is generally heavy load of traffic after an email campaign. We need tracking to measure the effectiveness of the campaign. I was just looking for some best practices for Cache management in such situations.


(Marcus Fong) #4

It’s a bit tricky with Squid, since it doesn’t have built-in URL rewriting capabilities.

The three ways I know of to handle this are:

  • Create your own URL rewriter program/script, then use Squid’s url_rewrite_program and url_rewrite_children directives with it to strip the relevant query parameters.
  • Put something with better URL rewriting abilities in front of Squid and use it to strip the relevant query parameters.
  • Switch from Squid to something with better URL rewriting abilities and use it to strip the relevant query parameters.

All three options will have different pros and cons, so the best option will depend on your circumstances.