Matrix Cache - Getting it Right


(Scott Hall) #1

Hey people, I personally know how bad it can be when the C word is dropped in a technical conversation, 99% of the eyes in a room glaze over. I know, I was fully glazed all of but 6-12 months ago but then things just started to make sense.


So many clients struggle with caching and many end up pointing the finger at Matrix as the likely scape goat, which in some respects it is… but the truth is, no one has been kind enough to tell the poor little highland goaty oat what to do!



Getting the basics set up in admin mode (I mean basics) is so easy (so easy) it hurts:





>>> Cache Manager:

  • Caching Status: On
  • Default Expiry: 1hour
  • Browser Expiry: 1hour
  • Public Level Caching: On



    >>> System Configuration:
  • Send Cacheable Headers: Yes



    Thats it! Suddenly Matrix is talking the talk and walking the walk, not only is it caching at an asset level wherever possible but it's also tells anything else that may be requesting a page to cache as well e.g. browser or proxy if one exists. If it still isnt working after those changes then its time to inspect the headers using something like ySlow or net tab for FireFox which usually ends in a chat to your friendly sys admin in order to correct web server settings like server/date times.



    Just remember Matrix is the 'switch' in between all of this. Turn that little turkey on and the power will flow to the web server and in turn be passed to whatever knocks on its door for a chat. Flick the wrong switches and some rooms will be left in the dark :frowning: and those that stepped in for a chat fumbling about to figure out exactly what to do.



    99% of the cases I have corrected have been related to: cache manager turned on but system config still says 'no' for sending the headers… if the sys config setting says 'no' then your browser and proxy servers will hold on to little or no cache at all and Matrix becomes the 'little engine that couldnt'. If you dont have a proxy then its all good, the browser can still do a big chunk of the hard yards giving matrix and the web server more time to be friendly.



    You might have noticed I set the expiry to 1hr. 1-4hrs is a very good idea for a new roll out, why? Because at least its working and its not at an expiry level which will totally confuse people who are new to the product. Scale it up or down as necessary.



    What's even better than this? More settings and options of course! Once you have the above in place, proven, and working then start turning some of the other sys config options on as required, in many cases you probably wont even need to.



    What's even better than that!!! OMG right click on the cache manager in the asset map, you can go to town for 'hot spots' in your websites and tell those pages (and any assets they use) to expire every 10mins if you wanted to. Saves your media/marketting team following steps ABC, XYZ, 123 to clear cache which = small wins for the web team, how good would your peeps look if they 'suddenly' saved marketting 60mins a day in cms overheads!



    Of course there are question marks, but guess what… ask and you shall receive (get it S.Hall pretty funny yer?) lets dispell the myths together.



    Who wants to go first? I know you all have your hands up.

Description of how caching works (using Edge)
(Nic Hubbard) #2

Currently we have our cache set to expire after 12 hours. Can you let me know what are the pros and cons to this? My assumption was that making it longer would always allow for faster page loads. Then I would just manually have to recache certain things (or have triggers doing that).


Thoughts on longer cache times?


(Dan Simmons) #3

[quote]
Thoughts on longer cache times?

[/quote]



I think as long as you understand caching well enough and know how to clear it easily (eg. /_recache) then you can get benefits from caching pages for longer.



How much you benefit does depend on the site though. If your pages are simple and load quickly anyway, then you get less benefit. But if your pages have many complex listings on them and you've got a longer page generation time (eg. 2 seconds), then increasing the cache time is going to decrease the number of cache misses people get.


(Scott Hall) #4

I could see Nics hand eagerly bouncing up and down from the front row…


Any uni site will benefit from longer cache times, in fact 12 hours for default system wide is what I suggest for my uni clients, this allows for changes during the day to be seen easily by the next morning you come into work, or even earlier for those who hit the sites to study early in the morning.



In my books Nic, that's spot on. Consider how advanced matrix is with caching now I would go out of my way to avoid triggers (of course they have their place) but strategic customisation of assets in hot spots will save the day e.g. home and landing pages (and any assets they rely upon) set to 30mins to 4 hours depending on traffic.



We have _recache at our disposal now, after doing this you should remove the /_recache and hard refresh cache in the browser to request headers for proxy/browser.



Testing this at low traffic periods would be a good thing considering it will go and request a recache for applied design, any paint layouts, and any nested assets.


(Justin Avery) #5

It should also be noted that any static files that are being served from /__data/ will not be affected by the cache settings in Matrix.


You will need to set these either directly through Apache or via a .htaccess file.



Also, if you find that the response headers are showing no cache make sure that you're not viewing a page within a brownser you're logged into Matrix with.



Some tools to check the cache…


  • FireFox Addon - HTTP Heahders
  • FireBug - Net tab - All - Headers (Expires Thu, 31 Dec 2037 23:55:55 GMT Cache-Control max-age=315360000)
  • http://www.webpagetest.org

(Scott Hall) #6

[quote]
Also, if you find that the response headers are showing no cache make sure that you're not viewing a page within a brownser you're logged into Matrix with.



Some tools to check the cache…


  • FireFox Addon - HTTP Heahders
  • FireBug - Net tab - All - Headers (Expires Thu, 31 Dec 2037 23:55:55 GMT Cache-Control max-age=315360000)
  • http://www.webpagetest.org

    [/quote]



    Very good point, everyone gets tripped up by trying to test their caching and not realising they are doing it from a logged in session… when you do cache testing have admin mode open in one browser and use another browser (not logged in) to test with.



    Another groovy web diagnostics tool is http://www.webpagetest.org to help translate some of what you might see in the nettab/yslow componenst view into human readable chunks of info.

(Scott Hall) #7

[quote]
It should also be noted that any static files that are being served from /__data/ will not be affected by the cache settings in Matrix.



You will need to set these either directly through Apache or via a .htaccess file.



[/quote]



Justin, I sometimes struggle with a good path to go down for non matrix served URLs… are you talking about web server tweaks to get things like etags sent for file assets?


(Duncan Robertson) #8

Good thread. This might sound really dumb but in the spirit of dispelling Myths:

  1. What's 'Squiz server' in regards to caching?
  2. What's the need of Squid or any other proxy server if MySource does this?

(Dan Simmons) #9

[quote]

  1. What's 'Squiz server' in regards to caching?

    [/quote]



    Squiz Server is a tool that runs HIPO jobs in the background on the server. It has nothing to do with caching, doesn't affect caching in any way.




[quote]

2) What's the need of Squid or any other proxy server if MySource does this?

[/quote]



Squid and Matrix caches are slightly different - Squid caches objects by URL, where as Matrix caches in much more detail (eg. per URL, per asset, etc).



But probably the most important difference is that Squid is built as an web application accelerator, and hence is very very fast. It's written in C, caches in memory and on disk and can handle a huge amount of traffic to cached objects.



Matrix's cache storage is in the database (by default) so each cached request still requires a database connection to be opened, and PHP processing time.



Therefore, Matrix = good, Squid = faster, Matrix + Squid = a great team. :slight_smile:


(Duncan Robertson) #10

Dan,


Is there any configuration guide you could share with the community to setup Squid and Matrix? I've looked a lot at the config examples (http://wiki.squid-cache.org/ConfigExamples/#General) and I, being honest, just don't get it…


(Justin Avery) #11

[quote]
Justin, I sometimes struggle with a good path to go down for non matrix served URLs… are you talking about web server tweaks to get things like etags sent for file assets?

[/quote]



If you take a look at a page served from Matrix with the cache set to 86400 (1 day) you receive the following headers (taken from RedBot)


    HTTP/1.1 200 OK
    Date: Thu, 10 Feb 2011 10:04:35 GMT
    Server: Apache/2.2.3 (CentOS)
    X-Powered-By: PHP/5.1.6
    Set-Cookie: SQ_SYSTEM_SESSION=3el8v4alnltkfnki9cp40aams1;
        domain=surfthedream.com.au; path=/;
    Expires: Fri, 11 Feb 2011 10:04:35 GMT
    Cache-Control: max-age=86400, public
    Pragma: cache
    Last-Modified: Mon, 07 Feb 2011 20:56:06 GMT
    Keep-Alive: timeout=7, max=10
    Connection: Keep-Alive
    Transfer-Encoding: chunked
    Content-Type: text/html; charset=iso-8859-1


If we then take a look at a static file originating from that design the redbot output looks like this

     HTTP/1.1 200 OK
    Date: Thu, 10 Feb 2011 10:06:27 GMT
    Server: Apache/2.2.3 (CentOS)
    Last-Modified: Mon, 31 Jan 2011 14:10:51 GMT
    ETag: "382f6-2ae8-49b24fbb650c0"
    Accept-Ranges: bytes
    Content-Length: 10984
    Keep-Alive: timeout=7, max=10
    Connection: Keep-Alive
    Content-Type: text/css


In this case we have files being served on the /__data/ url which means that Matrix is not invoked and the file is served directly by the Apache server. This means that the file is not aware of the Matrix configuration of the cache policies and could be served fresh each time.

To avoid this we can either serve the file through Matrix (not recommended as there is no point in serving static files from a dynamic CMS), or configure Apache to cache the static file types.

To make it easier to see what is happening, and also to stay out of the server configuration if you're not comfortable there, you can create a .htaccess file with the following contents and save it to the /mysource_matrix/ directory within the mysource matrix install (please read this before deciding to use .htaccess http://httpd.apache.org/docs/1.3/howto/htaccess.html#when)

    
    ExpiresActive On 
    ExpiresByType text/css A31536000 
    ExpiresByType application/x-javascript A31536000 
    ExpiresByType text/html A7200
    ExpiresByType text/richtext A7200 
    ExpiresByType image/svg+xml A7200 
    ExpiresByType text/plain A7200 
    ExpiresByType text/xsd A7200
    ExpiresByType text/xsl A7200 
    ExpiresByType text/xml A7200 
    ExpiresByType video/asf A31536000 
    ExpiresByType video/avi A31536000 
    ExpiresByType image/bmp A31536000 
    ExpiresByType application/java A31536000 
    ExpiresByType video/divx A31536000 
    ExpiresByType application/msword A31536000
    ExpiresByType application/x-msdownload A31536000 
    ExpiresByType image/gif A31536000 
    ExpiresByType application/x-gzip A31536000 
    ExpiresByType image/x-icon A31536000 
    ExpiresByType image/jpeg A31536000 
    ExpiresByType application/vnd.ms-access A31536000 
    ExpiresByType audio/midi A31536000 
    ExpiresByType video/quicktime A31536000 
    ExpiresByType audio/mpeg A31536000 
    ExpiresByType video/mp4 A31536000 
    ExpiresByType video/mpeg A31536000 
    ExpiresByType audio/ogg A31536000 
    ExpiresByType application/pdf A31536000 
    ExpiresByType image/png A31536000 
    ExpiresByType application/vnd.ms-powerpoint A31536000 
    ExpiresByType audio/x-realaudio A31536000 
    ExpiresByType application/x-shockwave-flash A31536000
    ExpiresByType application/x-tar A31536000 
    ExpiresByType image/tiff A31536000 
    ExpiresByType audio/wav A31536000 
    ExpiresByType audio/wma A31536000 
    ExpiresByType application/zip A31536000 


**Please note that you should be setting this in Apache. If you make a mistake with the .htaccess file you could stop your site from appearing - which kind of defeats the purpose of cache :) Work out your cache policies on static files and then send them to your System Adminstrator to apply.

(Dan Simmons) #12

Sure. Attached is a Squid configuration to get you started. At the bare minimum, it only requires 3-4 changes to get going.


You need to move your Apache VirtualHosts onto port 81 (ie. change “Listen 80” to “Listen 81” in Apache and change your NameVirtualHost and VirtualHost directives from 80 to 81).



From there:



%PUBLIC_NAME% - change to the hostname of your server (eg. “www.example.com”)

%DISK_CACHE_MB% - the size of disk space Squid will consume. Only needs to be as big as your fresh cache, which is typically a few hundred MBs. I usually just set this to 1024 (1GB) to get started.

%TARGET_WEBSERVER% - the interface which Apache is listening for requests to Matrix. This is probably going to be the IP address of your server, or “127.0.0.1”.

%TARGET_WEBSERVER_PORT% - “81” once you’ve moved your Apache vhosts to listen on port 81.



Switching Squid on to a live system requires extreme care and some sysadmin skill, so I’d recommend generally that you test this somewhere first.



Sorry I can’t spend more time creating a full-blown guide right now, but I hope this points you in the right direction :slight_smile:
squid.conf.txt (983 Bytes)


(Duncan Robertson) #13

OMFW. Thanks.


(Nic Hubbard) #14

Can you explain the best options for the rest of the HTTP Headers Settings, such as last-modified and no-cache for files?


(Vcurd) #15

A question I have is about the caching of design areas. It seems most design areas can and should be cached but I have found it difficult figuring out when one should not be cached. For example it seems appropriate to cache a nested content area which nests an asset listing with a static root node, but what about if the asset listing has a dynamic root node of current asset? What about the caching of the body design area on asset builders and account manager pages? Is there a generally applicable rule or principle to working this out?


(Scott Hall) #16

[quote]
A question I have is about the caching of design areas. It seems most design areas can and should be cached but I have found it difficult figuring out when one should not be cached. For example it seems appropriate to cache a nested content area which nests an asset listing with a static root node, but what about if the asset listing has a dynamic root node of current asset? What about the caching of the body design area on asset builders and account manager pages? Is there a generally applicable rule or principle to working this out?

[/quote]



It's good practise to apply cache=1 to design area tags within your parse file before you start using it and customisating it. My rule of thumb is apply cache=1 to everything except BODY design area and MENU/MEN SUBS.



Caching the body design would = small win for a brochure (static) site that changes once in a blue moon, but for anything else its is a bad idea, otherwise you will get headaches when trying to view asset builders, account managers, or forms at their URLs. I believe it caches the body copies in these instances (someone please correct me if I am wrong) which causes dramas.



The menu design area and its subs do not cache so no use applying cache=1 to these as a tag attribute or in their custimisations.



If you apply cache manage customisations to assets just remember to apply the same level of customisation to any other assets it may be nesting in, whether it be direct nest or via paint layout or design.



Example:



I want page X to expire cache in 1 hour, the system wide default is currently 12hrs. I have to create a cache manage 'root node' customisation for page X, I set it to apply to everything under that page as well. I go to view changes as a public user after an hour and only half the changes have shown up!?!? Woops I just realised that half the content on page X is sourced from nested page Y, and page Y is not a child of page X! I go back to the 'root node' customisations and add another 1hr expiry for page Y. Now my page updates every hour :).



If you have an asset listing being nested in using dynamic root node, you have to think about: the page it is nested into, the asset listing itself, and the assets it sources content from. Matrix will create cache for each asset setting expiry times based on system defaults or cache manager overrides if they exist, and if they are viewed at their individual URLs you have to then think about browser/proxy.


(Dan Simmons) #17

I can't think of a real world case where you ever want to cache the body design area.


AFAIK, pages/bodycopies do their own caching anyway, so that the asset contents of a standard page shown through a body design area WILL be cached (even though the body design area is not cached).



(On the other hand, caching the body design area will break stuff that is meant to be dynamic. For example, Custom Forms will easily break when displayed using a cached body area.)



It gets quite complicated quite fast, so without delving into the code and going into an insane amount of detail, rule of thumb is never cache the body design area.


(Vcurd) #18

Okay, so just to be clear, I can set cache="yes" on the following design areas without things going awry?


  • Accesshistory
    [*]Assetlineage
    [*]Calendar
    [*]Constant
    [*]Datetime
    [*]Declaredvars
    [*]Head
    [*]Login
    [*]Metadata
    [*]Nestcontent
    [*]Requestvars
    [*]Searchbox
    [*]Showif
    [*]Exit


But I shouldn't cache the following design areas?
[list]
[*]Body
[*]Menunormal
[*]Menustalks
[/list]

(Scott Hall) #19

[quote]
Okay, so just to be clear, I can set cache="yes" on the following design areas without things going awry?

  • Accesshistory[*]Assetlineage[*]Calendar[*]Constant[*]Datetime[*]Declaredvars[*]Head[*]Login[*]Metadata[*]Nestcontent[*]Requestvars[*]Searchbox[*]Showif[*]Exit


But I shouldn't cache the following design areas?
[list][*]Body[*]Menunormal[*]Menustalks[/list]
[/quote]

Excellent questions, lets get a dev to tick them off... loading *.*

(Edison Wang) #20

I have to correct a bit about above list:
Following Design Areas can't cached ( even you specify cache = yes, it won't work anyway):



design_area_login_form.inc

design_area_menu_type.inc

design_area_body.inc

design_area_exit.inc

design_area_show_if.inc



Rest of them can be cached at selection.

Actually labs is working on a series of performance tests. One of them will be comparing cache on/off for various design areas. Hopefully, it will help implementer to decide which design areas they really should turn on caching.