SEO friendly URLS under custom asset


(Martin) #1

I have a custom asset which currently works of URLs of this form:


www.site.com/customasset?dbid=1234



This grabs data and formats it from an external database.



I'd much rather be using URLs like this:



www.site.com/customasset/1234



which means "grabbing" the URL at the customasset level so I can parse appropriately.



Not knowing quite how MSM parses down the URL path, have I any hope of coding up such a customasset? I'm quite willing to poke around the source code, but would rather not if the journey is quixotic :slight_smile:



Using 3.18.3 GPL.



thanks, martin


(Avi Miller) #2

This should be fairly straight-forward. Take a look at the Multiple Page asset, as it does something similar to generate URLs for its Multiple Page Page children.

(Greg Sherwood) #3

Actually, this is not going to be easy. Multiple page works because it registers URLs for each of it's children. So each child has a valid URL stored in the Matrix DB. I am assuming you have too many IDs to create valid URLs for each one.


When Matrix prints a page, it looks in a DB table to find the asset with that specific URL. If nothing is found, it prints the 404 page of the site.



The only way I can see this working is if your custom asset extends the Site asset and overrides the part where the 404 is printed. Instead of printing a not found page, it would first have to look in your external DB to see if there is actually some content to print. If not, it could then revert to the 404 page.


(Martin) #4

Thanks both! I'll look in to the two suggestions, I may actually be able to use both concepts (as ever, the reality is a little more complicated than my example). Picking up on the 404 makes a lot of sense.


cheers, martin


(Martin) #5

Hmmm. I've followed through the code in include/mysource.inc, seems that if the URL is not found then the 404 page for the site is output using _paintNotFoundAsset. That unconditionally outputs a 404 header then the 404 asset itself. Which makes it a bit tricky to sneak in a normal looking page via a custom 404 handler.


I've looked at the Site asset and can't immediately see where I could subclass it to catch 404s and hence handle the special cases I have in mind. Site asset doesn't appear to do 404 handling at all. It does handle returning the 404 special page, but as above, this is then printed in mysource.inc.



I may have to consider registering a lot of URLs, which leads to issues of maintaining integrity between the database (updated outside of MSM) and MSM. Life, eh?



cheers, martin


(Greg Sherwood) #6

You should be able to send a 200 OK header after mysource.inc has sent the 404 Not Found header. Just call header() again and pass the second parameter to replace the previous header: http://au.php.net/header


(Martin) #7

It works! :smiley: Thanks for pointing me in the right direction.


I knocked up a simple custom page for use as a Not Found special page. It can be configured to watch for specific web path roots and hand these off to other nominated custom pages that can dig in to the database and print whatever is needed (based on the rest of the web path).



Example paths when watching a web path root of "/doldex", database has a table of entries:



www.site.com/doldex/

  • takes you to a custom asset that prints all the names of entries in the database



    www.site.com/doldex/entries
  • takes you to the custom asset that prints all rows in the entries table (doesn't invoke the 404)



    www.site.com/doldex/entry/fred-bloggs
  • goes via the custom 404 handler to a custom asset that prints the row of that name



    Actually, one custom asset page handles all the variations to avoid an explosion of types and code.



    Nothing very pretty, but will happily share if PMed.



    cheers, martin

(Greg Sherwood) #8

Good work. I hadn't ever thought of doing that before, so it's nice to know that a site-based asset can do that.


(Martin) #9

Apologies if I wasn't clear (brain fried by this point). There is no subclass of Site involved. I subclassed Page_Standard for the 404 handler. In fact, the code is sufficiently short, here's the guts of it:

    <?php
    
    	// include the parent type, the user asset
    	require_once SQ_CORE_PACKAGE_PATH.'/page_templates/page_standard/page_standard.inc';
    
    	class DOL_404 extends Page_Standard
    	{
    		var $_traphandler_page;
    		var $_trap_path;
    
    			/**
    			* Constructor
    			*
    			* @param int	$assetid		the asset id to be loaded
    			*
    			*/
    			function __construct($assetid=0)
    			{
    				return parent::__construct($assetid);
    
    			}//end constructor
    
    		public function load($assetid)
    		{
    				parent::load($assetid);
    
    				$trapconfig = $this->attr('trapconfig');
    				if (strlen($trapconfig) == 0) {
    						return;
    				}
    
    				$am = $GLOBALS['SQ_SYSTEM']->am;
    				$url = strip_url(current_url(false, false), true);
    				$current_site = $am->getRootUrl(current_url(FALSE, TRUE));
    				$site_url = $current_site['url'];
    				$root = substr($url, strlen($site_url));
    
    				$traps = explode(';', $trapconfig);
    				array_pop($traps);	  // last element is empty
    				foreach ($traps as $trap) {
    						list($traproot, $trapassetid) = explode('=', $trap);
    						if (preg_match(':(^'.$traproot.'/)(.*):', $root, $matches) > 0) {
    								$page = $am->getAsset($trapassetid);
    								$this->_trap_path = $matches[2];
    								$this->name = $page->trapPageName($this->_trap_path);
    								$this->_traphandler_page = $page;
    						}
    				}
    				// Otherwise, no trap handler matches, so 404 as usual.
    		}
    
    		/**
    		* Prints out the frontend for this file with no design
    		*
    		* @return void
    		* @access public
    		*/
    		public function printFrontend()
    		{
    				if (!$this->readAccess()) {
    						return;
    				}
    				parent::printFrontend();
    
    		}//end printFrontend()
    
    
    		/**
    		* Called by the design to print the body of this asset
    		*
    		* @param array  $keyword_replacements   some replacements for keywords in the content
    		*
    		* @return void
    		* @access public
    		*/
    		public function printBody()
    		{
    				if (!$this->readAccess()) {
    						return;
    				}
    				if (!empty($this->_traphandler_page)) {
    						// mysource.inc will have set a 404 header already, override this.
    						header('HTTP/1.1 200 OK');
    						$this->_traphandler_page->printTrapBody($this->_trap_path);
    				}
    				else {
    						// No handler, print 404 body as usual.
    						parent::printBody();
    				}
    
    		}//end printBody()
    
    	}//end class
    
    ?>

There's a single, quick'n'dirty custom attribute "trapconfig" which holds a sequence of "traproot=assetid;" (I was too lazy to work out how to have an array of attribute values :slight_smile: ). Example value would be "/doldex=799;".



Part of the complexity is allowing the asset that actually prints the page to provide a page name early enough in the overall printing process. The custom asset registered in "trapconfig" needs two public methods:

    trapPageName($trappath) -- returns string to be used as page name
    printTrapBody($trappath) -- prints the page body

where $trappath is the remains of the URI after traproot is removed. I use a separate print method rather than rely on printBody because in my case the custom asset doing the printing is also a page in its own right. More laziness :slight_smile:



cheers, martin