[Beginner] Making a robots.txt file


(Tavernerj) #1

This tutorial will teach you how to wack out a robots.txt file that you can use to both show all search engine spiders where you put your sitemap.xml file (thanx Duncan :wink: ) & stop spiders from indexing those directories that you don’t want them to.


1) Begin by making a new Design. Call it something like ‘Blank Design’ (this can be useful else where too).

Now add the following code to the parse file and save:


    


[b]2)[/b] Then create and standard page at the highest level (top) of your site (naming it something sensible, it doesn’t have to be robots.txt but be sure to change the web path to “robots.txt”). Spiders need to see the robots.txt file in the root of the site.

[b]3)[/b] Edit the ‘Page Contents’ screen and make sure that it is presenting [b]RAW HTML[/b]. Then add (as much as you want of) the following code (here I’m disallowing Google Images, Yahoo Media & psbot from indexing our site):

    Sitemap: http://www.yourdomain.com/sitemap.xml
User-agent: Googlebot-Image
Disallow: / 

User-agent: Yahoo-MMCrawler
Disallow: / 

User-agent: Yahoo-MMAudVid
Disallow: / 

User-agent: psbot
Disallow: /

User-agent: *
Disallow: /_admin</pre><br />

See: http://www.google.com/support/webmasters/bin/answer.py?answer=40360 & http://www.google.com/support/webmasters/bin/answer.py?answer=40360 for more info

[b]4)[/b] Make it all go live & then sit back and relax.

(Avi Miller) #2

An alternative method is to create a Text File asset in the root of your site called "robots.txt", set "Allow Unrestricted" to NO and make it a TYPE_2 link. This will tell Matrix to retain the friendly URL even when it is Live with Public Read permission. You can edit Text File assets in the Administration Interface.


(Duncan Robertson) #3

Nice one!


I put:


    
    


Just to make sure that I'm actually publishing a TXT file.