Using robots.txt to block spiders crawling your web site


'Robots.txt' is a plain text file that through its name has special meaning to most decent robots on the web. By defining a few rules in this text file instruct robots to not crawl and index certain files or directories within your site.

If you do not want Google to crawl your site's /pictures folder, you can protect this folder from Google's crawler.

The following gives a few examples how to write a robots.txt file. It has to be placed in the www root directory of your server. On Linux boxes, this is typically /var/www/html.

The following example shows several versions of robots.txt files, separated by a line.

; block Google's image crawler completely User-agent: Googlebot-Image Disallow: /
; block all spiders and bots from those 2 directories User-agent: * Disallow: /cgi-bin/ Disallow: /pictures/
; allow Googlebot to access everything except /cgi-bin ; and all other bots can access nothing ; finally allow ia_archive ( to access everything! User-agent: * Disallow: / User-agent: Googlebot Disallow: /cgi-bin/ User-agent: ia_archiver Allow: /

You are on page 1 of 2
2007-05-12, 20:14:37
anonymous from United States  
this is good stuff
2007-12-12, 07:45:50
anonymous from United Kingdom  
Helpful examples - thanks
2009-04-11, 20:28:43
anonymous from United States  
Clear, succinct. Thanks!
You are on page 1 of 2



