I thought i'd give you guys an idea about this file!
As we all know, the google crawler engine (the one that gets files into the google database, so you can search them) has access to pages in websites, that you don't have access to ! like a /admin/ folder that's .htaccess'd (password protected)
now website owners dont want people to know the content of these directories, so they .htaccess them (password protect them), but with a google search + google cache, u can still see the contents! so they should stop google from accessing these directories.. how ? in the site's main, they make a file called robots.txt
for example
http://www.********.com/robots.txt
the robots.txt file looks like this:
Code:
# everything after a "#" is not taken into consideration
# you can write anything here !
# ammouna
User-agent: * <<< specifies which user agent should be allowed
Disallow: /cgi-bin/
Disallow: /tmp/
Disallow: /cache/
Disallow: /class/
Disallow: /images/
Disallow: /include/
Disallow: /install/
Disallow: /kernel/
Disallow: /language/
Disallow: /templates_c/
Disallow: /themes/
Disallow: /uploads/
so these directories wont be shown by the google search, and are secure!
Now how you can actually use this info ? it depends

it can be quite useful, and sometimes meaningless!
the most famous robots.txt file on the net is ..... uh ... nsita lol ill update it later..
ya i found it

here it is
www.whitehouse.gov/robots.txt
check it out its ok to open it :P