View Single Post
Old 08-06-2007   #1
god
Registered Member
 
god's Avatar
 
Last Online: 02-14-2010
Join Date: Mar 2006
Posts: 846
Thanks: 71
Thanked 293 Times in 217 Posts
Groans: 0
Groaned at 0 Times in 0 Posts
Default The robots.txt file!

I thought i'd give you guys an idea about this file!
As we all know, the google crawler engine (the one that gets files into the google database, so you can search them) has access to pages in websites, that you don't have access to ! like a /admin/ folder that's .htaccess'd (password protected)
now website owners dont want people to know the content of these directories, so they .htaccess them (password protect them), but with a google search + google cache, u can still see the contents! so they should stop google from accessing these directories.. how ? in the site's main, they make a file called robots.txt
for example http://www.********.com/robots.txt
the robots.txt file looks like this:
Code:
# everything after a "#" is not taken into consideration
# you can write anything here ! 
# ammouna
User-agent: * <<< specifies which user agent should be allowed
Disallow: /cgi-bin/
Disallow: /tmp/
Disallow: /cache/
Disallow: /class/
Disallow: /images/
Disallow: /include/
Disallow: /install/
Disallow: /kernel/
Disallow: /language/
Disallow: /templates_c/
Disallow: /themes/
Disallow: /uploads/
so these directories wont be shown by the google search, and are secure!
Now how you can actually use this info ? it depends it can be quite useful, and sometimes meaningless!

the most famous robots.txt file on the net is ..... uh ... nsita lol ill update it later..

ya i found it here it is www.whitehouse.gov/robots.txt
check it out its ok to open it :P
__________________
--Capitalisation is the only difference between "I helped my uncle Jack off a horse" and "I helped my uncle jack off a horse" !!
http://img482.imageshack.us/img482/4889/hell7ta.jpg

Last edited by god; 08-06-2007 at 01:16 PM.
god is offline   Reply With Quote
The Following 2 Users Say Thank You to god For This Useful Post:
Justin (08-06-2007), Kingroudy (08-06-2007)