Vcoderz Community
We create websites that have it all, beauty & brains
Lebanon Web Design & Development - Coddict
 

Go Back   Vcoderz Community > Computer Zone > Computers & Information Technologies

Notices

Computers & Information Technologies « Everything related to computers and internet. »

Reply
 
Share Thread Tools Search this Thread
Old 08-06-2007   #1
god
Registered Member
 
god's Avatar
 
Last Online: 02-14-2010
Join Date: Mar 2006
Posts: 846
Thanks: 71
Thanked 293 Times in 217 Posts
Groans: 0
Groaned at 0 Times in 0 Posts
Default The robots.txt file!

I thought i'd give you guys an idea about this file!
As we all know, the google crawler engine (the one that gets files into the google database, so you can search them) has access to pages in websites, that you don't have access to ! like a /admin/ folder that's .htaccess'd (password protected)
now website owners dont want people to know the content of these directories, so they .htaccess them (password protect them), but with a google search + google cache, u can still see the contents! so they should stop google from accessing these directories.. how ? in the site's main, they make a file called robots.txt
for example http://www.********.com/robots.txt
the robots.txt file looks like this:
Code:
# everything after a "#" is not taken into consideration
# you can write anything here ! 
# ammouna
User-agent: * <<< specifies which user agent should be allowed
Disallow: /cgi-bin/
Disallow: /tmp/
Disallow: /cache/
Disallow: /class/
Disallow: /images/
Disallow: /include/
Disallow: /install/
Disallow: /kernel/
Disallow: /language/
Disallow: /templates_c/
Disallow: /themes/
Disallow: /uploads/
so these directories wont be shown by the google search, and are secure!
Now how you can actually use this info ? it depends it can be quite useful, and sometimes meaningless!

the most famous robots.txt file on the net is ..... uh ... nsita lol ill update it later..

ya i found it here it is www.whitehouse.gov/robots.txt
check it out its ok to open it :P


__________________
--Capitalisation is the only difference between "I helped my uncle Jack off a horse" and "I helped my uncle jack off a horse" !!
http://img482.imageshack.us/img482/4889/hell7ta.jpg

Last edited by god; 08-06-2007 at 01:16 PM.
god is offline   Reply With Quote
The Following 2 Users Say Thank You to god For This Useful Post:
Justin (08-06-2007), Kingroudy (08-06-2007)
Old 08-06-2007   #2
OS7
Registered Member
 
OS7's Avatar
 
Last Online: 09-03-2009
Join Date: Mar 2006
Posts: 170
Thanks: 1
Thanked 36 Times in 20 Posts
Groans: 0
Groaned at 0 Times in 0 Posts
Default Re: The robots.txt file!

MAn what do u meen by this ma fhemet chi
Can you please tell us what do u meen directly? and thank you
__________________
Lebanese Army...Atyab Gech.
May GOD protect Our Officers and soldiers...
OS7 is offline   Reply With Quote
Old 08-06-2007   #3
SysTaMatIcS
Registered Member
 
SysTaMatIcS's Avatar
 
Last Online: 10-14-2022
Join Date: Dec 2006
Posts: 10,467
Thanks: 14,136
Thanked 4,244 Times in 2,547 Posts
Groans: 186
Groaned at 198 Times in 120 Posts
Default Re: The robots.txt file!

so wats the use of the robot.txt , we wont have access to the files , i dont get it
__________________
problems of performance appraisal is that it sucks to memorize them

Last edited by SysTaMatIcS; 08-06-2007 at 02:30 PM.
SysTaMatIcS is offline   Reply With Quote
Old 08-06-2007   #4
Krazy
Registered Member
 
Krazy's Avatar
 
Last Online: 12-06-2012
Join Date: Apr 2007
Posts: 188
Thanks: 107
Thanked 103 Times in 72 Posts
Groans: 0
Groaned at 1 Time in 1 Post
Default Re: The robots.txt file!

As I understood...
If you add it in your website, then google search won't access the files that are in the folders mentioned.

But I think we can access the pages if we directly try to access the folder.
So you'll know what they don't want google search access those folders.
__________________
Let us be modest, delicate and obedient in words, but firm, strict and daring in actions. (Simon Zavarian)
Krazy is offline   Reply With Quote
Old 08-06-2007   #5
Justin
Vcoderz Dj
 
Justin's Avatar
 
Last Online: 04-17-2018
Join Date: Dec 2005
Posts: 7,916
Thanks: 5,372
Thanked 3,557 Times in 2,050 Posts
Groans: 21
Groaned at 16 Times in 14 Posts
Default Re: The robots.txt file!

systa... its not .exe wtf?!

it's a .txt file... a notepad file.. let's call it.. gaining useful infos... & a big thk u for God for these amazing posts & ideas that without him we wouldn't learn them...
__________________
Music is what feelings sound like
Justin is offline   Reply With Quote
The Following User Says Thank You to Justin For This Useful Post:
SysTaMatIcS (08-06-2007)
Old 08-06-2007   #6
god
Registered Member
 
god's Avatar
 
Last Online: 02-14-2010
Join Date: Mar 2006
Posts: 846
Thanks: 71
Thanked 293 Times in 217 Posts
Groans: 0
Groaned at 0 Times in 0 Posts
Default Re: The robots.txt file!

Quote:
Originally Posted by systamatics View Post
so wats the use of the robot.exe , we wont have access to the files , i dont get it
in reply to that:
Quote:
Originally Posted by god View Post
Now how you can actually use this info ? it depends it can be quite useful, and sometimes meaningless!
Figure that out on yourself :P as i said it depends, if you're trying to find something in a website, this file might give u a better understanding of the structure of the site, and maybe tell you what CMS the website is using... look i cant tell u what to do letter by letter that would be illegal :P + i wont do it hehe
__________________
--Capitalisation is the only difference between "I helped my uncle Jack off a horse" and "I helped my uncle jack off a horse" !!
http://img482.imageshack.us/img482/4889/hell7ta.jpg
god is offline   Reply With Quote
Reply

  Vcoderz Community > Computer Zone > Computers & Information Technologies

Tags
file, robotstxt



Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)
 

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +1. The time now is 06:58 PM.


Lebanon web design and development
Powered by vBulletin® Version 3.8.3
Copyright ©2000 - 2025, Jelsoft Enterprises Ltd.
Ad Management plugin by RedTyger
Share