Using robot.txt as web security
Monday, February 11, 2008, 0:05
- Use the robots.txt file if possible
- ROBOTS META tag should be used if you can’t create the above file. It’s okay to use both methods if possible.
- If you know which robots you’re trying to prevent from indexing your pages, a particular search engine for example, go to the source of the robot and remove your page if possible. In other words, many search engines are providing ways for you to remove your URLs from their indexes without having to use any of the above methods.
- Make your page stand-alone if possible. Meaning, remove links to the page that you’re trying to keep away from robots. More links there are to your page the easier it is for a search engine robot to find your page. If your page is already in search engine indexes, it’s too late to take this preventative step.
- If you must have absolute protection from robots, password protect those pages in question. Since all other methods are “agreements” that both parties must acknowledge in order for them to work in full, preventing the page from being served is the only way to guarantee that robots will not be able to touch your pages.
- Robot files are good to use if some one restrict the access of certain files but a simple mistake can cost you a lot. If your site is business site and one put disallow certain directories wrongly, it means the search engines wont crawl your certain areas.
Related posts:
- Search engines are reluctant with risky results
- Robot learns to grasp everyday chores
- Dubious Search Engine Advice from McAfee & Infoworld
- Dubious Search Engine Advice from McAfee & Infoworld
- Digg Users Revolt Over AACS Key
- Remove search bar from IE 7
Related posts brought to you by Yet Another Related Posts Plugin.
You can follow any responses to this entry through the
RSS 2.0 feed.