Basic Robots.txt – An Introduction:
This page gives an basic introduction to basic robots.txt, its usefulness for SEO, Google rankings and how to construct one.
#SEOGuides #SEOStrategy #Robotstxt #Googlebot
Basic Robots.txt Outline:
- Robots.txt provides exclusion instructions to crawlers about where they cannot crawl.
- Google’s version of the robots.txt protocol also gives instructions about where Googlebot can crawl. Other search engines may or may not follow these instructions.
- Be aware that these Google instructions are advanced, and incorrect application may prevent crawling of pages your want crawled.
- It is a small text file which resides in the root of the domain.
- Robots.txt does not prevent indexing of a URL, only the crawling of its content.
- It also provides a place to specify URLs of XML Sitemaps.
- If you don’t want to set up a robots.xt, you don’t need to, but it is advised.
- Follow the brief instructions at robotstxt.org to set up an initial robots.txt.
- The basic setup is to not disallow any crawlers from crawling any of the site. (Yes, plenty of double-negatives).
- Robots.txt can block specific crawlers from accessing all or certain URLs, or:
- Block all crawlers from accessing all or certain URLs.
- Note that blocking a URL from being crawled by Googlebot and others will not prevent the URL from being indexed.
- Also note that crawlers work on an honesty basis – many of them will ignore instructions in robots.txt and crawl away to their heart’s content.
- The following examples are simple versions of robots.txt instructions. The most commonly used ones will be either allowing all bots free rein, or blocking all bots from certain sections:
Do not exclude any robots from visiting any part of the site: User-agent: * Disallow: Exclude "somebot" from visiting any part of the site: User-agent: somebot Disallow: / Exclude all robots from visiting the /keep-out/ section of the site: User-agent: * Disallow: /keep-out/ Exclude "somebot" from visiting the /keep-out/ section of the site: User-agent: somebot Disallow: /keep-out/ Declare the XML Sitemap (you can declare more than one): Sitemap: https://www.yourdomain.com/sitemap.xml