Basic Robots.txt – An Introduction:
This page gives an basic introduction to basic robots.txt, its usefulness for SEO, Google rankings and how to construct one.
#SEOGuides #SEOStrategy #Robotstxt #Googlebot
Basic Robots.txt Outline:
Purpose:
- Robots.txt provides exclusion instructions to crawlers about where they cannot crawl.
- Google’s version of the robots.txt protocol also gives instructions about where Googlebot can crawl. Other search engines may or may not follow these instructions.
- Be aware that these Google instructions are advanced, and incorrect application may prevent crawling of pages your want crawled.
- It is a small text file which resides in the root of the domain.
- Robots.txt does not prevent indexing of a URL, only the crawling of its content.
- It also provides a place to specify URLs of XML Sitemaps.
Guidelines:
- If you don’t want to set up a robots.xt, you don’t need to, but it is advised.
- Follow the brief instructions at robotstxt.org to set up an initial robots.txt.
- The basic setup is to not disallow any crawlers from crawling any of the site. (Yes, plenty of double-negatives).
- Robots.txt can block specific crawlers from accessing all or certain URLs, or:
- Block all crawlers from accessing all or certain URLs.
- Note that blocking a URL from being crawled by Googlebot and others will not prevent the URL from being indexed.
- Also note that crawlers work on an honesty basis – many of them will ignore instructions in robots.txt and crawl away to their heart’s content.
- Don’t go crazy with blocking crawlers from accessing URLs. Google and others will complain in Google Search Console if you block access to JavaScript, CSS or other resources needed to load pages correctly, and this may impact your rankings.
Examples:
- The following examples are simple versions of robots.txt instructions. The most commonly used ones will be either allowing all bots free rein, or blocking all bots from certain sections:
Do not exclude any robots from visiting any part of the site: User-agent: * Disallow: Exclude "somebot" from visiting any part of the site: User-agent: somebot Disallow: / Exclude all robots from visiting the /keep-out/ section of the site: User-agent: * Disallow: /keep-out/ Exclude "somebot" from visiting the /keep-out/ section of the site: User-agent: somebot Disallow: /keep-out/ Declare the XML Sitemap (you can declare more than one): Sitemap: https://www.yourdomain.com/sitemap.xml