Basic Robots.txt – An Introduction:

This page gives an basic introduction to basic robots.txt, its usefulness for SEO, Google rankings and how to construct one.

#SEOGuides #SEOStrategy #Robotstxt #Googlebot


Basic Robots.txt Outline:

Purpose:

Guidelines:

  1. If you don’t want to set up a robots.xt, you don’t need to, but it is advised.
  2. Follow the brief instructions at robotstxt.org to set up an initial robots.txt.
  3. The basic setup is to not disallow any crawlers from crawling any of the site. (Yes, plenty of double-negatives).
  4. Robots.txt can block specific crawlers from accessing all or certain URLs, or:
  5. Block all crawlers from accessing all or certain URLs.
  6. Note that blocking a URL from being crawled by Googlebot and others will not prevent the URL from being indexed.
  7. Also note that crawlers work on an honesty basis – many of them will ignore instructions in robots.txt and crawl away to their heart’s content.
  8. Don’t go crazy with blocking crawlers from accessing URLs. Google and others will complain in Google Search Console if you block access to JavaScript, CSS or other resources needed to load pages correctly, and this may impact your rankings.

Examples:

  • The following examples are simple versions of robots.txt instructions. The most commonly used ones will be either allowing all bots free rein, or blocking all bots from certain sections:
Do not exclude any robots from visiting any part of the site:
User-agent: * 
Disallow: 

Exclude "somebot" from visiting any part of the site:
User-agent: somebot
Disallow: /

Exclude all robots from visiting the /keep-out/ section of the site:
User-agent: *
Disallow: /keep-out/

Exclude "somebot" from visiting the /keep-out/ section of the site:
User-agent: somebot
Disallow: /keep-out/

Declare the XML Sitemap (you can declare more than one):
Sitemap: https://www.yourdomain.com/sitemap.xml

More info:

Return to Top