Basic XML Sitemaps – An Introduction
Basic XML Sitemaps
- Basic XML Sitemaps give search engines and other crawlers an easy route to identify all content to crawl on a website.
- Use them to alert engines to new, or modified content, and also start to give search engines an understanding of the priority of pages.
- Also use them in conjunction with Google Search Console (and other webmaster consoles) to assist in identifying crawling and indexation issues, such as 404s, 301s, etc.
- Using XML Sitemaps gives Google a better chance of finding, crawling and indexing your content.
- Create an XML Sitemap, using the XML Sitemaps protocol.
- Many CMSs can do this “out of the box”, but make sure you check.
- Many CMSs (like WordPress) have plugins which will create sitemaps for you.
- You can also use online software or desktop software, such as Screaming Frog, to crawl your website.
- Note that these services will only include URLs which are reachable from another link on the website; the free versions are usually limited to a small number of URLs; and they may not be as flexible as you would like.
- The required tags are:
- <urlset> – enclosing tags for the whole XML Sitemap. This is standard and should not be changed.
- <url> – enclosing tags for each URL
- <loc> – the tag which actually contains the URL.
- The optional tags are:
- <lastmod> – the last modified date of the URL using YYYY-MM-DD
- <changefreq> – how frequently the URL is updated
- <priority> – the priority of the URL relative to other URLs on the site
- Note that Google and others pay very limited, if any, attention to the <changefreq> and <priority> tags.
- Include all URLs you would like crawled and indexed, including rich media such as image, PDF and movies files. Google’s XML Sitemap guide has information on additional tags which can be used.
- The maximum number of URLs per sitemap is 50,000 with an uncompressed file size of 10MB. If you pass these limits you will need to create multiple sitemaps.
- Update it each time a page is modified or added.
- Host your Sitemap in the root folder of your domain, eg: https://www.example.com/sitemap.xml.
- Declare the address of the XML Sitemap in your properly set up robots.txt file.
- Submit it to Google Search Console.
- Wait for Google to process and crawl the URLs – usually 1-3 days – and check GSC for errors and issues.
- Google will re-check the sitemap for updates, but many CMSs and plugins will automatically ping Google to alert it to an update.
- It is possible to maintain many XML Sitemaps. These can be identified separately, or be enclosed in a Sitemap Index file. XML Sitemaps you might consider include:
- Key Pages – the most important pages on the site.
- New Pages – the newest pages on the site. This depends on a site’s update frequency and volume of content.
- Key Sections – these are a useful way to check important sections are wholly indexed.
- Redirects – very useful to signify to engines that these URLs have ben 301 redirected.
An example XML Sitemap would look like this:
<?xml version="1.0" encoding="UTF-8"?> <urlset xmlns="https://www.sitemaps.org/schemas/sitemap/0.9"> <url> <loc>http://www.example.com/</loc> <lastmod>2005-01-01</lastmod> <changefreq>monthly</changefreq> <priority>0.8</priority> </url> </urlset>