Sitemaps are a way for SEOs and webmasters to tell Google their site structure and the important URLs on their website.
In this article we will cover the two types of sitemaps, XML and HTML, and how each one works. We will also cover important considerations to avoid with sitemaps.
XML sitemaps are built using eXtensible Markup Language (XML). For lessons about XML and to learn how to use this well-formatted and structured language, check out the W3Schools.com tutorials.
XML sitemaps should be thought of as a guide to your website for search engines. They are a way for webmasters to organize the URLs on the website and prioritize them to the search engines. Often, the search crawlers use sitemaps as a way to discover fresh content.
Every XML sitemap begins with the following two lines, which declare the sitemap as being in XML format to the search crawlers:
<?xml version=”1.0″ encoding=”UTF-8″?>
< urlset xmlns=”http://www.sitemaps.org/schemas/sitemap/0.9“>
Every entry to an XML sitemap is formatted thus:
The sitemap ends with the following line:
Many tools exist to help create XML sitemaps for your site. The most popular web-based tools are:
XML sitemaps can also be created manually, but this is tedious for large sites and even small sites that publish semi-frequently. Therefore, it is recommended to use a content management system or platform that automatically generates your sitemap and pings the search engines.
For more information about correct syntax, check out sitemaps.org.
Declaring to Search Engines
After the XML sitemap has been created, it should be submitted to Google and Bing Webmaster Tools. The process for this is simple. Here is how to do it with Google:
First, log into Google Webmaster Tools.
Navigate to the Sitemaps section, as shown in the screenshot below:
You are then able to submit an XML sitemap using the button to the top right of the below screenshot. Once the sitemap has been submitted for a short amount of time, the graphs will show, telling you how many of your submitted URLs are indexed:
The sitemap location should also be added to the robots.txt file using the following line:
It is important to note that sitemaps are not required to be named “sitemap.xml”, as sometimes a site will have multiple sitemaps, as mentioned below.
Large sites must take into account that the search engines, and Google in particular, have a maximum size for sitemaps. According to this WebmasterWorld Forum post, the maximum size is:
- 50,000 URLs, and
- 10MB file after being unzipped.
Because of this, large sites will often contain a sitemap index file that contains links to all of the sitemaps. Others will have separate sitemaps for separate sections of the site.
HTML sitemaps are publicly facing, usually linked from the footer of the website, and are used as an alternative way for users to find the content on your website. HTML sitemaps are also another way to ensure that the search crawlers crawl and index as many of your URLs as possible.
Considerations To Avoid
According to an interview with Duane Forrester of Bing from September 2011, Bing can lose trust in sitemaps that have over 1% of “dirt.” Duane said:
“Your Sitemaps need to be clean. We have a 1% allowance for dirt in a Sitemap. Examples of dirt are if we click on a URL and we see a redirect, a 404 or a 500 code. If we see more than a 1% level of dirt, we begin losing trust in the Sitemap”.
A “dirty” URL in a sitemap can be any of the following:
- A URL that 301 redirects to another URL
- A URL that returns a 404 error
- A URL that returns a 500 “Server Not Found” error
The URLs listed in the sitemap should be the final URL only.