I already wanted to talk about it and the time has come, since this file mostly goes unnoticed and is generated automatically with a plugin or module and is never touched again. And the truth is, it has many uses that I want to tell you about throughout this article.
In this article, I want to expose some aspects that we find in our daily life with Sitemaps and the solutions that we choose according to the project, the volume and the state of it.
What is the sitemap?
The sitemap is a file used to tell any search engine (in this case, Google) the URLs of a web project, so that bots can more effectively track that project. Moreover, the sitemap also guarantees that the bots will collect this information before, the use they make of it already depends on other factors which I will tell you throughout this article.
Formats in which we can generate a sitemap:
- XML: This is the most used and the one I recommend you use. Most plugins, modules and extensions used by content managers such as WordPress, Prestashop or magento use this format.
- RSS: If you have a generated feed that automates the upload of new content, you can include it as a sitemap, but be careful because most feeds miss a lot of old pages that weren’t auto-generated.
- Text document: You can also include .txt files for generating your sitemap. Of course, you must include one URL per line.
- Google sites: Another way to create your Sitemap that I do not recommend, but that Google authorizes its use, here I leave you all the information: https://support.google.com/webmasters/answer/183668?hl=es&ref_topic=4581190#sitemapformat
You have to keep in mind that in most cases it is not available, it is the following aspects:
- Do not include in sitemap urls with Noindex
- Do not include URLs that do not respond to a 200 code in sitemap URLs
- Do not include NON-canonical URLs in the sitemap
THESE ARE SITEMAP’S 3 GOLDEN RELAYS 😉
Here I leave you a success story, so you can use the insights and see how good use of sitemaps serves to improve crawling, indexing, and increase traffic: https://moz.com/blog/multiple-xml-sitemaps-increased-indexation-and-traffic
Errors when generating sitemaps
Over the past few years, checking websites and working with different projects, I have come across everything with the topic of Sitemaps, but what stands out most above all is this:
- Include URLs that match 301 codes.
- Include URLs that meet 404.
- Include URLs whose canonical points to another URL.
- Include URLs blocked by Robots.txt (this is the best xD).
To verify that none of this is happening to you in a project is very simple, you only need the site map of the project and Howling frog (If you are not using this tool yet, here is a Howling Frog Guide). I explain the process in several steps:
PASO 1: Download the Sitemap file to be able to work with the document.
PASO 2: Start Screaming Frog >> Mode >> List >> Download List >> From File >> select xml Sitemap. With this, you will be able to upload your sitemap to analyze it in depth and remove any errors that it may generate.
PASO 3: Identify errors and generate a correct sitemap. With this you will get a significant improvement in tracking. Depending on the state of your sitemap, this improved tracking can cause your project to start improving positions.
When should you perform this check?
In general, I’ll list some situations where this verification is very important, as well as generating a new sitemap that allows the google crawler to move smarter on your site:
- If you have implemented the infamous HTTPS on your page then this is a crucial time to check your sitemap and you will see how much 3xx you will find.
- If you’ve recently migrated or made URL changes. There you will find surprises like: 301 and 404 xD.
- If you really like playing with Noindex or if you are using a plugin to generate your sitemap, you will surely find some urls with noindex that you include in the sitemap.
- If you really like using “canonical”, you will surely find some nasty surprises on your sitemap.
2 Advanced uses of the sitemap
The sitemap has different uses. Here I will explain to you in which situations I use them and the reason for each of these actions that I perform:
1. Speed up the deindexing of a large number of URLs thanks to the Sitemap
We start with the first common scenario! We have a number of unnecessary URLs that we want to deindex for some reason (I don’t want to go into detail, but that would be forever, in the following articles we will explain why we often need to deindex URLs). Imagine there are hundreds or thousands. You can’t expect Google to review each and every one of them based on their crawl frequency.
- To speed up this process of deindexing a large number of URLs, we just need to generate a sitemap containing all the URLs we already have with noindex and upload them to Search Console. For this, I asked my colleague Julio to download a tool to generate free sitemaps that you can find here, since Screaming Frog and other tools have problems with this type of URL.
- After a considerable amount of time, we just take all of those URLs and verify that they have been deindexed using URL profiler (I will explain this tool later). All you need to do is insert all the URLs and select the “Google Indexing” option.
- Once they’re deindexed, we remove the Search Console sitemap.
Here is a series of Tweets where Gary Illyes appears to comment on the question:
@nishanthstephen usually anything you put in a sitemap will be fetched sooner
– Gary Illyes (method) October 13, 2015
2. Create a sitemap to remove URLs faster
This scenario appears in many e-commerce! Imagine that you have an ecommerce and you are working with product seasons and suddenly you have to eliminate different categories and products for different reasons. Be careful when this happens, there are several options:
- Check that no URL contains authoritative external links.
- Check the organic traffic of these URLs, because if I have any URLs with traffic, I wouldn’t delete them in any way.
- Check that there are no similar products, because if there was and we had traffic to these URLs, we could run a 301.
- If you’ve already decided to eliminate these URLs because they don’t have traffic or external links that provide authority, those URLs will no longer exist. You just need to make these URLs return a code 410.
- Create a sitemap with all URLs that meet 410. Remember you can create it with this free tool.
- After a considerable amount of time, we just take all of these URLs and verify that they have been deindexed using URL Profiler. All you need to do is insert all the URLs and select the “Google Indexing” option.
- Once they’re deindexed, we remove the Search Console sitemap.
Opinions and impressions of sitemaps – [VÍDEO]
The Sitemap file – My results
As you have seen throughout the article, the sitemap has multiple uses and you can get a lot out of it. To this day, I am convinced that it does not take full advantage of it and that is why I wanted to show that I use it with my colleagues in my everyday life.
If you check the sitemap every now and then, pay attention to it and use the advanced aspects that I have shown you, it will surely be a very productive tool for you and depending on the case it will help you to improve the exploration of your page.
Now i would like to know the uses you give to the sitemap and thus be able to get more ideas to be able to use this file more efficiently and get the most out of it. Do you dare to comment on your uses and your impressions?