Crawl budget: why are your pages not indexed by Google?  |

Crawl budget: why are your pages not indexed by Google? |

The crawl budget designates the number of pages limit that the Google robot (Googlebot) will index on your site. John Mueller, Senior Webmaster Trends Analyst at Google, gives us the reasons for a bad crawl budget.

In one Google SEO office hours

(via Hangout), we asked John mueller, Senior Webmaster Trends Analyst at Google, why Google only indexed very few pages on certain websites. In other words, why some websites had a bad crawl budget.

In this article, we will come back to Mueller’s explanations about the factors that influence the number of pages indexed on a site… and why certain pages are not indexed by Google.

What is the crawl budget or crawl budget?

The crawl budget (or exploration budget in English) corresponds to attention level than Google’s crawler (Googlebot) grants to your website. It translates to allocated resources by the Googlebot to explore the pages of your site and frequency of these explorations. Your crawl budget thus determines the number of pages limit that the Google robot will crawl on your site.

But because the Web is vast, Google’s strategy is toindex only the best quality web pages and not to index poor quality web pages.

According to Google’s developer page for very large websites (with millions of web pages): “Not everything crawled (or crawled) on your site will necessarily be indexed; each page should be rated, consolidated, and estimated to determine if it will be indexed after being crawled. ”

What determines the crawl budget of a website?

According to John Mueller, there are two main factors that determine a website’s crawl budget: server response time which hosts the site and quality of site content.

1- The response time of the server

According to Mueller, the server response time is one of the main factors that influence the crawl budget of a site. If the server hosting your site is very slow, the number of pages Google indexes will inevitably be affected. You can see your server response time in the Exploration statistics report from Google.

The response time is different from the loading speed of a page. It allows Google to crawl as many pages as possible on your site. Mueller recommends, on average, a server response time of below 300 to 400 milliseconds.

A website hosted on a shared server may have trouble delivering pages to Google quickly enough. Particularly because other sites on the same server are using excessive resources. This slows down the server for the thousands of other sites hosted on this server. Host your site on a dedicated server is therefore a good way to optimize your crawl budget.

2- The quality of the site’s content

According to John Mueller, poor quality of content can also prevent the GoogleBot crawler from crawling a website.

“The other main reason why a website is not crawled enough is that we are not convinced that overall quality. “, he said.

It is not enough to create a website with a million pages and put it online to have a good SEO right away. Google will not index your pages until it is sure of the quality of the content on those pages.

“We’ll be a little more careful about exploring and indexing them until we’re sure the quality is really good. “, says Mueller.

So think about optimize content of each of the pages of your website in order to increase their indexing change by Google.

Other factors that affect the number of pages crawled by Google

In addition to server response time and site quality, there are other factors that can also affect your website’s crawl budget:

  • The depth of the page: Google also takes the depth of your pages into account when determining your crawl budget. The depth of a page refers to the number of clicks required to reach that page from the home page of the site. The more “distant” a page, the less likely it is to be crawled by Google.
  • The frequency of updates: Google’s robot will crawl your website more often if you regularly feed it with new content.
  • Malicious robots: Another reason for a bad crawl budget is that your server is overloaded with malicious bots, which slows down the website.

In summary…

As recommended John mueller, you must make sure that the server that hosts your site provides the web pages to a good cadence (less than 300 to 400 milliseconds). Be sure to do this check outside of the night hours. Indeed, many crawlers like Google crawl the sites early in the morning. Because there are generally fewer visitors to the sites at this time.

Also make sure that each of your web pages is good quality and optimized for SEO. Finally, if you have the means, prefer a dedicated server has a shared server for hosting your website.

Improve the content of your website to better position yourself on Google, what do you think? Quickly discover our article on a proven SEO technique: content pruning !

Source link

Leave a Reply

%d bloggers like this: