Google Search Console 8. Sitemaps

Robert Crowther Sep 2022
Last Modified: Mar 2023

Prev

A sitemap gives a list of URLs and some directions to search‐engines about what should be crawled on a website. My experience is that a sitemap is very effective. I will not talk here about how to create a sitemap, but will explain how they work with the console.

Google console and sitemaps

Add a sitemap

The Console will tell you if the sitemap read was successful or not. That happens almost immediately. If anything is wrong it is near‐always a problem with XML layout or site access, both of which are easy to cross‐check.

Remove a sitemap

In the console, hidden. You need to click on the sitemap to view detail, then look for the top right ‘three dots’ menu button, where you will find ‘remove sitemap’.

To assure no more crawling of a sitemap, remove from the server. Removing from the console will only clean the display. Reputedly, Google will store the address, then try again.

On use of the sitemaps tool, delayed updates

For addition of a sitemap, I find the Console… not misleading… but not obvious. The not‐obvious issue is that after a successful sitemap registration, the Console often shows no URLs have been discovered. I’ve not tested enough to know if this issue happens for single sitemaps, but I use sitemaps that point to other sitemaps—and on registering they never seem to display found URLs. The ‘discovered URLs’ column shows ‘0’. The Console has only checked the map can be visited and is in correct format. I’ve found reading the information can take four days or so. As soon as that happens, you’ll see the ‘discovered URLs’ column jump to the number of URLs listed in the sitemap.

There is a second issue. A successful sitemap read, and the following recognition of URLs, does not mean those URLs will be indexed. That will take more time. My experience: can take six days between sitemap submission to Google’s attempt to index. Inexplicably, the console ‘Pages’ view will then show that URLs have been indexed by Google already, back to the day the sitemap was read. And the displays show the sitemap read date as the indexing date. From the dates, the suggestion is that Google indexes in chunks of URLs, but the Console only displays when the job is complete. Sorry, but I don’t know if that is true or not—I seem to have evidence both ways.

Google’s handling of sitemaps

Googlebots like sitemaps

Google do not say they favour sitemaps, but my experience is that discovery and indexing can be vastly better with them.

Google will not revisit sitemaps often

Google says they will revisit a sitemap periodically. My experience (only rough) is that the time period may be every three months. Not often at all.

Despite sitemap visit gaps, new material will appear

Without visiting a sitemap, Googlebots may crawl for pages every week. If you create new material, it may seem like a step forward to request indexing, or ping for a sitemap read (see below). However I’ve found, sitemap or not, Google crawls for new material.

Update sitemaps, not replace

If you are adding new sitemap data, update the existing sitemap, not replace with another. It’s easier for Google.

Google skips known information

Old material in sitemaps will be skipped. However, every so often, Googlebots will check older pages, perhaps for accessibility? So I’ve never concerned myself.

The ‘last‐modified’ attribute

Google say they do not regard most of the ’attributes’ in the sitemap specification. The coders say the attributes have for them returned poor‐quality information. However, the coders except the ‘last‐modified’ attribute—say their process will consider this.

My experience is not sure. Regardless of sitemaps, Google seems good at discovering new material. So much is this true, I have a suspicion that Google has a preference for sites that update and/or post new material i.e. appear to be active. However, I seem to get better results when ‘last modified’ attributes are used. With these attributes, Googlebots will not necessarily visit modified/new pages. Neither will they disregard older pages. However the Google process seems to show a keener interest in a site, and perhaps visit more actively. I have no no substantial evidence, but nowadays often add the attribute.

Sitemap pings

As noted above, when a sitemap is added or updated, it can take a long time before any search‐engine visits. This time can be shortened by telling engines you have a new sitemap or data. This idea was used by all search engines, but at the time of writing (2022) there is a new initiative. At the time of writing, these engines could be pinged,

Google

https://www.google.com/ping?sitemap=fullURLToSitemap

Yahoo

http://siteexplorer.search.yahoo.com/submit

Bing

(no ping)

Yandex

(no ping)

There are also sites that claim to ping everything for you. On Google, I’ve found Googlebots visit quickly, sometimes within a day or so for a sitemap read. That doesn’t mean crawling or indexing will be updated, though. That could take much more time.

Index Now

Index Now is a Yandex/Bing initiative to deliver all‐engine indexing. At the time of writing (2022), brand new. Requires, like Google Console, validation by a file on site.

Refs

Google, build a sitemap,

https://developers.google.com/search/docs/advanced/sitemaps/build-sitemap

Sitemap specification,

https://www.sitemaps.org/protocol.html