Google Search Console 6. Unindexed Links
Failure to index
The issues with failure to index can be split into,
Pages have not been discovered at all
I own sites where less than one‐fifth of pages are listed
Pages have been discovered, but not indexed
…and the pages will seem to be a selection with no common issue
If links have not been discovered at all, the site probably needs a sitemap. But if they have been listed as ‘unindexed’, read down.
What is indexing?
Indexed means Google has seen a link, and thinks it is fit to display in a Google search.
Bear in mind, when Google indexes a link, it means they regard it as in a useful format and a lead to good material. It does not mean the link will be listed—depends on which search terms Google thinks the link may be relevant for, other links your link may be mixed with, etc.
Unindexed links can not be fixed with sitemaps
It’s worth noting that indexing problems, far as I have found, can not be fixed using sitemaps. Google persists the information that a link has failed. For performance that makes sense—it reduces the load on Google robots and a website.
The inspection report
A link may be indexed, but that does not mean Google is satisfied with the link, only that it has enough information to classify the link. For example, Google has indexed pages for me, but reported errors in structural data.
For some reason, these errors are not highlighted. Perhaps because Google feels the assessment of ‘to index or not’ is the key information? Anyway, it is worth reading the inspection reports, meaning reading to the bottom of the report, and clicking on warning or error links. There may be information worth knowing.
How much should this worry you?
If you are an SEO person, you may want to see every site page indexed. But many sites do not need mass coverage. Some sites are resources, where users approach and browse, others are portals. Only the main links need to be listed.
Although the console doesn’t highlight this at all, there’s a split between ‘Pages Google has discovered/crawled but is having difficulties with’ and ‘Pages Google don’t think worthy of indexing’. On the whole, I’m ok with Googlebots deciding if pages are worth indexing, and how. And I also don’t worry about sometimes seeing ‘loose link’ or 404 errors. I’m not happy with pages that are blocked because they are ‘Duplicates without user‐selected canonical’. That can cause masses of otherwise good pages to be unindexed.
How to fix link issues
First: Google is braced against multiple submissions/requests for link inspection. Console users are allowed I think about twelve URL index requests a day. Also, Google say running the URL inspection tool multiple times for the same URL will have no effect. You can understand why, but I suppose some people may try, or do that by accident.
There are two ways to fix indexing issues,
URL inspection tool
The obvious way to fix link issues is with the URL inspection tool. If you want a link to be re‐assessed, click on the affected link. That will lead to links that let you have the URL inspected then, if you ask, submitted for indexing.
Especially if you do a ‘live test’ on an inspect, Google will visit that page. However, this is not the same as a crawl. At the time of an ‘inspect’, Google will gather/refresh data. That said, if the page is mobile‐usable, you can see it appear as mobile‐usable often within a day. What happens after an ‘inspect’ seems to be that the status of an inspected page is noted as ‘discovered’. But the URL is not immediately recategorised—I presume because Google is waiting on a crawl to decide ‘error or index’. Still, if something has been changed, for example a fix applied to a page, the inspect tool is a way to inform Google.
The behaviour of inspects
The ‘inspection’ tool seems to work like the sitemap tool. Submit some URLs for inspection, then wait. If indexing succeeds, the ‘Pages’ display then seems to backdate the indexing to when the requests were submitted! I talk a little more about this on the sitemaps page. After an ‘inspect’ submission, Google will take 3–4 days to display results.
‘Validate Fix’ buttons
Another way to get links reassessed is to press these buttons. Here is the only help I can find about them. I assume they run a specific test on a specific area of issues e.g. for ‘Not Found (404)’ errors, the bots test if the gathered URLS can now be found. I also assume that if the tests pass, that means the links will then be submitted for indexing.
Note that, as far as I know, this effect may be different to the ‘inspect’ tool—the pages are not necessarily recrawled. Maybe only the specific error is checked.
The behaviour of validations
On the main ‘pages’ display under the ‘Validation’ column, these trigger a ‘Started’ display. The validation proceeds in full, reporting nothing, and not shifting URLs progressively. Error or not, and so far as I know regardless of URL count, I’ve found a validation will take about a month (that said, I did one fix on mobile usability and the validation cleared in a day. Was I lucky? Do Google favour mobile validation fixes? I don’t know).
When a validation is run, the bots will not do a tolerant sweep of the URLs. If they detect an error they will stop. This is not clear in the console, which will continue to display ‘pending’ and ‘failed’ lists. I’ve had validations block on URLs which displayed no errors at all. Sometimes Google has congratulated me on my site’s click rates, while deprecating the page indexing. It’s frustrating and unhelpful. Enough to push you back to the inspection tool—I have no idea why Google is stalling.
Inspect, validate or both?
There are times you may want to try validation, and times you prefer inspection. If you have a site with many ‘Not Found (404)’ errors, and the errors are for different reasons, validation is not as interactive as the ‘URL Inspection Tool’. On the other hand, for example, if you had a problem with many ‘Duplicates without user‐selected canonical’ issues, so installed meta‐data or a sitemap, then a validation of affected URLs will address all the new data. Also, a validation will get you past the multiple submissions cap, and likely helps Google too.
As for running both… if your site had a communication error, and you fixed it, then run both. The ‘inspect’ is is an update to Google information, though if that goes beyond establishing mobile usability, I do not know. And the validation will check the error block generally. They have different actions so, if applicable, run both.
However, running both is not always the answer. In particular, if you have rogue URLs, or a ‘Crawled ‐ currently not indexed’ error block, a validation risks a fail at affected URLs. Maybe best to update data using inspects. On the other hand, if the pages are known to be good, and there is no new information, it may be better to let a validation have it’s way.
Issues reported with links, with notes
These are the most common issue reports (there are more, see the reference for Google Help, which lists them all),
Not Found (404)
The first question, which somewhere Google gently suggests… is the site ok? Did a bot visit and find the site down? I would add—is the URL ok? I often find entries here are for URLs that are incorrect anyway—loose links, or mangled sitemap entries, deprecated URLs etc. So check.
I find Google is usually correct about unvisitable/unavailable URLs. But say you identified this issue, and fixed it. Click on the URL, it will lead to an inspection and the opportunity to index.
In‐site loose links
A note on these: these are links in general text that go nowhere. They may not be broken explicitly—they may be wrongly formed e.g. misspelled, or be auto‐generated for material not created. I find Googlebots amazingly quick at finding these. I’ve had error reports back within a week of a new posting. This kind of error is easy to fix, and will not in general affect ranking. But I say, because you could be surprised.
Duplicates without user‐selected canonical
Google makes a palaver of this error. I can understand why—as a search engine, Google wants the cleanest URL in it’s search‐results. Preferably with no redirections, a decided (‘canonical’) form, no duplication, and so forth.
I’ve found that if I add a sitemap, this pleases Google. Google help says that the bots will assume sitemap URLs are canonical. Google Help also recommends adding a ‘canonical’ meta tag to the webpage. I find this not‐DRY, but near‐every major site uses these meta‐tags.
A sitemap will establish canonical URLs for new material, but for existing material, the issues are deeper.
Google’s choices of canonical
In the case of the ‘duplicate’ error, efforts to fix can result in Google choosing a canonical URL, but for a URL you do not want. The ‘Duplicate, Google chose different canonical than user’ errors. These issues fall into the ‘Googlebots and site not talking’ category. They are difficult to fix, and usually require server access.
Anyway, there’s posts on the web about this. Those posts usually replay Google‐derived advice, which may be all you need. Or you could read the article on this site about Google canonical links and fixes, which goes into detail.
Page with redirect
Comes from Google attempting to find the best of equivalent URLs. Probably don’t need to worry any URL listed here.
Blocked by robots.txt
This error report is obvious, yes? Though it does highlight that Google will honour ‘robots.txt’ files when they block a URL. However, one point, to remove a ‘robot.txt’ block the ‘robot’ file must be altered, then either wait for Googlebots to find this, or issue a ping. No way to fix this through the console.
Also, if you want to block a URL, it will take weeks to do that through a ‘robots.txt’ file. Better to use, and Google recommend, a ‘noindex’ meta statement. If you need an immediate block, use the console ‘Remove URL’ tool.
Discovered, currently not indexed
This is different to the previous listing, because it’s not an error. Google say you can wait until they return to these URLs. For example, the Googlebot may have retreated to avoid overloading a website. But if a URL hangs here and the validation status is ‘Not Started’, you could try running the ‘inspect’ tool. I have no convincing results for this.
Crawled, not indexed
Much the same as ‘Discovered, currently not indexed’. However, this time the Googlebot gathered data, so not much to worry about. Wait.
Next
Refs
Google’s intro for developers to crawling and indexing. Unhelpfully, a brief overview,
https://developers.google.com/search/docs/fundamentals/how-search-works
Google help on reported link status,
https://support.google.com/webmasters/answer/7440203#valid-status