Google Search Console 7. The robots file

Robert Crowther Sep 2022
Last Modified: Feb 2023

PrevNext

Robots.txt

A ‘Robots.txt’ file is the old way ro tell a search‐engine where to crawl. It does this by telling where not to crawl. Nowadays, ‘robots.txt’ is less important than a sitemap. Sitemaps tell bots where they can crawl. However, this difference means a ‘robots.txt’ file still has a use. Worth knowing…

(for Google) Robots.txt overrides a sitemap

This works for Google, so is not guaranteed for other search engines. ‘robots.txt’ overrides sitemaps. This means you can make sitemaps generic. They can cover every page on site. Then make specific exclusions using ‘robots.txt’. This is a nice way to work.

Note that, if you do not want a page listed, best way is to add the ‘noindex’ meta‐tag. But that may be difficult in an abstracted site construction. This method can be thought of as a ‘soft’ exclusion i.e. ‘We prefer not to list these pages’.

Robots.txt can’t be updated

If you change ‘robots.txt’, you’ll need to request a full index, then wait.

Next

Sitemaps

Refs

Google Help on robots files,

https://developers.google.com/search/docs/crawling-indexing/robots/create-robots-txt

Google Robots testing tool (requires console registration),

https://www.google.com/webmasters/tools/robots-testing-tool