Google Search Console 7. The robots file
Robots.txt
A ‘Robots.txt’ file is the old way ro tell a search‐engine where to crawl. It does this by telling where not to crawl. Nowadays, ‘robots.txt’ is less important than a sitemap. Sitemaps tell bots where they can crawl. However, this difference means a ‘robots.txt’ file still has a use. Worth knowing…
(for Google) Robots.txt overrides a sitemap
This works for Google, so is not guaranteed for other search engines. ‘robots.txt’ overrides sitemaps. This means you can make sitemaps generic. They can cover every page on site. Then make specific exclusions using ‘robots.txt’. This is a nice way to work.
Note that, if you do not want a page listed, best way is to add the ‘noindex’ meta‐tag. But that may be difficult in an abstracted site construction. This method can be thought of as a ‘soft’ exclusion i.e. ‘We prefer not to list these pages’.
Robots.txt can’t be updated
If you change ‘robots.txt’, you’ll need to request a full index, then wait.
Next
Refs
Google Help on robots files,
https://developers.google.com/search/docs/crawling-indexing/robots/create-robots-txt
Google Robots testing tool (requires console registration),