robots.txt

robots.txt is a plain-text file placed at the root of a domain (e.g., example.com/robots.txt) that tells compliant web crawlers which parts of the site they may or may not access. It uses Robots Exclusion Protocol directives such as User-agent, Disallow, and Allow. Blocking sections via robots.txt conserves crawl budget for important pages, but blocked pages cannot be indexed even if they have [[backlinks|backlinks]] — making over-blocking a common SEO mistake. The file should also declare the location of the [[sitemap-domain|XML sitemap]] for crawler discovery.

Example

Adding 'Disallow: /admin/' to robots.txt prevents Google from crawling internal admin pages, while 'Sitemap: https://example.com/sitemap.xml' advertises the sitemap location to all crawlers.