Robots.txt: SEO File Configuration

Definition

The robots.txt file is a text file placed at the root of a website that tells search engine bots which pages or sections they can or cannot crawl.

Robots.txt is a plain text configuration file located at the root of a domain (example.com/robots.txt) that communicates directives to search engine crawlers. It uses the REP (Robots Exclusion Protocol) to allow or disallow access to certain parts of the site via Allow and Disallow directives. This file is essential for managing crawl budget by preventing exploration of unnecessary pages (admin pages, filters, duplicates) and for protecting sensitive areas. Important: robots.txt blocks crawling but not necessarily indexing. A page blocked by robots.txt can still appear in results if other pages reference it. To prevent indexing, the meta noindex tag is more appropriate.

robots.txt file robots txt robots exclusion file robots exclusion protocol

Key Points

Robots.txt blocks crawling but not indexing: use noindex to prevent appearing in results
It must be placed at the domain root and be publicly accessible
An error in robots.txt can block the entire site: always test before deployment

Practical Examples

Blocking the admin area

A WordPress site adds 'Disallow: /wp-admin/' in its robots.txt to prevent Googlebot from crawling back-office pages, saving crawl budget for public pages.

Sitemap reference

By adding 'Sitemap: https://www.mysite.com/sitemap.xml' in the robots.txt, a webmaster ensures all crawlers easily discover the sitemap.

Frequently Asked Questions

How to create a robots.txt file?

Create a text file named 'robots.txt' at the root of your site. The basic syntax uses 'User-agent:' to target a specific bot (or * for all), 'Disallow:' to block access to a path, and 'Allow:' to authorize access. Also add the 'Sitemap:' line with your sitemap URL. Always test via the Search Console robots.txt testing tool before going live.

Does robots.txt truly prevent page indexing?

No, robots.txt prevents crawling but not necessarily indexing. If other sites link to a blocked page, Google may still index it with a generic title and description. To truly prevent indexing, use the meta 'noindex' tag or the X-Robots-Tag HTTP header.

Related Terms

Go Further with LemmiLink

Discover how LemmiLink can help you put these SEO concepts into practice.

LemmiLink, your technical SEO partner Technical configuration for site publishers Technical SEO audit for agencies