Bot file - Fundamentals and Best Practices 2025
What is a robots.txt file?
The robots.txt file is an important technical SEO element that allows website operators to tell search engine crawlers which areas of a website may be crawled and which may not. It serves as a "house rules" for web crawlers and is a central component of technical SEO.
Basic Functions
The robots.txt file fulfills several important functions:
- Bot visit Control: Determines which directories and files may be crawled
- Crawl Budget Optimization: Directs crawlers to important content
- Server Relief: Prevents unnecessary crawling requests
- Sitemap Reference: Shows crawlers the location of the XML sitemap
Robots.txt Syntax and Structure
Basic Syntax
The robots.txt file follows a simple but precise syntax:
User-agent: [Crawler-Name]
Disallow: [Forbidden Path]
Allow: [Allowed Path]
Crawl-delay: [Seconds]
Sitemap: [Sitemap-URL]
User-Agent Directives
The User-Agent directive specifies which crawler the rules apply to:
Disallow Directives
Disallow directives define which paths should not be crawled:
Disallow: /- Blocks the entire websiteDisallow: /admin/- Blocks the admin directoryDisallow: *.pdf- Blocks all PDF filesDisallow: /private/- Blocks the private folder
Allow Directives
Allow directives override Disallow rules:
Allow: /public/- Allows crawling of the public folderAllow: /important-page.html- Allows specific page
Best Practices for Robots.txt
1. File Placement
The robots.txt file must be placed in the root directory of the domain:
- ✅
https://example.com/robots.txt - ❌
https://example.com/subfolder/robots.txt
2. File Size and Format
3. Crawl-Delay Optimization
Crawl-Delay directives help with server relief:
User-agent: *
Crawl-delay: 1
Recommended Values:
- Small websites: 0-1 seconds
- Large websites: 1-2 seconds
- E-Commerce: 2-5 seconds
4. Sitemap Integration
Always reference the XML sitemap in robots.txt:
Sitemap: https://example.com/sitemap.xml
Sitemap: https://example.com/sitemap-images.xml
Sitemap: https://example.com/sitemap-news.xml
Common Robots.txt Weaknesses
1. Syntax Errors
2. Logical Errors
Problem: Contradictory Rules
User-agent: *
Disallow: /admin/
Allow: /admin/public/
Solution: More specific rules first
User-agent: *
Allow: /admin/public/
Disallow: /admin/
3. Excessive Restrictions
Avoid:
User-agent: *
Disallow: /
Better:
User-agent: *
Disallow: /private/
Disallow: /admin/
Disallow: /temp/
Robots.txt for Different Website Types
E-Commerce Websites
User-agent: *
Allow: /products/
Allow: /categories/
Disallow: /checkout/
Disallow: /cart/
Disallow: /user/
Disallow: /admin/
Disallow: /search?*
Disallow: /filter?*
Sitemap: https://shop.example.com/sitemap.xml
Blog Websites
User-agent: *
Allow: /posts/
Allow: /categories/
Allow: /tags/
Disallow: /wp-admin/
Disallow: /wp-includes/
Disallow: /wp-content/plugins/
Disallow: /?s=
Disallow: /search/
Sitemap: https://blog.example.com/sitemap.xml
Corporate Websites
User-agent: *
Allow: /about/
Allow: /services/
Allow: /contact/
Disallow: /internal/
Disallow: /drafts/
Disallow: /test/
Sitemap: https://company.example.com/sitemap.xml
Testing and Validation
1. Google Search Console
Google Search Console offers an integrated testing tool:
- Robots.txt Tester access
- Test URL enter
- Crawling Status check
- Identify errors and fix them
2. Online Validation Tools
Recommended Tools:
- Google Search Console Robots.txt Tester
- Screaming Frog SEO Spider
- Ryte Website Checker
- SEMrush Site Audit
3. Manual Tests
Test Checklist:
- [ ] File is accessible under
/robots.txt - [ ] Syntax is correct
- [ ] No 404 errors
- [ ] Sitemap URLs work
- [ ] Crawl-delay is appropriate
Advanced Robots.txt Techniques
1. Wildcard Usage
User-agent: *
Disallow: /private*
Disallow: /*.pdf$
Disallow: /temp/
2. Specific Crawler Rules
User-agent: Googlebot
Allow: /important-content/
Disallow: /admin/
User-agent: Bingbot
Crawl-delay: 2
Disallow: /admin/
3. Sitemap Index Integration
Sitemap: https://example.com/sitemap-index.xml
Sitemap: https://example.com/sitemap-pages.xml
Sitemap: https://example.com/sitemap-images.xml
Sitemap: https://example.com/sitemap-news.xml
Monitoring and Maintenance
1. Regular Review
Weekly Tasks:
- Check crawling errors in GSC
- Evaluate new directories for blocking needs
- Validate sitemap URLs
Monthly Reviews:
- Complete robots.txt analysis
- Crawl budget optimization
- Measure performance impact
2. Change Management
When making website changes:
- Evaluate new directories
- Update robots.txt
- Perform testing
- Inform GSC about changes