Meta Robots Tags
Meta robots tags are HTML meta elements that allow website owners to give specific instructions to search engine crawlers for crawling and indexing individual pages. These tags serve as a direct communication interface between website and search engine and are an essential part of technical SEO.
How Meta Robots Tags Work
Meta robots tags are placed in the <head> section of an HTML page and give precise instructions to crawlers like Googlebot, Bingbot or other search engine bots:
Crawling Control: Determines whether a page should be crawled
Indexing Control: Controls whether a page is included in the search index
Link Following: Controls whether links on the page should be followed
Cache Control: Determines whether a page should be cached
The Most Important Meta Robots Directives
Indexing Directives
Directive
Function
Application
index
Page should be indexed
Default behavior, explicit confirmation
noindex
Page should NOT be indexed
Private pages, duplicate content, test pages
follow
Links on the page should be followed
Default behavior for internal linking
nofollow
Links should NOT be followed
User-generated content, paid links
Advanced Directives
Directive
Function
SEO Impact
noarchive
Prevents caching of the page
Protection from outdated content in SERPs
nosnippet
Prevents snippet display
Control over SERP presentation
noodp
Ignores ODP descriptions
Control over meta description sources
notranslate
Prevents automatic translation
Linguistic consistency
Practical Use Cases
1. Avoid Duplicate Content
Problem: Multiple URLs show identical content
Solution:
<meta name="robots" content="noindex, follow">
Application Examples:
- URL parameter variants
- Print versions of pages
- Sorted product lists
- Session-based URLs
2. Protect Private Areas
Use Cases:
- Login-protected areas
- Admin panels
- Development/test environments
- Internal documentation
Implementation:
<meta name="robots" content="noindex, nofollow">
3. Control User-Generated Content
Scenario: Comments, forums, user profiles
Strategy:
Strategy:
<meta name="robots" content="index, nofollow">
Advantages:
- Page is indexed
- User links are not followed
- Protection from spam backlinks
X-Robots-Tag: Server-Level Control
The X-Robots-Tag offers advanced possibilities for crawling control at server level:
HTTP Header Implementation
X-Robots-Tag: noindex, nofollow
X-Robots-Tag: noindex
X-Robots-Tag: nosnippet, noarchive
Advantages of X-Robots-Tag
File-wide: Works with PDFs, images, videos too
Server-Level: No HTML changes needed
Dynamic: Can be set based on conditions
Performance: Less HTML overhead
Practical Applications
Content Type
X-Robots-Tag
Reason
PDF Documents
noindex
Internal documents
Images (Thumbnails)
noindex
Avoid duplicate content
API Endpoints
noindex, nofollow
Technical URLs
Maintenance Pages
noindex, nofollow
Temporary content
Common Mistakes and Best Practices
❌ Common Mistakes
1. Contradictory Directives:
<!-- WRONG -->
<meta name="robots" content="index, noindex">
2. Forgotten Canonical Tags:
<!-- With noindex also set Canonical -->
<meta name="robots" content="noindex">
<link rel="canonical" href="https://example.com/canonical-page">
3. Robots.txt vs. Meta Robots Conflict:
- Robots.txt: "Disallow: /private/"
- Meta Robots: "index, follow"
- Result: Page is not crawled, but Meta Robots ignored
✅ Best Practices
1. Consistent Strategy:
- Robots.txt for directory-level control
- Meta Robots for page-level control
- X-Robots-Tag for file-level control
2. Testing and Monitoring:
- Use Google Search Console
- Regular indexing checks
- Analyze crawling logs
3. Documentation:
- Document all noindex pages
- Record reasons for decisions
- Conduct regular reviews
Monitoring and Analysis
Google Search Console
Important Reports:
- Index Coverage: Monitor indexed pages
- URL Inspection: Check individual pages
- Sitemaps: Monitor crawling status
Crawling Monitoring
Metric
Target
Tool
Indexing Rate
95%+ for important pages
GSC, Screaming Frog
Crawl Budget
Efficient usage
Server Logs, GSC
Duplicate Content
Minimization
Screaming Frog, Sistrix