robots.txt, tests whether major bots can reach the page, checks for firewall blocking, and reports on LLM bot access.
What it shows
Robots.txt summary
The panel fetches and parses the site’srobots.txt file, then tests it against major search engine bots:
| Bot | Search engine |
|---|---|
| Googlebot | |
| Bingbot | Bing |
| Slurp | Yahoo |
| DuckDuckBot | DuckDuckGo |
| Baiduspider | Baidu |
| YandexBot | Yandex |
- Allowed or Blocked status
- The specific
robots.txtrules that apply - Which user-agent group matched
Firewall status
The panel tests whether bots are blocked at the server/firewall level — beforerobots.txt even applies. This catches cases where a WAF (Web Application Firewall) or CDN blocks crawler traffic entirely.
For each bot, you’ll see:
- HTTP status code returned
- Whether the request was allowed or blocked
- The blocking reason, if applicable
LLM bot access
A dedicated section tests whether AI and LLM crawlers can access the site:- GPTBot (OpenAI)
- Claude-Web (Anthropic)
- Other configured AI crawlers
robots.txt rules to control AI training and retrieval crawlers.
LLM bot rules are separate from search engine bot rules. A site can allow Googlebot while blocking GPTBot, or vice versa.
Export
Export the full crawling report — including all bot statuses, matched rules, and firewall results — as a CSV file.Common issues to look for
- Accidentally blocked bots —
Disallow: /applying to important crawlers - Overly broad rules — blocking entire directories that contain indexable content
- Firewall blocking crawlers — security rules that reject requests with bot user-agents
- Missing robots.txt — no file at all, which means all bots are allowed by default