robots.txt Tester
Test robots.txt allow/block decisions for a given user-agent and URL. Build new robots.txt with AI crawler preset (GPTBot, CCBot, anthropic-ai, Google-Extended).
robots.txt
Check a URL
Parsed groups
*- allow
/ - disallow
/admin/ - disallow
/api/ - disallow
/private/
AhrefsBot- disallow
/admin/
Sitemaps
- https://example.com/sitemap.xml
About this tool
robots.txt is the text file every site puts at the
root to tell crawlers what they can and can't fetch. It's a
polite request — well-behaved bots honor it; abusive ones don't.
But for the bots that matter (Googlebot, Bingbot, Applebot, the
major AI training crawlers), robots.txt is the official way to opt
in or out.
Test mode takes an existing robots.txt body, a user-agent, and a URL path, and tells you whether the URL would be allowed or blocked. The decision follows Google's algorithm: pick the most specific User-agent group, then apply the longest matching rule (Allow wins on ties). The parsed groups and rules are shown so you can see exactly what robots.txt is doing.
Build mode generates a new robots.txt from form input — one or more groups of User-agent + Allow/Disallow rules, a Sitemap reference, and an optional AI crawler block preset that adds Disallow rules for the well-known AI training crawlers. The list is current as of 2026 and covers OpenAI, Anthropic, Common Crawl, Google's AI-extended flag, Perplexity, Apple's AI-extended flag, ByteDance, and Meta.
One important caveat: blocking AI crawlers via robots.txt only stops crawlers that respect robots.txt. The major commercial AI companies publicly commit to honoring their declared user-agents, but enforcement is on the honor system. For stricter control, combine robots.txt with Cloudflare's "AI Audit" or "Block AI Bots" rules, which enforce at the network edge.
Frequently asked questions
How does the tester decide allow vs block?
It implements Google’s robots.txt parsing algorithm: find the most specific User-agent group whose token appears in the requested user-agent string, then within that group apply the longest matching pattern (Allow wins on tie). This matches Googlebot’s behaviour for the cases that matter.
What does the AI crawler preset block?
GPTBot (OpenAI), ChatGPT-User, OAI-SearchBot, CCBot (Common Crawl, used by many AI training datasets), anthropic-ai and ClaudeBot (Anthropic), Claude-Web, Google-Extended (Google’s flag for AI training opt-out, separate from Googlebot), PerplexityBot, Applebot-Extended, meta-externalagent (Meta), Bytespider (ByteDance), Diffbot, and Omgilibot. This is the current set of well-known AI training crawlers.
Will blocking AI crawlers actually stop AI training?
For crawlers that respect robots.txt — yes. For crawlers that don’t — no. robots.txt is a polite request, not enforcement. Most reputable AI companies do honor it (OpenAI, Anthropic, Google all publicly committed to respecting their declared user-agents). For stricter control, combine robots.txt blocks with server-side user-agent rejection and a Cloudflare AI bot block rule.
What pattern syntax is supported?
* matches any sequence of characters. $ at the end anchors to end-of-URL. So `Disallow: /*.pdf$` blocks all PDFs but not PDF-named directories. Other characters are matched literally. Patterns are anchored to the start of the path implicitly.
Why does Disallow: /admin/ not block /admin (without trailing slash)?
Because robots.txt patterns are literal prefix matches. /admin/ does not match the URL /admin. To block both, use either Disallow: /admin (no slash, matches /admin and /admin/...) or write two rules.