HomeField notesMethodology6 min read

AI Crawler Allowlist (GPTBot, ClaudeBot, PerplexityBot)

If your robots.txt blocks GPTBot, ClaudeBot, or PerplexityBot, you are invisible to every AI search engine. The first thing to check.

AskRanker research · published 2026-05-10 · updated 2026-05-10

BMethodology

The first thing to check on any AEO project is whether your robots.txt blocks the AI crawlers. A surprising share of sites have inadvertently disallowed GPTBot, ClaudeBot, or PerplexityBot during a privacy or compliance update, and the result is total invisibility in AI search. Before any content optimization, confirm that the crawlers can reach you.

The crawlers that matter in 2026

OpenAI's GPTBot crawls for ChatGPT's web search and the index that backs ChatGPT browsing. Anthropic's ClaudeBot does the same job for Claude. Perplexity uses two: PerplexityBot for indexing and Perplexity-User for live, on-demand fetches when a user runs a search that needs fresh content. Google's Google-Extended is a separate flag that controls whether Google can use your content to train AI features (distinct from the Googlebot that builds the search index). Common Crawl's CCBot feeds many open-source LLM training pipelines.

How to check your current state

Open https://yourdomain.com/robots.txt in any browser. Look for any User-agent: GPTBot, ClaudeBot, PerplexityBot, or * lines followed by a Disallow:. A blanket Disallow: / for any of them takes you out of that engine entirely. Less obvious failures: a Disallow: /blog/ that hides your highest-citation content, or a User-agent: * Disallow: / that the team forgot to revert after a staging deployment.

Recommended allow stance

For most B2B SaaS, the right move is to allow all major AI crawlers explicitly and disallow only specific surfaces (admin, internal tools, API endpoints). Add explicit User-agent: GPTBot, ClaudeBot, PerplexityBot, Perplexity-User, Google-Extended, CCBot blocks each with Allow: / and a sensible Disallow set. Explicit blocks are safer than relying on the wildcard, because they survive a careless edit to the User-agent: * section.

WAF and CDN gotchas

Robots.txt is advisory. The crawlers also have to actually receive a 200 from your origin. Cloudflare's bot-fight mode, AWS WAF rules, and a few CDN security templates have shipped with default rules that block AI crawlers' user-agent strings or rate-limit them aggressively. Check your edge logs for GPTBot, ClaudeBot, PerplexityBot, and Perplexity-User over the last 30 days. If you see lots of 403s or 429s, your WAF is rejecting them and your robots.txt is irrelevant.

How to verify after a change

After updating robots.txt and any WAF rules, ask each of the major engines a known buyer question that should reach a specific page on your site. Watch the server access log for fetches from the relevant user agents within minutes. Then re-run the question 24 hours later and check whether your URL has appeared in the citation list (Perplexity, Google AI Overviews) or your brand has appeared in the answer text (ChatGPT). If neither, the issue is content or chunking, not crawler access.

Related reading

Back to all entries

See what AI says about you, today.

Send your domain. We run 50 buyer questions in your category through ChatGPT, Claude, Gemini, and Perplexity, and email back the answer set, your mention rate, and the page edit that moves the needle.

4 models · 50 questions · 24-hour turnaround · no credit card