The robots.txt Audit: Are you Accidentally Blocking AI?

You've published great content. Maybe you've even set up llms.txt and per-page .md files. But none of that matters if your robots.txt is blocking the AI agents that want to consume it.

Many websites are invisible to AI search - not because their content is poor, but because their robots.txt rules inadvertently shut the door. This article walks you through auditing and fixing it.

The problem you don't know you have

robots.txt has been around since 1994. It tells web crawlers what they can and can't access. Most robots.txt files were written with Google in mind - and they work fine for traditional search.

But AI crawlers use different user-agent strings than search engine crawlers. A robots.txt that allows Googlebot may inadvertently block ClaudeBot, GPTBot, and PerplexityBot through broad disallow rules, default-deny configurations, or security plugins that block unrecognised bots.

Check your robots.txt right now: go to yoursite.com/robots.txt in your browser. If you see any of these patterns, you may have a problem:

User-agent: *
Disallow: /

That blocks everything - including all AI crawlers.

User-agent: *
Disallow: /
User-agent: Googlebot
Allow: /

That only allows Google. Every AI crawler is blocked.

Some CMS platforms and security plugins add these rules automatically. You might not even know they're there.

The AI crawlers you need to know

There are twelve major AI crawler families. Each serves a different purpose:

Crawler	Company	What it does
ClaudeBot	Anthropic	Powers Claude AI assistant
GPTBot	OpenAI	ChatGPT training and search
ChatGPT-User	OpenAI	ChatGPT browsing mode (when users ask it to browse)
PerplexityBot	Perplexity	Perplexity AI search engine
Google-Extended	Google	Gemini AI training
Googlebot	Google	Google Search (also feeds AI Overviews)
BingBot	Microsoft	Bing Search and Copilot
Applebot	Apple	Siri and Apple Intelligence
Meta-ExternalAgent	Meta	Meta AI assistant
Bytespider	ByteDance	TikTok and Douyin AI
CCBot	Common Crawl	Open dataset for AI training
Amazonbot	Amazon	Alexa and Amazon search

Not all of these crawlers are equal. Some drive AI search visibility (GPTBot, PerplexityBot, ClaudeBot). Some are primarily for training (CCBot, Google-Extended). Some do both (Googlebot feeds both traditional search and AI Overviews).

How to audit your robots.txt

Step 1: Read your current file. Navigate to yoursite.com/robots.txt and read what's there. Look for blanket disallow rules, and check whether AI-specific user agents are mentioned.

Step 2: Test specific agents. Google's robots.txt tester lets you test whether specific user agents can access specific URLs. Test with ClaudeBot, GPTBot, and PerplexityBot.

Step 3: Check server logs. If you have access to your server or CDN logs, search for AI crawler user agents. If you see requests being served with 403 (forbidden) status codes, your robots.txt or server configuration is blocking them.

Step 4: Check CMS settings. Some platforms manage robots.txt through their own interface:

HubSpot: Settings → Website → Pages → SEO → robots.txt
WordPress: Often managed by SEO plugins (Yoast, Rank Math)
Webflow: Site Settings → SEO → Robots.txt
Shopify: Managed via the robots.txt.liquid template

The recommended configuration

Add these rules to your robots.txt to explicitly allow AI crawlers:

# AI crawlers
User-agent: ClaudeBot
Allow: /

User-agent: GPTBot
Allow: /

User-agent: ChatGPT-User
Allow: /

User-agent: PerplexityBot
Allow: /

User-agent: Google-Extended
Allow: /

User-agent: Applebot-Extended
Allow: /

If your robots.txt has a blanket User-agent: * / Disallow: / rule, you'll need to either remove it or add explicit Allow rules for each AI crawler above it (specific user-agent rules take precedence over the wildcard).

Selective access: it's a legitimate choice

You don't have to allow everything. You might want:

Search bots allowed, training bots blocked - Allow GPTBot, PerplexityBot, ClaudeBot (they drive search citations) but block CCBot and Bytespider (primarily training crawlers)
All major AI bots allowed, except specific ones - Allow the major twelve but block crawlers from companies you don't want accessing your content
Specific paths blocked - Allow AI crawlers site-wide but block them from specific directories (e.g., /internal/ or /members/)

This is where robots.txt does well - it gives you per-bot, per-path control. But it's still binary: allow or block. For more nuanced governance, like allowing search citations but blocking training use on the same content, you need Content Signals.

The CMS complication

Some CMS platforms make robots.txt management harder than it should be:

HubSpot manages robots.txt through its own interface but doesn't always support custom user-agent rules in the way you'd expect. You may need to use the "Additional rules" field or edit the raw file.

WordPress with security plugins like Wordfence or Sucuri may add rules that block bots with unrecognised user agents. Check your security plugin settings alongside your robots.txt.

Webflow allows custom robots.txt editing in site settings, but changes only take effect on publish. Make sure to publish after editing.

Shopify uses a template-based approach (robots.txt.liquid) that requires some Liquid syntax knowledge to customise.

In all cases: after editing, verify by visiting yoursite.com/robots.txt in a browser and confirming your changes are live.

The quick win

This is the highest-impact, lowest-effort step in making your website AI-readable. Five minutes of editing your robots.txt can be the difference between being visible and being invisible to AI search.

If you're doing nothing else from this series, do this:

Go to yoursite.com/robots.txt
Check for blanket disallow rules
Add explicit Allow rules for the AI crawlers listed above
Verify the changes are live

Then move on to the bigger wins: llms.txt, per-page .md files, and AI crawl analytics to confirm bots are actually getting through.

This article is part of our series on making your website AI-readable. Next: Content structure for AI · Also in this series: What is markdown? · What is llms.txt? · Per-page .md files · Content Signals · How to track LLM indexing

Scroll to top

The robots.txt Audit: Are you Accidentally Blocking AI?

The problem you don't know you have

The AI crawlers you need to know

How to audit your robots.txt

The recommended configuration

Selective access: it's a legitimate choice

The CMS complication

The quick win

Recent posts like this

Is SEO Dead? How AI Search Is Changing Traffic (and Lead Quality)

AI Visibility in 2026: How to Get Cited, Mentioned, and Ranked

Who Offers Answer Engine Optimisation Services in South Africa?

10 Ways to Improve Your B2B Customer Engagement