Cloudflare Just Split AI Bots Into Search, Agent, and Training. Check One Setting Before September 15.
On July 1 Cloudflare shipped its biggest AI bot update since it started blocking AI crawlers by default. Every AI bot on its network now sorts into three categories - Search, Agent, and Training - with a separate allow/block control for each. Free plan included.
This is the third time I'm writing about Cloudflare settings that decide whether AI engines can see your store. The default-blocking problem and the Bot Fight Mode / AI Crawl Control / Labyrinth configuration guide still apply. This update changes the layer above them: how Cloudflare classifies AI traffic, and what happens if you touch nothing.
Stay in the loop
Get news and updates about GEO, AI search and new features. Unsubscribe anytime.
Is your brand a Ghost or a Guide on AI?
See if AI knows your brand. We ask Gemini and Claude live - in ~5 seconds, no signup.
What I can say, the new controls are good. One new default is dangerous for stores, and it activates September 15, 2026.
Cloudflare's three AI bot categories: Search, Agent, Training
Cloudflare's announcement defines the categories by behavior, not by company:
| Category | What it covers | Example bots | Meaning for for e-commerce |
|---|---|---|---|
| Search | Crawlers that index your content to answer questions about it later. Referral traffic expected in return. | OAI-SearchBot, Claude-SearchBot, PerplexityBot | Allow. This is where citations come from. |
| Agent | Automated activity acting in real time on a person's behalf - chat fetch bots, browser-use agents. | ChatGPT-User, Claude-User, agents driving Chrome | Allow. A human with intent is waiting on the other end. |
| Training | Crawlers that take your content to train or fine-tune a model. | GPTBot, ClaudeBot, Bytespider | Your call. Blocking does not touch live citations. |
If this split looks familiar, it should. Search crawler vs training crawler vs live user fetch is exactly the mental model I pushed in the configuration guide. Cloudflare just turned it into a product with three switches.
Each switch has three positions: block on all pages, block only on pages that display ads, or do not block (changelog). And unlike most Cloudflare bot tooling, this lands on every plan, including Free.
The September 15 default blocks AI agents on pages with ads
Here's the part to act on. Starting September 15, 2026, Cloudflare blocks the Training and Agent categories by default on pages that display ads. Search stays allowed.
The application of synthetic intelligence (AI) in the native advertising system has brought changes that are revolutionary in themselves in terms of content production, user targeting, an increase in bidding, and in estimating revenue.
The new defaults apply to new Cloudflare customers, new sites added by existing customers, and - this is the one everybody will miss - all existing Free plan customers. Most small stores sit on the Free plan. Your existing zone gets the new defaults, not just somebody's future zone.
Cloudflare's logic is coherent: an ad on the page means the page is monetized by human attention, so bots that consume the page without delivering a human get stopped (The Register). For a publisher that logic works. For a store it needs one correction.
"Cloudflare has just issued the AI industry a new deadline to separate the web crawlers used for traditional search purposes, like Google Search, from those used for AI agents and training."
Agent traffic is a customer, not a scraper
When a shopper asks ChatGPT "does this store ship to Munich?" and ChatGPT fetches your shipping page to answer, that fetch is ChatGPT-User. Agent category. There is a human with buying intent waiting on the other end of that request.
Block it and the answer becomes "I couldn't access the site" - and the engine routes the shopper to a competitor it can read. Same mechanics for Claude-User, Perplexity-User, and the browser-use agents that power agentic checkout.
Yes, the default only fires on pages that display ads. Clean product pages without ad units keep serving agents. But do you actually know which of your pages carry ad code? AdSense on the blog, a partner widget in the footer, a sponsored slot in the size guide - every one of those pages goes dark to agents on September 15 if you leave the default alone. And blog posts are exactly where your how-to citations come from.
How to configure the new AI traffic controls: recommended settings
- Open AI Crawl Control in the Cloudflare dashboard (Security > Bots). The three category presets live there on every plan, Free included.
- Search: allow. Cloudflare keeps this allowed by default - leave it. This is where OAI-SearchBot, Claude-SearchBot, and PerplexityBot build the indexes your citations come from.
- Agent: do not block. Override the incoming default. Live fetches and agentic checkout are revenue traffic, not scraping. This is the one setting this post exists for.
- Training: your call. Blocking training crawlers does not touch live citations - a BuzzStream study found 88.2% of sites blocking GPTBot still appear in AI answers. My robots.txt recommendation from the configuration guide stands.
- Audit which pages serve ads. If you use the "block only on pages that display ads" position anywhere, list your ad-carrying pages first. Grep your theme for adsbygoogle and your ad partner tags. Most merchants I ask cannot name these pages in one try.
- Re-check robots.txt. The category switches sit on top of the per-crawler rules from my configuration guide. Keep both layers aligned so you don't allow a bot in one and block it in the other.
- Verify after September 15. Ask ChatGPT a question that forces a live fetch of your store, then confirm the request shows as allowed in the AI Crawl Control activity log.
Mixed-purpose AI crawlers now inherit their strictest treatment
Also effective September 15: a crawler that combines Search with Training behavior gets allowed or blocked according to all of its behaviors, not its friendliest label. Block Training, and a combined search-and-training crawler is blocked too.
Watch your crawler list that week. A bot you allowed as "search" can start getting blocked because it also trains, and your citation traffic quietly drops with it. Check BotBase or your AI Crawl Control log for reclassifications before you blame your content.
BotBase and Attribution Business Insights: measuring what AI crawlers give back
Bot Management customers get two new views (changelog): BotBase, a searchable directory of every bot Cloudflare tracks with its place in the new taxonomy, and Attribution Business Insights, a dashboard with a site-wide crawl-to-referral ratio and the split of AI bot vs organic traffic.
Crawl-to-referral ratio is the number I keep saying nobody at the infrastructure layer was measuring - the AI attribution problem. Cloudflare measuring it is a real step. The catch: it sits in the paid Bot Management tier, which most merchants will never buy.
Pay Per Crawl becomes Pay Per Use
One year after launching Pay Per Crawl, Cloudflare now says charging per fetch was the wrong unit. The model is evolving into Pay Per Use: publishers get compensated when their content actually appears inside an AI answer, not every time a crawler touches a page (PPC Land, TechCrunch).
Store is not a publisher. Do not put your catalog behind a toll. For e-commerce the citation IS the compensation - it walks a buyer to your checkout. Charging engines to read your product pages is charging them to recommend you.
Your Cloudflare AI bot checklist before September 15
- Find the three new category presets in AI Crawl Control - they are live now, ahead of the defaults
- Set Agent to "do not block" so live AI answers and agentic checkout keep working
- Keep Search allowed; decide Training deliberately instead of inheriting a default
- Inventory which pages serve ad units - the new defaults are scoped to exactly those pages
- Re-test your store in ChatGPT and Perplexity in the week after September 15
GEOlikeaPro's Crawler View shows which AI crawlers reach your pages and which get stopped at the CDN layer - before and after the defaults flip. Sign up free and see where you stand.
FAQ
What are Cloudflare's new Search, Agent, and Training bot categories?
Announced July 1, 2026, Cloudflare now classifies every AI bot by behavior: Search (indexes content to answer questions later, e.g. OAI-SearchBot), Agent (real-time fetches on a person's behalf, e.g. ChatGPT-User and browser-use agents), and Training (takes content to train models, e.g. GPTBot). Each category gets its own allow/block control on every plan, including Free.
Will Cloudflare's September 15 default block ChatGPT from my site?
Partially, if you do nothing. From September 15, 2026 the Agent and Training categories are blocked by default on pages that display ads, while Search stays allowed. ChatGPT-User live fetches fall in the Agent category, so any ad-carrying page becomes unreadable for real-time AI answers. Set Agent to 'do not block' in AI Crawl Control to keep live fetches working.
Do the new Cloudflare AI bot defaults apply to existing sites?
Yes, for Free plan zones. The September 15 defaults apply to new Cloudflare customers, new sites added by existing customers, and all existing Free plan customers. Paid-plan zones keep their current settings, but most small stores are on the Free plan - check yours.
Should I block the Training category in Cloudflare's new AI controls?
It's a defensible choice. Training crawlers like GPTBot don't power live citations - a BuzzStream study found 88.2% of sites blocking GPTBot still appear in AI answers. The tradeoff is possible underrepresentation in future models. What you should not block is the Agent category: those are real-time fetches with a shopper waiting on the other end.
What are BotBase and Attribution Business Insights?
Two new Cloudflare Bot Management views shipped July 1, 2026. BotBase is a searchable directory of every bot Cloudflare tracks, with each bot's classification in the Search/Agent/Training taxonomy. Attribution Business Insights is a dashboard showing a site-wide crawl-to-referral ratio and the distribution of AI bot vs organic traffic - infrastructure-level AI attribution data.
What happened to Cloudflare Pay Per Crawl?
It's evolving into Pay Per Use. One year after launching per-fetch charging, Cloudflare says the fetch is the wrong unit and is moving to compensate publishers when content actually appears inside an AI answer. For e-commerce stores my advice is unchanged: don't put your catalog behind a toll - the citation itself is the compensation because it delivers a buyer.