How GPTBot Actually Crawls E-commerce Sites — Traffic Data, Server Logs, and What It Means for Your Store

Q: Is Cloudflare blocking AI bots by default?

Since July 2025, Cloudflare defaults to blocking AI bots. Check Cloudflare dashboard Security Bots to verify your settings.

April 1, 2026

GPTBot's crawl traffic grew 305% from May 2024 to May 2025, jumping from the #9 crawler to #3 (Cloudflare, 2025). It now accounts for 11.7% of all AI crawler traffic. But here's the part that trips people up: GPTBot is one of three OpenAI crawlers, and each one behaves differently. Knowing which does what is the line between showing up in ChatGPT and being completely absent from it.

Three crawlers, three purposes

OpenAI runs three separate user agents, each with its own robots.txt behavior (official docs):

GPTBot - crawls content for model training. Respects robots.txt. Block it and your content stays out of future training data.
OAI-SearchBot - powers ChatGPT Search and Shopping results. Respects robots.txt. This is the one you actually want open for product visibility.
ChatGPT-User - fires when a user asks ChatGPT to browse a specific URL. Behaves like a browser, not a traditional crawler. Does not follow robots.txt the same way.

Each setting is independent, and that's the leverage point. You can block GPTBot (no training) while allowing OAI-SearchBot (yes to search visibility). Get this wrong and you pay for it: Amazon blocked all three OpenAI crawlers and made 600 million product listings invisible to ChatGPT Shopping (Roketto, 2026). That is not a setting you want to fat-finger.

The robots.txt I'd ship for e-commerce:

User-agent: GPTBot
Disallow: /       # Block training

User-agent: OAI-SearchBot
Allow: /          # Allow search/shopping

User-agent: ChatGPT-User
Allow: /          # Allow user-triggered browsing

And verify bots by IP, not just the user-agent string - that string is trivial to spoof. OpenAI publishes IP lists at openai.com/gptbot.json, openai.com/searchbot.json, and openai.com/chatgpt-user.json.

GPTBot does not execute JavaScript - and that changes everything

Pre-render.io analyzed over 500 million GPTBot requests and found zero evidence of JavaScript execution (Prerender.io). GPTBot sends an HTTP request, downloads the raw HTML, and moves on. It does not wait for React components to mount, API calls to resolve, or lazy-loaded content to appear. It is not patient. It does not come back later for the part that loaded slowly.

This is a different animal from Googlebot, which runs a headless Chrome engine that executes JavaScript. A React-based Shopify Hydrogen storefront can rank fine on Google while being a blank page to GPTBot, ClaudeBot, and PerplexityBot.

Test it yourself, right now: disable JavaScript in your browser and load your key product pages. If descriptions, pricing, reviews, or FAQ content vanishes - that's what GPTBot sees too. An empty page.

What GPTBot reads from your raw HTML:

Semantic HTML: headings (h1-h6), paragraphs, lists, tables
JSON-LD structured data in the <head> - Product, FAQPage, Organization schemas
Meta tags: title, description, canonical
Plain text in the document body

What it ignores completely:

JavaScript-rendered content (React, Vue, Angular SPAs)
Content inside iframes
Images (no OCR during crawling)
Content behind login walls or cookie banners that block the DOM
Lazy-loaded sections, infinite scroll, AJAX-loaded content

Critical, and I'll keep repeating it: your JSON-LD Product schema must be server-rendered, not JavaScript-injected. If the schema only appears after JS runs, GPTBot never sees it - you built it for nobody. Microsoft confirmed Bing uses schema.org markup for Copilot integration. Perplexity does the same. And mismatches between schema and visible HTML trip deception flags, so don't try to be clever there.

Crawl patterns from real server logs

A 48-day server log study (Feb-Mar 2026) showed how GPTBot actually behaves in the wild:

Burst crawling: GPTBot was completely absent for weeks, then fired 187 requests in a single week - 152 of them in a 3-minute burst. It does not crawl continuously the way Googlebot does.
Activation pattern: GPTBot seems to switch on for a site once that site's content gains traction inside OpenAI's ecosystem.
Sitemap consumption: GPTBot and ClaudeBot both started consuming sitemaps in March 2026 for the first time.
robots.txt ignored: the study found GPTBot never checked robots.txt before crawling. Meta-WebIndexer did the same.

For scale, over the same 48-day window the study logged ChatGPT-User at 923 requests (user-triggered), OAI-SearchBot at 330 (search), and GPTBot at 187 (training). The user-triggered crawler is the busy one - keep that in mind when you decide what to block.

The crawl-to-referral problem

Here's the uncomfortable number. OpenAI's crawl-to-referral ratio is 1,700:1 - they crawl 1,700 pages for every 1 click they send back to publishers (Cloudflare, June 2025). Anthropic's is 73,000:1. Google's, for context, is 14:1. This is not a balanced trade by default.

It gets better in your vertical, though. In Computer & Electronics specifically the ratios tighten: OpenAI at 401:1 and Perplexity at 88:1 (Cloudflare industry breakdown). E-commerce sites get more back from AI crawling than the web average - which is exactly why blanket-blocking is usually the wrong call here.

One data point on the cost side: the Read the Docs project found blocking AI crawlers cut their traffic 75% - 800GB down to 200GB daily - saving roughly $1,500/month in bandwidth. For a smaller store the bandwidth hit is far lower, but it's worth watching rather than guessing.

ChatGPT Shopping: why OAI-SearchBot is the crawler that matters

ChatGPT Shopping processes 50 million shopping queries daily - roughly 2% of ChatGPT's 2.5 billion daily prompts (DataSlayer, 2026). The recommendations include images, prices, reviews, and direct purchase links, with no ads - OpenAI states they're based on relevance, not paid placement. While that's true, it's the channel to win.

For your products to show up in ChatGPT Shopping:

Allow OAI-SearchBot in robots.txt - this is the front door, everything else is moot if it's shut
Ship complete AI-ready product pages with full Product schema (JSON-LD) - name, description, image, brand, SKU, price, availability, GTIN/MPN identifiers
Add Review schema - aggregate ratings, review count, review snippets
Add Offer schema - price, currency, availability, merchant info
Server-render all of it - JSON-LD in raw HTML, not JS-injected (yes, again)

OpenAI also accepts product feeds - structured files (CSV, TSV, XML, or JSON) following their Product Feed Specification. You can submit catalogs with pricing, availability, media assets, and flags like enable_search and enable_checkout.

Shopify merchant onboarding is underway, with early partners including Glossier, SKIMS, Spanx, and Vuori. OpenAI charges a 4% transaction fee on Instant Checkout purchases, on top of standard Stripe processing - price that in before you decide it's free money.

What Cloudflare's data says about blocking trends

GPTBot is the most blocked AI crawler - 312 domains block it outright - but also the most explicitly allowed, with 61 domains granting access (Cloudflare). Between July 2025 and January 2026, sites actively blocking AI crawlers outnumbered those blocking Googlebot by 7:1. The web is making a decision about this, fast.

Important, and most people miss this one: Cloudflare defaulted to blocking AI bots in July 2025. Plenty of sites now block GPTBot, ClaudeBot, and PerplexityBot without the owner ever knowing. Go check your own settings under Security → Bots before you assume you're visible. I've found this exact misconfiguration on stores that swore they were open.

GEOlikeaPro's Crawler View simulates exactly what GPTBot, PerplexityBot, and ClaudeBot see when they hit your pages - and what they miss. See where you stand.

FAQ

What's the difference between GPTBot and OAI-SearchBot?

GPTBot crawls content for AI model training. OAI-SearchBot powers ChatGPT Search and Shopping results. You can block GPTBot (prevent training) while allowing OAI-SearchBot (keep search visibility). Each respects robots.txt independently.

Does GPTBot execute JavaScript?

No. Pre-render.io analyzed 500 million+ GPTBot requests and found zero evidence of JavaScript execution. GPTBot downloads raw HTML only. If your product content is rendered client-side (React, Vue, Angular), GPTBot sees an empty page.

How often does GPTBot crawl my site?

Not continuously. A 48-day server log study found GPTBot was absent for weeks, then executed 152 requests in a 3-minute burst. It appears to activate when a site's content gains traction in OpenAI's ecosystem, unlike Googlebot which crawls on a regular schedule.

Should I block or allow GPTBot?

Block GPTBot (training) but allow OAI-SearchBot (search/shopping) and ChatGPT-User (user browsing). This keeps your products visible in ChatGPT without contributing content to model training. Amazon blocked all three and made 600M listings invisible.

How do I get my products into ChatGPT Shopping?

Allow OAI-SearchBot in robots.txt, implement complete Product + Review + Offer schema in server-rendered JSON-LD, and optionally submit a product feed to OpenAI. ChatGPT Shopping processes 50 million queries daily with no paid placement — visibility is based on relevance and data quality.

Is Cloudflare blocking AI bots by default?

Since July 2025, Cloudflare defaults to blocking AI bots. Your site may be blocking GPTBot, OAI-SearchBot, and others without your knowledge. Check Cloudflare dashboard → Security → Bots to verify your settings.