OAI-SearchBot vs GPTBot vs ChatGPT-User — What Each OpenAI Crawler Does
OpenAI runs three separate crawlers: GPTBot, OAI-SearchBot, and ChatGPT-User. Here's the mistake I see constantly: someone configuring robots.txt blocks all three in one line and locks themselves out of ChatGPT Shopping without ever knowing it happened. So let me break down what each crawler actually does, how they're verified, and the config I'd run for e-commerce.
The three crawlers at a glance
| Crawler | Purpose | Frequency | Drives traffic? |
|---|---|---|---|
| GPTBot | Training foundation models | Async, continuous | No |
| OAI-SearchBot | Index for ChatGPT search & citations | Async, regular | Yes - citations in ChatGPT search |
| ChatGPT-User | Live user-triggered fetches | Real-time per query | Yes - live answers |
Source: OpenAI Crawlers documentation.
Stay in the loop
Get news and updates about GEO, AI search and new features. Unsubscribe anytime.
Is your brand a Ghost or a Guide on AI?
See if AI knows your brand. We ask Gemini and Claude live - in ~5 seconds, no signup.
GPTBot - the training crawler
GPTBot is OpenAI's crawler for model training. It collects publicly available data to improve future versions of GPT models. User agent: GPTBot/1.1.
Key characteristics:
- Operates offline and asynchronously - not tied to live user activity
- Does not drive referral traffic to your site
- Content scraped by GPTBot may show up in future model responses with no attribution back to you
- Respects robots.txt - blocking it prevents training use
GPTBot has the worst crawl-to-referral ratio of any AI crawler - effectively infinite, because it never sends a referral. Plenty of publishers block it on bandwidth and content-rights grounds. For e-commerce the call is even simpler: blocking GPTBot doesn't cost you a single sale, because it was never sending you traffic in the first place.
OAI-SearchBot - the search indexer
OAI-SearchBot powers ChatGPT's live search - inline citations, product recommendations, real-time answers. User agent: OAI-SearchBot/1.0.
Key characteristics:
- Asynchronous crawler that builds the ChatGPT search index
- Augments data from Bing and other sources
- Blocking it removes your pages from ChatGPT search results
- Respects robots.txt
When a ChatGPT user asks "best running shoes under $150," OAI-SearchBot's index decides which products are even candidates. Block it and you're not ranked low in ChatGPT Shopping - you're not in the room at all.
ChatGPT-User - the live fetcher
ChatGPT-User fires when a user asks ChatGPT (or a Custom GPT) to fetch a specific URL or browse a site in real time. User agent: ChatGPT-User/1.0.
Key characteristics:
- Not a traditional crawler - behaves more like a browser agent
- Triggered per user query, not on a schedule
- Used when the user explicitly asks for web content
- Respects robots.txt directives targeting ChatGPT-User
Block ChatGPT-User and the user who asks ChatGPT to "check availability on yourstore.com" gets an error instead of your live data. If you want to support agentic shopping workflows at all, keep this one open - that's the whole point of it.
robots.txt configuration for e-commerce
The setup I'd run: block training, allow search and user fetches.
# Block model training
User-agent: GPTBot
Disallow: /
# Allow search indexing (drives ChatGPT Shopping citations)
User-agent: OAI-SearchBot
Allow: /
# Allow live user fetches (supports agentic shopping)
User-agent: ChatGPT-User
Allow: /
That gives you ChatGPT Shopping visibility and supports agent-driven queries, while keeping your content out of future model training without attribution. Best of both, deliberately chosen.
If you want maximum visibility and don't care about the training question (allow all):
User-agent: GPTBot
Allow: /
User-agent: OAI-SearchBot
Allow: /
User-agent: ChatGPT-User
Allow: /
Verification via IP ranges
Each crawler publishes its IP ranges as JSON. Use these to verify legitimate OpenAI traffic in your server logs and to build WAF allow-lists - the user-agent string alone is trivially spoofable, so don't trust it on its own:
- GPTBot IPs: openai.com/gptbot.json
- OAI-SearchBot IPs: openai.com/searchbot.json
- ChatGPT-User IPs: openai.com/chatgpt-user.json
These endpoints change regularly. Automate the fetch and keep your WAF rules current - set it once and forget it and you'll be blocking legitimate OpenAI IPs within a quarter.
Common configuration mistakes
- Blocking all three via
User-agent: *Disallow - kills ChatGPT Shopping visibility in one line. Use explicit per-agent directives. - Cloudflare default blocking (July 2025+) - new Cloudflare domains block AI crawlers by default. Check your Bots dashboard and explicitly allow OAI-SearchBot, because the robots.txt you carefully wrote doesn't matter if Cloudflare returns 403 first.
- Confusing ChatGPT-User with ChatGPT traffic - ChatGPT-User only handles explicit web fetches. Most ChatGPT citations come from OAI-SearchBot's pre-built index, so that's the one to optimize for.
- WAF blocking the user-agent string - robots.txt allows the crawler, your WAF returns 403 anyway. Two layers, check both.
How to check if you're blocked
Quick test: ask ChatGPT "Does [yourstore.com] sell [specific product]?" If it says "I can't access that site" or hands back generic info without visiting, you're blocked at one of these layers:
- robots.txt disallowing OAI-SearchBot or ChatGPT-User
- Cloudflare Bot Fight Mode or AI Crawl Control blocking the request
- WAF rule returning 403 to the OpenAI user agent
- Server-side rendering failing for the crawler's request headers
Run that test on your own store before you assume you're fine. I've watched store owners swear they were open and fail it on the first try.
GEOlikeaPro's Crawler View shows exactly what each OpenAI crawler sees when it visits your pages - including whether you're blocked at the robots.txt, WAF, or rendering layer. Sign up free to audit your crawler configuration.
FAQ
What's the difference between GPTBot and OAI-SearchBot?
GPTBot trains foundation models — it doesn't send you traffic. OAI-SearchBot indexes content for ChatGPT's live search and citations — this is what makes your products show up in ChatGPT Shopping. You want OAI-SearchBot allowed at minimum.
Does blocking GPTBot hurt my ChatGPT visibility?
No. GPTBot is for training future models, not for search. Blocking GPTBot keeps your content out of training data but does NOT affect ChatGPT Shopping or citations — those come from OAI-SearchBot, which is a separate crawler.
What is ChatGPT-User and when does it fetch my site?
ChatGPT-User fetches pages in real time when a user explicitly asks ChatGPT to check a URL or when a Custom GPT uses browsing. It's not a crawler — it acts like a browser agent per user query. Blocking it prevents live user lookups.
Should I allow or block OpenAI crawlers for e-commerce?
For e-commerce: block GPTBot (training, no traffic), allow OAI-SearchBot (ChatGPT Shopping citations), allow ChatGPT-User (live user fetches). This gives maximum visibility for products without feeding training data.
How do I verify OpenAI crawler traffic in my logs?
OpenAI publishes IP ranges for each crawler at openai.com/gptbot.json, openai.com/searchbot.json, and openai.com/chatgpt-user.json. Cross-reference the source IP with these lists to verify legitimate OpenAI traffic. Spoofed user agents with non-matching IPs are impersonators.
Why isn't my store showing in ChatGPT search results?
Most common causes: (1) robots.txt blocks OAI-SearchBot, (2) Cloudflare default blocking is enabled (introduced July 2025 for new domains), (3) WAF returns 403 to the OpenAI user agent, (4) the page requires JavaScript rendering that the crawler can't execute. Check all four layers.