Structured Data Testing for AI Search — Beyond Google's Rich Results Test

May 24, 2026

Google's Rich Results Test answers exactly one question: does this JSON-LD qualify for a Google SERP feature? That was a useful question in 2019. In 2026, when most product decisions are being made inside ChatGPT, Perplexity, and Claude - none of which render SERP features - it's the wrong question, and testing against it gives you false confidence.

The right question is: can a retrieval system parse my schema, resolve every required field, and match the entities to the visible page? No single tool answers that today. You have to chain four checks together, and I'll show you the order that saves the most time.

This is the testing stack we run on every audit: syntax validation, spec compliance, server-render verification, visible-text alignment, and live LLM extraction. The order is the point - most of the time-saving comes from catching the failure at the cheapest layer before you ever touch the expensive one.

5
Testing layers — syntax, server-render, visible-text, parity, live extraction
6
AI user agents to test (OAI-SearchBot, GPTBot, ChatGPT-User, PerplexityBot, ClaudeBot, Amazonbot)
#1
Cause of "valid schema, no citations" — visible-text alignment failure

Why the Rich Results Test is not enough

Rich Results Test blind spots

The test is built around Google's SERP feature gallery - not around what AI retrieval systems extract. Three blind spots break in production:

Adding schema produced no major uplift in citations on any platform.

  • It hides non-feature schema types. Organization, Person, WebSite, BreadcrumbList nested in unusual ways - none of these qualify for a rich result, but all of them feed LLM entity resolution. The test ignores them, so you do too, and that's the gap.
  • It tolerates display-only failures. The test will pass a Product with a missing brand field if other fields satisfy a minimal SERP card. AI extractors reading the same Product entity fail to resolve the brand and skip the page entirely.
  • It accepts client-rendered schema. The test uses a rendered DOM after JavaScript executes. Most AI crawlers (notably OAI-SearchBot through mid-2025) do not. Your test passes; your real-world extraction fails. That's the most expensive false positive in this whole space.

The dominant practical implication is that traditional organic rank position remains the primary lever for AI visibility, and that GEO-specific optimization efforts are most productive when directed at content quality and authority rather than generic structured data implementation.

Kurt Fischman Founder of Growth Marshal SSRN, 2026

The five-layer testing stack

Layer 1 - syntax and Google compliance

Run two tools in parallel:

  • Google Rich Results Test - confirms SERP feature eligibility. Still worth running, because AI Overviews share Google's index pipeline.
  • Schema.org Validator - flags types and properties that exist in the Schema.org vocabulary but aren't on Google's feature list. This is where you catch the Organization.foundingDate typo or the invalid Offer.availability enum value that Google silently ignores but an LLM chokes on.

If either tool flags an error, fix it before moving on. Every layer below assumes Layer 1 passes - don't skip ahead.

Layer 2 - server-render verification

Run curl against the URL with JavaScript out of the picture - the raw HTML response:

curl -sL https://yourstore.com/products/widget \
  -H "User-Agent: OAI-SearchBot/1.0; +https://openai.com/searchbot" \
  | grep -A 200 'application/ld+json'

If the JSON-LD block isn't in the response body, your schema is being injected by client-side JavaScript and AI crawlers without JS execution will never see it. Fix it by moving the schema into the server-rendered template - the way Shopify's Liquid layouts and Next.js getStaticProps already do. This is the single most common real failure I see.

Repeat with each major AI user agent:

  • OAI-SearchBot/1.0
  • GPTBot/1.1
  • ChatGPT-User/1.0
  • PerplexityBot/1.0
  • ClaudeBot/1.0
  • Amazonbot/0.1

Some CDNs and bot-management products (Cloudflare's AI Labyrinth and Bot Fight Mode especially) serve different bodies to different bots. If your schema vanishes under one of these user agents, you're accidentally blocking the exact crawler you were trying to feed - and your browser will never show you that.

Layer 3 - visible-text alignment

This is the layer most teams skip, and it's the one that breaks the most production extractions. Every value in your JSON-LD must appear, verbatim, in the rendered HTML. AI crawlers cross-check schema against page text, and entities that exist only in the JSON-LD block get downweighted or dropped. No warning, just gone.

For each Product / Offer / FAQPage entity, confirm:

  • Price: the Offer.price string appears as text on the page ("$49.99", not just "49.99").
  • Availability: the human-readable equivalent of InStock / OutOfStock appears in the buy-box copy.
  • Brand: the brand name appears in visible text near the product title, not just in Product.brand.name.
  • Aggregate rating: the star count and review count are both visible. Schema-only ratings get stripped.
  • FAQ questions: every Question.name appears as a heading, summary, or list item on the page.

Quick automated check: extract every "name" and "text" string from the JSON-LD, then grep the rendered HTML for each one. Anything missing is a downweight risk - treat it as a bug, not a nicety.

Layer 4 - server vs client schema parity

If you generate schema both server-side (template) and client-side (analytics tag, app block), they have to agree. The classic bug: a Shopify theme renders Product schema with the variant the merchandiser selected, and a third-party reviews app renders a second Product schema with the default variant. AI crawlers concatenate both and get two conflicting prices - then trust neither.

Search the rendered HTML for the count of application/ld+json blocks. If you have more than one Product entity, audit each block's @id and confirm they refer to distinct things (Product vs Offer vs Review). Duplicate Product entities sharing one @id are a citation killer, full stop.

Layer 5 - live LLM extraction

This is the only layer that confirms the schema is actually producing the citations you want. Simple, but tedious:

  1. Open ChatGPT with web search enabled (e.g. GPT-4.1 with the search tool, or ChatGPT Shopping if you sell to consumers).
  2. Ask a buyer-intent question about your product: "What's the return policy on the Acme Widget Pro?"
  3. Inspect the answer. If ChatGPT quotes your FAQ answer verbatim, your FAQPage schema is reaching extraction. If it paraphrases or sources a competitor, something earlier in the stack is failing - go back, don't patch here.
  4. Repeat in Perplexity and Claude. Citation behavior varies by engine - see our Perplexity citation triggers writeup for the per-engine breakdown.

You can also script this with the OpenAI API and the web_search tool - useful for tracking citation stability week-over-week without doing the manual checks by hand every time.

Common failures and what they look like

Symptom Layer Fix
Rich Results Test passes, AI never cites 3 or 5 Audit visible-text alignment; test live extraction in three engines.
Schema visible in browser, missing in curl 2 Move schema to server-rendered template.
Schema present for Googlebot, missing for OAI-SearchBot 2 Check bot-management rules in Cloudflare / Akamai / Fastly.
Two Product entities with conflicting prices 4 Deduplicate; assign distinct @id per entity.
FAQ schema accepted, no LLM citations 3 Confirm every Question is visible in the rendered HTML.
Schema passes everything, still no citations - Schema is one of many signals. Audit content depth, freshness, and outbound sources.

What to test on each page type

  • Product pages: Product + Offer + AggregateRating + FAQPage. All five layers.
  • Blog posts: BlogPosting + FAQPage + (optionally) HowTo. Skip Layer 4; Layers 1-3 and 5 matter.
  • Category / collection pages: CollectionPage + ItemList + BreadcrumbList. Layers 1-3.
  • About / brand pages: Organization + Person (founder). See our About page guide.
  • Support / policy pages: FAQPage + WebPage. Layers 1-3 are critical here, because these answer the most LLM-extractable queries you have.

What we are not testing for (yet)

There's no standard validator today for AI-specific schema extensions - Schema.org's Action hierarchy for agent affordances, llms.txt indexing hints, or the proposed aiContext fields some platforms are floating. Adopt any of these and you're testing in production, full stop. My advice: ship them only where the cost of being wrong is low (blog posts, marketing pages) until reference validators actually exist. Don't put unvalidated experiments on your top product page.

Test order, not test menu

Run the layers top-to-bottom. Most teams find their problem in Layer 2 (server-render) or Layer 3 (visible-text alignment) and never need to script Layer 5. Skipping straight to live LLM extraction is how you spend a day debugging the wrong thing.


GEOlikeaPro's AI Readiness audit runs Layers 1-4 automatically and flags Layer 5 candidates worth manual review. Try it free on any URL.

FAQ

Is Google's Rich Results Test enough for AI search?

No. It validates SERP feature eligibility, not LLM extraction. It accepts client-rendered schema (which most AI crawlers do not execute), ignores Schema.org types that do not produce SERP features (Organization, Person, BreadcrumbList nested unusually), and tolerates missing fields that AI extractors require. Use it as Layer 1 of a five-layer stack, not as the whole test.

How do I test whether my JSON-LD is server-rendered?

Run <code>curl -sL https://yoursite.com/page</code> against your URL with an AI crawler user agent (<code>OAI-SearchBot/1.0</code>, <code>PerplexityBot/1.0</code>, <code>ClaudeBot/1.0</code>) and grep for the <code>application/ld+json</code> block. If the schema is missing from the raw HTML response, it is being injected client-side and AI crawlers without JavaScript execution will never see it.

Why does my schema pass validation but never produce AI citations?

The most common cause is visible-text alignment failure. AI crawlers cross-check schema against the rendered page — values in your JSON-LD that do not appear in the visible HTML get downweighted or dropped. Audit every <code>name</code> and <code>text</code> string in your markup against the page text. Other causes: missing required fields the Rich Results Test ignores, conflicting duplicate Product entities, or content depth too thin to anchor a citation.

Should I test schema separately for each AI crawler?

Yes. CDNs and bot-management products (Cloudflare's <a href="/blog/cloudflare-ai-labyrinth-configuration">AI Labyrinth and Bot Fight Mode</a>, Akamai Bot Manager, Fastly) routinely serve different bodies to different user agents. Curl with each of the major AI bots and confirm the schema block is present and identical in every response. A bot you forgot to allowlist is a bot that never cites you.

How do I confirm my schema is actually producing AI citations?

Layer 5: live LLM extraction. Ask a buyer-intent question that matches one of your schema entries in ChatGPT, Perplexity, and Claude. If the AI quotes your <code>acceptedAnswer.text</code> verbatim or cites the specific Product field, extraction is working. Paraphrasing, generic answers, or competitor citations indicate a failure somewhere in Layers 1–4. You can script this with the OpenAI API's web_search tool for week-over-week stability tracking.

Brands using GEO see 3× more AI citations

Start optimising your product pages for AI search engines - free tier, no credit card needed.

Start free →

Free tier · No credit card required