The mention-density model: how AI search actually cites mid-market e-commerce brands
Five of the eight mid-market audio brands we tested - Skullcandy, Teufel, Master & Dynamic, AIAIAI, Grado Labs - got cited zero times by any of the four leading LLMs when asked "best wireless headphones under $300." The remaining three were each cited by exactly one model, with no overlap between ChatGPT, Claude, Gemini, and Perplexity. Sony, Bose, and Sennheiser appeared in all four shortlists.
This is not a story about audio. It's what happens to mid-market brands in every category once the query is broad enough that the LLM falls back on memory.
Stay in the loop
Get news and updates about GEO, AI search and new features. Unsubscribe anytime.
Is your brand a Ghost or a Guide on AI?
See if AI knows your brand. We ask Gemini and Claude live - in ~5 seconds, no signup.
We audited 50+ mid-market e-commerce brands across the four leading LLMs over two weeks. Three findings cut against common Generative Engine Optimization (GEO) advice. The most contrarian one: AI-readiness schema does not predict citation rate - and the audit-tool category, our own Visibility Vitals checker included, has been pointing customers at the wrong intervention.
This post walks the three findings with the per-brand data, names the latent variable that connects them, and closes with the three interventions that actually correlated with citation lift in our sample.
One disclosure up front, because it matters: GEOlikeaPro is itself a GEO audit tool. Our own Visibility Vitals checker scores brands on the very schema/robots/sitemap signals this post argues are over-weighted. The contrarian finding lands on our own product, not some faceless industry - and that's exactly why we ran the audit instead of looking away.
How GEOlikeaPro audits AI citation: 50+ brands × 4 LLMs across three sprints
Each audit sends one matched query - for example, "best sustainable flats for women" or "best Italian espresso brand 2026" - to ChatGPT, Claude, Gemini, and Perplexity in parallel. We parse each response for brand mentions, citation rank position, source authority, and sentiment, then compute a Share of Voice (SOV) score: the percentage of models that cited the brand, weighted by rank.
The dataset spans:
- 50+ mid-market e-commerce brands across Cosmetics & Beauty, Fashion & Apparel, Sports & Fitness, Food & Nutrition, Home & Garden, and E-commerce & Retail.
- 9 e-commerce platforms (Shopify, Shopify Plus, BigCommerce, Salesforce Commerce Cloud, Shopware, PrestaShop, and three more).
- 11 countries across North America, Europe, and Asia-Pacific.
- Three sprints: a cross-sectional baseline, a native-language A/B reinforcement, and a schema-variance + cultural-authority extension.
- ~200 audit rows, ~30 visibility-vitals reports, 20 agent-standards scorecards, and 5 brand-verification gap analyses across four Supabase tables.
Finding 1: Native-language GEO audits lift share of voice by 36 percentage points for brands invisible in English, but reduce it by 7 points for brands already cited
We A/B tested 14 mid-market non-English-primary brands. Each was audited twice on the same matched category query - once in English, once translated into the brand's primary market language.
The hypothesis going in was the standard one: brands targeting non-English-speaking customers should benefit from native-language audits because LLMs tokenize local brands more accurately and surface local press, reviews, and forums. Most GEO advice repeats this without ever testing it.
Our data does not support a universal lift. It supports the opposite of universal.
| English SOV bucket | n | Mean lift in native language |
|---|---|---|
| Low EN (≤ 50) | 7 | +36 pp |
| High EN (≥ 75) | 7 | −7 pp |
The lift is conditional on English-baseline visibility. For brands invisible in English, native-language audits unlocked an average of 36 additional percentage points of SOV - sometimes 75 points. For brands already cited in English, native-language audits actually reduced SOV by an average of 7 points. The same intervention helps one group and hurts the other. That's the part the generic advice never tells you.
The mechanism: native-language queries surface local competitors who don't show up in the English shortlist. PatBO displaces Farm Rio in Portuguese results. WMF and Fissler displace Manufactum in German results. The native-language query changes the competitive set - and whether that change helps or hurts depends entirely on which set you started in.
The diagnostic that falls out of this is one line:
If your English SOV is ≥ 75, native-language audits will not help. If ≤ 50, they may be your single biggest lever.
Run the English audit first. Use that result to decide whether the native-language test is even worth running. Don't run it on faith.
Finding 2: Schema affects retrieval-time visibility, not training-corpus presence
Most GEO audit tools - ours included - score brands on six auto-verifiable signals: page accessibility, JSON-LD Organization schema, AI-bot robots.txt access (GPTBot / PerplexityBot / anthropic-ai / Google-Extended), sitemap presence, FAQPage schema, and aggregateRating schema. The implicit theory baked into those scorers: hit the signals, get cited. We baked that theory in too.
We tested it on 21 brands. The correlation between schema completeness and SOV was weak. Brands with near-perfect schema (4–5 of 6 auto-verifiable signals) ranged from 0% SOV (Skullcandy) to 100% (Warby Parker, Rothy's, Aritzia). Brands with minimal schema (1–2 of 6) ranged from 25% (Brava Fabrics) to 100% (Allbirds, Bombas, Rapha). If schema were the lever, that scatter wouldn't exist.
Schema explained roughly 9 percentage points of SOV variance in our sample. The other 90+ sat somewhere else entirely.
But "schema doesn't matter" is the wrong takeaway, and I want to be precise about why. The sharper claim is a two-corpus model.
LLMs cite brands from two distinct corpora - bodies of text the model can draw on:
- The training corpus - the static dataset of web pages, books, archived content, and forum posts the LLM was trained on, frozen at the model's training cutoff. This is what the model "knows" without looking anything up.
- The retrieval corpus - the live web content the model fetches during a query. Perplexity does this heavily; Gemini via Google Search; ChatGPT and Claude via web-browsing tools.
Schema cannot touch the training corpus. Training is frozen. No amount of JSON-LD you add today rewrites what the model learned in pre-training - that's not how any of this works.
Schema can help retrieval-time citation by web-searching models, because retrievers use schema to parse and extract content during the query. That's consistent with our model-by-model data: Perplexity, the heaviest web-searcher of the four, was the most likely to cite mid-market brands on broad queries.
So the precise finding is:
Schema is a retrieval-time feature, not a training-time feature. Brands already in the training corpus (Allbirds, Bombas, Rapha) get cited regardless of schema. Brands not in the training corpus (Skullcandy, Cuts Clothing) are not rescued by schema - schema can only help if a web-searching model finds and parses your page during the live query.
Treating schema as a citation lever assumes schema can rewrite the training corpus. It can't. Its actual role is narrower and more retrieval-specific than most GEO scorecards - Visibility Vitals included - admit. We're saying that about our own product on purpose.
The contrarian cases
The cleanest illustrations of the two-corpus model in our data:
| Brand | Vitals/6 | English SOV | Reading |
|---|---|---|---|
| Skullcandy | 4/6 | 0% | Near-perfect schema, zero citations. Training-corpus density too low; schema can't compensate. |
| Cuts Clothing | 4/6 | 25% | High schema, low SOV. Same pattern. |
| Allbirds | 2/6 | 100% | Low schema, full citation. Cited by all four LLMs regardless of schema completeness. |
| Bombas | 2/6 | 100% | Same pattern. |
| Rapha | 2/6 | 100% | Same pattern. |
In every case, schema completeness and citation rate are decoupled by something more fundamental: the brand's mention density in the training corpus.
Finding 3: Cultural authority of country-in-category sets your English baseline
Finding 1 raised an obvious follow-up: is the lift really about language, or about the broader cultural context a country carries in a given category?
We tested it directly. We added 5 mid-market brands from countries with strong categorical cultural authority: Italy/coffee (Caffè Borbone), Japan/audio (Audio-Technica), Switzerland/watches (Mido), France/fragrance (Diptyque), Germany/tools (Wera). Same matched query template as the Sprint 1 control cohort: "best [country] [category] brand 2026," in English.
| Cohort | n | Mean English SOV |
|---|---|---|
| Cultural-authority countries (IT/coffee, JP/audio, CH/watches, FR/fragrance, DE/tools) | 5 | 90% |
| Non-authority countries (DE/clothing, FR/underwear, ES/menswear, NO/outdoor) | 4 | 12.5% |
The delta is 77.5 percentage points of free SOV lift, just from being in a country-category pair the global imagination already associates. Free, in the sense that no individual brand earned it - and that's also the trap, because you can't quickly earn it either.
The mechanism is cumulative editorial repetition. Decades of "Italians know coffee," "Japan makes the best headphones," "Swiss watches are precision craftsmanship" got encoded into the training corpus. LLMs surface brands from these country-category pairs even when the specific brand is mid-market - because the cultural-authority association is dense enough to act as a topical anchor.
This is the upstream variable. Cultural authority drives training-corpus mention density, which drives citation rate. Brands in authority pairs start the GEO race already cited. Brands in non-authority pairs start at zero. Same effort, different starting line.
Synthesis: mention density is the master variable
The three findings are facets of one mechanism. AI citation rate is governed by your brand's mention density at both layers the LLM uses - what it was trained on and what it retrieves live. The three findings are three different vectors of that one variable.
- Cultural authority sets the baseline density of your country-category pair in the training corpus. Built over decades of editorial repetition. Cannot be moved quickly - accept that and plan around it.
- Native-language audits access language-localized density - local press, reviews, and forums the English training data doesn't index. Rescues brands invisible in English by tapping a different slice of the same training data.
- Schema adds retrieval-time parsing density - helps web-searching models extract and cite content during the query. Does not change the training corpus.
Most GEO tooling - our own Visibility Vitals scorer included - has been optimizing the wrong layer. Schema is necessary-but-not-sufficient for retrieval-time visibility, and irrelevant for training-corpus presence. The mid-market brand obsessing over its JSON-LD audit score is polishing brass on the Titanic if its underlying training-corpus density is low. I'd rather say that plainly than keep selling the comfortable version.
The actual levers, ranked by long-term impact:
- Earn mentions in sources LLMs preferentially train on - Wikipedia, established press, industry publications (e.g., TechRadar in consumer electronics), expert roundups, high-authority review sites (Trustpilot, niche category publications).
- Build cultural authority in your country-category combination - partner with editorial sources that reinforce the country-as-authority association, contribute to standards bodies, sponsor research that gets cited.
- Audit and publish in your customers' native language when the English baseline is low - to access language-localized mention density.
- Implement schema as a hygiene baseline - it helps retrieval-time parsing, but don't treat it as a citation lever.
In that order. Most GEO tools, ours historically included, have the order inverted.
Three secondary findings
LLMs do not agree on which mid-market brand to cite
We asked four LLMs the same query - "best wireless headphones under $300" - and got four different shortlists. ChatGPT cited House of Marley. Gemini cited Marshall. Perplexity cited Nothing. Claude cited none. Five of the eight brands tested got zero citations. The three that did were each cited by exactly one model, zero overlap.
Implication: a GEO audit run against a single model captures 25% of the citation landscape. Single-LLM audits are a methodology error, not a budget shortcut.
Sentiment drops 10–15 percentage points on trust queries even when SOV is 100 percent
Trust queries - "is [brand] legit" or "is [brand] worth the price" - score 100% SOV because the query names the brand and all four LLMs echo it back. But sentiment on trust queries ran 10–15 percentage points lower than sentiment on discovery queries for the same brand. Olipop: 86 on discovery sentiment, 71 on trust. Rothy's: 88 and 76. Princess Polly: 79 and 68.
Trust-query SOV is a vanity metric. Sentiment-weighted SOV is the signal worth tracking - the headline number is lying to you here.
Mid-market brands get cited 7× more on niche category queries than on broad ones
The same 11 mid-market fashion and sports brands averaged 3.1 citations out of 4 LLMs on niche matched queries ("best sustainable flats for women," "best premium cycling kits," "best Spanish menswear shirts") and 0.4 out of 4 on broad generic queries ("best everyday clothing brand 2026"). Query breadth, not brand quality, dominates citation rate. Read that twice - it's not your product, it's the question.
Mid-market brands chasing generic head-terms lose to global incumbents regardless of schema, authority, or press. The citation win is in the long tail - queries specific enough that only 3–5 brands credibly compete. That's where the game is winnable.
What to do about it
Three interventions correlated with citation lift in our data:
- Audit across all four leading LLMs. Single-LLM audits miss 75% of the signal. The model that does or doesn't cite you varies by query and by week.
- Stop chasing broad head-terms. If you're mid-market, broad queries are a Sony/Bose/Sennheiser trap. Target niche category queries where 3–5 brands credibly compete.
- Build training-corpus mention density by earning placements in the publications LLMs preferentially train on. This is the slow, real GEO. Schema is the fast, decorative one - don't confuse the two.
If your brand is non-English-primary and your English baseline SOV is below 50%, also run a native-language audit. It's the most cost-effective short-term lever in our dataset for that segment.
Limitations
- Sample sizes vary by finding: Finding 1 is n=14, Finding 2 is n=21, Finding 3 is n=9 (5 vs 4). These are directional, not settled empirical laws - I'd rather under-claim than oversell our own study.
- The Visibility Vitals score reflects only the 6 auto-verifiable signals out of 15 in the full GVI framework. Manual signals (backlinks, expert bylines, comparison pages) were not measured. Schema's role at retrieval time may be larger or smaller than our auto-portion suggests.
- 5 brands had bot-blocked autochecks; their schema scores are floored.
- We tested mid-market e-commerce brands. Findings may not generalize to enterprise SaaS, B2B services, or non-commerce categories.
Continuing research
GEOlikeaPro runs cross-sectional GEO audits like this one every 4–6 weeks. The next sprint extends the cultural-authority finding to n=15+ across more country-category pairs, tests the schema retrieval-time hypothesis directly by isolating Perplexity, and adds a manual-GVI cohort so we can score the full 15-signal framework.
Get the next research sprint in your inbox
We publish cross-sectional GEO audits like this every 4–6 weeks. Subscribe and we'll send the next one as soon as it goes live. No marketing emails, no newsletter spam - just the next research sprint when it publishes.
If you'd like your brand included in the next research sprint, email hello@geolikeapro.com.
By Alex Birman, founder of GEOlikeaPro - a generative engine optimization audit platform for mid-market e-commerce brands. This research was conducted between April 17 and May 4, 2026.
The full methodology, sample notes, and limitations are deposited as a citable preprint on Zenodo: Birman, A. (2026). The Mention-Density Model: How AI Search Cites Mid-Market E-Commerce Brands. Zenodo. https://doi.org/10.5281/zenodo.20379032
FAQ
Does AI-readiness schema predict citation rate for mid-market brands?
Not in our 21-brand sample. Schema completeness explained roughly 9 percentage points of share-of-voice variance - the remaining 90+ pp lives elsewhere. The sharper framing is a two-corpus model: schema affects retrieval-time visibility for web-searching models, not training-corpus presence. Brands already in the training corpus (Allbirds, Bombas, Rapha) get cited regardless of schema. Brands not in the training corpus (Skullcandy, Cuts Clothing) are not rescued by schema.
When does running a GEO audit in the brand's native language help?
Only when the brand's English baseline SOV is at or below 50%. Brands in that range gained an average of <strong>+36 percentage points</strong> in our 14-brand A/B test. Brands with English SOV at or above 75% saw an average decrease of 7 points when switched to native language, because native-language queries surface local competitors that displace the brand (PatBO displaces Farm Rio in Portuguese; WMF displaces Manufactum in German).
Why do mid-market brands from authority countries get more AI citations?
Cultural authority of country-in-category - Italy/coffee, Japan/audio, Switzerland/watches, France/fragrance, Germany/tools - is encoded into the LLM training corpus through decades of editorial repetition. Mid-market brands in those pairs averaged <strong>90% English SOV</strong> in our test (Audio-Technica, Diptyque, Wera, Caffè Borbone, Mido) versus <strong>12.5%</strong> for non-authority pairs (Snocks/DE/clothing, Le Slip Français/FR/underwear, Brava Fabrics/ES/menswear, Stormberg/NO/outdoor) - a 77.5 percentage point delta.
Should mid-market brands chase broad-category queries like “best clothing brand”?
No. The same 11 mid-market brands averaged 3.1 citations out of 4 LLMs on niche matched queries and 0.4 out of 4 on broad ones - a 7× drop just from widening the query. Broad queries are dominated by global incumbents (Sony, Bose, Sennheiser, Uniqlo, Zara) regardless of GEO investment. Mid-market wins live in long-tail queries where only 3–5 brands credibly compete.
Is auditing a single LLM sufficient for GEO research?
No. In our broad-query test, ChatGPT, Claude, Gemini, and Perplexity produced four different shortlists with zero overlap among mid-market brands. ChatGPT cited House of Marley; Gemini cited Marshall; Perplexity cited Nothing; Claude cited none. Single-LLM audits capture roughly 25% of the citation landscape. Audit all four to see the full picture.
What is the master variable behind all three findings?
Mention density at both layers the LLM uses - what it was trained on and what it retrieves live. Cultural authority sets the <em>baseline density</em> of your country-category pair in the training corpus. Native-language audits access <em>language-localized density</em> that English queries don’t see. Schema adds <em>retrieval-time parsing density</em>. The three findings are three vectors of the same underlying variable. Most GEO tooling - including ours - has been optimizing the wrong layer by treating schema as if it can rewrite the training corpus. It cannot.