The mention-density model: how AI search actually cites mid-market e-commerce brands
Five of the eight mid-market audio brands we tested - Skullcandy, Teufel, Master & Dynamic, AIAIAI, Grado Labs - got cited zero times by any of the four leading LLMs when asked “best wireless headphones under $300.” The remaining three were each cited by exactly one model, with no overlap between ChatGPT, Claude, Gemini, and Perplexity. Sony, Bose, and Sennheiser appeared in all four shortlists.
This is not a story about audio. It is what happens to mid-market brands across every category when the query is broad enough that LLMs default to memory.
Stay in the loop
Get news and updates about GEO, AI search and new features. Unsubscribe anytime.
We audited 50+ mid-market e-commerce brands across the four leading LLMs over two weeks. Three findings challenge common Generative Engine Optimization (GEO) advice. The most contrarian: AI-readiness schema does not predict citation rate, and the audit-tool category - our own Visibility Vitals checker included - has been pointing at the wrong intervention.
This post walks through the three findings with the per-brand data, explains the latent variable that connects them, and closes with the three interventions that actually correlated with citation lift in our sample.
One disclosure up front: GEOlikeaPro is itself a GEO audit tool. Our own Visibility Vitals checker scores brands on the very schema/robots/sitemap signals this post argues are over-weighted. The contrarian finding applies to our own product, not just a faceless industry - and it's why we ran the audit in the first place.
How GEOlikeaPro audits AI citation: 50+ brands × 4 LLMs across three sprints
Each audit sends one matched query - for example, “best sustainable flats for women” or “best Italian espresso brand 2026” - to ChatGPT, Claude, Gemini, and Perplexity in parallel. We parse each response for brand mentions, citation rank position, source authority, and sentiment, then compute a Share of Voice (SOV) score: the percentage of models that cited the brand, weighted by rank.
The dataset spans:
- 50+ mid-market e-commerce brands across Cosmetics & Beauty, Fashion & Apparel, Sports & Fitness, Food & Nutrition, Home & Garden, and E-commerce & Retail.
- 9 e-commerce platforms (Shopify, Shopify Plus, BigCommerce, Salesforce Commerce Cloud, Shopware, PrestaShop, and three more).
- 11 countries across North America, Europe, and Asia-Pacific.
- Three sprints: a cross-sectional baseline, a native-language A/B reinforcement, and a schema-variance + cultural-authority extension.
- ~200 audit rows, ~30 visibility-vitals reports, 20 agent-standards scorecards, and 5 brand-verification gap analyses across four Supabase tables.
Finding 1: Native-language GEO audits lift share of voice by 36 percentage points for brands invisible in English, but reduce it by 7 points for brands already cited
We A/B tested 14 mid-market non-English-primary brands. Each brand was audited twice on the same matched category query - once in English, once translated into the brand's primary market language.
The original hypothesis was straightforward: brands targeting non-English-speaking customers should benefit from native-language audits because LLMs will tokenize local brands more accurately and surface local press, reviews, and forums. Most GEO advice repeats this without testing it.
The data does not support a universal lift.
| English SOV bucket | n | Mean lift in native language |
|---|---|---|
| Low EN (≤ 50) | 7 | +36 pp |
| High EN (≥ 75) | 7 | −7 pp |
The lift is conditional on English-baseline visibility. For brands invisible in English, native-language audits unlocked an average of 36 additional percentage points of SOV - sometimes 75 points. For brands already cited in English, native-language audits actually reduced SOV by an average of 7 points.
The mechanism: native-language queries surface local competitors who don’t appear in the English shortlist. PatBO displaces Farm Rio in Portuguese-language results. WMF and Fissler displace Manufactum in German-language results. The native-language query changes the competitive set; whether that change helps or hurts depends entirely on which set you started in.
The diagnostic that follows from this is one line:
If your English SOV is ≥ 75, native-language audits will not help. If ≤ 50, they may be your single biggest lever.
Run the English audit first. Use the result to decide whether the native-language test is worth running.
Finding 2: Schema affects retrieval-time visibility, not training-corpus presence
Most GEO audit tools - including our own Visibility Vitals checker - score brands on six auto-verifiable signals: page accessibility, JSON-LD Organization schema, AI-bot robots.txt access (GPTBot / PerplexityBot / anthropic-ai / Google-Extended), sitemap presence, FAQPage schema, and aggregateRating schema. The implicit theory baked into those scorers: hit those signals, get cited.
We tested it on 21 brands. The correlation between schema completeness and SOV was weak. Brands with near-perfect schema (4–5 of 6 auto-verifiable signals) ranged from 0 percent SOV (Skullcandy) to 100 percent (Warby Parker, Rothy’s, Aritzia). Brands with minimal schema (1–2 of 6) ranged from 25 percent (Brava Fabrics) to 100 percent (Allbirds, Bombas, Rapha).
Schema explained roughly 9 percentage points of SOV variance in our sample. The remaining 90+ percentage points sat elsewhere.
But “schema doesn’t matter” is the wrong takeaway. The sharper claim is a two-corpus model.
LLMs cite brands from two distinct corpora - bodies of text the model has access to:
- The training corpus - the static dataset of web pages, books, archived content, and forum posts the LLM was trained on, frozen at the model’s training cutoff. This determines what the model “knows” without external lookup.
- The retrieval corpus - the live web content the model fetches during a query. Perplexity does this heavily; Gemini does it via Google Search; ChatGPT and Claude do it via web-browsing tools.
Schema cannot influence the training corpus. Training is frozen. No amount of JSON-LD added today rewrites what the model learned during pre-training.
Schema can help retrieval-time citation by web-searching models, because retrievers use schema to parse and extract content during the query. This is consistent with what our model-by-model data shows: Perplexity (the heaviest web-searcher among the four) was the most likely to cite mid-market brands on broad queries.
So the precise finding is:
Schema is a retrieval-time feature, not a training-time feature. Brands already in the training corpus (Allbirds, Bombas, Rapha) get cited regardless of schema. Brands not in the training corpus (Skullcandy, Cuts Clothing) are not rescued by schema - schema can only help if a web-searching model finds and parses your page during the live query.
Treating schema as a citation lever assumes schema can rewrite the training corpus. It cannot. Schema's actual role is narrower and more retrieval-specific than most GEO scorecards - Visibility Vitals included - acknowledge.
The contrarian cases
The cleanest illustrations of the two-corpus model in our data:
| Brand | Vitals/6 | English SOV | Reading |
|---|---|---|---|
| Skullcandy | 4/6 | 0% | Near-perfect schema, zero citations. Training corpus density too low; schema can’t compensate. |
| Cuts Clothing | 4/6 | 25% | High schema, low SOV. Same pattern. |
| Allbirds | 2/6 | 100% | Low schema, full citation. Cited by all four LLMs regardless of schema completeness. |
| Bombas | 2/6 | 100% | Same pattern. |
| Rapha | 2/6 | 100% | Same pattern. |
In each case, schema completeness and citation rate are decoupled by something more fundamental: the brand’s mention density in the training corpus.
Finding 3: Cultural authority of country-in-category sets your English baseline
Finding 1 raised a follow-up question: is the lift really about language, or about the broader cultural context a country has in a given category?
We tested it directly. We added 5 mid-market brands from countries with strong categorical cultural authority: Italy/coffee (Caffè Borbone), Japan/audio (Audio-Technica), Switzerland/watches (Mido), France/fragrance (Diptyque), Germany/tools (Wera). Same matched query template as the Sprint 1 control cohort: “best [country] [category] brand 2026,” in English.
| Cohort | n | Mean English SOV |
|---|---|---|
| Cultural-authority countries (IT/coffee, JP/audio, CH/watches, FR/fragrance, DE/tools) | 5 | 90% |
| Non-authority countries (DE/clothing, FR/underwear, ES/menswear, NO/outdoor) | 4 | 12.5% |
The delta is 77.5 percentage points of free SOV lift, just from being in a country-category pair the global imagination already associates.
The mechanism is cumulative editorial repetition. Decades of “Italians know coffee” content, “Japan makes the best headphones,” and “Swiss watches are precision craftsmanship” got encoded into the training corpus. LLMs surface brands from these country-category pairs even when the specific brand is mid-market - because the cultural-authority association is so dense it acts as a topical anchor.
This is the upstream variable. Cultural authority drives training-corpus mention density, which drives citation rate. Brands in authority pairs start the GEO race already cited. Brands in non-authority pairs start at zero.
Synthesis: mention density is the master variable
The three findings are different facets of one underlying mechanism. AI citation rate is governed by your brand’s mention density at both layers the LLM uses - what it was trained on and what it retrieves live. The three findings are three different vectors of mention density.
- Cultural authority sets the baseline density of your country-category pair in the training corpus. Built over decades of editorial repetition. Cannot be moved quickly.
- Native-language audits access language-localized density - local press, reviews, and forums the English-language training data doesn’t index. Rescues brands invisible in English by tapping a different slice of the training data.
- Schema adds retrieval-time parsing density - helping web-searching models extract and cite content during the query. Does not change the training corpus.
Most GEO tooling - our own Visibility Vitals scorer included - has been optimizing the wrong layer. Schema is necessary-but-not-sufficient for retrieval-time visibility, and irrelevant for training-corpus presence. The mid-market brand obsessed with their JSON-LD audit score is polishing brass on the Titanic if their underlying training-corpus density is low.
The actual levers, ranked by long-term impact:
- Earn mentions in sources LLMs preferentially train on - Wikipedia, established press, industry publications (e.g., TechRadar in consumer electronics), expert roundups, and high-authority review sites (Trustpilot, niche category publications).
- Build cultural authority in your country-category combination - partner with editorial sources that reinforce the country-as-authority association, contribute to industry standards bodies, sponsor research that gets cited.
- Audit and publish in your customers’ native language when the English baseline is low - to access language-localized mention density.
- Implement schema as a hygiene baseline - it helps retrieval-time parsing, but don’t treat it as a citation lever.
In that order. Most GEO tools have the order inverted.
Three secondary findings
LLMs do not agree on which mid-market brand to cite
We asked four LLMs the same query - “best wireless headphones under $300” - and got four different shortlists. ChatGPT cited House of Marley. Gemini cited Marshall. Perplexity cited Nothing. Claude cited none. Five of the eight brands tested got zero citations. The three that did get cited were each cited by exactly one model, with zero overlap.
Implication: a GEO audit run against a single model captures 25 percent of the citation landscape. Single-LLM audits are a methodology error.
Sentiment drops 10–15 percentage points on trust queries even when SOV is 100 percent
Trust queries - “is [brand] legit” or “is [brand] worth the price” - score 100 percent SOV because the query names the brand and all four LLMs echo it back. But sentiment scores on trust queries ran 10–15 percentage points lower than sentiment on discovery queries for the same brand. Olipop scored 86 on discovery sentiment and 71 on trust. Rothy’s: 88 and 76. Princess Polly: 79 and 68.
Trust-query SOV is a vanity metric. Sentiment-weighted SOV is the signal worth tracking.
Mid-market brands get cited 7× more on niche category queries than on broad ones
The same 11 mid-market fashion and sports brands averaged 3.1 citations out of 4 LLMs on niche matched queries (“best sustainable flats for women,” “best premium cycling kits,” “best Spanish menswear shirts”) and 0.4 out of 4 on broad generic queries (“best everyday clothing brand 2026”). Query breadth, not brand quality, dominates citation rate.
Mid-market brands chasing generic head-terms lose to global incumbents regardless of schema, authority, or press. The citation win is in the long tail - queries specific enough that only 3–5 brands credibly compete.
What to do about it
Three interventions correlated with citation lift in our data:
- Audit across all four leading LLMs. Single-LLM audits miss 75 percent of the signal. The model that does or doesn’t cite you varies by query and by week.
- Stop chasing broad head-terms. If you are mid-market, broad queries are a Sony/Bose/Sennheiser trap. Target niche category queries where 3–5 brands credibly compete.
- Build training-corpus mention density by earning placements in publications LLMs preferentially train on. This is the slow, real GEO. Schema is the fast, decorative one.
If your brand is non-English-primary and your English baseline SOV is below 50 percent, also run a native-language audit. It is the most cost-effective short-term lever in our dataset for that segment.
Limitations
- Sample sizes vary by finding: Finding 1 is n=14, Finding 2 is n=21, Finding 3 is n=9 (5 vs 4). Findings are directional, not settled empirical laws.
- The Visibility Vitals score reflects only the 6 auto-verifiable signals out of the 15 in the full GVI framework. Manual signals (backlinks, expert bylines, comparison pages) were not measured. Schema’s role at retrieval time may be larger or smaller than our auto-portion suggests.
- 5 brands had bot-blocked autochecks; their schema scores are floored.
- We tested mid-market e-commerce brands. Findings may not generalize to enterprise SaaS, B2B services, or non-commerce categories.
Continuing research
GEOlikeaPro runs cross-sectional GEO audits like this one every 4–6 weeks. The next sprint will extend the cultural-authority finding to n=15+ across more country-category pairs, test schema’s retrieval-time hypothesis directly by isolating Perplexity, and add a manual-GVI cohort so we can score the full 15-signal framework.
Get the next research sprint in your inbox
We publish cross-sectional GEO audits like this every 4–6 weeks. Subscribe and we'll send the next one as soon as it goes live. No marketing emails, no newsletter spam - just the next research sprint when it publishes.
If you'd like your brand included in the next research sprint, email hello@geolikeapro.com.
By Alex Birman, founder of GEOlikeaPro - a generative engine optimization audit platform for mid-market e-commerce brands. This research was conducted between April 17 and May 4, 2026.
FAQ
Does AI-readiness schema predict citation rate for mid-market brands?
Not in our 21-brand sample. Schema completeness explained roughly 9 percentage points of share-of-voice variance - the remaining 90+ pp lives elsewhere. The sharper framing is a two-corpus model: schema affects retrieval-time visibility for web-searching models, not training-corpus presence. Brands already in the training corpus (Allbirds, Bombas, Rapha) get cited regardless of schema. Brands not in the training corpus (Skullcandy, Cuts Clothing) are not rescued by schema.
When does running a GEO audit in the brand's native language help?
Only when the brand's English baseline SOV is at or below 50%. Brands in that range gained an average of <strong>+36 percentage points</strong> in our 14-brand A/B test. Brands with English SOV at or above 75% saw an average decrease of 7 points when switched to native language, because native-language queries surface local competitors that displace the brand (PatBO displaces Farm Rio in Portuguese; WMF displaces Manufactum in German).
Why do mid-market brands from authority countries get more AI citations?
Cultural authority of country-in-category - Italy/coffee, Japan/audio, Switzerland/watches, France/fragrance, Germany/tools - is encoded into the LLM training corpus through decades of editorial repetition. Mid-market brands in those pairs averaged <strong>90% English SOV</strong> in our test (Audio-Technica, Diptyque, Wera, Caffè Borbone, Mido) versus <strong>12.5%</strong> for non-authority pairs (Snocks/DE/clothing, Le Slip Français/FR/underwear, Brava Fabrics/ES/menswear, Stormberg/NO/outdoor) - a 77.5 percentage point delta.
Should mid-market brands chase broad-category queries like “best clothing brand”?
No. The same 11 mid-market brands averaged 3.1 citations out of 4 LLMs on niche matched queries and 0.4 out of 4 on broad ones - a 7× drop just from widening the query. Broad queries are dominated by global incumbents (Sony, Bose, Sennheiser, Uniqlo, Zara) regardless of GEO investment. Mid-market wins live in long-tail queries where only 3–5 brands credibly compete.
Is auditing a single LLM sufficient for GEO research?
No. In our broad-query test, ChatGPT, Claude, Gemini, and Perplexity produced four different shortlists with zero overlap among mid-market brands. ChatGPT cited House of Marley; Gemini cited Marshall; Perplexity cited Nothing; Claude cited none. Single-LLM audits capture roughly 25% of the citation landscape. Audit all four to see the full picture.
What is the master variable behind all three findings?
Mention density at both layers the LLM uses - what it was trained on and what it retrieves live. Cultural authority sets the <em>baseline density</em> of your country-category pair in the training corpus. Native-language audits access <em>language-localized density</em> that English queries don’t see. Schema adds <em>retrieval-time parsing density</em>. The three findings are three vectors of the same underlying variable. Most GEO tooling - including ours - has been optimizing the wrong layer by treating schema as if it can rewrite the training corpus. It cannot.