Article

The State of AI Search Visibility 2026: 300 Commercial Queries, 30 Verticals

14 min readLumenGEO Research
GEOAI searchoriginal researchcitation datadata studyAI search visibility

We analyzed the organic search landscape behind 300 commercial-intent buyer queries across 30 verticals — 2,261 classified results in total — to map the candidate pool AI search engines draw from. The finding that should reorder every GEO budget: 67% of the retrievable web for buyer queries is third-party content the brand does not own. Review and comparison sites alone account for 50.2% of all results, and a ranked roundup ("best X for Y") is the dominant top-3 format for 53% of queries. If your GEO strategy is built entirely around your own domain, you are competing for the minority of the field.

Most GEO advice tells you to optimize your own pages. That advice is not wrong — but it is incomplete, and this study shows how incomplete. To understand where AI search visibility actually comes from for commercial queries, we mapped the retrievable web itself: the pool of pages that AI search engines pull candidates from when someone asks which product to buy.

This article breaks down what that landscape looks like — the source-type mix, the verticals where brands still hold ground versus the ones where third parties have taken over, and what the structure of the candidate pool means for how you should spend your GEO effort.

Last updated: May 2026

Across 300 commercial-intent queries in 30 verticals, two-thirds (67%) of the retrievable web is third-party content — review sites, comparison platforms, and editorial coverage the brand does not control. Review and comparison aggregators alone are 50.2% of all results. The strategic implication: GEO for commercial queries is mostly an off-domain game. Winning means getting onto the listicles and review pages AI search engines retrieve, not just polishing your own site.

What we measured and why

This study maps the organic web-search retrieval landscape for commercial queries — the candidate pool AI search engines draw from — not a direct log of AI citations. That distinction is the foundation of everything below, so we are stating it up front.

AI search engines like ChatGPT, Perplexity, and Google's AI features answer buyer questions by retrieving a set of web pages, reading them, and synthesizing an answer with citations. The pages they retrieve come overwhelmingly from the organic search index. So if you want to understand what AI search engines can cite for a query, you study what the organic web returns for that query. That is what we did.

We selected 30 commercial verticals — a deliberate mix of physical products (robot vacuums, mattresses, running shoes), B2B software (CRM, project management, accounting), and considered-purchase services (life insurance, online therapy, solar installation). For each vertical we ran 10 high-intent buyer queries — "best [category] 2026," "[Brand A] vs [Brand B]," "how to choose a [category]," "cheapest [category]," and segment variants like "best [category] for small business." That is 300 queries. For each query we classified the top organic results by source type and recorded the dominant format of the top three results. In total, 2,261 results were classified.

We sorted every result into six source types:

  • Review / comparison aggregators — sites whose business is ranking and reviewing products (NerdWallet, RTINGS, Wirecutter-style outlets, G2, Consumer Reports).
  • Brand-owned — the vendor's or manufacturer's own domain.
  • Editorial — general news and magazine coverage (CNBC, CNN, news outlets) that is not primarily a review business.
  • Community — forums, Q&A sites, Reddit, Quora.
  • Reference — encyclopedic and institutional sources (Wikipedia, .gov, .org standards bodies).
  • Marketplace — Amazon, Walmart, and other retail listing pages.

This gives us a clean, reproducible picture of the commercial retrieval landscape. What it does not give us — and we want to be precise here — is a direct measurement of which pages a specific LLM actually cited. We cover that limitation in full in the methodology section below. Read it before you quote any number from this study.

The retrievable web is a third-party world

Across all 2,261 classified results, review and comparison aggregators were 50.2% of the field, brand-owned sites 31.1%, and editorial 12.8% — meaning 67% of what AI search engines retrieve for commercial queries is content the brand does not own.

Here is the overall source-type mix:

Source typeShare of all resultsWhat it is
Review / comparison aggregator50.2%Sites whose product is ranked roundups and reviews
Brand-owned31.1%The vendor's own domain
Editorial12.8%News and magazine coverage
Community2.2%Forums, Reddit, Quora
Reference1.8%Wikipedia, .gov, standards bodies
Marketplace1.8%Amazon, Walmart, retail listings

The single most important number here is that brand-owned content is less than a third of the field. When a buyer asks an AI search engine which product to choose, the pool of pages it has to work with is dominated by independent reviewers and comparison sites. Your own site is a minority voice — present, but outnumbered roughly two to one.

This matters because of how AI answers get built. When an AI search engine synthesizes "the best CRM for a small team," it is not reading one page — it is reading a handful, weighting them, and producing a recommendation. If five of the six pages in its candidate set are third-party roundups and one is your product page, the shape of the answer is set by the third-party pages. You can have the most extractable, best-structured product page on the internet and still lose the answer because the consensus around you was written by other people.

This is the gap between citation absorption and citation selection — whether your content informs the answer versus whether it gets picked as a named source. A third-party-dominated field means the selection contest is largely happening on pages you do not control. We explore that mechanism in depth in citation selection vs. absorption.

Brand-owned content is only 31.1% of the commercial retrieval landscape. A GEO program that only optimizes the brand's own domain is optimizing one-third of the field and ignoring the other two-thirds. The third-party content — review sites, comparison roundups, editorial coverage — is where the buyer's decision is being framed.

The 67% figure and the "84-94% earned" figure are not the same measurement

You may have seen the widely-cited statistic that 84-94% of AI citations are "earned" — that is, point to third-party rather than brand-owned sources. Our 67% third-party figure points in the same direction, but it is not the same measurement, and we will not pretend otherwise.

The 84-94% figures come from studies that examined citation links inside actual AI answers. Our 67% comes from classifying the organic retrieval landscape — the candidate pool, not the final citation set. The two methodologies measure adjacent things: ours is "what AI search engines have available to cite," theirs is "what AI search engines chose to cite." They corroborate each other directionally — both say third-party content dominates AI search visibility — but they are different lenses on the problem, and the gap between 67% and 84-94% is itself informative: AI answers appear to lean even more third-party than the raw retrieval pool, likely because brand pages get retrieved but lose the selection contest. Treat our number as independent corroboration of the direction, not as a restatement of the same finding.

The ranked roundup is the default format

For 53% of the 300 queries, the top three organic results were led by a listicle — a ranked "best X" roundup — making it the single most common content format in the commercial retrieval landscape.

Format matters as much as source type. We recorded the dominant format of the top three results for every query: listicle (ranked roundup), comparison (head-to-head), guide (how-to-choose), product (a vendor's own page), or forum/community.

The listicle won. For more than half of all commercial queries, the AI search engine's most prominent candidates are ranked roundups. The pattern is even stronger for the highest-volume query type — "best [category] 2026" — which was listicle-led in nearly every vertical we tested.

This has a direct, slightly uncomfortable consequence for brands. The format that dominates AI's view of your category is one you fundamentally cannot publish credibly about yourself. A ranked roundup of "the 7 best project management tools" is only persuasive when a third party writes it; your own "why we're #1" page does not occupy the same slot in the AI's candidate set. So the highest-leverage GEO move in a listicle-dominated category is not writing a better page — it is getting placed, accurately and favorably, inside the roundups that already rank.

The remaining formats split roughly between comparison pages (especially for two-horse-race categories like Salesforce vs. HubSpot or Roomba vs. Roborock), how-to-choose guides, and — in a minority of categories — vendor product pages appearing directly in the top three. Forum-led top-threes were rare.

The ranked roundup ("best X for Y") is the dominant format in 53% of commercial queries — and it is a format brands cannot author about themselves. The practical takeaway: a category-by-category audit of which listicles rank for your buyer queries, and a deliberate effort to be included and accurately represented in them, is core GEO work, not PR garnish.

Some verticals are review-dominated. Others are still brand-held.

The third-party share is not uniform — it ranges from 84-85% in physical-product categories like robot vacuums and air purifiers down to 47-51% in B2B software categories like business checking accounts and HR/payroll software. Where your vertical sits on that spectrum should change your GEO strategy.

Here is the per-vertical breakdown for a representative set of 15 verticals, sorted by review-site dominance:

VerticalReview %Brand %Third-party %Listicle-led top-3
Robot vacuums70%12%84%60%
Credit cards70%19%81%60%
VPN services68%13%85%60%
Air purifiers67%8%84%60%
Running shoes66%19%81%60%
Mattresses63%18%81%60%
Website builders57%30%69%60%
Online banks57%23%77%50%
Password managers54%30%70%60%
Email marketing platforms51%39%61%60%
CRM software35%45%55%40%
AI writing tools35%48%52%60%
Accounting software33%45%55%50%
HR / payroll software33%49%51%40%
Business checking accounts38%53%47%60%

Two clear patterns emerge.

Physical products and consumer financial products are review-dominated. Robot vacuums (70% review), VPNs (68%), air purifiers (67%), credit cards (70%) — in these categories, independent reviewers own the conversation. RTINGS, Consumer Reports, NerdWallet, and category-specialist sites like HouseFresh and VacuumWars are the candidate pool. A vacuum brand's own site appears in only 12% of results. For these verticals, on-site GEO is necessary hygiene but nowhere near sufficient — your visibility is decided on reviewer pages.

B2B software is the last redoubt of brand-owned content. Business checking accounts (53% brand-owned), HR/payroll software (49%), AI writing tools (48%), CRM and accounting software (45% each) — these are the only categories in our dataset where brand-owned content rivals or exceeds review content. Buyers in these categories still land on vendor domains directly, and AI search engines still retrieve them. That is a genuine, time-limited advantage: B2B SaaS brands can still win meaningful AI search visibility with excellent on-site content. But the trend line is unfriendly. Review platforms like G2 and Capterra appeared steadily across our software queries, and the consumer categories show where this ends up once aggregators fully mature in a vertical.

The strategic read is simple. Find your vertical on this spectrum. If you are review-dominated, your GEO budget belongs disproportionately off-site. If you are still brand-held, you have a window — use it to build on-site authority now, before the aggregators arrive. Either way, the right GEO content strategy depends on which side of this line you are on.

Third-party dominance ranges from 84-85% (robot vacuums, VPNs, air purifiers) to 47-51% (business checking, HR software). Review-dominated verticals require an off-site-heavy GEO program. Brand-held verticals — mostly B2B SaaS — still reward on-site investment, but that window is closing as review aggregators expand into software.

— Free GEO Audit

See what ChatGPT says about your brand

Get your GEO Score, competitor analysis, and actionable recommendations — free, in 60 seconds.

Run My Free Audit

A handful of domains own the commercial web

A small set of high-authority domains appears again and again across the commercial retrieval landscape: NerdWallet alone showed up in 58 of 300 queries, with CNBC, Zapier, Consumer Reports, RTINGS, TechRadar, U.S. News, and Bankrate close behind. This concentration tells you exactly which third-party properties matter for AI search visibility.

The most frequently recurring domains across all 300 queries:

DomainQueries it appeared inType
nerdwallet.com58Review / comparison
cnbc.com27Editorial
zapier.com25Brand-owned (with strong editorial)
consumerreports.org25Review
usnews.com20Review / editorial
rtings.com20Review
techradar.com19Review / editorial
bankrate.com19Review

A few things worth noting. First, the concentration is real but not absolute — NerdWallet's 58 appearances out of 300 queries means it shows up in roughly one in five, dominant within finance verticals but absent elsewhere. The landscape has category-specific kingmakers, not one universal authority.

Second, Zapier is the instructive outlier. It is a software brand, not a review site — but it has built a content library so broad and so well-structured that its own domain functions like an editorial publication and gets retrieved across dozens of software queries. Zapier is the existence proof for the brand-owned strategy: a vendor can earn AI search visibility at scale on its own domain. But notice what it took — years of comprehensive, genuinely useful, answer-first content, not a handful of optimized product pages. That is the bar.

Third, the recurring domains are your off-site target list. If you sell a financial product, NerdWallet, Bankrate, and U.S. News are not optional — they are the candidate pool. If you sell hardware, RTINGS and Consumer Reports are. Building entity authority for AI search means, in large part, being present and accurately represented on the specific domains that AI search engines retrieve most often for your category.

What this means for your GEO strategy

The structure of the retrievable web points to a three-part GEO strategy: win the listicles, earn the recurring domains, and treat on-site content as your foundation rather than your whole program. Here is how to apply the findings.

Audit the candidate pool, not just your own page

Before optimizing anything, run your top buyer queries through AI search and through plain organic search, and write down what comes back. Which review sites rank? Which listicles? Which competitor pages? That candidate pool is the field you are actually competing on. A GEO program that starts and ends with "improve our pages" is, per this data, addressing one-third of the problem.

In review-dominated verticals, shift budget off-site

If your vertical looks like robot vacuums or VPNs — 80%+ third-party — the highest-ROI GEO work is getting onto and accurately represented within the review and comparison pages that already rank. That means review-site relationships, accurate product data on aggregators, getting your product into hands-on roundups, and monitoring how those pages describe you. This is closer to digital PR than to traditional SEO, and that is the point.

In brand-held verticals, build the moat now

If you sell B2B software, you still have a window where your own domain can win AI search visibility. Use it. Build the comprehensive, answer-first, structured content library that makes your domain a retrieval target — the Zapier model. Do it before review aggregators saturate your category, because once they do, the cost of visibility goes up.

Get into the roundups, accurately

Because the listicle is the dominant format, being included in the ranked roundups for your category is non-negotiable. But inclusion is not enough — accuracy of inclusion is the real prize. An AI search engine reading a roundup that lists you with a wrong price, a missing feature, or a stale description will synthesize that error into its answer. Audit how the ranking listicles describe you and get the errors fixed.

Remember that the candidate pool is not static

The retrievable web changes constantly — new roundups publish, rankings shift, review sites re-test. A page in the candidate pool today may be gone next quarter. This study is a snapshot of May 2026; the shape (third-party-dominated, listicle-led) is durable, but the specific pages are not. AI citations themselves rotate even faster, which is why AI citations decay and GEO has to be run as a continuous program rather than a one-time project.

The strategy that follows from the data: audit the full candidate pool for your buyer queries; in review-dominated verticals, move budget off-site toward review and comparison placements; in brand-held verticals, build on-site authority while the window is open; and treat getting accurately included in ranked roundups as core GEO work, not optional PR.

Methodology & limitations

This study analyzed the organic web-search retrieval landscape for commercial queries using programmatic web search. It is a proxy for what AI search engines can cite — not a direct log of what any specific AI engine actually cited. We are spelling out the limitations in full because overclaiming would damage the credibility of every other number in this article.

What we did. We ran 300 commercial-intent buyer queries (10 each across 30 verticals) through programmatic web search in May 2026. For each query we classified the top organic results into six source types and recorded the dominant format of the top three. In total, 2,261 results were classified. The verticals span physical products, B2B software, and considered-purchase services to avoid skewing toward any one category. Classification was based on domain identity — a site's primary business — so RTINGS and Consumer Reports are "review," a vendor's own domain is "brand," Reddit and Quora are "community."

Limitation 1: This is a proxy, not direct citation logging. The cleanest way to study AI citations would be to log what ChatGPT, Perplexity, and Gemini actually cite at scale. We did not do that. We studied the organic retrieval landscape — the candidate pool AI search engines draw from — because that pool is what determines what an AI can cite. Web-search results approximate but are not identical to an LLM's retrieval and citation behavior. AI engines apply their own re-ranking, recency weighting, and selection logic on top of the organic pool, and some run on Bing's index rather than Google's. Read every number here as "the retrievable web for commercial queries," not "AI engine X cited this."

Limitation 2: The 0% Reddit figure is almost certainly an artifact — do not over-read it. Across all 300 queries, Reddit appeared in the top 10 organic results for 0% of them. We are reporting that honestly, but we are also telling you plainly not to conclude "Reddit does not matter for AI search." It is very likely an artifact of programmatic web search returning a curated, de-duplicated result set that under-represents forum content. Reddit is well documented to carry significant weight inside LLM training data and inside the live retrieval of several AI engines — more weight than this proxy shows. Community content (Reddit, Quora, forums) was only 2.2% of our classified results overall, and that figure should be read as a floor, not a true estimate of community influence on AI answers.

Limitation 3: Classification involves judgment. Source-type classification is rules-based but not perfectly objective at the margins — a site like Zapier blends a brand domain with a genuine editorial operation, and CNBC blends news with review content. We classified by primary business identity and applied it consistently, but a different rubric would shift the percentages by a few points. Treat the figures as accurate to within a few percentage points, and treat the direction — third-party dominance, listicle prevalence — as the robust finding.

Limitation 4: Snapshot in time, US/English commercial intent. This is a May 2026 snapshot of US-English commercial buyer queries. The retrievable web changes; rankings rotate. The structural findings (third-party-dominated, listicle-led, brand-held in B2B software) are durable patterns, but the specific domains and exact percentages will drift.

In short: trust the shape of the landscape this study reveals, and the strategic conclusions that follow from it. Do not treat any single percentage as a precise, AI-engine-specific citation rate — that is not what this method measures, and pretending otherwise would be dishonest.

Frequently asked questions

Does this study log actual ChatGPT or Perplexity citations?

No. This study analyzed the organic web-search retrieval landscape — the candidate pool AI search engines draw from for commercial queries — using programmatic web search across 300 queries. It is a proxy for what AI search engines can cite, not a direct log of what any specific engine actually cited. AI engines apply their own re-ranking and selection logic on top of the organic pool, so the final citation set differs from the retrieval landscape. Read the methodology section for the full set of limitations.

What does the 67% third-party figure actually mean?

It means that across 2,261 classified organic results for commercial buyer queries, 67% came from sources the brand does not own — primarily review and comparison aggregators (50.2%) and editorial coverage (12.8%) — while only 31.1% came from brand-owned domains. The practical implication is that AI search engines building an answer about which product to buy are working from a candidate pool that is two-thirds independent content. Your own pages are a minority of the field.

Is the 67% figure the same as the "84-94% of AI citations are earned" statistic?

No, and we are careful to distinguish them. The 84-94% figures come from studies examining citation links inside actual AI answers. Our 67% comes from classifying the organic retrieval landscape — the candidate pool, not the final citations. The two corroborate each other directionally — both show third-party content dominates AI search visibility — but they are different measurements. Our number is independent corroboration of the direction, not a restatement of the same finding.

Why did Reddit appear in 0% of queries — does Reddit not matter for AI?

Do not conclude that. The 0% figure is very likely an artifact of programmatic web search returning a curated, de-duplicated result set that under-represents forum content. Reddit is well documented to carry significant weight inside LLM training data and inside the live retrieval of several AI search engines — more than this proxy shows. The 0% should be read as a limitation of the method, not as evidence that Reddit is irrelevant to AI search visibility.

Which verticals are easiest for a brand to win on its own domain?

B2B software categories. In our dataset, business checking accounts (53% brand-owned), HR/payroll software (49%), AI writing tools (48%), CRM software (45%), and accounting software (45%) were the only verticals where brand-owned content rivaled or exceeded review content. Buyers in these categories still land on vendor domains directly. Physical-product and consumer-finance categories — robot vacuums, VPNs, air purifiers, credit cards — are heavily review-dominated and far harder to win on-site alone.

What is the most common content format AI search engines retrieve for commercial queries?

The ranked roundup, or listicle — a "best X for Y" article ranking multiple products. It was the dominant top-three format for 53% of the 300 queries, and even more prevalent for high-volume "best [category] 2026" queries. This matters because a listicle is a format brands cannot credibly author about themselves, which makes being accurately included in third-party roundups a core piece of GEO work.

Which third-party domains should I prioritize for off-site GEO?

Start with the domains that recur most across commercial queries: NerdWallet (appeared in 58 of 300 queries), CNBC, Consumer Reports, U.S. News, RTINGS, TechRadar, and Bankrate. The right targets are category-specific — NerdWallet and Bankrate for financial products, RTINGS and Consumer Reports for hardware, G2 and Capterra for software. Audit which review and comparison sites rank for your specific buyer queries and prioritize being present and accurately represented on those.

How should this change my GEO budget?

If your vertical is review-dominated (physical products, consumer finance), shift budget off-site: review-site relationships, accurate product data on aggregators, and inclusion in ranked roundups. If your vertical is still brand-held (B2B software), invest in building a comprehensive, answer-first on-site content library while that window is open. In both cases, on-site optimization remains the foundation — it is just not, on its own, the whole program. Treat your own domain as one-third of the field, because that is what the data shows it is.

— Free GEO Audit

See what ChatGPT says about your brand

Get your GEO Score, competitor analysis, and actionable recommendations — free, in 60 seconds.

Run My Free Audit