Article

The AI Citation Index: Who AI Search Actually Surfaces Across 30 Verticals (2026)

10 min readLumenGEO Research
AI citationsoriginal researchcitation analysisGEO dataAI searchretrieval landscapeaggregators

We ran 150 commercial buyer queries across 30 verticals, sampled each one three times, and recorded every domain that surfaced: 3,684 result rows, 763 unique domains in the candidate pool AI search engines draw from. The headline finding: the most-surfaced single source across the whole set is not a vendor and not a news outlet. It is Zapier's blog, which showed up in 10 of 30 verticals. Third-party content (review sites, comparison aggregators, editorial roundups) is the connective tissue of the retrievable web, 64.2% of the surfaced URLs are "best/top X" roundups, and a brand's own domain is only one lever among several. This is the AI Citation Index.

Most GEO advice tells you to optimize your own pages. That is not wrong, but it answers the smaller question. The bigger question is: when someone asks an AI search engine which product to buy, what pages does it actually have to choose from? That pool, the retrieval candidate set, is where the answer gets shaped. So we mapped it.

This is the flagship read of the AI Citation Index: who AI search retrieves across 30 commercial verticals, which sources span the most categories, and what the structure of that pool means for how you spend GEO effort. Read the measurement note first, because it changes how you should quote every number below.

Across 30 verticals and 150 buyer queries (sampled three times each for 450 samples, 3,684 result rows), the most-surfaced single source was Zapier's blog, present in 10 of 30 verticals. Editorial and aggregator sources span the most categories. A brand's own domain is a minority of the field, which means GEO for commercial queries is largely an off-domain game.

What this index measures (read this first)

This is a multi-sample web-search retrieval landscape: the candidate pool AI search engines draw from. It is not a direct log of what ChatGPT, Perplexity, or Gemini cited. We are stating that up front because it governs how every figure here should be read.

AI search engines answer buyer questions by retrieving a set of web pages, reading them, and synthesising an answer with citations. The pages they retrieve come overwhelmingly from the organic search index. So if you want to understand what AI search can surface for a query, you study what web search returns for that query, repeatedly, and look at the shape. That is what the index does.

We selected 30 commercial verticals, deliberately mixing physical products (mattresses, running shoes, electric cars), B2B software (CRM, project management, accounting), and considered-purchase services (life insurance, business bank accounts, smart-home security). For each vertical we ran 5 high-intent buyer queries ("best [category]", "[Brand A] vs [Brand B]", segment variants like "best [category] for small business"). That is 150 queries. Then we pulled each query three times to separate the stable core of the pool from the sample-dependent noise, 450 samples in total, yielding 3,684 classified result rows and 763 unique domains.

We sorted every surfaced domain into three types:

  • Brand: the vendor's or manufacturer's own domain.
  • Aggregator: sites whose business is ranking and reviewing products (G2, NerdWallet, Capterra, category specialists like passwordmanager.com).
  • Editorial: publications that cover categories but are not primarily a review business (TechRadar, CNBC, Zapier's blog).

What this gives you is a reproducible picture of the commercial retrieval landscape. What it does not give you is a direct measurement of which page a specific LLM cited. AI engines apply their own re-ranking, recency weighting, and selection on top of this pool, and some run on Bing's index rather than Google's. So read every number here as "what AI search retrieves for this query", a lower-bound proxy, never "Perplexity cited this." And five queries per vertical is a snapshot, not a census, so no universal claims follow from any single vertical.

A few sources are the connective tissue of the whole web

The same handful of domains surface again and again across unrelated categories. Zapier's blog appeared in 10 of 30 verticals; TechRadar in 9; G2 in 8; NerdWallet in 6 (with 66 appearances and a 3.05 mean position, the strongest finance presence in the set). These are the cross-vertical authorities, the sources AI search reaches for no matter the category.

Here are the top 12 sources by how many verticals they surfaced across:

SourceTypeVerticalsAppearancesMean position
zapier.comEditorial10554.93
techradar.comEditorial9404.65
g2.comAggregator8404.67
nerdwallet.comAggregator6663.05
cnbc.comEditorial6454.33
youtube.comEditorial6224.91
cybernews.comEditorial5255.36
hubspot.comBrand5214.71
tomsguide.comEditorial5164.75
uschamber.comEditorial5175.24
research.comAggregator5166.56
security.orgAggregator4492.8

A few things worth noting.

Zapier is the instructive outlier. It is a software brand, not a review site, but its blog functions like an editorial publication, and it surfaces across 10 software verticals (cloud storage, CRM, email marketing, HR, project management, time tracking, video conferencing, website builders, and more). Zapier is the existence proof for the brand-owned play: a vendor can earn cross-category visibility on its own domain. But notice what it took, years of comprehensive, answer-first content, not a handful of optimised product pages. That is the bar, and it is why we class its blog as editorial rather than brand-marketing.

The reach belongs to editorial and aggregators, not vendors. Of the 12 sources spanning the most verticals, only one (HubSpot) is a product brand, and it appears in five. The breadth is owned by publications and directories: the sources AI search treats as general-purpose authorities. NerdWallet (66 appearances, mean position 3.05) and security.org (49 appearances, mean position 2.8) are not optional in finance and security, they are most of the candidate pool.

The recurring domains are your off-site target list. If you sell a financial product, NerdWallet and CNBC are the pool. If you sell security software, security.org and cybernews.com are. Getting present and accurately represented on the specific domains AI search retrieves most often for your category is, in large part, what GEO looks like off your own site.

→ So what

Reach across the commercial web is concentrated in editorial publications and aggregators, not in vendor domains. A GEO program aimed only at the brand's own pages is competing for the part of the field with the least cross-category reach. The bigger lever is getting into the roundups and directories that AI search retrieves over and over.

The candidate pool is majority non-brand, and overwhelmingly roundups

Counting every unique domain in the pool, 48.6% were brand-owned, 30.4% aggregators, and 21.0% editorial. But brand domains are long-tail: weight by how often each one actually surfaces and brand drops to 41.6%, with aggregator climbing to 36.8% and editorial at 21.6%. Either way, more than half the field is third-party, and 64.2% of all surfaced URLs are "best/top X" roundup-format pages.

Here is the global source-type split, by unique domains and by how often they appear:

Source typeShare of unique domainsShare of appearancesWhat it is
Brand48.6%41.6%The vendor's own domain
Aggregator30.4%36.8%Review and comparison directories
Editorial21.0%21.6%Publications covering the category

The gap between the two columns is the real story. There are more unique brand domains in the pool than anything else, of course, every vendor has a site. But each brand domain tends to surface for its own narrow slice and then disappear. Aggregators do the opposite: fewer unique domains, but each one shows up across far more queries, which is why aggregator share rises from 30.4% to 36.8% once you weight by appearances. The aggregators are the connective tissue.

And the dominant format is unmistakable: 64.2% of surfaced URLs are roundup-shaped, a ranked "best/top X for Y" list. That carries a slightly uncomfortable consequence: the format that dominates AI's view of your category is one you cannot publish credibly about yourself. A ranked roundup of the seven best project-management tools is only persuasive when a third party writes it; your own "why we are number one" page does not occupy that slot. So in a roundup-dominated pool, the highest-leverage move is not writing a better page, it is getting placed, accurately and favourably, inside the roundups that already surface.

Brand domains are 48.6% of unique sources but only 41.6% of appearances, because each one surfaces narrowly. Aggregators are the inverse, rising to 36.8% of appearances on fewer domains. And 64.2% of all surfaced URLs are "best/top X" roundups, a format brands cannot author about themselves. Being included in those roundups, accurately, is core GEO work, not PR garnish.

Niche specialists win narrow verticals

Zoom out and editorial giants own the cross-vertical reach. Zoom in on a single category and the picture flips: tightly focused specialists, often sites built around one product class, take the top of the pool. A vertical-specialist domain that means nothing in 29 categories can be the single most-surfaced source in the 30th.

The pattern is clearest in the verticals where the five queries draw from a stable, overlapping pool (a high agreement between repeated samples), so the leaderboard is a genuine ranking rather than five disjoint result sets. A few examples:

  • Password managers are led by passwordmanager.com, a category specialist surfacing at a 1.83 mean position, ahead of the general-purpose tech press.
  • VPNs are led by thebestvpn.com (1.5 mean position), another single-purpose comparison site.
  • Smart-home security is led by safehome.org (22 appearances), a vertical-specialist directory.
  • Mattresses are led by naplab.com (15 appearances, 2.4 mean position), a hands-on testing site for exactly one product class.
  • Running shoes are led by runrepeat.com (2.75 mean position), again a category specialist.
  • Electric cars are led by edmunds.com (2.77 mean position), the automotive authority.

There are brand-led narrow verticals too, where the vendor's own content is strong enough to top its category. Business bank accounts are led by airwallex.com's blog (21 appearances, best position 1). Time-tracking software is the most brand-heavy vertical in the set, 80.6% of its surfaced domains are brand-owned, led by apploye.com. CRM is led by hubspot.com. These are the existence proofs that on-domain content can win a category outright, but notice they are categories where a vendor has invested in genuinely useful comparison content, not thin product pages.

The takeaway for a brand: there are two games, and which one you play depends on your vertical. If a category specialist owns your space, you compete to be ranked well inside that specialist's roundups. If your vertical is still brand-held, your own domain is a live lever, use it while the window is open.

→ So what

Cross-vertical reach and within-vertical leadership are different contests. Editorial giants span categories; niche specialists win single ones. Find the specialist that owns your vertical's pool, then work to be accurately and favourably ranked inside it, because that one site can be most of your category's candidate pool.

Not every vertical has a clean leaderboard

In some verticals the five queries pull from genuinely disjoint pools, so there is no single ranking to report. Our broad "saas" probe is the clearest case: its queries fan out across email tools, project management, and CRM-adjacent results that barely overlap, so a "top domain" would be misleading and we do not present one. Where repeated samples disagree, the honest read is the source-type mix, not a ranked list. Treat any leaderboard here as reliable only where the pulls agreed, and as directional given five queries per vertical. This is also why we pulled each query three times instead of once: the pool shifts between samples taken minutes apart, and a single check of "did I get surfaced" tells you very little.

What this means for your GEO strategy

The structure of the index points to a layered strategy: own-domain content is one lever, but the larger game is getting into the roundups and aggregators that AI search retrieves across your category. Four moves follow directly from the data.

Audit the candidate pool, not just your own page. Run your top buyer queries through AI search and plain web search, more than once, and write down what comes back: which aggregators surface, which roundups, which competitor pages. That pool is the field you are actually competing on. A program that starts and ends with "improve our pages" is addressing the minority of it.

Find the specialist that owns your vertical. Single categories are often dominated by one or two specialist sites (passwordmanager.com, safehome.org, naplab.com, edmunds.com). Getting ranked well and described accurately inside that specialist's roundups is likely higher-leverage than any on-site change you can make.

Get into the roundups, accurately. Because 64.2% of surfaced URLs are roundups, inclusion in the ranked "best X" lists for your category is non-negotiable, and accuracy of inclusion is the real prize. AI search reading a roundup that lists you with a wrong price or a stale description will fold that error into its answer. Audit how the surfacing lists describe you, and fix what is wrong.

If your vertical is still brand-held, build the moat now. Time-tracking (80.6% brand domains), business bank accounts (airwallex.com leading), and CRM (hubspot.com leading) prove a vendor domain can still win a category. If you sell B2B software, that window is open: build the comprehensive, answer-first content library that makes your domain a retrieval target. Zapier is the model, and the bar.

And run it continuously. The pool is not static (roundups publish, rankings shift, the set even moves between identical pulls minutes apart), so this index is a snapshot. The shape (third-party-heavy, roundup-led, specialist-owned within verticals) is durable; the specific pages are not.

The strategy the index supports: audit the full candidate pool for your buyer queries; identify and get accurately ranked inside the specialist site that owns your vertical; treat inclusion in ranked roundups as core work because 64.2% of surfaced URLs are roundups; and, if you are still brand-held, build on-domain authority while the window is open. Own-domain is one lever. The roundups and aggregators are the bigger game.

— Free GEO Audit

See what ChatGPT says about your brand

Get your GEO Score, competitor analysis, and actionable recommendations — free, in 60 seconds.

Run My Free Audit

Methodology and limitations

The AI Citation Index analyses a multi-sample web-search retrieval landscape, a proxy for what AI search engines can surface, not a direct log of what any engine actually cited. We spell out the limits in full because overclaiming would damage every number above.

What we did. We ran 150 commercial buyer queries (5 each across 30 verticals) through programmatic web search, sampling each query three times for 450 samples. That produced 3,684 classified result rows and 763 unique domains. Each domain was classed by primary business identity (brand, aggregator, or editorial), and we recorded position, appearance counts, cross-vertical reach, and the agreement between repeated samples per vertical.

It is a proxy, not citation logging. The cleanest way to study AI citations would be to log what ChatGPT, Perplexity, and Gemini cite at scale. We did not do that. We mapped the retrieval candidate pool because that pool determines what an AI can surface. Web-search results approximate but are not identical to an LLM's retrieval and citation behaviour; engines apply their own re-ranking and selection, and some use Bing's index. Read every figure as "the retrievable web for commercial queries," a lower bound, not "engine X cited this."

Five queries per vertical is a snapshot, not a census. Per-vertical leaderboards are reliable only where the three samples agreed; where they drew from disjoint pools (our broad "saas" probe is the clearest case) we report the source-type mix instead of a ranking. Treat any single-vertical figure as directional.

Classification involves judgment. Sorting by primary business identity is rules-based but not perfectly objective at the margins: Zapier's blog blends a brand domain with a genuine editorial operation, which is why we class it editorial. A different rubric would move the percentages by a few points. Trust the direction, not any figure to the decimal beyond what is reported. And this is a single capture of US-English commercial intent: the structural findings are durable, but the specific domains and exact shares will drift.

Find your category in the index

The full per-vertical breakdown is browsable. Each category page in the interactive AI Citation Index shows the specific domains AI search surfaces, the brand-versus-third-party split, the run-to-run stability, and the exact queries we sampled. Start with VPNs, CRM, mattresses, or the full index of 30 verticals to find the sites that own your category's retrieval pool.

Frequently asked questions

Does the AI Citation Index log actual ChatGPT or Perplexity citations?

No. The index maps a multi-sample web-search retrieval landscape, the candidate pool AI search engines draw from, across 150 commercial queries sampled three times each (450 samples, 3,684 result rows, 763 unique domains). It is a proxy for what AI search can surface, not a log of what any engine cited. Engines apply their own re-ranking and selection on top of this pool, so the final citation set differs from the retrieval landscape.

What is the single most-surfaced source across the 30 verticals?

Zapier's blog, which surfaced in 10 of the 30 verticals, the widest cross-category reach in the index. It is a software brand whose blog functions like an editorial publication. TechRadar (9 verticals) and G2 (8 verticals) follow, and NerdWallet has the strongest single-category presence in finance with 66 appearances at a 3.05 mean position.

How much of the candidate pool is third-party versus brand-owned?

By unique domains, brand-owned is 48.6%, aggregators 30.4%, and editorial 21.0%. But brand domains surface narrowly, so weighting by how often each domain appears drops brand to 41.6% and raises aggregators to 36.8% (editorial 21.6%). Either way, more than half the field is third-party, and the aggregators carry disproportionate weight relative to their domain count.

Why does the index focus so much on roundups?

Because 64.2% of all surfaced URLs are roundup-format "best/top X for Y" pages, the dominant content shape in the retrieval pool. That matters because a roundup is a format a brand cannot credibly author about itself, which makes being accurately included in third-party roundups a core piece of GEO work rather than optional PR.

Which verticals can a brand still win on its own domain?

Mostly B2B software with strong content investment. In this index, time-tracking software was 80.6% brand-owned domains, business bank accounts were led by airwallex.com's blog, and CRM was led by hubspot.com. Physical-product and consumer categories tend to be owned by niche specialists (passwordmanager.com, safehome.org, naplab.com, edmunds.com), which are far harder to win on-site alone.

How should this change my GEO budget?

Treat your own domain as one lever, not the whole program. Identify the specialist site that owns your vertical's pool and work to be ranked accurately inside it. Because 64.2% of surfaced URLs are roundups, prioritise inclusion in the ranked "best X" lists for your category. If your vertical is still brand-held, invest in a comprehensive on-domain library while that window is open, and run the whole thing continuously, because the pool shifts even between identical pulls.

— Free GEO Audit

See what ChatGPT says about your brand

Get your GEO Score, competitor analysis, and actionable recommendations — free, in 60 seconds.

Run My Free Audit