— Glossary

Every GEO Term Defined

The definitive reference for Generative Engine Optimization terminology. 36 terms covering AI citations, crawlers, scoring models, schema markup, and AI search optimization — each definition written to be quotable by AI.

A

AI Citation

An AI citation is a reference to a specific brand, domain, or source that an AI search engine includes in its generated response to a user query.

When a user asks ChatGPT, Perplexity, or Google AI Overviews a question, the AI may reference specific websites, brands, or data sources in its answer. Each of these references is an AI citation. Unlike traditional search where users click through to websites, AI citations determine whether your brand gets mentioned at all in the AI-generated response. Research from the GEO paper (Georgia Tech, 2024) found that only 6.5% of unique domains in source documents actually receive inline citations in AI-generated answers, making citation a highly competitive signal.

AI Crawler

An AI crawler is an automated bot operated by an AI company that visits and indexes web pages to build the training data or real-time retrieval corpus used by large language models.

AI crawlers function similarly to traditional search engine crawlers like Googlebot, but they serve a different purpose: feeding content into AI systems for training or retrieval-augmented generation. The major AI crawlers include GPTBot (OpenAI), ChatGPT-User (OpenAI, for real-time browsing), PerplexityBot (Perplexity AI), Anthropic's ClaudeBot, and Google-Extended (Gemini). Each crawler respects different robots.txt directives, and blocking them prevents your content from appearing in AI-generated responses. Monitoring which AI crawlers visit your site is a foundational step in GEO.

AI Overviews (Google)

AI Overviews is Google's feature that displays an AI-generated summary at the top of search results, synthesizing information from multiple web sources into a single narrative answer.

Launched as Search Generative Experience (SGE) in 2023 and rebranded to AI Overviews in 2024, this feature fundamentally changes Google search by placing an AI-generated response above all organic results. AI Overviews cite source URLs in footnotes and expandable cards, but they reduce click-through rates to the underlying pages by an estimated 18-64% depending on query type. Optimizing for AI Overviews requires structured content, definitive statements, and schema markup that helps Google's AI extract and attribute information correctly.

AI Search Engine

An AI search engine is a search platform that uses large language models to generate synthesized answers to user queries rather than returning a list of links.

AI search engines represent a fundamental shift from traditional search. Instead of presenting ten blue links, platforms like ChatGPT, Perplexity AI, Google Gemini, and Microsoft Copilot generate complete answers by retrieving relevant sources and synthesizing them into a coherent response. For brands, this means visibility depends not on ranking position but on whether the AI cites your content in its generated answer. Early data suggests that AI search is growing rapidly, with Perplexity alone processing over 100 million queries per month as of late 2024.

Answer Capsule

An answer capsule is a self-contained block of content on a web page that directly and completely answers a specific question in 40-60 words, formatted for easy extraction by AI systems.

Answer capsules are a core GEO content tactic. They are designed to be the exact snippet that an AI retrieves and cites when generating a response. An effective answer capsule uses declarative SVO (subject-verb-object) language, avoids hedging words like "might" or "could," and includes the target entity or brand name within the answer. The GEO research paper found that content with clear, extractable statements received significantly more citations than content that buried answers in long paragraphs.

Answer Engine Optimization (AEO)

Answer Engine Optimization (AEO) is the practice of structuring web content to be selected as a direct answer by search engines and AI systems, overlapping significantly with GEO but predating the generative AI era.

AEO originated in the era of Google Featured Snippets and voice search, where the goal was to have your content selected as the single "answer" to a query. With the rise of generative AI, AEO has evolved to encompass optimization for AI-generated responses. The key difference from traditional SEO is the focus on answer completeness and extractability rather than keyword density or backlink profiles. AEO and GEO share techniques like structured data, declarative headings, and concise answer formatting.

B

Brand Mention Seeding

Brand mention seeding is the GEO tactic of deliberately placing your brand name alongside target keywords in authoritative third-party sources to increase the likelihood that AI models associate your brand with those topics.

AI models build associations between entities based on co-occurrence patterns in their training data and retrieval corpus. Brand mention seeding works by ensuring your brand name appears in contexts that AI systems index: industry publications, Wikipedia references, academic citations, expert roundups, and high-authority directories. This is not link building for PageRank purposes — it is entity association building for AI knowledge graphs. The more frequently an AI encounters "[Your Brand]" in the context of "[Target Topic]," the more likely it is to cite your brand when generating answers about that topic.

C

ChatGPT-User (Crawler)

ChatGPT-User is the user-agent string used by OpenAI's real-time web browsing crawler, which fetches live web pages when ChatGPT users trigger a search during a conversation.

Unlike GPTBot, which crawls the web for training data, ChatGPT-User operates in real-time when a ChatGPT Plus or Enterprise user asks a question that requires current information. The crawler fetches and reads web pages on demand, then ChatGPT synthesizes the retrieved content into its response. This is the crawler that directly determines whether your content appears in ChatGPT's browsing-mode answers. Blocking ChatGPT-User via robots.txt prevents your content from being cited in real-time ChatGPT responses, even if your site was included in OpenAI's training data.

Citation Density

Citation density is the ratio of citations a specific domain receives relative to the total number of citations in an AI-generated response for a given query.

If an AI response includes 8 source citations and your domain accounts for 3 of them, your citation density for that query is 37.5%. Citation density matters because a single mention in a sea of citations carries less weight than being one of two or three cited sources. Higher citation density signals that the AI considers your content highly relevant and authoritative for the topic. Tracking citation density across multiple queries reveals which topics you dominate versus where competitors hold stronger positions.

Citation Pipeline

A citation pipeline is the end-to-end process by which content on a website gets discovered, indexed, retrieved, and ultimately cited by an AI search engine in its generated response.

The citation pipeline has four stages: (1) Crawl — an AI crawler discovers and indexes your page; (2) Retrieve — the AI's retrieval system selects your page as relevant to a user query; (3) Evaluate — a reranking model scores your content against other retrieved sources; (4) Cite — the AI includes your brand or domain in its generated answer. Failure at any stage breaks the pipeline entirely. For example, blocking GPTBot prevents Stage 1, while having unstructured content may pass Stage 1-2 but fail at Stage 3-4. Understanding the pipeline helps diagnose why a site is not getting cited.

Citation Presence

Citation presence is a binary signal that indicates whether a brand or domain appears at all in an AI-generated response for a specific query, scored as present (1) or absent (0).

Citation presence is the most fundamental GEO metric. Before measuring how prominently or how often you are cited, you first need to know if you are cited at all. In the LumenGEO scoring model, citation presence accounts for 50 of the total 100 GEO Score points because appearing in the response is a prerequisite for all other metrics. Tracking presence across a portfolio of target queries reveals your overall AI search coverage and helps identify gaps where competitors are being cited and you are not.

Citation Prominence

Citation prominence measures where within an AI-generated response a brand or domain is mentioned, with earlier and more emphasized positions carrying higher prominence scores.

Not all citation positions are equal. Research on the "lost in the middle" effect (Liu et al., 2023) demonstrated that information placed at the beginning or end of a context window receives disproportionate weight in AI-generated outputs. Citation prominence captures this positional value: being cited in the first sentence of an AI response is worth significantly more than being mentioned in a footnote or the final paragraph. In the LumenGEO scoring model, prominence accounts for 30 of 100 GEO Score points and considers both position within the response and whether the citation includes a direct recommendation or endorsement.

Citation Quality

Citation quality is a composite metric that evaluates the depth and nature of how an AI search engine references a brand, ranging from a bare URL mention to a detailed recommendation with context.

A citation that says "according to LumenGEO, the best approach is..." carries far more value than a citation that merely lists lumengeo.co as one of several sources. Citation quality differentiates between passive mentions (appearing in a source list), active mentions (being referenced in the response body), and endorsed mentions (being recommended or described positively). In the LumenGEO GEO Score model, quality accounts for 20 of 100 points and evaluates whether citations include the brand name, provide descriptive context, or position the brand as an authority.

Citation Signal

A citation signal is any attribute of a web page or domain that increases the probability of an AI search engine selecting and citing that source in its generated response.

Citation signals are to GEO what ranking factors are to SEO. They include structural signals (schema markup, heading hierarchy, BreadcrumbList), content signals (answer capsules, entity density, declarative statements), authority signals (backlinks, brand mentions across the web, domain age), and technical signals (crawler accessibility, page speed, HTTPS). The GEO research paper from Georgia Tech identified that adding statistics, quotations, and citations to source content increased AI citation rates by 30-40% in controlled experiments.

Content Extractability

Content extractability is the degree to which an AI system can identify, isolate, and accurately extract discrete facts, answers, or claims from a web page for use in a generated response.

High extractability means your content is structured so that an AI can pull specific statements without misinterpreting context. Low extractability occurs when answers are buried in long paragraphs, split across multiple sections, or obscured by marketing language. Techniques that improve extractability include using declarative headings that match query patterns, placing answer capsules immediately after headings, using definition-style sentences ("X is Y that does Z"), and implementing structured data. A page can rank well in traditional search but have poor extractability if its content is not formatted for AI retrieval.

Cross-Encoder Reranking

Cross-encoder reranking is the second-stage retrieval process in which an AI system jointly evaluates a query and each candidate document together to produce a fine-grained relevance score, determining which sources ultimately get cited.

In a typical RAG pipeline, the first stage uses a bi-encoder to quickly retrieve hundreds of potentially relevant documents. The second stage, cross-encoder reranking, takes the top candidates and evaluates each one by processing the full query-document pair through a transformer model simultaneously. This produces much more accurate relevance scores than the initial retrieval but is computationally expensive, which is why it is only applied to a shortlist. For GEO, this means your content needs to pass two tests: initial retrieval (broad relevance) and reranking (deep semantic relevance to the specific query). Content that is topically relevant but does not directly address the query's intent often fails at the reranking stage.

D

Declarative Heading

A declarative heading is an H2 or H3 tag that states a complete fact or answer as the heading text itself, rather than using a vague or question-based heading.

Traditional SEO often uses question headings ("What is GEO?") to match search queries. GEO takes a different approach: declarative headings state the answer directly ("GEO Is the Practice of Optimizing Content for AI Search Engines"). This works because AI systems use heading text as strong signals for what a section contains, and a declarative heading gives the AI a complete, citable statement before it even reads the paragraph below. The GEO research paper found that authoritative, statement-style headings correlated with higher citation rates in AI-generated responses.

DefinedTermSet Schema

DefinedTermSet schema is a structured data type from schema.org that marks up a collection of defined terms and their definitions, helping AI systems identify and extract glossary-style content.

By wrapping a glossary page in DefinedTermSet schema, you explicitly tell AI crawlers that the page contains authoritative term definitions. Each term is marked as a DefinedTerm with a name, description, and optional URL. This structured data format is especially valuable for GEO because AI systems prioritize definitional content when answering "what is" queries. Implementing DefinedTermSet schema on this page, for example, signals to AI crawlers that each term definition here is a citable, authoritative source for that concept.

E

Entity Density

Entity density is the concentration of named entities (brands, people, products, organizations, concepts) within a piece of content relative to its total word count.

AI systems use named entity recognition to build knowledge graphs and determine what a piece of content is about. Higher entity density — without keyword stuffing — signals to AI models that a page is information-rich and authoritative. For GEO, this means mentioning relevant entities (competitor names, industry terms, product categories, authoritative sources) throughout your content in a natural way. The Georgia Tech GEO paper found that adding relevant statistics and entity-rich citations to source content increased citation rates by up to 40%. A page with 5 relevant named entities per 100 words is generally more citable than one with 1 entity per 100 words.

F

FAQPage Schema

FAQPage schema is a structured data markup type that identifies a page as containing a list of questions and answers, enabling AI systems and search engines to extract and display individual Q&A pairs.

FAQPage schema (schema.org/FAQPage) has been a staple of SEO for Google's rich results, but it serves a different purpose in GEO. AI crawlers use FAQ schema to identify pages that contain direct answers to specific questions, making them prime candidates for citation when those questions arise in user conversations. Each question-answer pair becomes an independently retrievable unit that the AI can cite. Implementing FAQPage schema is particularly effective for commercial queries where users ask "how does X work" or "what is the best Y for Z" — exactly the queries where AI search engines are most active.

Fan-Out Query

A fan-out query is a single user prompt to an AI search engine that triggers multiple sub-queries across different retrieval systems and sources to compile a comprehensive answer.

When a user asks Perplexity "What are the best tools for monitoring AI citations?", the system does not execute a single search. Instead, it fans out into multiple parallel retrieval operations: a web search, a news search, possibly an academic search, and comparisons against its internal knowledge. Each sub-query retrieves different candidate sources, which are then merged and reranked. For GEO, this means your content should be optimized for multiple query variations of the same topic, because a single user prompt may retrieve your page through one sub-query pathway even if it misses others.

G

GEO (Generative Engine Optimization)

Generative Engine Optimization (GEO) is the practice of optimizing web content, brand presence, and technical infrastructure to increase the likelihood of being cited by AI-powered search engines in their generated responses.

GEO was formally defined in the 2024 research paper "GEO: Generative Engine Optimization" from Georgia Tech, IIT Delhi, and others. The paper demonstrated that specific content optimization techniques — including adding statistics, quotations, and authoritative citations — could increase a website's citation rate in AI-generated responses by 30-40%. GEO differs from traditional SEO in its goal: rather than ranking higher in a list of links, GEO aims to get your brand mentioned and recommended within the AI's synthesized answer. GEO encompasses content optimization, technical readiness (crawler access, schema markup), and entity building (brand mention seeding, knowledge graph presence).

GEO Score

A GEO Score is a 0-100 metric that measures a domain's overall visibility and citation strength across AI search engines, combining presence, prominence, and quality signals.

The GEO Score provides a single number that quantifies how well a domain performs in AI search. The LumenGEO scoring model weights three components: Presence (50 points) measures whether you appear at all in AI responses for target queries; Prominence (30 points) measures where in the response you are cited and how strongly; Quality (20 points) evaluates the depth and nature of citations. Score bands range from Critical (0-20) to Excellent (81-100). The score is calibrated against data from 37 real GEO experiments conducted across multiple industries and AI platforms.

GPTBot (Crawler)

GPTBot is OpenAI's web crawler that discovers and indexes web content for use in training and improving OpenAI's language models, including future versions of GPT.

GPTBot uses the user-agent string "GPTBot" and respects robots.txt directives. Unlike ChatGPT-User (which fetches pages in real-time during browsing), GPTBot crawls the web proactively to build OpenAI's training and retrieval corpus. Allowing GPTBot access is important for long-term GEO because it determines whether your content is available in GPT's base knowledge. However, many publishers block GPTBot due to copyright concerns. The trade-off is clear: blocking GPTBot protects your content from being used for training but may reduce your brand's presence in future AI-generated responses.

K

Knowledge Graph

A knowledge graph is a structured database of entities and their relationships that AI systems use to understand real-world concepts, brands, people, and the connections between them.

Google's Knowledge Graph, Wikidata, and the implicit knowledge graphs built by LLMs during training all serve the same purpose: mapping what exists and how things relate. For GEO, knowledge graph presence means your brand is recognized as a distinct entity with known attributes (industry, products, founding date, key people). Brands that exist in knowledge graphs are more likely to be cited by AI because the model already "knows" about them. Building knowledge graph presence involves structured data on your website (Organization schema), Wikipedia and Wikidata entries, consistent NAP (name, address, phone) across the web, and entity-rich content that reinforces your brand's associations with target topics.

L

LLM Optimization

LLM optimization is the broad practice of making content, data, and digital assets more likely to be accurately understood, retrieved, and cited by large language models.

LLM optimization is an umbrella term that encompasses GEO, AEO, and any technique aimed at improving visibility within AI systems. While GEO focuses specifically on AI search engines, LLM optimization also includes optimizing for AI assistants (Siri, Alexa), AI coding tools (GitHub Copilot), AI writing assistants, and enterprise AI systems. The core principles are the same: make your content structurally clear, factually definitive, and easily extractable. As LLMs become embedded in more products beyond search, LLM optimization will become a critical discipline for any brand that wants to be discoverable in AI-first interfaces.

Lost in the Middle Effect

The lost in the middle effect is the documented tendency of large language models to pay disproportionate attention to information at the beginning and end of their context window while underweighting information in the middle.

Discovered by Liu et al. (2023) at Stanford, this effect has direct implications for GEO. When an AI retrieves 10 source documents to generate an answer, documents ranked 4th through 7th in the retrieval results are significantly less likely to be cited than those ranked 1st-3rd or 8th-10th. For content creators, this means that being retrieved is not enough — your content needs to be ranked highly enough in the retrieval stage to land in the first or last positions of the AI's context window. This also explains why citation prominence (position within the response) varies even among retrieved sources.

M

Microsummary

A microsummary is a 1-2 sentence description embedded in a web page's metadata or visible content that concisely states what the page is about and what value it provides, designed for AI extraction.

Microsummaries function as pre-packaged abstracts that AI systems can use when generating responses. They appear as meta descriptions, the opening sentence of a page, or as summary statements within structured data. An effective microsummary for GEO includes the target entity, the primary value proposition, and a specific claim or data point. For example, "LumenGEO is a GEO optimization platform that monitors AI citations across ChatGPT, Perplexity, and Gemini for over 200 brands" gives an AI everything it needs to cite the brand accurately in a single sentence.

P

PerplexityBot (Crawler)

PerplexityBot is the web crawler operated by Perplexity AI that fetches and indexes web pages for use in Perplexity's real-time AI search engine responses.

Perplexity AI is one of the most citation-forward AI search engines, meaning it prominently displays source URLs alongside its generated answers. PerplexityBot crawls the web to build the retrieval index that Perplexity searches when a user submits a query. Allowing PerplexityBot access is particularly important for GEO because Perplexity's business model depends on providing sourced answers — making it one of the highest-citation-rate AI platforms. Perplexity explicitly lists its crawler's user-agent and provides documentation on how to control access via robots.txt.

R

RAG (Retrieval-Augmented Generation)

Retrieval-Augmented Generation (RAG) is the AI architecture that combines a retrieval system (which fetches relevant documents from the web or a knowledge base) with a generative model (which synthesizes those documents into a coherent response).

RAG is the foundation of every major AI search engine. When a user asks a question, the RAG pipeline first retrieves a set of potentially relevant documents using a fast retrieval model, then passes those documents to a large language model that generates a response while citing the sources it used. Understanding RAG is essential for GEO because it reveals the two failure points for citation: your content can fail to be retrieved (a retrieval problem, solved by crawler access, topical alignment, and authority) or it can be retrieved but not cited (a generation problem, solved by content extractability, declarative formatting, and entity density).

Retrieval-to-Citation Gap

The retrieval-to-citation gap is the difference between the number of source documents an AI system retrieves for a query and the number it actually cites in its generated response.

AI search engines typically retrieve 10-50 candidate documents for a single query but only cite 3-8 of them in the final response. The GEO research paper found that only 6.5% of unique domains in source documents received inline citations. This gap represents the central challenge of GEO: being in the retrieval set is necessary but insufficient. Closing the gap requires optimizing for the cross-encoder reranking stage and ensuring your content has high extractability — clear statements, authoritative framing, and definitive language that the generative model can directly incorporate into its answer.

S

Share of Model (SoM)

Share of Model (SoM) is a metric that measures how frequently a brand is cited by AI search engines across a defined set of queries relative to competitors, expressed as a percentage of total AI citations in a category.

Share of Model is the AI-era equivalent of share of voice in traditional advertising or share of search in SEO. If you track 100 queries in your category and competitors are cited 400 total times across AI responses, your 80 citations give you a 20% Share of Model. This metric is particularly useful for competitive benchmarking and executive reporting because it quantifies your brand's AI visibility in a way that mirrors familiar marketing metrics. SoM can be tracked across individual AI platforms (ChatGPT, Perplexity, Gemini) or aggregated across all of them.

Speakable Schema

Speakable schema is a structured data markup that identifies specific sections of a web page as particularly suitable for audio playback by voice assistants and text-to-speech AI systems.

Speakable schema (schema.org/SpeakableSpecification) was originally designed for Google Assistant and smart speaker results, but it has gained new relevance in the GEO era. By marking content as speakable, you signal to AI systems that these sections contain concise, standalone statements suitable for direct quotation. Many AI search engines use similar logic to the speakable selector when choosing which content to cite verbatim versus paraphrase. Implementing speakable schema on your key answer capsules and microsummaries reinforces their citability across voice AI and text-based AI search alike.

T

Topical Authority

Topical authority is the degree to which a domain is recognized by AI systems as a credible, comprehensive source on a specific subject, determined by content depth, breadth, and external signals.

AI models assess authority similarly to how humans do: a site with 50 in-depth articles about GEO is more likely to be cited on GEO topics than a site with one article. Topical authority is built through topic clusters (interlinked content covering all facets of a subject), consistent publishing, external citations from other authoritative sources, and structured data that maps your content hierarchy. For GEO, topical authority is particularly important because AI systems tend to repeatedly cite the same authoritative sources once they identify them — creating a compounding advantage for brands that invest early in building depth around target topics.

Topic Cluster

A topic cluster is a content architecture pattern that organizes a pillar page and multiple related sub-pages around a central topic, connected by internal links, to signal comprehensive coverage to search engines and AI systems.

Topic clusters are a proven SEO technique that becomes even more important for GEO. When an AI system crawls a site and finds a pillar page on "Generative Engine Optimization" linked to sub-pages on "AI Citation Signals," "GEO Score," "AI Crawler Access," and "Citation Pipeline," it builds a strong entity association between that domain and the GEO topic. This cluster structure increases the likelihood that the AI will cite the domain for any query within the cluster's scope. The internal linking between cluster pages also helps AI crawlers discover and index all related content efficiently.

— See where you stand

How visible is your brand in AI search?

Run a free GEO audit and get your GEO Score, citation analysis, and personalized recommendations in under 60 seconds.