[
]

Citations Not Rankings: How Google’s AI Overview Chooses its Sources

How does Google decide which sources get listed in its AI Overviews?

Does it simply pull the top-ranked organic results?

The answer to the last question is no, Google’s LLM, Gemini, isn’t primarily concerned with who ranks #1 in organic search when hunting for sources. 

Instead, Gemini cites the pages that:

  1. Best satisfy the intent behind the user’s prompt 
  2. Are logically structured (concise formatting and structured data) 
  3. Exhibit core trust signals 

Granted, there’s typically lots of overlap between the top-ranked organic results and the sources Gemini chooses to cite. 

However, this isn’t because Google is strictly pulling results from the top 10. It’s because some top-ranked pages already contain the right elements to get cited by AI tools

In other words, it’s a side effect, not a selection rule. 

This also explains why some pages win citations in AI Overviews despite being ranked far outside of the top 10

In this guide, we’ll unpack the exact processes Google uses to select sources for its AI Overviews, and explore the mindset shift necessary to target AI citations instead of rankings.

How Do AI Overview Citations Work? How Do They Differ From Featured Snippets?

Recent research from Ahrefs found that AI Overviews (AIOs) generate for 20.5% of all keyword searches on Google. 

Bear in mind, that’s only the baseline across over 146 million results

Also, they most commonly appear for informational, question-based, and long-form queries (greater than 5 – 7 words). 

AIOs work by generating an original answer to the user query. To synthesize the answer, Gemini uses a hybrid approach that combines:

  1. Its internal knowledge 
  2. External sources via RAG (retrieval-augmented generation) 

It then lists the primary 3 – 8 sources it pulled from in the source panel. Other sources are also available to view by clicking on inline links:

The source panel then changes depending on which link you click on:

Scrolling down, users have a chance to dive deeper into the topic with Google’s AI Mode, or they can continue to the organic search results:

If you’ve been in the SEO game for a while, AIOs may remind you of Featured Snippets, but there are some key differences. 

The most important difference is that Featured Snippets highlight a single organic result. 

Here’s what one looks like from Safari with AI Overviews turned off (which is about the only time they appear now):

As you can see, it still provides a direct answer to the user query, but it’s a direct quote from Goodwin University and not a synthesized answer. Also, since the Goodwin University result is the only one referenced, it stands to gain all the traffic from the placement. 

AIOs, on the other hand, feature multiple sources, diluting the chances that one brand will get to claim all the traffic.

What are the Key Mechanics Involved in AI Overviews?

Besides understanding how Google selects sources for AIOs, it’s also helpful to know which mechanics are involved in the retrieval and generation processes. 

Query fan-out

In order to retrieve relevant documents, Google’s LLM doesn’t use the exact language from the prompt to look up content. Instead, it breaks the query down into a series of sub-queries that target each specific intent. 

For instance, the query ‘Best VPN for Mac Mini M2 with good speeds for SEO tools’ is too long and complicated for retrieving accurate results from search indexes. To remedy this, Gemini (or any other LLM) will fan the query out into multiple queries. 

These include ‘top no-log VPNs for 2026’ to handle the privacy and trust aspect of the search, ‘fastest VPN for Mac Mini M2’ for device compatibility, and ‘VPN speed tests for heavy browser tools like Ahrefs’ for the performance concerns. 

As you can see, each sub-query directly relates to a specific part of the original query. 

This ensures that Google is able to retrieve content that’s truly relevant to the user’s request while addressing each nuanced form of intent. 

Citation triggers

Certain formatting choices can serve as citation triggers for AI search systems, and Google’s AI Overviews are no different. 

In particular, the presence of structured data like schema markup and semantic HTML boosts parseability for LLMs, which makes your content easier to understand. 

Also, formatting choices like lists, tables, FAQ sections, and direct answers make it easy for LLMs to extract information. 

Placement signals

Where your citation appears in the AIO reveals certain things about its importance to the query. 

Citations that appear at the very top of the source panel are typically anchor sources that provide the core facts, statistics, and overall framework for the synthesized response. 

These sources often hail from domains with strong topical authority and E-E-A-T signals. 

Placements in the middle of the AIO often provide supporting details like examples and data points, while citations at the bottom handle caveats and alternatives. Understanding how this hierarchy works is crucial for optimizing your AIO citations. 

Obviously, you’ll want to target top placements the majority of the time by:

  1. Building content clusters with in-depth pillar pages (2,000+ words) that cover the full query lifecycle. That means following the ‘definition, how to, examples, risks’ model, and then creating cluster pages that dive deeper into each section. This is the most reliable way to build the kind of topical authority that LLMs recognize. 
  2. Ensure each piece nails the core definitions, data points, and facts involved with the topic at hand. Use clear H2s for extractable chunks, and regularly update pages for freshness. 

At the same time, it’s also worth targeting middle and bottom citations by diversifying cluster content with alternative viewpoints, specific studies, and lesser-known information about topics. 

How Do AI Overviews Choose Which Content to Cite?

Next, let’s dive into the specifics of how (and why) certain sources get chosen while others are ignored. 

To do so, we need to explain the seven primary trust signals that AI search systems use to select sources. 

If you want your content to get cited in AIOs and by other AI platforms like ChatGPT, then your content must make it through the ‘seven layers of scrutiny,’ so to speak. 

Here’s how each layer works plus optimization tips. 

#1: Core algorithm output

First, your content must survive Google’s core ranking systems to enter the citation candidate pool.

Instead of replacing Google’s classic search algorithm, AI signals are layered on top of it

So, while some AIO citations don’t come from the top 10, research by Originality.ai found that nearly 90% of all citations come from the top 30 (with 52% coming from the top 10). 

You can think of Google’s core algorithm as table stakes for being considered as a source for AIOs. 

Bearing that in mind, optimizing for this layer is all about cornerstone on-page SEO:

  • Creating high-quality cluster content 
  • Optimizing metadata
  • Proper keyword placement 
  • Internal links to aid crawlability 
  • Technical SEO (Core Web Vitals, mobile friendliness, etc.)    

#2: Embedding similarity (semantic relevance) 

This is the first layer where AI-powered search enters the picture. It involves using vector embeddings to measure how well your content satisfies the user’s intent on the semantic level

With these numerical embeddings, AI models like Gemini are able to understand the meaning behind words and the relationships between similar concepts.

As a result, an LLM can understand aspects of your content that are conceptually related to the prompt even if they don’t use the same language. 

Here’s an example. 

If the user query is ‘best VPN for Mac Mini M2,’ and an online guide never uses those exact words but mentions that ‘ProtonVPN works perfectly on Apple Silicon Macs,’ Gemini will still understand that the guide satisfies the user intent. 

It will make the connection that the Mac Mini M2 is an Apple Silicon Mac due to how close their embeddings are in semantic space. 

Optimization tactics for this layer include:

  • Use natural question phrasing to align with the conversational search patterns that trigger AI Overviews 
  • Capture fan-outs by entering queries into Google and analyzing the People Also Ask (PAA) section 
  • Flesh out every angle of your topical universe (definitions, examples, risks, how-tos, etc.) 

#3: Cross-attention (capturing nuance)

Embeddings are rigid and often miss nuance like negation. That’s because queries go through a process called normalization before getting mapped to other vector embeddings. 

In the normalization process, language is standardized and simplified in order to group naturally with other embeddings in the vector database. 

For instance, the words ‘running’ and ‘ran’ would both normalize to ‘run.’ 

The problem is that nuance can get lost during this process. 

Here’s an example.

Original query: ‘Best VPN for Mac M2 that doesn’t slow down Ahrefs’ 

Normalized version: ‘Best VPN Mac M2 Ahrefs speed’ 

Here, the negation (doesn’t slow down) gets lost entirely during the normalization process. The cross-attention layer makes another pass of the document to capture anything that was missed. 

Here are some quick optimization tips:

  • Build ‘canonical explainer’ sections with original frameworks and assets 
  • Frequently reference and link to related entities to strengthen cross-document coherence 
  • Frame H2s positively to avoid confusion surrounding negation 

#4: BM25 keyword matching 

Surface-level keyword signals still matter for grounding AI search results in truly relevant content that uses literal query terms. 

As sophisticated as vector search and cross-attention are, they still lean on lexical keyword matching to ensure they provide accurate results (and not something that’s closely related but not actually relevant). 

Optimizing for this layer includes classic keyword best practices like:

  • Researching popular search terms with your target audience 
  • Using your target keyword in the first 100 words of your content 
  • Inserting LSI (latent semantic indexing) keywords 
  • Maintain a keyword density of 1% – 2%

#5: PCTR (Predictive click-through rate) 

Google’s AI predicts user engagement by measuring things like SERP position, historical CTR patterns, content signals like dwell time, and bounce rate. 

This layer exists so that AI Overviews cite content that people actually like to consume. 

Compelling meta descriptions and featured snippet-friendly structures also get rewarded. To optimize, use schema markup and provide key definitions, answers, and concepts within the first 60 words

Also, monitor your CTR in Google Search Console. Ideally, you’ll want to have an 8%+ click-through rate for content ranked in the top 5. Ensure your site runs smoothly, has a strong user experience, and employs a readable structure and flow. 

#6: Freshness

AI models can interpret queries with temporal intent, so they prioritize fresh content. That’s especially true for fast-moving topics like SEO, cybersecurity, and software compatibility. 

That means you should frequently update your content so that it remains in the citation candidate pool for as long as possible. 

Adding ‘last updated’ banners to articles signals freshness, as does the NewsArticle schema type. Refresh your content clusters with yearly updates (like ‘updated for 2026’), tool updates, and new statistics. 

#7: Boost/bury override signals 

The final layer involves manual quality controls, business policies, and safety overrides. Certain pages, like content from business partners, receive a boost during this layer, while other content, like spam, gets buried

Your brand’s reputation plays a significant role in this layer. Thus, optimization tactics include building editorial brand mentions and backlinks on news outlets and media sites. Author bylines with clear credentials also demonstrate your expertise, as do real-world case studies, so include both as often as possible. 

Wrapping Up: AI Citations Instead of Rankings

As a final quote, LLMs have distinct methods for citing content, and they don’t simply pull from the top organic rankings. 

That being said, Google’s core ranking systems play a foundational role in deciding which content gets cited, so the process isn’t entirely divorced from Google’s SERPs. 

Yet, the most important factors are semantic relevance, brand trust, and answer completeness. If a piece of content checks those boxes, it doesn’t matter if it’s ranked #1 or #20; it’s fair game to appear in AIOs. 

Do you need expert assistance shifting your brand’s SEO strategy to include AI Overviews?

Book a session with our team to develop a picture-perfect search strategy that includes organic and AI-based discovery.   

Next insight

Answer Engine Optimization (AEO)

AEO vs. SEO: Strategic Differences Every Marketer Must Know

Next insight

Let’s build the future
of content together