[
]

Google’s AI Search Pipeline Confirms the Problems Most Websites Don’t Know They Have

How do modern AI engines interpret content?

For the longest time, this question was shrouded in mystery. 

While search marketers like us came up with lots of savvy ways to approximate how AI models create content, like running manual experiments at a snail’s pace, official sources remained extremely scarce

That is, until recently. 

Google’s Vertex AI Search API documentation has exposed the stack powering consumer-facing features like AI Mode and AI Overviews, and we’ve been reverse-engineering it for SEO insights. 

The kicker?

The documentation reveals a widespread problem

Most websites are misunderstood by AI due to three major issues

Chunking errors, embedding-level misinterpretation, and freshness decay can cause AI models to misinterpret or ignore your content. 

The good news is that we’ve developed solutions for these structural weaknesses. 

Keep reading for the ultimate ‘peek under the hood’ for Google’s AI search, and how to make your site AI-ready. 

What’s the Four-Stage AI Search Pipeline?

Our research uncovered a commonality between most modern AI systems, not just Google’s.

Whether you’re working with Bing, ChatGPT, or Google’s enterprise search product, they all follow a similar four-stage search process:

  • Prepare – First, the AI system interprets the prompt by breaking it down into its most basic elements. It sets the stage for retrieval by normalizing language, expanding synonyms, and inferring freshness clues. 
  • Retrieve – Next, the AI searches its embedding index (or calls a plugin/API) to retrieve relevant content. Instead of interpreting a piece of content all at once, it’s processed in 300 – 500 token chunks, which is where context can become fragmented. 
  • Signal – This is where ranking signals enter the picture. Once the content chunks are retrieved, the system analyzes them using multiple layers of ranking signals. These decide which chunks survive until the ‘serve’ step.  
  • Serve – Lastly, the system constructs the final answer to the user’s prompt, which is synthesized from the best-ranked chunks of online content. There’s also a final stage of filtering that takes place.  

Here’s a more in-depth analysis of how the pipeline works. 

#1 Prepare: Query normalization, synonym expansion, and time-aware meaning

This is where the AI system interprets the prompt on its own, and there are several steps in the process. 

Query normalization takes place first, and it’s where the system rewrites the user’s query into a standardized format

It ‘flattens’ a query by removing nonessentials like:

  • Uppercase letters 
  • Plurals
  • Punctuation (commas, periods, quotations, etc.) 
  • Stemming (reducing words to their root forms, like changing ‘running’ to ‘run’) 
  • Diacritics 

Basically, the goal is to break a query down so that it only includes search-relevant information

Here are a few examples:

  1. A query like ‘What do I do if my registration expired yesterday in Florida’ standardizes to ‘expired vehicle registration Florida what to do.’ The model removes the fluff words so that only relevant keywords remain. 
  2. Even a simple query like ‘How do I fix my leaking pipe’ undergoes normalization and becomes ‘how to fix leaking pipe.’ The extra words ‘do’ and ‘my’ aren’t relevant to the query, so they’re discarded. 

AI systems perform normalization to strip away the noise so that a query becomes easier to understand and match to similar embeddings (i.e., related information). 

Besides query normalization, AI models also use a process called synonym expansion that connects the query to related terms and concepts. That way, the AI can match a query to content that’s relevant but uses totally different phrasing. 

For instance, the word ‘attorney’ would expand into terms like lawyer, law firm, and legal counsel. 

Time-aware meaning is another consideration AI models make. This simply means that AIs reinterpret queries based on recency expectations

Imagine that a user asks ChatGPT, “What’s the score for the 49ers game?” 

At face value, this could mean the score to any 49ers game, and there have been thousands. 

Yet, LLMs interpret queries temporally, so they would assume they mean the score to the most recent 49ers game, even though they don’t say that outright. 

Thanks to time-aware meaning, it would rewrite the query to say, “What’s the score for the latest 49ers game?” 

#2 Retrieve: Chunking limits and structured data

The AI system will use the normalized queries it rewrote to search its embedding index for the most relevant content. 

It ingests the content it retrieves in 300 – 500 token chunks

Moreover, these chunks are retrieved independently, which is where AI models may wind up losing context. If a chunk is missing a crucial piece of information found in the previous chunk, it can throw everything off. 

This is also the step where the system processes any structured data present in the content, such as semantic HTML and schema markup. This data helps the AI recognize entities, content types (FAQ, product page, etc.), and understand context. 

#3 Signal: Seven primary ranking signals 

The AI wants to ensure the chunks it retrieved are accurate and come from trustworthy sources. To do so, the model will evaluate seven primary ranking signals:

  1. Base Ranking – This refers to Google’s standard information retrieval process that it’s used for years. It’s the backbone of the entire system, so things like content quality and crawlability are still important. 
  2. Gecko Score –  This score represents a query’s embedding similarity compared to retrieved chunks of content. It converts the query and content chunks into embeddings and measures how close they are in vector space. The closer their embeddings, the stronger the semantic match.  
  3. Jetstream – A cross-attention model, Jetstream is able to more accurately interpret things like negation, nuance, and granular details than embeddings. This is because embeddings break down 500-token chunks into a single point in vector space, thus removing all structure. Jetstream exists to bring that structure back to improve interpretation. 
  4. BM25 – What’s this? A lexical keyword-matching model in an AI search stack? Believe it or not, old-school keyword retrieval is still a ranking signal, albeit a parallel one. Since AI models can misread content at times, keywords act as unambiguous anchors that solidify what the page is about, which entities are present, and whether a page truly addresses a user’s query. 
  5. PCTR – This is the same 3-tier engagement system that Google uses for Ads and YouTube. It estimates how likely users are to click on your content based on past engagement signals, layout, and metadata. 
  6. Freshness – As stated in the section on time-aware meaning, AI systems always go for the freshest, most frequently updated pages. 
  7. Boost/Bury – These are policy controls that boost trusted sources and bury irrelevant or outdated ones. 

These signals are incredibly important for getting AI to properly understand your content. 

#4: Serve: Final answer generation and filtering 

The last step is to provide the user with a final response that incorporates the highest-rated chunks the model retrieved. 

This phase also involves filtering content that’s contradictory, outdated, or unclear, even if it ranked well during the retrieval process. 

What are The Three Most Common Failure Points for Websites?

After exploring the data, we pinpointed three main areas where most websites struggle. 

Since most AI models follow the chunk, retrieve, source, and serve pipeline, a major aspect of securing better visibility is to ensure your content can make it through each stage successfully. 

Issue #1: Chunking errors

Google’s retrieval units are capped at approximately 500 tokens

That means the key information in your article should be self-contained inside clearly labeled, brief content sections. 

Ancestor headings (H1, H2, H3s) carry with each chunk, which reinforces the need for concise formatting. To get the best results, don’t deviate from each subheading’s area of focus, and reduce fluff. 

Here’s what to avoid:

  1. Writing chunks that don’t contain full concepts
  2. Splitting key definitions, entities, and service descriptions across boundaries
  3. Headings that don’t match the text they introduce 
  4. Non-semantic HTML (div soup) 

Issue #2: Embedding-level misinterpretation 

Embedding models require clear topical focus, repeated signals, and a clean document structure to function properly. 

Problems arise whenever content is unclear or drifts in focus

If an article is overly vague or structurally chaotic, the embedding model may assign the wrong meaning to your content chunks, which completely throws off the pipeline. 

Here are some common causes of embedding-level misinterpretations:

  • A services page reads more like a blog post, so the model tags it as informational instead of transactional
  • A product page contains too much marketing fluff, leading an embedding model to conclude the page is about ‘company values’ instead of product specs. 
  • An article on a complex topic mixes unrelated concepts into one chunk, causing the embedding model to not know what it’s about. 

The key takeaway here is actually pretty simple. 

Produce content that’s hyper-focused, tightly structured, and free of fluff.  

Issue #3: Freshness and relevance decay 

The final problem is with freshness decay, even for content you could have sworn was evergreen. 

As we’ve discussed, time-aware meaning causes AI models to opt for the most up-to-date content. 

Even if a piece is evergreen, it can become outdated if it isn’t regularly updated or refreshed with the most recent terminology. 

Other signals of content staleness include:

  1. Outdated schema 
  2. Old timestamps 
  3. Infrequent site updates
  4. Factual drift 
  5. Content that’s in contradiction with newer sources 
  6. Missing modern entities 

Make sure you frequently update and refresh your content so that you don’t get deprioritized in favor of newer sources. 

How Does Next Net Solve All These Problems?

Once we understood the stack and the signals AI search uses, it became clear to us how to solve all these issues. 

In particular, we:

  • Provide clear vector representations of content 
  • Identify and correct semantic mismatch 
  • Structure meaning in retrieval-friendly formats
  • Detect drift and maintain topical stability 
  • Improve trust signals for AI engines 

We don’t just incorporate a few best practices and call it a day. Instead, we meticulously prepare websites for the actual way AI reads them. 

Final Takeaways: Google’s AI Search Pipeline

Learning the inner workings of Google’s Vertex Search API certainly provided some eye-opening insights, especially in terms of chunking and embedding misinterpretations. 

It was also surprising to learn that BM25-style keyword matching is still in play. 

To summarize, AI search does not struggle with keywords. It struggles with unclear meaning. 

Do you want your website to make it through the AI search pipeline without a hitch?

Book a strategy call with our team to learn how. 

Next insight

Generative Engine Optimization

GEO vs. SEO: Why Generative Engine Optimization Changes Everything

Next insight

Let’s build the future
of content together