We’ve all been there. You build a RAG (Retrieval-Augmented Generation) pipeline, dump your documents into a vector database, and feel like a wizard. You search for ‘guidelines for remote work’ and it finds documents about ‘working from home policy.’ It feels like magic.
Then, you put it into production.
A user searches for a specific error code like ERR-502 or a specific product SKU like XJ-900.
The result? Hallucinations, or worse, totally irrelevant documents that just happen to share a similar ‘vibe’ in the embedding space.
The hard truth is that Vector Search is not a silver bullet. It understands meaning, but it is terrible at precision.
The Two Extremes of Search
To understand why your AI agent is failing, you have to look at the two distinct ways computers search for information.
1. The ‘Vibe’ Search (Vector Embeddings)
Vector search turns text into numbers (embeddings). It finds concepts that are semantically close.
Superpower: Understanding synonyms and intent (e.g., ‘monitor’ = ‘display’).
Kryptonite: Exact matches. To a vector model, ‘Version 1.2’ and ‘Version 2.1’ might look nearly identical because they are semantically similar concepts, even though they are completely different technical facts.
2. The ‘Ctrl+F’ Search (BM25 / Keywords)
This is the old-school search logic (like Elasticsearch or Lucene). It looks for the exact tokens you typed.
Superpower: Precision. If you search for ERR-502, it finds that exact string.
Kryptonite: Vocabulary mismatch. If you search ‘laptop screen,’ it might miss a document that only uses the word ‘display.’
The Fix: Hybrid Search
If you want a production-grade system, you cannot choose one. You need both. This is called Hybrid Search.
In a Hybrid architecture, every user query triggers two parallel searches:
Vector Search scans for semantic meaning.
BM25 Search scans for exact keywords.
You get two lists of results. One captures the intent, the other captures the specifics.
The ‘Merging’ Problem
Now you have a new engineering problem.
Vector search returns a Cosine Similarity score (usually 0.0 to 1.0).
BM25 returns a Relevance Score (which can be 5.0, 15.0, or 42.0 depending on document length and frequency).
You cannot compare these numbers. A 0.8 in vectors is not ‘better’ or ‘worse’ than a 12.0 in BM25. They are apples and oranges.
Enter Reciprocal Rank Fusion (RRF)
The solution is to ignore the scores and look at the rank.
Reciprocal Rank Fusion (RRF) doesn’t care that Document A got a score of 0.89. It cares that Document A was the #1 result. It takes the rankings from both lists and fuses them into a new, unified score.
If a document appears at the top of both lists, it skyrockets to the top of the final list. If it appears in only one, it gets pushed down.
But… What is a ‘Good’ RRF Score?
Congratulations, you have implemented Hybrid Search! But you have introduced a final, tricky variable.
RRF outputs abstract scores that look like 0.0163 or 0.032.
Unlike Cosine Similarity, where we know 0.85 is usually ‘good,’ RRF scores are unintuitive.
Is 0.016 a strong match or noise?
At what threshold should you cut off the results to prevent hallucinations?
Most developers guess. They pick a random number like 0.02 and hope for the best. Do not do that.
There is a mathematical way to calculate exactly where the ‘Noise Floor’ ends and ‘High Confidence’ begins.
๐ Read the sequel: Stop Guessing โ A Mathematical Approach to RRF Thresholds in Hybrid Search
Leave a Reply