Why isn't basic embed-and-retrieve enough for production RAG?

Because real corpora are large, technical, and full of near-duplicate language. Pure vector search misses exact terms and returns passages that are semantically close but wrong. Production RAG needs keyword search alongside vectors, a reranking pass to order candidates properly, and structure so it retrieves at the right level, not just the nearest chunk.

What is hybrid retrieval in RAG?

Hybrid retrieval combines dense vector search (semantic similarity) with sparse keyword search such as BM25 (exact-term matching). Vectors catch meaning when the wording differs; keywords catch precise terms, codes, and article numbers that vectors blur. Regulations are full of exact references, so you need both, then a reranker to merge and order the results.

What is the embedding-dimension mismatch failure?

It's when the embeddings in your store were produced by a different model or configuration than the one you query with, so the vectors don't line up. Often nothing errors. Similarity scores are still computed, just against the wrong space, so retrieval returns confident, plausible, wrong results. It's a classic silent failure and it can sit undetected for a long time.

What is RAPTOR / hierarchical retrieval?

It's building a tree of summaries over your documents so retrieval can work at multiple levels: a high-level summary node for broad questions, leaf chunks for specifics. For long regulations, this lets the system answer a general question from a summary instead of stitching together dozens of fragments, which improves both accuracy and cost.

tsukumo

RAG in production over regulations: the pipeline that holds · tsukumo

tsukumo

Agency17 June 20264 min read

RAG in production over regulations and client documents

Production RAG over regulations isn't embed-and-retrieve. It's a layered pipeline (hybrid search, reranking, hierarchical summaries, graph context), and the failure that bites hardest is a silent embedding-dimension mismatch that returns confident garbage.

The short answer

Production RAG over regulations and client documents is not embed-and-retrieve. It is a layered pipeline: hybrid search (dense vectors plus keyword BM25), a cross-encoder reranking pass, hierarchical summaries so retrieval can work at the right altitude, and graph context to connect related rules. And the failure that hurts most is silent: an embedding-dimension mismatch that returns plausible, confident garbage with no error. Naive RAG demos beautifully and fails quietly on real corpora.

tsukumo

RAG in production over regulations and client documents

Short version: retrieval-augmented generation looks trivial in a demo and gets hard the moment the corpus is real. Over a fiduciary's regulations, internal procedures, and client documents, naive embed-and-retrieve returns passages that are semantically close and factually wrong. What holds in production is a layered pipeline: hybrid search, a reranking pass, hierarchical summaries, and graph context. And the failure that cost us the most was silent: an embedding-dimension mismatch that returned confident garbage without throwing a single error.

Stage	What it adds
Hybrid search (dense + BM25)	Vectors for meaning, keyword search for exact terms and references
Cross-encoder reranking	Re-orders candidates by true relevance, not just retrieval score
Hierarchical summaries (RAPTOR)	Lets retrieval work at the right altitude: summary for broad, leaf for specific
Graph augmentation	Connects related rules and references so context isn't a disconnected chunk
Multiple collections	Keeps distinct corpora (regulation vs procedure vs client docs) cleanly separated

RAG in production over regulations and client documents

Why naive RAG fails on a real corpus#

The pipeline that holds#

Hybrid search: meaning and exactness#

Reranking: order is most of the answer#

Hierarchy and graph: altitude and connection#

The failure that bites: a silent embedding-dimension mismatch#

What this means for your team#

Fast and production-grade: how an agentic studio ships both

What agentic product development actually is (and how it beats a dev shop)

When to scale your agent setup: the team signals that actually matter

Want this running on your team?