Agency17 June 20264 min read
RAG in production over regulations and client documents
Production RAG over regulations isn't embed-and-retrieve. It's a layered pipeline (hybrid search, reranking, hierarchical summaries, graph context), and the failure that bites hardest is a silent embedding-dimension mismatch that returns confident garbage.
The short answer
Production RAG over regulations and client documents is not embed-and-retrieve. It is a layered pipeline: hybrid search (dense vectors plus keyword BM25), a cross-encoder reranking pass, hierarchical summaries so retrieval can work at the right altitude, and graph context to connect related rules. And the failure that hurts most is silent: an embedding-dimension mismatch that returns plausible, confident garbage with no error. Naive RAG demos beautifully and fails quietly on real corpora.

Short version: retrieval-augmented generation looks trivial in a demo and gets hard the moment the corpus is real. Over a fiduciary's regulations, internal procedures, and client documents, naive embed-and-retrieve returns passages that are semantically close and factually wrong. What holds in production is a layered pipeline: hybrid search, a reranking pass, hierarchical summaries, and graph context. And the failure that cost us the most was silent: an embedding-dimension mismatch that returned confident garbage without throwing a single error.