A model that responds by selecting the best match from a predefined response database.
A retrieval-based model generates outputs not by synthesizing new text, but by searching a curated database of candidate responses and returning the one that best matches the input. This selection process relies on similarity measures ranging from classical techniques like TF-IDF and BM25 to modern dense vector representations produced by neural encoders. Given a user query, the model computes a relevance score between the query and each candidate, then returns the highest-ranked result. This architecture stands in contrast to generative models, which construct responses token by token and can produce entirely novel text.
The practical appeal of retrieval-based models lies in their predictability and controllability. Because every possible output is drawn from a hand-curated or carefully indexed corpus, the system cannot hallucinate facts or produce off-brand language — a critical advantage in customer service, medical Q&A, and enterprise chatbots where response accuracy is non-negotiable. Early deployments used keyword matching and rule-based filters, but the introduction of dual-encoder architectures and models like Dense Passage Retrieval (DPR) dramatically improved the ability to match semantically similar queries even when surface-level wording differs.
Retrieval-based approaches have gained renewed importance in the era of large language models through the paradigm of Retrieval-Augmented Generation (RAG), where a retrieval component fetches relevant documents that a generative model then uses to ground its response. This hybrid design combines the factual reliability of retrieval with the fluency of generation, and has become a dominant pattern for building knowledge-intensive NLP systems. The underlying retrieval machinery — dense indexes, approximate nearest-neighbor search, and bi-encoder models — is now a core component of modern AI infrastructure.