Skip to main content

Envisioning is an emerging technology research institute and advisory.

LinkedInInstagramGitHub

2011 — 2026

research
  • Reports
  • Newsletter
  • Methodology
  • Origins
  • Vocab
services
  • Research Sessions
  • Signals Workspace
  • Bespoke Projects
  • Use Cases
  • Signal Scanfree
  • Readinessfree
impact
  • ANBIMAFuture of Brazilian Capital Markets
  • IEEECharting the Energy Transition
  • Horizon 2045Future of Human and Planetary Security
  • WKOTechnology Scanning for Austria
audiences
  • Innovation
  • Strategy
  • Consultants
  • Foresight
  • Associations
  • Governments
resources
  • Pricing
  • Partners
  • How We Work
  • Data Visualization
  • Multi-Model Method
  • FAQ
  • Security & Privacy
about
  • Manifesto
  • Community
  • Events
  • Support
  • Contact
  • Login
ResearchServicesPricingPartnersAbout
ResearchServicesPricingPartnersAbout
  1. Home
  2. Vocab
  3. RAG (Retrieval-Augmented Generation)

RAG (Retrieval-Augmented Generation)

Enhances language model outputs by retrieving relevant documents before generating responses.

Year: 2020Generality: 774
Back to Vocab

Retrieval-Augmented Generation (RAG) is an architecture that combines a neural language model with an external document retrieval system, allowing the model to consult a knowledge base at inference time rather than relying solely on information encoded in its parameters. When a query arrives, a retrieval component — typically a dense vector search over a large corpus — identifies the most semantically relevant passages. Those passages are then concatenated with the original query and fed into a generative model, which synthesizes a response grounded in the retrieved content. This two-stage pipeline was formalized in a 2020 paper from Facebook AI Research and quickly became a foundational pattern for knowledge-intensive NLP tasks.

The retrieval step usually relies on dense passage retrieval (DPR), where both queries and documents are encoded into a shared embedding space using a bi-encoder architecture. Approximate nearest-neighbor search — via tools like FAISS — makes it practical to search millions of documents in milliseconds. The generative component is typically a sequence-to-sequence model such as BART or a decoder-only model like GPT, fine-tuned to condition its outputs on the retrieved context. Some RAG variants marginalize over multiple retrieved documents during training, while others select a single top-ranked passage, trading coverage for simplicity.

RAG addresses a fundamental limitation of parametric language models: their knowledge is frozen at training time and can become stale, incomplete, or confidently wrong. By externalizing factual knowledge into a retrievable corpus, RAG systems can be updated simply by refreshing the document index rather than retraining the entire model. This makes them far more practical for domains where accuracy and currency matter, such as medical question answering, enterprise search, and legal research.

Since its introduction, RAG has evolved considerably. Modern implementations incorporate reranking stages, query rewriting, iterative retrieval, and hybrid sparse-dense search to improve recall and precision. The pattern has also been extended to multimodal settings, where retrieved content may include images or structured data. RAG now sits at the center of most production deployments of large language models that require factual grounding, making it one of the most practically impactful architectural ideas in recent NLP.

Related

Related

Retrieval-Based Model
Retrieval-Based Model

A model that responds by selecting the best match from a predefined response database.

Generality: 692
RAFT (Retrieval Augmented Fine-Tuning)
RAFT (Retrieval Augmented Fine-Tuning)

Fine-tuning technique that trains models to answer questions using retrieved context documents.

Generality: 293
Contextual Retrieval
Contextual Retrieval

A retrieval method that uses semantic context rather than exact keyword matching.

Generality: 591
Reranking
Reranking

Reordering an initial set of retrieved results using a more sophisticated secondary model.

Generality: 580
IR (Information Retrieval)
IR (Information Retrieval)

Finding and ranking relevant documents from large collections in response to user queries.

Generality: 838
S2R (Speech-to-Retrieval)
S2R (Speech-to-Retrieval)

Maps spoken audio directly to retrieval-ready representations, bypassing error-prone transcription pipelines.

Generality: 174