Fine-tuning technique that trains models to answer questions using retrieved context documents.
Retrieval Augmented Fine-Tuning (RAFT) is a training methodology that teaches language models how to effectively use retrieved documents when answering domain-specific questions. Unlike standard fine-tuning, which trains a model on question-answer pairs alone, RAFT exposes the model during training to a mix of relevant and irrelevant retrieved documents alongside each question. This forces the model to learn which information is useful, how to extract key facts from supporting context, and how to ignore distractors — skills that are essential for reliable performance in retrieval-augmented generation (RAG) pipelines.
The mechanics of RAFT involve constructing training examples that pair each question with a set of retrieved passages, some of which are genuinely relevant ("oracle" documents) and some of which are deliberately unhelpful noise. The model is trained to produce answers that cite or reason from the correct passages while disregarding the irrelevant ones. This stands in contrast to standard RAG setups, where a pre-trained or instruction-tuned model is simply handed retrieved documents at inference time without ever having been explicitly trained to navigate that kind of noisy, multi-document context.
RAFT matters because it closes a significant gap between how models are trained and how they are deployed. A model fine-tuned on clean question-answer pairs may struggle when suddenly presented with a cluttered retrieval result at inference time. By simulating realistic retrieval conditions during fine-tuning, RAFT produces models that are substantially more robust and accurate in production RAG systems. This is especially valuable in enterprise and domain-specific applications — such as medical, legal, or technical documentation — where precision and source fidelity are critical.
The approach was formally introduced by researchers at UC Berkeley in 2024, though it builds on earlier work in retrieval-augmented generation and domain adaptation. RAFT represents a practical bridge between two major paradigms in modern NLP — fine-tuning for specialization and retrieval for knowledge grounding — and has quickly become an influential recipe for building reliable, knowledge-intensive AI assistants.