A decoding method that reduces hallucinations by contrasting outputs across transformer layers.
DoLa (Decoding by Contrasting Layers) is a inference-time technique for large language models that reduces hallucinations and improves factual accuracy by exploiting the internal layer structure of transformer networks. Rather than relying solely on the final layer's output distribution, DoLa computes the next-token probability by contrasting the logits from a later "mature" layer against those from an earlier "premature" layer. The intuition is that factual knowledge tends to be injected into the model's representations at specific layers, and by amplifying the difference between a layer that has processed this knowledge and one that hasn't, the model is steered toward more factually grounded predictions.
In practice, DoLa selects the premature layer dynamically — choosing the earlier layer whose output diverges most from the mature layer, as measured by Jensen-Shannon divergence. The final decoding distribution is then derived from this contrast, effectively suppressing tokens that are predicted similarly across layers (often generic or repetitive tokens) and boosting tokens that emerge strongly only in the mature layer (often factually specific ones). This requires no additional training, fine-tuning, or external knowledge retrieval, making it a lightweight and broadly applicable improvement over standard greedy or sampling-based decoding.
DoLa is particularly valuable in open-ended generation tasks where hallucination is a persistent problem, such as question answering, summarization, and dialogue. Empirical results have shown meaningful improvements on benchmarks like TruthfulQA and StrategyQA without sacrificing fluency or coherence. Because it operates entirely at inference time and is model-agnostic, DoLa can be applied to most autoregressive transformer models, positioning it as a practical tool for improving the reliability of deployed language systems.