Skip to main content

Envisioning is an emerging technology research institute and advisory.

LinkedInInstagramGitHub

2011 — 2026

research
  • Reports
  • Newsletter
  • Methodology
  • Origins
  • Vocab
services
  • Research Sessions
  • Signals Workspace
  • Bespoke Projects
  • Use Cases
  • Signal Scanfree
  • Readinessfree
impact
  • ANBIMAFuture of Brazilian Capital Markets
  • IEEECharting the Energy Transition
  • Horizon 2045Future of Human and Planetary Security
  • WKOTechnology Scanning for Austria
audiences
  • Innovation
  • Strategy
  • Consultants
  • Foresight
  • Associations
  • Governments
resources
  • Pricing
  • Partners
  • How We Work
  • Data Visualization
  • Multi-Model Method
  • FAQ
  • Security & Privacy
about
  • Manifesto
  • Community
  • Events
  • Support
  • Contact
  • Login
ResearchServicesPricingPartnersAbout
ResearchServicesPricingPartnersAbout
  1. Home
  2. Vocab
  3. DoLa (Decoding by Contrasting Layers)

DoLa (Decoding by Contrasting Layers)

A decoding method that reduces hallucinations by contrasting outputs across transformer layers.

Year: 2023Generality: 101
Back to Vocab

DoLa (Decoding by Contrasting Layers) is a inference-time technique for large language models that reduces hallucinations and improves factual accuracy by exploiting the internal layer structure of transformer networks. Rather than relying solely on the final layer's output distribution, DoLa computes the next-token probability by contrasting the logits from a later "mature" layer against those from an earlier "premature" layer. The intuition is that factual knowledge tends to be injected into the model's representations at specific layers, and by amplifying the difference between a layer that has processed this knowledge and one that hasn't, the model is steered toward more factually grounded predictions.

In practice, DoLa selects the premature layer dynamically — choosing the earlier layer whose output diverges most from the mature layer, as measured by Jensen-Shannon divergence. The final decoding distribution is then derived from this contrast, effectively suppressing tokens that are predicted similarly across layers (often generic or repetitive tokens) and boosting tokens that emerge strongly only in the mature layer (often factually specific ones). This requires no additional training, fine-tuning, or external knowledge retrieval, making it a lightweight and broadly applicable improvement over standard greedy or sampling-based decoding.

DoLa is particularly valuable in open-ended generation tasks where hallucination is a persistent problem, such as question answering, summarization, and dialogue. Empirical results have shown meaningful improvements on benchmarks like TruthfulQA and StrategyQA without sacrificing fluency or coherence. Because it operates entirely at inference time and is model-agnostic, DoLa can be applied to most autoregressive transformer models, positioning it as a practical tool for improving the reliability of deployed language systems.

Related

Related

Large Language Diffusion Models
Large Language Diffusion Models

Generative architectures applying diffusion-based denoising processes to large-scale natural language generation.

Generality: 337
DLMs (Deep Language Models)
DLMs (Deep Language Models)

Deep neural networks trained to understand, generate, and translate human language.

Generality: 796
Speculative Decoding
Speculative Decoding

A technique that accelerates LLM inference by drafting and verifying token sequences in parallel.

Generality: 520
LoRA (Low-Rank Adaptation)
LoRA (Low-Rank Adaptation)

A parameter-efficient method for fine-tuning large pre-trained models using low-rank matrices.

Generality: 398
Self-Speculative Decoding
Self-Speculative Decoding

A technique where a single model drafts and verifies tokens to accelerate inference.

Generality: 186
DPO (Direct Preference Optimization)
DPO (Direct Preference Optimization)

A training method that fine-tunes language models directly from human preference data.

Generality: 494