Skip to main content

Envisioning is an emerging technology research institute and advisory.

LinkedInInstagramGitHub

2011 — 2026

research
  • Reports
  • Newsletter
  • Methodology
  • Origins
  • Vocab
services
  • Research Sessions
  • Signals Workspace
  • Bespoke Projects
  • Use Cases
  • Signal Scanfree
  • Readinessfree
impact
  • ANBIMAFuture of Brazilian Capital Markets
  • IEEECharting the Energy Transition
  • Horizon 2045Future of Human and Planetary Security
  • WKOTechnology Scanning for Austria
audiences
  • Innovation
  • Strategy
  • Consultants
  • Foresight
  • Associations
  • Governments
resources
  • Pricing
  • Partners
  • How We Work
  • Data Visualization
  • Multi-Model Method
  • FAQ
  • Security & Privacy
about
  • Manifesto
  • Community
  • Events
  • Support
  • Contact
  • Login
ResearchServicesPricingPartnersAbout
ResearchServicesPricingPartnersAbout
  1. Home
  2. Vocab
  3. Multi-Token Prediction

Multi-Token Prediction

A generation strategy where language models predict multiple output tokens simultaneously.

Year: 2024Generality: 380
Back to Vocab

Multi-token prediction is a language model training and inference strategy in which a model is trained to predict several future tokens at once, rather than forecasting only the immediately next token. In the standard autoregressive paradigm, a model conditions on all previous tokens to predict the next single token, repeating this process sequentially. Multi-token prediction extends this by adding auxiliary prediction heads that forecast tokens two, three, or more steps ahead from the same hidden representation, effectively asking the model to plan further into the future at each position during training.

The mechanism typically involves attaching multiple independent output heads to a shared transformer backbone, where each head is responsible for predicting a token at a different future offset. During training, losses from all heads are combined, encouraging the model to develop internal representations that encode richer, more forward-looking contextual information. At inference time, the additional heads can optionally be used for speculative decoding — generating candidate future tokens in parallel and verifying them, which can substantially reduce the number of sequential forward passes required and improve throughput.

The significance of multi-token prediction lies in two distinct benefits. First, it acts as a richer self-supervised training signal: by forcing the model to simultaneously account for multiple future tokens, the representations it learns tend to capture higher-level semantic structure rather than low-level token-by-token statistics. Empirical results have shown improvements on code generation and reasoning benchmarks when models are trained with this objective. Second, the inference-time speedup from speculative decoding can be substantial on hardware where parallel computation is cheap, making deployment of large models more practical.

Interest in multi-token prediction as a deliberate training objective grew notably around 2024, when Meta AI published research demonstrating that training large language models with four-token prediction heads improved both sample efficiency and downstream task performance. This distinguished the approach from earlier speculative decoding work, which had focused purely on inference acceleration without modifying the training objective. The technique is now considered a promising direction for making next-generation language models both more capable and more efficient.

Related

Related

Next Token Prediction
Next Token Prediction

A training objective where models learn to predict the next token in a sequence.

Generality: 794
Token Speculation Techniques
Token Speculation Techniques

Methods that predict multiple candidate tokens in parallel to accelerate text generation.

Generality: 450
NTP (Next Token Prediction)
NTP (Next Token Prediction)

A training objective where language models learn to predict the next token in a sequence.

Generality: 795
Next Word Prediction
Next Word Prediction

A training objective where models learn to predict the next token in a sequence.

Generality: 794
Self-Reasoning Token
Self-Reasoning Token

Specialized tokens that train language models to anticipate and plan for future outputs.

Generality: 104
Sequence Prediction
Sequence Prediction

Forecasting the next item(s) in a sequence by learning patterns from prior observations.

Generality: 794