Skip to main content

Envisioning is an emerging technology research institute and advisory.

LinkedInInstagramGitHub

2011 — 2026

research
  • Reports
  • Newsletter
  • Methodology
  • Origins
  • Vocab
services
  • Research Sessions
  • Signals Workspace
  • Bespoke Projects
  • Use Cases
  • Signal Scanfree
  • Readinessfree
impact
  • ANBIMAFuture of Brazilian Capital Markets
  • IEEECharting the Energy Transition
  • Horizon 2045Future of Human and Planetary Security
  • WKOTechnology Scanning for Austria
audiences
  • Innovation
  • Strategy
  • Consultants
  • Foresight
  • Associations
  • Governments
resources
  • Pricing
  • Partners
  • How We Work
  • Data Visualization
  • Multi-Model Method
  • FAQ
  • Security & Privacy
about
  • Manifesto
  • Community
  • Events
  • Support
  • Contact
  • Login
ResearchServicesPricingPartnersAbout
ResearchServicesPricingPartnersAbout
  1. Home
  2. Vocab
  3. Next Word Prediction

Next Word Prediction

A training objective where models learn to predict the next token in a sequence.

Year: 2018Generality: 794
Back to Vocab

Next word prediction is a foundational task in natural language processing in which a model learns to estimate the probability of the next word (or token) given all preceding words in a sequence. Formally, this is the problem of modeling P(wₙ | w₁, w₂, ..., wₙ₋₁), and it serves as both a practical capability and a powerful self-supervised training signal. Because text corpora are abundantly available and require no manual labeling, next word prediction allows models to learn rich representations of language structure simply by training on raw text at massive scale.

The mechanics have evolved considerably over time. Early n-gram language models estimated word probabilities from co-occurrence counts, but were limited by their fixed context windows and inability to generalize. Recurrent neural networks improved on this by maintaining a hidden state that could theoretically capture long-range dependencies, though they struggled in practice with vanishing gradients. The transformer architecture, introduced in 2017, addressed these limitations through self-attention mechanisms that directly relate every token in a sequence to every other, enabling far more effective modeling of long-range context. Modern large language models such as GPT are trained almost entirely on this objective, predicting the next token autoregressively across billions of examples.

The significance of next word prediction extends well beyond autocomplete features on smartphones or search engines. Researchers discovered that models trained at sufficient scale on this simple objective develop emergent capabilities — including reasoning, translation, and code generation — that were never explicitly trained for. This insight reframed next word prediction from a narrow NLP task into a general-purpose pretraining strategy. Applications now span conversational AI, document summarization, creative writing assistance, and software development tools, making next word prediction one of the most consequential training paradigms in modern machine learning.

Related

Related

Next Token Prediction
Next Token Prediction

A training objective where models learn to predict the next token in a sequence.

Generality: 794
NTP (Next Token Prediction)
NTP (Next Token Prediction)

A training objective where language models learn to predict the next token in a sequence.

Generality: 795
Multi-Token Prediction
Multi-Token Prediction

A generation strategy where language models predict multiple output tokens simultaneously.

Generality: 380
Sequence Prediction
Sequence Prediction

Forecasting the next item(s) in a sequence by learning patterns from prior observations.

Generality: 794
Autoregressive Prediction
Autoregressive Prediction

A modeling approach that predicts each sequence element from its preceding values.

Generality: 792
Autocomplete
Autocomplete

A system that predicts and suggests completions for partial user input in real time.

Generality: 624