Skip to main content

Envisioning is an emerging technology research institute and advisory.

LinkedInInstagramGitHub

2011 — 2026

research
  • Reports
  • Newsletter
  • Methodology
  • Origins
  • Vocab
services
  • Research Sessions
  • Signals Workspace
  • Bespoke Projects
  • Use Cases
  • Signal Scanfree
  • Readinessfree
impact
  • ANBIMAFuture of Brazilian Capital Markets
  • IEEECharting the Energy Transition
  • Horizon 2045Future of Human and Planetary Security
  • WKOTechnology Scanning for Austria
audiences
  • Innovation
  • Strategy
  • Consultants
  • Foresight
  • Associations
  • Governments
resources
  • Pricing
  • Partners
  • How We Work
  • Data Visualization
  • Multi-Model Method
  • FAQ
  • Security & Privacy
about
  • Manifesto
  • Community
  • Events
  • Support
  • Contact
  • Login
ResearchServicesPricingPartnersAbout
ResearchServicesPricingPartnersAbout
  1. Home
  2. Vocab
  3. Replaced Token Detection

Replaced Token Detection

A self-supervised task where models learn to identify intentionally substituted tokens in sequences.

Year: 2020Generality: 339
Back to Vocab

Replaced token detection is a self-supervised pre-training objective in which a model is trained to identify which tokens in an input sequence have been swapped out for plausible but incorrect alternatives. Unlike masked language modeling, where tokens are hidden and the model must predict what belongs in each gap, replaced token detection presents a complete sequence — some tokens genuine, others substituted — and asks the model to classify each token as real or fake. This binary discrimination task is computationally efficient because the model receives a learning signal from every token in the sequence, not just the small fraction that would typically be masked.

The mechanism relies on a two-component architecture. A smaller generator network, often trained with a masked language modeling objective, proposes replacement tokens that are contextually plausible but incorrect. A larger discriminator network then processes the resulting sequence and learns to detect which tokens were swapped. Because the generator produces believable substitutions rather than random noise, the discriminator must develop a nuanced understanding of syntax, semantics, and contextual coherence to succeed. This setup was central to the ELECTRA model introduced by Google Research in 2020, which demonstrated that the approach could match or outperform BERT-scale models using significantly less compute.

The appeal of replaced token detection lies in its sample efficiency. Standard masked language modeling only computes loss on the 15% of tokens that are masked, leaving the majority of the sequence unused for learning. Replaced token detection turns every token into a training signal, making each forward pass substantially more informative. This translates into faster convergence and strong performance on downstream NLP benchmarks even when pre-training budgets are constrained.

Beyond efficiency, the task encourages models to build richer contextual representations because distinguishing a plausible replacement from the original token demands fine-grained language understanding. The approach has influenced subsequent work in self-supervised learning across both text and other modalities, establishing replaced token detection as a meaningful alternative to masking-based objectives in the design of foundation models.

Related

Related

Next Token Prediction
Next Token Prediction

A training objective where models learn to predict the next token in a sequence.

Generality: 794
MLM (Masked Language Modeling)
MLM (Masked Language Modeling)

A pre-training objective where models learn to predict randomly hidden tokens using bidirectional context.

Generality: 694
NTP (Next Token Prediction)
NTP (Next Token Prediction)

A training objective where language models learn to predict the next token in a sequence.

Generality: 795
Token Speculation Techniques
Token Speculation Techniques

Methods that predict multiple candidate tokens in parallel to accelerate text generation.

Generality: 450
Self-Reasoning Token
Self-Reasoning Token

Specialized tokens that train language models to anticipate and plan for future outputs.

Generality: 104
Multi-Token Prediction
Multi-Token Prediction

A generation strategy where language models predict multiple output tokens simultaneously.

Generality: 380