Skip to main content

Envisioning is an emerging technology research institute and advisory.

LinkedInInstagramGitHub

2011 — 2026

research
  • Reports
  • Newsletter
  • Methodology
  • Origins
  • Vocab
services
  • Research Sessions
  • Signals Workspace
  • Bespoke Projects
  • Use Cases
  • Signal Scanfree
  • Readinessfree
impact
  • ANBIMAFuture of Brazilian Capital Markets
  • IEEECharting the Energy Transition
  • Horizon 2045Future of Human and Planetary Security
  • WKOTechnology Scanning for Austria
audiences
  • Innovation
  • Strategy
  • Consultants
  • Foresight
  • Associations
  • Governments
resources
  • Pricing
  • Partners
  • How We Work
  • Data Visualization
  • Multi-Model Method
  • FAQ
  • Security & Privacy
about
  • Manifesto
  • Community
  • Events
  • Support
  • Contact
  • Login
ResearchServicesPricingPartnersAbout
ResearchServicesPricingPartnersAbout
  1. Home
  2. Vocab
  3. Unembedding

Unembedding

The linear projection that converts a transformer's internal representations back into vocabulary predictions.

Year: 2021Generality: 0.45
Back to Vocab

Unembedding is the final transformation step in language models and other neural architectures that maps learned internal representations back into a human-interpretable output space. In transformer-based language models specifically, the unembedding matrix (sometimes called the output projection or language model head) takes the high-dimensional hidden state produced by the final layer and projects it into a probability distribution over the model's vocabulary. This is typically implemented as a learned weight matrix, and in many modern architectures the same matrix is shared with the input embedding layer — a technique known as weight tying — which reduces parameter count and often improves performance.

The mechanics of unembedding are straightforward: the hidden state vector is multiplied by the unembedding matrix to produce a vector of raw scores (logits) over all possible output tokens. These logits are then passed through a softmax function to yield probabilities, from which the next token is sampled or selected. Despite its apparent simplicity, the unembedding matrix encodes rich structure — research in mechanistic interpretability has shown that individual directions in the residual stream can be decoded through the unembedding matrix to reveal meaningful semantic content, making it a key tool for understanding what information models have learned to represent internally.

The concept gained particular relevance in the early 2020s as mechanistic interpretability emerged as a subfield focused on reverse-engineering the computations performed by large language models. Researchers began treating the unembedding matrix not just as an output layer but as a lens for probing intermediate model states — a technique sometimes called the "logit lens." By applying the unembedding matrix to hidden states at intermediate layers, practitioners can observe how a model's token predictions evolve across depth, revealing how information is progressively refined. This perspective has made unembedding central to interpretability research, circuit analysis, and efforts to understand how transformers store and retrieve factual knowledge.

Related

Related

Embedding
Embedding

A dense vector representation that encodes semantic relationships between discrete items.

Generality: 0.88
Unified Embedding
Unified Embedding

A single vector space representation that integrates multiple heterogeneous data types for AI models.

Generality: 0.62
Contextual Embedding
Contextual Embedding

Word representations that dynamically shift meaning based on surrounding context.

Generality: 0.75
Joint Embedding Architecture
Joint Embedding Architecture

A neural network design that maps multiple data modalities into a shared representational space.

Generality: 0.65
Embedding Space
Embedding Space

A learned vector space where similar data points cluster geometrically close together.

Generality: 0.79
Encoder-Decoder Transformer
Encoder-Decoder Transformer

A transformer architecture that encodes input sequences and decodes them into outputs.

Generality: 0.72