Skip to main content

Envisioning is an emerging technology research institute and advisory.

LinkedInInstagramGitHub

2011 — 2026

research
  • Reports
  • Newsletter
  • Methodology
  • Origins
  • Vocab
services
  • Research Sessions
  • Signals Workspace
  • Bespoke Projects
  • Use Cases
  • Signal Scanfree
  • Readinessfree
impact
  • ANBIMAFuture of Brazilian Capital Markets
  • IEEECharting the Energy Transition
  • Horizon 2045Future of Human and Planetary Security
  • WKOTechnology Scanning for Austria
audiences
  • Innovation
  • Strategy
  • Consultants
  • Foresight
  • Associations
  • Governments
resources
  • Pricing
  • Partners
  • How We Work
  • Data Visualization
  • Multi-Model Method
  • FAQ
  • Security & Privacy
about
  • Manifesto
  • Community
  • Events
  • Support
  • Contact
  • Login
ResearchServicesPricingPartnersAbout
ResearchServicesPricingPartnersAbout
  1. Home
  2. Vocab
  3. Probe

Probe

A lightweight model trained on internal representations to reveal what a neural network has learned.

Year: 2016Generality: 496
Back to Vocab

A probe is a diagnostic technique in machine learning interpretability where a simple secondary model — typically a linear classifier or shallow network — is trained on the internal activations of a larger, pre-trained model. The goal is to test whether specific types of information, such as part-of-speech tags, syntactic structure, or semantic properties, are encoded within a particular layer's representations. By holding the target model's weights fixed and only training the probe, researchers can attribute any predictive success to the information already present in the representations rather than to the probe's own capacity. This makes probing a relatively controlled method for interrogating what a model has implicitly learned during training.

Probing became especially prominent with the rise of large pretrained language models like ELMo and BERT in the late 2010s, where researchers sought to understand why contextual embeddings transferred so effectively across tasks. Studies using probes revealed that different layers of these models encode qualitatively different linguistic properties — lower layers capturing surface-level features and higher layers encoding more abstract semantic content. This layered structure of learned representations was not obvious from model architecture alone, and probing provided a tractable empirical window into it.

Despite its utility, probing has important limitations. A probe's success does not necessarily mean the target model actively uses that information during inference — it only demonstrates that the information is recoverable from the representations. Critics have also noted that a sufficiently expressive probe can extract information that is only weakly or incidentally encoded, inflating apparent interpretability. These concerns have spurred refinements such as minimum description length probes and control tasks, which help calibrate how much a probe's accuracy reflects genuine encoding versus probe-side learning. Probing remains a widely used tool in mechanistic interpretability, model evaluation, and the study of transfer learning.

Related

Related

Mechanistic Interpretability
Mechanistic Interpretability

Reverse-engineering neural networks to understand the causal mechanisms behind their outputs.

Generality: 527
Unembedding
Unembedding

The linear projection that converts a transformer's internal representations back into vocabulary predictions.

Generality: 450
NLD (Neural Lie Detectors)
NLD (Neural Lie Detectors)

AI systems that detect deception or inconsistencies in the outputs of other AI models.

Generality: 102
Black Box Problem
Black Box Problem

The challenge of understanding why and how ML models reach their decisions.

Generality: 792
Ablation
Ablation

Systematically removing model components to measure their individual contribution to performance.

Generality: 700
Capability Elucidation
Capability Elucidation

Systematic methods to reveal what tasks and latent abilities an AI system possesses.

Generality: 493