Skip to main content

Envisioning is an emerging technology research institute and advisory.

LinkedInInstagramGitHub

2011 — 2026

research
  • Reports
  • Newsletter
  • Methodology
  • Origins
  • Vocab
services
  • Research Sessions
  • Signals Workspace
  • Bespoke Projects
  • Use Cases
  • Signal Scanfree
  • Readinessfree
impact
  • ANBIMAFuture of Brazilian Capital Markets
  • IEEECharting the Energy Transition
  • Horizon 2045Future of Human and Planetary Security
  • WKOTechnology Scanning for Austria
audiences
  • Innovation
  • Strategy
  • Consultants
  • Foresight
  • Associations
  • Governments
resources
  • Pricing
  • Partners
  • How We Work
  • Data Visualization
  • Multi-Model Method
  • FAQ
  • Security & Privacy
about
  • Manifesto
  • Community
  • Events
  • Support
  • Contact
  • Login
ResearchServicesPricingPartnersAbout
ResearchServicesPricingPartnersAbout
  1. Home
  2. Vocab
  3. SSL (Self-Supervised Learning)

SSL (Self-Supervised Learning)

A learning paradigm where models generate their own supervisory signal from unlabeled data.

Year: 2018Generality: 820
Back to Vocab

Self-supervised learning (SSL) is a machine learning paradigm in which a model learns useful data representations without relying on human-provided labels. Instead of external annotation, the training signal is derived from the data itself by constructing pretext tasks — artificially generated prediction problems where one part of the input is used to predict another. Examples include predicting masked words in a sentence, forecasting the next frame in a video, or identifying whether two image patches come from the same image. Because supervision emerges from the data's own structure, SSL can exploit vast quantities of unlabeled data that would otherwise be unusable in traditional supervised settings.

The mechanics of SSL typically involve two stages. First, a model is pretrained on a pretext task using large unlabeled datasets, forcing it to develop rich internal representations that capture meaningful structure in the data. Second, these learned representations are transferred to downstream tasks — often via fine-tuning on a small labeled dataset — where they consistently outperform models trained from scratch. Contrastive methods like SimCLR and MoCo, masked modeling approaches like BERT and MAE, and generative techniques like GPT all fall under the SSL umbrella, each with different inductive biases about what structure is worth learning.

SSL has become one of the most consequential ideas in modern AI, underpinning virtually every major foundation model in natural language processing and computer vision. Its importance stems from a practical reality: labeled data is expensive and scarce, while unlabeled data is abundant. By closing this gap, SSL has enabled models of unprecedented capability — GPT-style language models, vision transformers, and multimodal systems — to be trained at scale. The paradigm has shifted the field's center of gravity away from task-specific supervised learning toward general-purpose pretraining, fundamentally changing how large models are built and deployed.

Related

Related

Self-Supervised Pretraining
Self-Supervised Pretraining

A technique where models learn rich representations from unlabeled data before fine-tuning on specific tasks.

Generality: 794
Semi-Supervised Learning
Semi-Supervised Learning

Training models using both small labeled datasets and large unlabeled datasets together.

Generality: 796
Contrastive Learning
Contrastive Learning

A self-supervised technique that learns representations by comparing similar and dissimilar data pairs.

Generality: 694
Unsupervised Learning
Unsupervised Learning

Machine learning that discovers hidden patterns in data without labeled examples.

Generality: 850
Non-Contrastive Learning
Non-Contrastive Learning

Self-supervised representation learning that requires no negative example pairs.

Generality: 575
Supervision
Supervision

Training ML models using labeled input-output pairs to guide learning.

Generality: 820