Skip to main content

Envisioning is an emerging technology research institute and advisory.

LinkedInInstagramGitHub

2011 — 2026

research
  • Reports
  • Newsletter
  • Methodology
  • Origins
  • Vocab
services
  • Research Sessions
  • Signals Workspace
  • Bespoke Projects
  • Use Cases
  • Signal Scanfree
  • Readinessfree
impact
  • ANBIMAFuture of Brazilian Capital Markets
  • IEEECharting the Energy Transition
  • Horizon 2045Future of Human and Planetary Security
  • WKOTechnology Scanning for Austria
audiences
  • Innovation
  • Strategy
  • Consultants
  • Foresight
  • Associations
  • Governments
resources
  • Pricing
  • Partners
  • How We Work
  • Data Visualization
  • Multi-Model Method
  • FAQ
  • Security & Privacy
about
  • Manifesto
  • Community
  • Events
  • Support
  • Contact
  • Login
ResearchServicesPricingPartnersAbout
ResearchServicesPricingPartnersAbout
  1. Home
  2. Vocab
  3. Self-Supervised Pretraining

Self-Supervised Pretraining

A technique where models learn rich representations from unlabeled data before fine-tuning on specific tasks.

Year: 2018Generality: 794
Back to Vocab

Self-supervised pretraining is a machine learning paradigm in which a model is trained on a large corpus of unlabeled data by solving pretext tasks derived from the data's own structure. Rather than requiring human-annotated labels, the training signal is generated automatically — for example, by masking tokens in a sentence and predicting them, predicting the next word in a sequence, or reconstructing a corrupted image patch. These pretext tasks force the model to develop deep, generalizable internal representations that capture semantic and structural patterns in the data. The pretrained model is then fine-tuned on a downstream task using a comparatively small labeled dataset, dramatically reducing the annotation burden while achieving strong performance.

The mechanics differ across modalities but share a common principle: exploit the redundancy and structure inherent in raw data to create a self-supervised training objective. In natural language processing, models like BERT use masked language modeling, while GPT-style models use autoregressive next-token prediction. In computer vision, contrastive methods such as SimCLR train encoders to produce similar representations for different augmented views of the same image, while masked image modeling approaches like MAE reconstruct missing pixel regions. Each strategy encourages the model to learn features that are broadly useful rather than narrowly fitted to a single task.

Self-supervised pretraining has fundamentally reshaped modern AI development by making it practical to leverage the enormous quantities of unlabeled text, images, and audio available on the internet. The resulting pretrained models — often called foundation models — serve as powerful starting points for a wide range of downstream applications, from document classification and machine translation to object detection and protein structure prediction. This paradigm has also narrowed the gap between supervised and unsupervised learning, and continues to drive state-of-the-art results across virtually every major benchmark in NLP, vision, and multimodal AI.

Related

Related

SSL (Self-Supervised Learning)
SSL (Self-Supervised Learning)

A learning paradigm where models generate their own supervisory signal from unlabeled data.

Generality: 820
Pretrained Model
Pretrained Model

A model trained on large data, reused or fine-tuned for new tasks.

Generality: 838
Continual Pre-Training
Continual Pre-Training

Incrementally updating a pre-trained model on new data while preserving prior knowledge.

Generality: 575
Contrastive Learning
Contrastive Learning

A self-supervised technique that learns representations by comparing similar and dissimilar data pairs.

Generality: 694
MLM (Masked Language Modeling)
MLM (Masked Language Modeling)

A pre-training objective where models learn to predict randomly hidden tokens using bidirectional context.

Generality: 694
Foundation Model
Foundation Model

A large pre-trained model adaptable to many tasks without retraining from scratch.

Generality: 838