A learning paradigm where models generate their own supervisory signal from unlabeled data.
Self-supervised learning (SSL) is a machine learning paradigm in which a model learns useful data representations without relying on human-provided labels. Instead of external annotation, the training signal is derived from the data itself by constructing pretext tasks — artificially generated prediction problems where one part of the input is used to predict another. Examples include predicting masked words in a sentence, forecasting the next frame in a video, or identifying whether two image patches come from the same image. Because supervision emerges from the data's own structure, SSL can exploit vast quantities of unlabeled data that would otherwise be unusable in traditional supervised settings.
The mechanics of SSL typically involve two stages. First, a model is pretrained on a pretext task using large unlabeled datasets, forcing it to develop rich internal representations that capture meaningful structure in the data. Second, these learned representations are transferred to downstream tasks — often via fine-tuning on a small labeled dataset — where they consistently outperform models trained from scratch. Contrastive methods like SimCLR and MoCo, masked modeling approaches like BERT and MAE, and generative techniques like GPT all fall under the SSL umbrella, each with different inductive biases about what structure is worth learning.
SSL has become one of the most consequential ideas in modern AI, underpinning virtually every major foundation model in natural language processing and computer vision. Its importance stems from a practical reality: labeled data is expensive and scarce, while unlabeled data is abundant. By closing this gap, SSL has enabled models of unprecedented capability — GPT-style language models, vision transformers, and multimodal systems — to be trained at scale. The paradigm has shifted the field's center of gravity away from task-specific supervised learning toward general-purpose pretraining, fundamentally changing how large models are built and deployed.