Skip to main content

Envisioning is an emerging technology research institute and advisory.

LinkedInInstagramGitHub

2011 — 2026

research
  • Reports
  • Newsletter
  • Methodology
  • Origins
  • Vocab
services
  • Research Sessions
  • Signals Workspace
  • Bespoke Projects
  • Use Cases
  • Signal Scanfree
  • Readinessfree
impact
  • ANBIMAFuture of Brazilian Capital Markets
  • IEEECharting the Energy Transition
  • Horizon 2045Future of Human and Planetary Security
  • WKOTechnology Scanning for Austria
audiences
  • Innovation
  • Strategy
  • Consultants
  • Foresight
  • Associations
  • Governments
resources
  • Pricing
  • Partners
  • How We Work
  • Data Visualization
  • Multi-Model Method
  • FAQ
  • Security & Privacy
about
  • Manifesto
  • Community
  • Events
  • Support
  • Contact
  • Login
ResearchServicesPricingPartnersAbout
ResearchServicesPricingPartnersAbout
  1. Home
  2. Vocab
  3. Semi-Supervised Learning

Semi-Supervised Learning

Training models using both small labeled datasets and large unlabeled datasets together.

Year: 2006Generality: 796
Back to Vocab

Semi-supervised learning is a machine learning paradigm that trains models using a combination of a small amount of labeled data and a much larger pool of unlabeled data. It occupies the space between supervised learning, which requires fully labeled datasets, and unsupervised learning, which operates without any labels at all. This approach is especially valuable in domains where labeling data is costly, slow, or requires specialized expertise — such as medical imaging, speech recognition, or natural language processing — while raw unlabeled data is abundant and cheap to collect.

The core assumption underlying most semi-supervised methods is that the structure of the unlabeled data carries meaningful information about the underlying data distribution, which can guide the learning process. Several techniques exploit this idea in different ways. Self-training iteratively uses a model's own high-confidence predictions on unlabeled examples as pseudo-labels, expanding the effective training set over successive rounds. Co-training trains two models on different feature views of the data, allowing each to label examples for the other. Graph-based methods construct a similarity graph over all data points and propagate labels from labeled nodes to unlabeled ones. More recently, consistency regularization methods — used in approaches like MixMatch, FixMatch, and UDA — enforce that a model's predictions remain stable under data augmentation, leveraging unlabeled data to improve generalization.

The practical impact of semi-supervised learning has grown substantially with deep learning. Modern methods can approach the performance of fully supervised models using only a fraction of the labeled examples, dramatically reducing annotation costs. This is particularly significant in fields like healthcare, where expert annotation is scarce and expensive, or in low-resource languages where labeled corpora are limited.

Semi-supervised learning also connects closely to related paradigms such as self-supervised learning, transfer learning, and active learning, all of which address the challenge of learning effectively when labeled data is scarce. As datasets grow larger and labeling bottlenecks persist, semi-supervised techniques remain a critical tool for building capable models under real-world constraints.

Related

Related

SSL (Self-Supervised Learning)
SSL (Self-Supervised Learning)

A learning paradigm where models generate their own supervisory signal from unlabeled data.

Generality: 820
Supervised Learning
Supervised Learning

Training models on labeled input-output pairs to predict or classify new data.

Generality: 900
Supervision
Supervision

Training ML models using labeled input-output pairs to guide learning.

Generality: 820
Scaled Supervision Method
Scaled Supervision Method

An AI training approach that improves model performance through large-scale, high-quality labeled data.

Generality: 337
Unsupervised Learning
Unsupervised Learning

Machine learning that discovers hidden patterns in data without labeled examples.

Generality: 850
Self-Supervised Pretraining
Self-Supervised Pretraining

A technique where models learn rich representations from unlabeled data before fine-tuning on specific tasks.

Generality: 794