Skip to main content

Envisioning is an emerging technology research institute and advisory.

LinkedInInstagramGitHub

2011 — 2026

research
  • Reports
  • Newsletter
  • Methodology
  • Origins
  • Vocab
services
  • Research Sessions
  • Signals Workspace
  • Bespoke Projects
  • Use Cases
  • Signal Scanfree
  • Readinessfree
impact
  • ANBIMAFuture of Brazilian Capital Markets
  • IEEECharting the Energy Transition
  • Horizon 2045Future of Human and Planetary Security
  • WKOTechnology Scanning for Austria
audiences
  • Innovation
  • Strategy
  • Consultants
  • Foresight
  • Associations
  • Governments
resources
  • Pricing
  • Partners
  • How We Work
  • Data Visualization
  • Multi-Model Method
  • FAQ
  • Security & Privacy
about
  • Manifesto
  • Community
  • Events
  • Support
  • Contact
  • Login
ResearchServicesPricingPartnersAbout
ResearchServicesPricingPartnersAbout
  1. Home
  2. Vocab
  3. MTL (Multi-Task Learning)

MTL (Multi-Task Learning)

Training a single model simultaneously on multiple related tasks to improve generalization.

Year: 1997Generality: 796
Back to Vocab

Multi-Task Learning (MTL) is a machine learning paradigm in which a single model is trained to perform multiple tasks simultaneously, rather than training separate models for each task in isolation. The core intuition is that related tasks share underlying structure — common features, representations, or inductive biases — and that learning them jointly allows the model to exploit these relationships. By sharing parameters or intermediate representations across tasks, the model receives richer training signal and is less likely to overfit to the noise of any single task. This makes MTL particularly valuable when labeled data for individual tasks is scarce, since supervision from one task can effectively regularize learning on another.

In practice, MTL architectures typically feature a shared backbone that learns common representations, alongside task-specific output heads that specialize for each objective. The degree of sharing can vary: hard parameter sharing ties the same weights across all tasks, while soft parameter sharing allows separate parameters that are regularized to remain similar. In natural language processing, for example, a single transformer model might simultaneously learn named entity recognition, sentiment classification, and syntactic parsing — each task reinforcing the shared language representations. In computer vision, joint training on depth estimation, surface normal prediction, and semantic segmentation has been shown to improve performance on all three objectives compared to single-task baselines.

The challenge in MTL lies in managing task relationships carefully. Not all tasks benefit equally from joint training — when tasks conflict or require incompatible representations, naive sharing can lead to negative transfer, where performance on one task degrades due to interference from another. Researchers have developed techniques such as gradient surgery, task weighting, and learned routing mechanisms to mitigate this. Selecting which tasks to train together, and how to balance their losses, remains an active area of research.

MTL has become foundational in modern large-scale models. Systems like GPT and T5 are trained on diverse objectives that can be viewed through an MTL lens, and instruction-tuned models explicitly optimize across hundreds of tasks simultaneously. The paradigm bridges the gap between narrow specialist models and general-purpose systems, making it central to the pursuit of more capable and data-efficient AI.

Related

Related

Meta-Learning
Meta-Learning

A paradigm enabling models to learn how to learn across tasks efficiently.

Generality: 756
Transfer Learning
Transfer Learning

Reusing a model trained on one task to accelerate learning on another.

Generality: 820
MLLMs (Multimodal Large Language Models)
MLLMs (Multimodal Large Language Models)

AI systems that understand and generate content across text, images, audio, and more.

Generality: 794
MoT (Mixture of Transformers)
MoT (Mixture of Transformers)

An architecture combining multiple specialized transformers to capture richer, more diverse representations.

Generality: 337
Multimodal
Multimodal

AI systems that process and integrate multiple data types like text, images, and audio.

Generality: 796
Multi-Token Prediction
Multi-Token Prediction

A generation strategy where language models predict multiple output tokens simultaneously.

Generality: 380