Skip to main content

Envisioning is an emerging technology research institute and advisory.

LinkedInInstagramGitHub

2011 — 2026

research
  • Reports
  • Newsletter
  • Methodology
  • Origins
  • Vocab
services
  • Research Sessions
  • Signals Workspace
  • Bespoke Projects
  • Use Cases
  • Signal Scanfree
  • Readinessfree
impact
  • ANBIMAFuture of Brazilian Capital Markets
  • IEEECharting the Energy Transition
  • Horizon 2045Future of Human and Planetary Security
  • WKOTechnology Scanning for Austria
audiences
  • Innovation
  • Strategy
  • Consultants
  • Foresight
  • Associations
  • Governments
resources
  • Pricing
  • Partners
  • How We Work
  • Data Visualization
  • Multi-Model Method
  • FAQ
  • Security & Privacy
about
  • Manifesto
  • Community
  • Events
  • Support
  • Contact
  • Login
ResearchServicesPricingPartnersAbout
ResearchServicesPricingPartnersAbout
  1. Home
  2. Vocab
  3. MDL (Minimum Description Length)

MDL (Minimum Description Length)

An information-theoretic principle selecting the model that most compresses data plus its own description.

Year: 1978Generality: 692
Back to Vocab

Minimum Description Length (MDL) is a model selection principle grounded in information theory that operationalizes Occam's Razor: the best explanation for a dataset is the one that produces the shortest combined description of both the model and the data encoded under that model. Formally, given a set of candidate models, MDL selects the model H that minimizes L(H) + L(D|H), where L(H) is the code length needed to describe the hypothesis and L(D|H) is the code length needed to describe the data given that hypothesis. This two-part framing forces a principled trade-off — a more complex model may fit the data better and reduce L(D|H), but its increased complexity raises L(H), so only genuinely useful structure earns its representational cost.

MDL connects deeply to several foundational ideas in machine learning and statistics. It has strong ties to Bayesian inference, where minimizing description length corresponds to maximizing a penalized posterior probability, and to Kolmogorov complexity, which provides the theoretical ideal of the shortest possible description of any object. In practice, exact Kolmogorov complexity is uncomputable, so MDL implementations use tractable surrogate code lengths derived from probabilistic models, leading to practical variants such as the Normalized Maximum Likelihood (NML) and prequential (predictive sequential) formulations developed through the 1980s and 1990s.

In machine learning, MDL provides a theoretically motivated alternative to heuristic regularization techniques like L1/L2 penalties or held-out validation for model selection. It has been applied to decision tree pruning, neural architecture selection, feature selection, and clustering, offering a unified lens through which model complexity and generalization are jointly managed. Because MDL penalizes models that encode noise rather than signal, it naturally guards against overfitting without requiring a separate validation set. Its influence extends into modern deep learning discussions around compression-based generalization bounds, making it a durable and increasingly relevant framework as practitioners seek principled explanations for why large models generalize.

Related

Related

Occam's Razor
Occam's Razor

Prefer the simplest model that adequately explains the data.

Generality: 792
MLE (Maximum Likelihood Estimation)
MLE (Maximum Likelihood Estimation)

A parameter estimation method that finds values making observed data most probable.

Generality: 875
Kolmogorov Complexity
Kolmogorov Complexity

The length of the shortest program that produces a given string as output.

Generality: 760
MRL (Matryoshka Representation Learning)
MRL (Matryoshka Representation Learning)

A technique that encodes information at multiple granularities within a single embedding vector.

Generality: 293
Model Compression
Model Compression

Techniques that shrink machine learning models while preserving predictive accuracy.

Generality: 795
Solomonoff Induction
Solomonoff Induction

A universal Bayesian framework for prediction grounded in algorithmic information theory.

Generality: 678