Skip to main content

Envisioning is an emerging technology research institute and advisory.

LinkedInInstagramGitHub

2011 — 2026

research
  • Reports
  • Newsletter
  • Methodology
  • Origins
  • Vocab
services
  • Research Sessions
  • Signals Workspace
  • Bespoke Projects
  • Use Cases
  • Signal Scanfree
  • Readinessfree
impact
  • ANBIMAFuture of Brazilian Capital Markets
  • IEEECharting the Energy Transition
  • Horizon 2045Future of Human and Planetary Security
  • WKOTechnology Scanning for Austria
audiences
  • Innovation
  • Strategy
  • Consultants
  • Foresight
  • Associations
  • Governments
resources
  • Pricing
  • Partners
  • How We Work
  • Data Visualization
  • Multi-Model Method
  • FAQ
  • Security & Privacy
about
  • Manifesto
  • Community
  • Events
  • Support
  • Contact
  • Login
ResearchServicesPricingPartnersAbout
ResearchServicesPricingPartnersAbout
  1. Home
  2. Vocab
  3. Occam's Razor

Occam's Razor

Prefer the simplest model that adequately explains the data.

Year: 1990Generality: 792
Back to Vocab

Occam's Razor is a philosophical principle holding that, among competing explanations of equal predictive power, the simplest one should be preferred. In machine learning, this translates directly into a bias toward models with fewer parameters, fewer assumptions, and lower structural complexity. Rather than a hard rule, it serves as a guiding heuristic for model selection — a reminder that unnecessary complexity is a liability, not an asset.

The practical motivation for this principle in ML is the problem of overfitting. A sufficiently complex model can memorize training data, capturing noise rather than the true underlying pattern, and consequently fail to generalize to new examples. Simpler models, by contrast, are less likely to fit spurious structure and tend to perform more reliably on held-out data. This intuition is formalized in several theoretical frameworks, including the Vapnik-Chervonenkis (VC) dimension, which quantifies model capacity and its relationship to generalization error, and minimum description length (MDL), which frames learning as compression — the best model is the one that most compactly encodes both the data and the model itself.

Occam's Razor also underpins many regularization techniques used throughout modern machine learning. L1 and L2 regularization, for instance, penalize model complexity by adding terms to the loss function that discourage large parameter values, effectively enforcing parsimony during training. Bayesian model selection operationalizes the same idea through prior distributions that favor simpler hypotheses, with more complex models required to earn their complexity through substantially better likelihood.

Despite its intuitive appeal, Occam's Razor is not without nuance. The rise of large neural networks has complicated the picture: massively overparameterized models can generalize surprisingly well, a phenomenon that challenges classical notions of the complexity-generalization tradeoff. Modern research into implicit regularization, the lottery ticket hypothesis, and double descent has revealed that the relationship between model size and generalization is richer than the simple principle suggests. Occam's Razor remains a valuable default, but contemporary ML has shown that simplicity and performance do not always align as neatly as the principle implies.

Related

Related

Simplicity Bias
Simplicity Bias

The tendency of ML models to favor simpler patterns or hypotheses over complex ones.

Generality: 520
MDL (Minimum Description Length)
MDL (Minimum Description Length)

An information-theoretic principle selecting the model that most compresses data plus its own description.

Generality: 692
Bias-Variance Dilemma
Bias-Variance Dilemma

The fundamental trade-off between model simplicity and sensitivity to training data.

Generality: 838
Bias-Variance Trade-off
Bias-Variance Trade-off

The fundamental tension between model complexity and generalization that governs prediction error.

Generality: 875
Sparsity
Sparsity

A principle where models use mostly zero values to improve efficiency.

Generality: 752
Regularization
Regularization

A technique that penalizes model complexity to prevent overfitting and improve generalization.

Generality: 876