Skip to main content

Envisioning is an emerging technology research institute and advisory.

LinkedInInstagramGitHub

2011 — 2026

research
  • Reports
  • Newsletter
  • Methodology
  • Origins
  • Vocab
services
  • Research Sessions
  • Signals Workspace
  • Bespoke Projects
  • Use Cases
  • Signal Scanfree
  • Readinessfree
impact
  • ANBIMAFuture of Brazilian Capital Markets
  • IEEECharting the Energy Transition
  • Horizon 2045Future of Human and Planetary Security
  • WKOTechnology Scanning for Austria
audiences
  • Innovation
  • Strategy
  • Consultants
  • Foresight
  • Associations
  • Governments
resources
  • Pricing
  • Partners
  • How We Work
  • Data Visualization
  • Multi-Model Method
  • FAQ
  • Security & Privacy
about
  • Manifesto
  • Community
  • Events
  • Support
  • Contact
  • Login
ResearchServicesPricingPartnersAbout
ResearchServicesPricingPartnersAbout
  1. Home
  2. Vocab
  3. Empirical Risk Minimization

Empirical Risk Minimization

A core ML principle that minimizes average training loss to learn model parameters.

Year: 1974Generality: 838
Back to Vocab

Empirical Risk Minimization (ERM) is a foundational framework in statistical learning theory that guides how models are trained. Because the true underlying data distribution is almost never known in practice, ERM substitutes the theoretical goal of minimizing expected loss — called the true risk — with the tractable goal of minimizing average loss over the available training samples. This empirical average, computed across a finite dataset, serves as a proxy for the true risk, and optimizing it yields model parameters that fit the observed data as well as possible under a chosen loss function.

The mechanics of ERM are straightforward: given a hypothesis class (a set of candidate models), a loss function measuring prediction error, and a training dataset, the learner selects the hypothesis that achieves the lowest average loss on that data. This process underpins a vast range of algorithms — from linear regression and logistic regression to more complex neural network training — wherever gradient-based or combinatorial optimization is used to fit parameters to data. The choice of loss function (squared error, cross-entropy, hinge loss, etc.) shapes what the minimization procedure rewards and penalizes.

A central concern with ERM is the tension between fitting training data well and generalizing to unseen examples. Minimizing empirical risk too aggressively can lead to overfitting, where a model memorizes training noise rather than learning the true signal. Statistical learning theory, particularly the work of Vladimir Vapnik and Alexey Chervonenkis in the 1970s, formalized this tension through concepts like VC dimension and generalization bounds, showing that ERM produces consistent estimators — converging to the true risk minimizer — when the hypothesis class is sufficiently constrained relative to sample size.

ERM remains the conceptual backbone of modern machine learning. Regularization techniques such as L1/L2 penalties, dropout, and early stopping can all be understood as modifications to pure ERM that penalize complexity to improve generalization. Understanding ERM is therefore essential for diagnosing model behavior, designing training objectives, and reasoning about when learned models will perform reliably in deployment.

Related

Related

Minimax Loss
Minimax Loss

An optimization strategy that minimizes the worst-case maximum loss an adversary can cause.

Generality: 520
Loss Optimization
Loss Optimization

Iteratively adjusting model parameters to minimize prediction error measured by a loss function.

Generality: 875
Regularization
Regularization

A technique that penalizes model complexity to prevent overfitting and improve generalization.

Generality: 876
Prediction Error
Prediction Error

The gap between a model's predicted values and the actual observed outcomes.

Generality: 875
MLE (Maximum Likelihood Estimation)
MLE (Maximum Likelihood Estimation)

A parameter estimation method that finds values making observed data most probable.

Generality: 875
Loss Function
Loss Function

A mathematical measure of error that guides model training toward better predictions.

Generality: 909