A core ML principle that minimizes average training loss to learn model parameters.
Empirical Risk Minimization (ERM) is a foundational framework in statistical learning theory that guides how models are trained. Because the true underlying data distribution is almost never known in practice, ERM substitutes the theoretical goal of minimizing expected loss — called the true risk — with the tractable goal of minimizing average loss over the available training samples. This empirical average, computed across a finite dataset, serves as a proxy for the true risk, and optimizing it yields model parameters that fit the observed data as well as possible under a chosen loss function.
The mechanics of ERM are straightforward: given a hypothesis class (a set of candidate models), a loss function measuring prediction error, and a training dataset, the learner selects the hypothesis that achieves the lowest average loss on that data. This process underpins a vast range of algorithms — from linear regression and logistic regression to more complex neural network training — wherever gradient-based or combinatorial optimization is used to fit parameters to data. The choice of loss function (squared error, cross-entropy, hinge loss, etc.) shapes what the minimization procedure rewards and penalizes.
A central concern with ERM is the tension between fitting training data well and generalizing to unseen examples. Minimizing empirical risk too aggressively can lead to overfitting, where a model memorizes training noise rather than learning the true signal. Statistical learning theory, particularly the work of Vladimir Vapnik and Alexey Chervonenkis in the 1970s, formalized this tension through concepts like VC dimension and generalization bounds, showing that ERM produces consistent estimators — converging to the true risk minimizer — when the hypothesis class is sufficiently constrained relative to sample size.
ERM remains the conceptual backbone of modern machine learning. Regularization techniques such as L1/L2 penalties, dropout, and early stopping can all be understood as modifications to pure ERM that penalize complexity to improve generalization. Understanding ERM is therefore essential for diagnosing model behavior, designing training objectives, and reasoning about when learned models will perform reliably in deployment.