The fundamental trade-off between model simplicity and sensitivity to training data.
The bias-variance dilemma is a core concept in supervised machine learning describing the inherent tension between two sources of prediction error. Bias measures how far a model's average predictions are from the true values — a high-bias model is too rigid, making overly simplistic assumptions that cause it to underfit the data. Variance measures how much a model's predictions fluctuate across different training sets — a high-variance model is too flexible, fitting noise in the training data rather than the underlying signal, a phenomenon known as overfitting. The total expected prediction error of a model decomposes mathematically into these two components plus irreducible noise, making the trade-off precise and quantifiable.
In practice, model complexity sits at the heart of this dilemma. Simple models — linear regression, shallow decision trees — tend to have high bias and low variance: they consistently miss patterns but do so predictably. Complex models — deep neural networks, high-degree polynomials — tend to have low bias and high variance: they can capture intricate patterns but are sensitive to the specific training sample used. As model complexity increases, bias typically falls while variance rises, and the optimal model sits at the sweet spot where total error is minimized.
Understanding this trade-off has driven the development of many foundational ML techniques. Regularization methods such as L1 and L2 penalties constrain model complexity to reduce variance at the cost of a small increase in bias. Ensemble methods like bagging reduce variance by averaging predictions across many models trained on different data subsets, while boosting reduces bias by iteratively correcting errors. Cross-validation provides a practical tool for estimating where a model sits on the bias-variance spectrum without access to held-out test data.
The dilemma has also shaped modern thinking about deep learning, where very large models sometimes defy classical expectations — a phenomenon called double descent, where test error decreases again after an initial rise as model size grows far beyond the interpolation threshold. This has prompted researchers to revisit and refine the classical bias-variance framework, making it an active area of theoretical inquiry even as it remains an essential lens for practitioners designing and evaluating models.