The fundamental tension between model complexity and generalization that governs prediction error.
The bias-variance trade-off describes a core tension in supervised machine learning: every model's prediction error on unseen data can be decomposed into three components — bias, variance, and irreducible noise. Bias measures how far a model's average predictions are from the true values, arising when the model makes overly simplistic assumptions about the underlying data-generating process. Variance measures how much predictions fluctuate across different training sets, arising when the model is too sensitive to the specific data it was trained on. These two sources of error pull in opposite directions: reducing bias typically requires increasing model complexity, which tends to inflate variance, and vice versa.
In practice, a high-bias model underfits the data — a linear regression applied to a highly nonlinear problem, for example, will consistently miss important patterns regardless of how much data it sees. A high-variance model overfits, memorizing training examples including their noise, and fails to generalize. The goal of model selection, regularization, and hyperparameter tuning is largely to navigate this trade-off, finding a level of complexity where total expected error is minimized. Techniques like cross-validation, dropout, L1/L2 regularization, and ensemble methods such as bagging and boosting are all, in different ways, strategies for managing this balance.
Understanding the bias-variance trade-off is essential for diagnosing model behavior and choosing appropriate remedies. When a model performs poorly on training data, high bias is usually the culprit; when it performs well on training data but poorly on held-out data, high variance is the likely cause. The framework also informs the surprising success of modern deep learning, where very large models trained with sufficient data and regularization can achieve low bias without catastrophic variance — a phenomenon that has prompted researchers to revisit classical bias-variance intuitions through the lens of double descent and interpolation regimes.