When a model memorizes training data noise instead of learning generalizable patterns.
Overfitting occurs when a machine learning model learns the training data too precisely — capturing not just the true underlying signal but also its random noise and idiosyncratic quirks. The result is a model that performs impressively on training examples but fails to generalize to new, unseen data. Intuitively, the model has memorized rather than learned: it has tuned itself so tightly to one dataset that its "knowledge" doesn't transfer. This is one of the most fundamental failure modes in machine learning and affects virtually every class of model, from linear regression to deep neural networks.
Overfitting typically emerges when a model has too much capacity relative to the amount of training data available. A neural network with millions of parameters trained on a few thousand examples has ample room to memorize every sample rather than extract meaningful structure. It can also arise from training for too many epochs, using overly expressive feature sets, or failing to apply any form of regularization. The telltale diagnostic sign is a growing gap between training loss and validation loss as training progresses — the model keeps improving on data it has seen while degrading on data it hasn't.
Practitioners have developed a rich toolkit for combating overfitting. Regularization techniques like L1 (lasso) and L2 (ridge) penalize large parameter weights, discouraging the model from fitting noise. Dropout randomly deactivates neurons during training, forcing the network to learn redundant representations. Early stopping halts training when validation performance begins to decline. Data augmentation artificially expands the training set, and cross-validation provides more reliable estimates of true generalization performance. Choosing a simpler model architecture — one with fewer parameters — is often the most direct remedy.
Understanding overfitting is inseparable from the broader bias-variance tradeoff: reducing model complexity decreases variance (overfitting) but risks increasing bias (underfitting). Finding the right balance is central to building models that are useful in production. As deep learning pushed model sizes into the billions of parameters, overfitting research intensified, yielding techniques like batch normalization, weight decay schedules, and massive dataset curation as standard practice.