A technique that penalizes model complexity to prevent overfitting and improve generalization.
Regularization is a family of techniques used in machine learning to prevent overfitting — the tendency of models to memorize training data rather than learn generalizable patterns. When a model is too complex relative to the amount of training data available, it can fit noise and idiosyncrasies in the training set, causing it to perform poorly on new, unseen examples. Regularization counteracts this by adding a penalty term to the loss function that grows with model complexity, effectively discouraging the learning algorithm from assigning large weights to any single feature or combination of features.
The two most common forms are L1 regularization (Lasso) and L2 regularization (Ridge). L2 adds a penalty proportional to the sum of squared model weights, shrinking all weights toward zero but rarely eliminating them entirely. L1 adds a penalty proportional to the sum of absolute weight values, which has the useful property of driving some weights to exactly zero — effectively performing feature selection. Elastic Net combines both penalties, offering a balance between sparsity and smooth weight shrinkage. The strength of regularization is controlled by a hyperparameter (often denoted λ or α) that must be tuned, typically via cross-validation.
Beyond L1 and L2, regularization encompasses a broader set of strategies. Dropout in neural networks randomly deactivates neurons during training, forcing the network to learn redundant representations. Early stopping halts training before the model fully converges on training data. Data augmentation and noise injection implicitly regularize by expanding the effective training distribution. Weight decay, batch normalization, and max-norm constraints serve similar purposes in deep learning contexts.
Regularization is one of the most practically important concepts in applied machine learning. It is nearly universally applied in modern models — from linear regression to large-scale deep neural networks — because the risk of overfitting increases with model capacity and data scarcity. Understanding the bias-variance tradeoff that regularization manages is foundational to building models that perform reliably in production, making it an essential tool for any practitioner working with learned systems.