A method that fits models to data by minimizing squared prediction errors.
Least squares regression is a foundational optimization technique used to estimate the parameters of a model by minimizing the sum of squared differences between observed data points and the values predicted by the model. In the context of machine learning, it provides the mathematical basis for ordinary linear regression, where the goal is to find a set of weights such that the resulting linear function best approximates the target variable across a training dataset. The closed-form solution — known as the normal equations — allows exact parameter estimation without iterative optimization, making it computationally attractive when the number of features is manageable and the data satisfies standard assumptions such as linearity, independence, and homoscedasticity.
The mechanics of least squares hinge on a geometric interpretation: the optimal parameter vector projects the observed output onto the column space of the input matrix, yielding the closest possible approximation in terms of Euclidean distance. This perspective connects least squares to linear algebra and explains why the method is equivalent to maximum likelihood estimation under the assumption of Gaussian-distributed residuals. When the input matrix is ill-conditioned or features are collinear, regularized variants such as Ridge regression (L2 penalty) and Lasso (L1 penalty) extend the framework to improve generalization and handle high-dimensional settings.
In modern machine learning, least squares serves both as a standalone algorithm and as a conceptual building block. It underpins the mean squared error (MSE) loss function widely used in regression tasks, and its gradient forms the basis for understanding how gradient descent updates weights in neural networks. Weighted least squares and iteratively reweighted least squares (IRLS) extend the method to handle non-uniform error distributions and generalized linear models, broadening its applicability considerably.
Despite the rise of more complex models, least squares regression remains highly relevant due to its interpretability, efficiency, and strong theoretical guarantees. It is often the first model applied to a regression problem, serving as a performance baseline and a diagnostic tool for understanding data structure. Its simplicity makes it indispensable in domains ranging from econometrics and signal processing to scientific computing and feature engineering pipelines.