A supervised learning approach that predicts continuous numerical outcomes from input variables.
Regression is a class of supervised machine learning methods that model the relationship between one or more input features and a continuous target variable. Unlike classification, which assigns inputs to discrete categories, regression produces real-valued outputs — making it the natural choice for tasks like predicting house prices, estimating energy consumption, or forecasting demand. The learned model captures how changes in input variables correspond to changes in the output, enabling predictions on new, unseen data.
The most foundational form is linear regression, which fits a weighted sum of input features to minimize prediction error — typically measured as mean squared error. The optimal weights are found analytically via the normal equations or iteratively through gradient descent. More expressive variants include polynomial regression, ridge and lasso regression (which add regularization to prevent overfitting), and logistic regression (which, despite its name, is used for classification by modeling class probabilities). Nonlinear regression methods, including regression trees, support vector regression, and neural networks, can capture complex, high-dimensional relationships that linear models cannot.
Regression sits at the core of machine learning practice. Nearly every neural network trained on a continuous output — whether predicting protein structure energies, stock returns, or sensor readings — is solving a regression problem. Evaluation metrics such as mean absolute error (MAE), root mean squared error (RMSE), and R² score provide standardized ways to assess model quality and compare approaches.
The technique's power lies in its interpretability and versatility. Simple linear models offer transparent, auditable predictions that are critical in regulated industries like healthcare and finance. More complex regression models, meanwhile, achieve state-of-the-art accuracy on challenging benchmarks. Understanding regression — its assumptions, failure modes, and regularization strategies — remains one of the most essential competencies in applied machine learning, forming the conceptual backbone from which more advanced methods are built.