A classification algorithm that models the probability of a binary outcome.
Logistic regression is a foundational supervised learning algorithm used primarily for binary classification, where the goal is to predict whether an input belongs to one of two classes. Rather than outputting a raw continuous value, it models the probability that a given example belongs to the positive class by passing a linear combination of input features through the sigmoid (logistic) function. This S-shaped function squashes any real-valued number into the range [0, 1], producing an interpretable probability score. A decision threshold — typically 0.5 — then converts this probability into a discrete class label.
Training logistic regression means finding the optimal set of weights (coefficients) for the input features. This is done by maximizing the log-likelihood of the observed labels given the model's predictions, which is equivalent to minimizing binary cross-entropy loss. Unlike closed-form solutions available in linear regression, this optimization is typically performed iteratively using gradient descent or more sophisticated solvers such as L-BFGS. Regularization terms (L1 or L2) are commonly added to the loss function to prevent overfitting and encourage sparse or small-magnitude weights.
Despite its name, logistic regression is a classification algorithm, not a regression one. It generalizes naturally to multi-class problems through extensions like one-vs-rest (OvR) schemes or multinomial logistic regression (softmax regression), where a separate set of weights is learned for each class. The model's linear decision boundary makes it highly interpretable — each coefficient directly reflects the log-odds contribution of its corresponding feature — which is a significant advantage in domains like medicine, finance, and social science where explainability matters.
Logistic regression remains widely used in modern machine learning as a strong baseline, a building block in ensemble methods, and the output layer of neural networks for classification tasks. Its computational efficiency, probabilistic outputs, and interpretability make it a go-to choice when data is linearly separable or when model transparency is a priority. Understanding logistic regression also provides essential intuition for more complex models, including neural networks and generalized linear models.