An ensemble method that combines weak learners sequentially into a strong predictor.
Boosting is an ensemble learning technique that converts a sequence of weak learners — models that perform only slightly better than random chance — into a single, highly accurate predictor. Unlike bagging methods that train models independently in parallel, boosting trains models sequentially, with each new model explicitly correcting the errors made by its predecessors. The result is a weighted combination of classifiers that collectively achieve far greater predictive power than any individual component.
The mechanism works by maintaining a probability distribution over the training data that evolves across iterations. Initially, all examples are weighted equally. After each weak learner is trained, the weights of misclassified examples are increased so that the next learner is forced to focus on the hardest cases. Each weak learner is then assigned a vote weight proportional to its accuracy, and the final prediction is a weighted majority vote across all learners. This adaptive reweighting is what distinguishes boosting from simpler ensemble strategies.
AdaBoost (Adaptive Boosting), introduced by Yoav Freund and Robert Schapire in 1996, was the first widely adopted boosting algorithm and remains a foundational reference point. Subsequent innovations — most notably Gradient Boosting Machines (GBM) and its highly optimized descendants XGBoost, LightGBM, and CatBoost — reframed boosting as iterative gradient descent in function space, dramatically expanding its flexibility and performance. These gradient-based variants dominate structured/tabular data competitions and production systems today.
Boosting matters because it reliably reduces both bias and variance, making it one of the most effective off-the-shelf approaches for supervised learning on tabular data. Its sequential nature does introduce a computational cost — models cannot be trained in parallel as easily as in bagging — but modern implementations mitigate this through parallelized tree construction and hardware acceleration. Boosting also provides natural mechanisms for feature importance estimation, making trained models more interpretable than many black-box alternatives.