Skip to main content

Envisioning is an emerging technology research institute and advisory.

LinkedInInstagramGitHub

2011 — 2026

research
  • Reports
  • Newsletter
  • Methodology
  • Origins
  • Vocab
services
  • Research Sessions
  • Signals Workspace
  • Bespoke Projects
  • Use Cases
  • Signal Scanfree
  • Readinessfree
impact
  • ANBIMAFuture of Brazilian Capital Markets
  • IEEECharting the Energy Transition
  • Horizon 2045Future of Human and Planetary Security
  • WKOTechnology Scanning for Austria
audiences
  • Innovation
  • Strategy
  • Consultants
  • Foresight
  • Associations
  • Governments
resources
  • Pricing
  • Partners
  • How We Work
  • Data Visualization
  • Multi-Model Method
  • FAQ
  • Security & Privacy
about
  • Manifesto
  • Community
  • Events
  • Support
  • Contact
  • Login
ResearchServicesPricingPartnersAbout
ResearchServicesPricingPartnersAbout
  1. Home
  2. Vocab
  3. Bagging

Bagging

Ensemble method that trains multiple models on random data subsets and aggregates predictions.

Year: 1994Generality: 694
Back to Vocab

Bagging, short for Bootstrap Aggregating, is an ensemble learning technique designed to reduce variance and combat overfitting in predictive models. Rather than relying on a single model trained on the full dataset, bagging constructs a collection of models—each trained on a different random subset of the training data—and combines their outputs into a single, more reliable prediction. This approach is especially effective with high-variance models like decision trees, which are sensitive to small fluctuations in training data.

The mechanics of bagging center on a statistical technique called bootstrap sampling: given a training set of n examples, each model in the ensemble is trained on a new dataset of size n drawn with replacement from the original. Because sampling is done with replacement, each bootstrap sample typically contains about 63% unique examples, with the remainder being duplicates. The models are trained independently and in parallel, making bagging computationally efficient. For regression tasks, predictions are averaged across all models; for classification, the ensemble takes a majority vote. The diversity introduced by different training subsets ensures that individual model errors tend to cancel out in the aggregate.

Bagging matters because it provides a principled, general-purpose way to improve the stability and accuracy of any learning algorithm that exhibits high variance. It laid the groundwork for some of the most successful algorithms in applied machine learning—most notably Random Forests, which extends bagging by also randomizing the feature selection at each split, further decorrelating the individual trees. The technique introduced by Leo Breiman in 1994 demonstrated that simple aggregation of weak, unstable learners could yield predictions competitive with or superior to more complex single models.

Beyond decision trees, bagging has been applied to neural networks, regression models, and other learners, though its benefits are most pronounced when the base model is unstable. Its conceptual simplicity, combined with strong empirical performance, has made it a foundational concept in the ensemble learning literature and a standard tool in modern machine learning practice.

Related

Related

Ensemble Methods
Ensemble Methods

Combining multiple trained models to produce predictions stronger than any single model.

Generality: 771
Ensemble Algorithm
Ensemble Algorithm

Combines multiple models to boost predictive accuracy, robustness, and generalization.

Generality: 796
Ensemble Learning
Ensemble Learning

Combining multiple models to produce predictions more accurate than any single model.

Generality: 836
Boosting
Boosting

An ensemble method that combines weak learners sequentially into a strong predictor.

Generality: 796
Random Forest
Random Forest

An ensemble of decision trees that improves accuracy and resists overfitting.

Generality: 796
Out-of-Bag Evaluation
Out-of-Bag Evaluation

A built-in validation method for ensemble models using bootstrap sampling's unused data.

Generality: 492