Skip to main content

Envisioning is an emerging technology research institute and advisory.

LinkedInInstagramGitHub

2011 — 2026

research
  • Reports
  • Newsletter
  • Methodology
  • Origins
  • Vocab
services
  • Research Sessions
  • Signals Workspace
  • Bespoke Projects
  • Use Cases
  • Signal Scanfree
  • Readinessfree
impact
  • ANBIMAFuture of Brazilian Capital Markets
  • IEEECharting the Energy Transition
  • Horizon 2045Future of Human and Planetary Security
  • WKOTechnology Scanning for Austria
audiences
  • Innovation
  • Strategy
  • Consultants
  • Foresight
  • Associations
  • Governments
resources
  • Pricing
  • Partners
  • How We Work
  • Data Visualization
  • Multi-Model Method
  • FAQ
  • Security & Privacy
about
  • Manifesto
  • Community
  • Events
  • Support
  • Contact
  • Login
ResearchServicesPricingPartnersAbout
ResearchServicesPricingPartnersAbout
  1. Home
  2. Vocab
  3. Cross-Validation

Cross-Validation

A resampling technique that estimates how well a model generalizes to unseen data.

Year: 1974Generality: 838
Back to Vocab

Cross-validation is a foundational model evaluation technique in machine learning that estimates how well a trained model will perform on independent, unseen data. Rather than relying on a single train-test split—which can produce misleading results depending on how the data happens to be divided—cross-validation systematically rotates which portion of the data is held out for evaluation. The most widely used variant, k-fold cross-validation, partitions the dataset into k equally sized subsets. The model is trained k times, each time using a different fold as the validation set and the remaining k−1 folds as training data. Performance metrics are then averaged across all k runs, yielding a more stable and reliable estimate of generalization ability.

Beyond standard k-fold, several specialized variants exist to handle different data conditions. Stratified k-fold preserves the class distribution within each fold, making it essential for imbalanced classification problems. Leave-one-out cross-validation (LOOCV) is an extreme case where k equals the number of samples, useful when data is very scarce but computationally expensive. Time-series data requires walk-forward or rolling-window validation to respect temporal ordering and prevent data leakage from future observations into past training windows.

Cross-validation plays a critical role in the model selection and hyperparameter tuning pipeline. By comparing cross-validated scores across different model architectures or parameter settings, practitioners can make principled choices without overfitting their decisions to a fixed test set. It also provides diagnostic information: high variance across folds suggests the model is sensitive to the specific training data, while consistently poor scores across folds indicate underfitting. Nested cross-validation—where an outer loop estimates generalization error and an inner loop tunes hyperparameters—offers an unbiased evaluation when both tasks must be performed on the same dataset.

The practical importance of cross-validation grows when labeled data is limited, as it allows nearly all available examples to contribute to both training and evaluation. In modern deep learning, where datasets are often large and training is expensive, simpler held-out validation sets are common, but cross-validation remains the gold standard for tabular data, scientific applications, and any setting where reliable performance estimates are critical.

Related

Related

Validation Data
Validation Data

A held-out dataset used to tune and evaluate models during training.

Generality: 820
Validation Set
Validation Set

A held-out dataset used to tune hyperparameters and guide model development.

Generality: 820
Validation Metric
Validation Metric

A quantitative measure used to evaluate model performance on held-out data.

Generality: 780
Out-of-Bag Evaluation
Out-of-Bag Evaluation

A built-in validation method for ensemble models using bootstrap sampling's unused data.

Generality: 492
Generalization
Generalization

A model's ability to perform accurately on new, previously unseen data.

Generality: 913
Test Set
Test Set

A held-out dataset used to evaluate a trained model's real-world generalization.

Generality: 820