A held-out dataset used to tune and evaluate models during training.
Validation data is a dedicated subset of a dataset that is withheld from the training process and used to assess model performance during development. Unlike training data, which the model directly learns from, validation data provides an independent signal that reveals how well the model is generalizing to unseen examples at each stage of training. This feedback loop allows practitioners to make informed decisions about model architecture, regularization strength, learning rate, and other hyperparameters without contaminating the final evaluation.
The mechanics of validation data are straightforward: after each training epoch or optimization step, the model's current parameters are frozen and its predictions are evaluated against the validation set. Metrics such as accuracy, loss, or F1 score computed on this set serve as a proxy for real-world performance. When validation performance stops improving while training performance continues to rise, this divergence is a classic signal of overfitting — the model is memorizing training examples rather than learning generalizable patterns. Early stopping, one of the most common regularization techniques, relies entirely on monitoring validation loss to halt training at the right moment.
Validation data occupies a distinct role in the standard three-way data split: training, validation, and test. The test set is reserved for a single final evaluation after all modeling decisions are made, ensuring an unbiased estimate of deployed performance. Because the validation set influences model selection and hyperparameter tuning, it is technically "seen" by the development process, even if not by the model's gradient updates directly. This is why a separate test set remains essential for honest reporting.
In practice, when labeled data is scarce, k-fold cross-validation offers an alternative: the dataset is partitioned into k subsets, and each fold takes a turn as the validation set while the remaining folds serve as training data. This approach maximizes data utilization and produces more reliable performance estimates. Validation data is a foundational concept in machine learning workflows, underpinning model selection, hyperparameter search, and the prevention of overfitting across virtually every application domain.