Optimizing model configuration settings that are set before training begins.
Hyperparameter tuning is the process of finding the optimal configuration settings for a machine learning model — settings that govern how the model learns rather than what it learns. Unlike model parameters such as weights and biases, which are updated automatically during training via gradient descent or similar algorithms, hyperparameters must be specified before training begins. Common examples include the learning rate, number of hidden layers, batch size, dropout rate, and regularization strength. Because these choices can dramatically affect a model's ability to generalize, selecting them carefully is essential to building high-performing systems.
The core challenge of hyperparameter tuning is that the search space is often vast, continuous, and non-convex, making exhaustive evaluation computationally expensive. The simplest approach, grid search, evaluates every combination of a predefined set of values, but scales poorly with the number of hyperparameters. Random search, shown by Bergstra and Bengio (2012) to be surprisingly effective, samples configurations randomly and often finds good solutions faster. More sophisticated methods include Bayesian optimization, which builds a probabilistic surrogate model of the objective function to intelligently select the next configuration to evaluate, and evolutionary strategies, which evolve populations of configurations over generations.
Modern machine learning has pushed hyperparameter tuning further with automated machine learning (AutoML) frameworks such as Hyperopt, Optuna, and Google Vizier, which integrate search algorithms, parallel evaluation, and early stopping to dramatically reduce the human effort involved. Techniques like successive halving and Hyperband allocate computational resources adaptively, terminating poor-performing configurations early and concentrating effort on promising ones. Neural architecture search (NAS) extends these ideas to the structure of the model itself, treating architectural choices as hyperparameters to be optimized.
Hyperparameter tuning matters because even a well-designed model can underperform significantly with poor configuration choices, while a simpler model with well-tuned hyperparameters can outperform more complex alternatives. As models grow larger and training runs more expensive, efficient tuning strategies have become a critical component of the machine learning development pipeline, directly influencing both model quality and the cost of experimentation.