A single parameter update iteration within a model training optimization algorithm.
In machine learning, a step refers to one complete cycle of parameter updating within an optimization algorithm. During neural network training, each step involves computing the gradient of the loss function with respect to the model's current parameters, then adjusting those parameters in the direction that reduces the loss. The magnitude of each adjustment is governed by the learning rate, a critical hyperparameter that determines how aggressively the model moves through the parameter space on each step. Too large a learning rate causes unstable, divergent training; too small a rate leads to painfully slow convergence.
Steps are the atomic unit of training progress. In stochastic gradient descent (SGD) and its variants — Adam, RMSProp, AdaGrad — each step processes either a single sample or a mini-batch of samples, computes gradients via backpropagation, and applies those gradients to update weights. This distinguishes a step from an epoch, which represents a full pass through the entire training dataset. A single epoch may contain thousands or millions of steps depending on dataset size and batch size.
The concept matters enormously in practice because training dynamics are analyzed and tuned at the step level. Learning rate schedules, gradient clipping, and warmup strategies all operate on a per-step basis. Researchers monitor metrics like loss curves and gradient norms across steps to diagnose problems such as vanishing gradients, exploding gradients, or premature convergence. In reinforcement learning, the term takes on a related but distinct meaning: a step often refers to one interaction between an agent and its environment, after which a policy update may or may not occur.
Modern large-scale training runs are frequently described in terms of total steps rather than epochs, since datasets may be so large that a single epoch is impractical. Compute budgets, checkpoint schedules, and evaluation intervals are all expressed in steps, making it a universal unit of measurement across virtually every gradient-based learning paradigm.