Skip to main content

Envisioning is an emerging technology research institute and advisory.

LinkedInInstagramGitHub

2011 — 2026

research
  • Reports
  • Newsletter
  • Methodology
  • Origins
  • Vocab
services
  • Research Sessions
  • Signals Workspace
  • Bespoke Projects
  • Use Cases
  • Signal Scanfree
  • Readinessfree
impact
  • ANBIMAFuture of Brazilian Capital Markets
  • IEEECharting the Energy Transition
  • Horizon 2045Future of Human and Planetary Security
  • WKOTechnology Scanning for Austria
audiences
  • Innovation
  • Strategy
  • Consultants
  • Foresight
  • Associations
  • Governments
resources
  • Pricing
  • Partners
  • How We Work
  • Data Visualization
  • Multi-Model Method
  • FAQ
  • Security & Privacy
about
  • Manifesto
  • Community
  • Events
  • Support
  • Contact
  • Login
ResearchServicesPricingPartnersAbout
ResearchServicesPricingPartnersAbout
  1. Home
  2. Vocab
  3. Step Size

Step Size

A hyperparameter controlling how large each parameter update is during optimization.

Year: 1986Generality: 720
Back to Vocab

Step size is a fundamental hyperparameter in iterative optimization algorithms that determines the magnitude of each update applied to a model's parameters during training. In gradient descent and its variants, the step size—commonly called the learning rate—scales the gradient before it is subtracted from the current parameter values. At each iteration, the algorithm computes the gradient of the loss function with respect to the parameters and moves in the direction that reduces the loss, with the step size governing how far to move. This single scalar (or per-parameter vector in adaptive methods) has an outsized influence on whether training converges at all.

Choosing an appropriate step size involves a fundamental trade-off. A value that is too large causes the optimizer to overshoot minima, producing oscillations or outright divergence; a value that is too small leads to painfully slow convergence and increases the risk of becoming trapped in suboptimal local minima or saddle points. In practice, step size is rarely held constant: techniques such as learning rate schedules (step decay, cosine annealing) and adaptive optimizers like AdaGrad, RMSProp, and Adam automatically adjust effective step sizes per parameter based on historical gradient information, substantially reducing the burden of manual tuning.

Step size became a central concern in machine learning as deep neural networks grew deeper and datasets larger through the 2000s and 2010s, because the sensitivity of training dynamics to this parameter scales with model complexity. Modern best practices include warm-up phases that start with a small step size before increasing it, cyclical learning rate schedules that periodically vary the rate to escape local minima, and learning rate finders that sweep values to identify a good operating range. Understanding step size is essential for anyone training neural networks, as it sits at the intersection of optimization theory and practical model performance.

Related

Related

Step
Step

A single parameter update iteration within a model training optimization algorithm.

Generality: 720
Gradient Descent
Gradient Descent

An iterative optimization algorithm that minimizes a function by following its steepest downhill direction.

Generality: 909
Batch Size
Batch Size

The number of training examples processed together before updating model parameters.

Generality: 796
Parameter Size
Parameter Size

The total count of learnable weights and biases in a machine learning model.

Generality: 694
Timestep
Timestep

A discrete time increment marking each update cycle in sequential AI models.

Generality: 720
Stride Length
Stride Length

The step size by which a convolutional filter moves across an input during convolution.

Generality: 550