Skip to main content

Envisioning is an emerging technology research institute and advisory.

LinkedInInstagramGitHub

2011 — 2026

research
  • Reports
  • Newsletter
  • Methodology
  • Origins
  • Vocab
services
  • Research Sessions
  • Signals Workspace
  • Bespoke Projects
  • Use Cases
  • Signal Scanfree
  • Readinessfree
impact
  • ANBIMAFuture of Brazilian Capital Markets
  • IEEECharting the Energy Transition
  • Horizon 2045Future of Human and Planetary Security
  • WKOTechnology Scanning for Austria
audiences
  • Innovation
  • Strategy
  • Consultants
  • Foresight
  • Associations
  • Governments
resources
  • Pricing
  • Partners
  • How We Work
  • Data Visualization
  • Multi-Model Method
  • FAQ
  • Security & Privacy
about
  • Manifesto
  • Community
  • Events
  • Support
  • Contact
  • Login
ResearchServicesPricingPartnersAbout
ResearchServicesPricingPartnersAbout
  1. Home
  2. Vocab
  3. Gradient Descent

Gradient Descent

An iterative optimization algorithm that minimizes a function by following its steepest downhill direction.

Year: 1986Generality: 909
Back to Vocab

Gradient descent is the workhorse optimization algorithm of modern machine learning, used to tune the parameters of models—from simple linear regression to deep neural networks with billions of weights. The core idea is straightforward: given a loss function that measures how poorly a model performs, compute the gradient of that loss with respect to each parameter. The gradient points in the direction of steepest increase, so moving in the opposite direction—downhill—reduces the loss. This update is scaled by a learning rate, a hyperparameter that controls step size. Repeat this process across many iterations, and the parameters converge toward values that minimize the loss.

In practice, three main variants dominate. Batch gradient descent computes the gradient over the entire dataset before each update, which is stable but computationally expensive. Stochastic gradient descent (SGD) updates parameters after each individual training example, introducing noise that can actually help escape shallow local minima. Mini-batch gradient descent strikes a balance, computing gradients over small random subsets of data—this is the standard approach in deep learning, combining computational efficiency with enough stochasticity to navigate complex loss landscapes. Modern optimizers like Adam, RMSProp, and AdaGrad build on SGD by adapting the learning rate per parameter, dramatically improving convergence in practice.

Gradient descent became central to machine learning through its pairing with backpropagation, the algorithm that efficiently computes gradients through layered neural networks using the chain rule of calculus. Without an efficient way to compute gradients, training deep networks would be computationally intractable. Together, backpropagation and gradient descent form the foundation of nearly all neural network training pipelines used today.

Despite its ubiquity, gradient descent has well-known limitations. It can get trapped in local minima or saddle points, and its performance is highly sensitive to the choice of learning rate. In high-dimensional non-convex loss landscapes—typical of deep learning—convergence is not guaranteed to a global optimum. Nevertheless, empirical results consistently show that gradient-based optimization finds solutions that generalize remarkably well, making it indispensable to the field.

Related

Related

Loss Optimization
Loss Optimization

Iteratively adjusting model parameters to minimize prediction error measured by a loss function.

Generality: 875
Optimization Problem
Optimization Problem

Finding the best solution from all feasible options given an objective and constraints.

Generality: 962
Step Size
Step Size

A hyperparameter controlling how large each parameter update is during optimization.

Generality: 720
Symbolic Descent
Symbolic Descent

An optimization method that searches over symbolic programs instead of tuning neural network weights

Generality: 264
Convergence
Convergence

The point at which a learning algorithm's parameters stabilize and stop improving meaningfully.

Generality: 874
Autograd
Autograd

An automatic differentiation engine that computes gradients for training machine learning models.

Generality: 752