Skip to main content

Envisioning is an emerging technology research institute and advisory.

LinkedInInstagramGitHub

2011 — 2026

research
  • Reports
  • Newsletter
  • Methodology
  • Origins
  • Vocab
services
  • Research Sessions
  • Signals Workspace
  • Bespoke Projects
  • Use Cases
  • Signal Scanfree
  • Readinessfree
impact
  • ANBIMAFuture of Brazilian Capital Markets
  • IEEECharting the Energy Transition
  • Horizon 2045Future of Human and Planetary Security
  • WKOTechnology Scanning for Austria
audiences
  • Innovation
  • Strategy
  • Consultants
  • Foresight
  • Associations
  • Governments
resources
  • Pricing
  • Partners
  • How We Work
  • Data Visualization
  • Multi-Model Method
  • FAQ
  • Security & Privacy
about
  • Manifesto
  • Community
  • Events
  • Support
  • Contact
  • Login
ResearchServicesPricingPartnersAbout
ResearchServicesPricingPartnersAbout
  1. Home
  2. Vocab
  3. Q-Learning

Q-Learning

A model-free reinforcement learning algorithm that learns optimal action values through experience.

Year: 1989Generality: 792
Back to Vocab

Q-learning is a model-free reinforcement learning algorithm that enables an agent to learn how to act optimally in an environment by estimating the value of taking specific actions in specific states. At its core, Q-learning maintains a Q-function — often represented as a table or, in modern implementations, a neural network — that maps state-action pairs to expected cumulative rewards. The agent iteratively updates these Q-values using the Bellman equation: after taking an action and observing the resulting reward and next state, it adjusts its estimate to bring it closer to the true long-term value. Over many interactions, this process converges toward an optimal policy without ever requiring a model of the environment's transition dynamics.

A defining characteristic of Q-learning is that it is an off-policy algorithm, meaning the agent can learn about the optimal policy while following a different, often more exploratory, behavior policy. This is typically managed through an epsilon-greedy strategy, where the agent occasionally takes random actions to explore the environment rather than always exploiting its current best estimates. This separation between the learning target and the behavior policy makes Q-learning particularly flexible and sample-efficient in many settings, allowing it to incorporate experience from diverse sources, including replay buffers of past interactions.

Q-learning gained renewed prominence with the introduction of Deep Q-Networks (DQN) by DeepMind in 2015, which replaced the traditional lookup table with a deep convolutional neural network to approximate the Q-function. This allowed the algorithm to scale to high-dimensional state spaces — most famously, learning to play Atari video games directly from raw pixel inputs at superhuman levels. Techniques such as experience replay and target networks were introduced to stabilize training, addressing the instability that arises when neural networks are used as function approximators in this setting.

Today, Q-learning and its deep variants remain foundational to reinforcement learning research and practice. The algorithm underpins a wide range of applications, from robotic control and autonomous navigation to game playing and resource scheduling. Its simplicity, theoretical guarantees of convergence under tabular conditions, and adaptability to complex function approximators make it one of the most studied and widely applied algorithms in the field.

Related

Related

Q-Value
Q-Value

Expected cumulative reward for taking an action in a given state under a policy.

Generality: 756
DQN (Deep Q-Networks)
DQN (Deep Q-Networks)

Reinforcement learning method combining Q-learning with deep neural networks for complex environments.

Generality: 694
RL (Reinforcement Learning)
RL (Reinforcement Learning)

A learning paradigm where an agent maximizes cumulative rewards through environmental interaction.

Generality: 908
DRL (Deep Reinforcement Learning)
DRL (Deep Reinforcement Learning)

Neural networks combined with reinforcement learning to master complex sequential decision-making tasks.

Generality: 796
Value Function
Value Function

A function estimating expected cumulative reward from a given state or action.

Generality: 842
Policy Learning
Policy Learning

Reinforcement learning approach that directly optimizes a policy to maximize cumulative reward.

Generality: 794