Skip to main content

Envisioning is an emerging technology research institute and advisory.

LinkedInInstagramGitHub

2011 — 2026

research
  • Reports
  • Newsletter
  • Methodology
  • Origins
  • Vocab
services
  • Research Sessions
  • Signals Workspace
  • Bespoke Projects
  • Use Cases
  • Signal Scanfree
  • Readinessfree
impact
  • ANBIMAFuture of Brazilian Capital Markets
  • IEEECharting the Energy Transition
  • Horizon 2045Future of Human and Planetary Security
  • WKOTechnology Scanning for Austria
audiences
  • Innovation
  • Strategy
  • Consultants
  • Foresight
  • Associations
  • Governments
resources
  • Pricing
  • Partners
  • How We Work
  • Data Visualization
  • Multi-Model Method
  • FAQ
  • Security & Privacy
about
  • Manifesto
  • Community
  • Events
  • Support
  • Contact
  • Login
ResearchServicesPricingPartnersAbout
ResearchServicesPricingPartnersAbout
  1. Home
  2. Vocab
  3. RL (Reinforcement Learning)

RL (Reinforcement Learning)

A learning paradigm where an agent maximizes cumulative rewards through environmental interaction.

Year: 1980Generality: 908
Back to Vocab

Reinforcement learning (RL) is a branch of machine learning in which an autonomous agent learns to make decisions by interacting with an environment and receiving feedback in the form of rewards or penalties. Unlike supervised learning, which relies on labeled input-output pairs, RL agents are not told what the correct action is — they must discover effective strategies through trial and error. The agent observes the current state of the environment, selects an action according to a policy, receives a reward signal, and transitions to a new state. The goal is to learn a policy that maximizes cumulative reward over time, a quantity often formalized as the expected discounted return.

The mathematical backbone of RL is the Markov Decision Process (MDP), which provides a principled framework for modeling sequential decision-making under uncertainty. Core algorithms fall into several families: value-based methods (such as Q-learning) estimate the long-term value of state-action pairs; policy gradient methods directly optimize the policy parameters; and actor-critic architectures combine both approaches. The advent of deep reinforcement learning — pairing deep neural networks with RL algorithms — dramatically expanded the complexity of problems RL could tackle, enabling agents to learn directly from high-dimensional inputs like raw pixels.

RL has produced some of the most striking demonstrations of machine intelligence. DeepMind's DQN mastered Atari games from raw screen input in 2013, AlphaGo defeated world champion Go players in 2016, and OpenAI Five competed at a professional level in Dota 2. More recently, RL has become central to aligning large language models with human preferences through techniques like Reinforcement Learning from Human Feedback (RLHF). These successes have cemented RL as a critical tool wherever sequential decision-making, planning, or adaptive control is required.

Despite its power, RL remains challenging in practice. Agents often require enormous amounts of experience to learn effective policies, reward functions can be difficult to specify correctly, and learned behaviors may not generalize well beyond training conditions. Active research areas include sample efficiency, safe exploration, multi-agent RL, and offline RL — where agents learn from fixed datasets rather than live interaction — all aimed at making RL more practical for real-world deployment.

Related

Related

DRL (Deep Reinforcement Learning)
DRL (Deep Reinforcement Learning)

Neural networks combined with reinforcement learning to master complex sequential decision-making tasks.

Generality: 796
Policy Learning
Policy Learning

Reinforcement learning approach that directly optimizes a policy to maximize cumulative reward.

Generality: 794
Q-Learning
Q-Learning

A model-free reinforcement learning algorithm that learns optimal action values through experience.

Generality: 792
IRL (Inverse Reinforcement Learning)
IRL (Inverse Reinforcement Learning)

Inferring an agent's reward function by observing its behavior.

Generality: 652
RLAIF (Reinforcement Learning with AI Feedback)
RLAIF (Reinforcement Learning with AI Feedback)

Training AI agents using feedback generated by other AI models instead of humans.

Generality: 487
RLHF (Reinforcement Learning from Human Feedback)
RLHF (Reinforcement Learning from Human Feedback)

Training AI systems using human preference signals as a reward mechanism.

Generality: 756