Reinforcement learning method combining Q-learning with deep neural networks for complex environments.
Deep Q-Networks (DQN) are a class of reinforcement learning algorithms that use deep neural networks to approximate the Q-value function — a measure of how valuable it is to take a particular action from a given state. Classical Q-learning maintains a lookup table mapping every state-action pair to an expected cumulative reward, but this approach becomes computationally intractable in environments with high-dimensional inputs like raw pixel images. DQN sidesteps this limitation by training a convolutional neural network to generalize across similar states, effectively compressing the Q-table into a learned function that scales to complex, real-world-like problems.
Two key innovations made DQN training stable enough to work in practice. The first is experience replay, in which the agent stores past transitions in a memory buffer and samples random mini-batches during training, breaking the temporal correlations that would otherwise destabilize gradient updates. The second is a target network — a periodically frozen copy of the main network used to compute training targets, preventing the feedback loop that arises when both the predictions and the targets shift simultaneously. Together, these techniques transformed a theoretically appealing but practically fragile idea into a reliable learning algorithm.
DQN gained widespread attention after DeepMind demonstrated it learning to play dozens of Atari 2600 games directly from pixel inputs, achieving superhuman performance on several titles without any game-specific engineering. This result was striking because a single architecture and set of hyperparameters was applied uniformly across games with very different dynamics, suggesting that DQN had learned genuinely general strategies rather than narrow heuristics.
The impact of DQN extends well beyond Atari. It established a template for deep reinforcement learning research and spawned a family of improvements — including Double DQN, Dueling DQN, Prioritized Experience Replay, and Rainbow — each addressing specific weaknesses in the original formulation. DQN remains a foundational reference point in RL, taught in virtually every modern course on the subject and serving as a baseline against which newer algorithms are routinely benchmarked.