A recursive AI training technique combining task decomposition and human oversight to safely scale capability.
Iterated amplification is an AI alignment technique designed to train increasingly capable systems while keeping their behavior aligned with human values. The core challenge it addresses is that as AI systems become more powerful, it becomes harder for humans to directly supervise and evaluate their outputs. Iterated amplification sidesteps this problem by breaking complex tasks into simpler sub-tasks that humans can reliably assess, then using those assessments to train a stronger model — which in turn becomes the baseline for the next round of amplification.
The process works iteratively: a human operator, assisted by a current version of the AI, decomposes a difficult problem into manageable pieces. Each piece is evaluated or solved using the existing model, and the combined result is used as a training signal for an improved version. Over successive rounds, the model's effective capability grows, but each individual training step remains grounded in human-verifiable judgments. This recursive bootstrapping allows the system to eventually handle tasks far beyond what a human could evaluate directly, while theoretically preserving alignment throughout.
Iterated amplification is closely related to debate and other scalable oversight proposals, all of which grapple with the same fundamental question: how do you supervise an AI that is smarter than you? The technique is often paired with distillation — where the amplified system's behavior is compressed back into a simpler model — forming a training loop that alternates between expanding capability and consolidating it. This pairing, sometimes called amplification-distillation, is central to how the approach scales in practice.
The significance of iterated amplification lies in its attempt to provide a principled, constructive path toward superintelligent AI that remains under meaningful human control. Rather than relying on post-hoc interpretability or hard-coded constraints, it embeds human judgment directly into the training process at every stage. While empirical validation at scale remains an open research challenge, iterated amplification has become a foundational concept in the AI safety literature and continues to influence how researchers think about aligning powerful future systems.