A mechanism that checks whether a system's outputs meet correctness and quality criteria.
An output verifier is a component or process that evaluates the results produced by a model or system against some standard of correctness, quality, or expected behavior. In machine learning contexts, output verifiers have become especially prominent in reinforcement learning from human feedback (RLHF) and reasoning-focused systems, where they serve as automated judges that assess whether a model's response satisfies a given goal—such as solving a math problem correctly, producing valid code, or adhering to safety constraints. Rather than relying solely on human evaluation, output verifiers allow correctness signals to be generated at scale and used to guide training or filter inference-time outputs.
Output verifiers generally fall into two categories: rule-based and learned. Rule-based verifiers apply deterministic checks—for example, executing generated code and comparing outputs, or running a symbolic math solver to confirm a proof step. Learned verifiers, sometimes called reward models or process reward models (PRMs), are themselves neural networks trained to predict whether an output is correct or high-quality. Process reward models go further by assigning correctness scores to intermediate reasoning steps, not just final answers, enabling finer-grained supervision of chain-of-thought reasoning.
The practical importance of output verifiers has grown substantially with the rise of large language models (LLMs) applied to tasks with verifiable ground truth, such as mathematics, coding, and logical reasoning. In these settings, verifiers enable techniques like best-of-N sampling—generating multiple candidate outputs and selecting the one the verifier scores highest—as well as more sophisticated search procedures like Monte Carlo Tree Search over reasoning traces. This verifier-guided inference can dramatically improve model performance without any additional training of the base model.
Output verifiers are also central to scalable oversight research, which aims to maintain human control over AI systems whose outputs humans cannot easily evaluate directly. By training verifiers on problems where correctness is checkable and then applying them to harder problems, researchers hope to extend reliable quality assessment beyond the limits of direct human judgment. As AI systems tackle increasingly complex tasks, robust output verification is widely seen as a prerequisite for safe and reliable deployment.