A formal framework committing AI labs to safety evaluations before scaling models further.
A Responsible Scaling Policy (RSP) is a formal organizational commitment by an AI developer to evaluate the potential dangers of increasingly capable AI systems before proceeding with further scaling. Rather than treating safety as an afterthought, an RSP establishes concrete thresholds — often called "safety levels" or "capability thresholds" — at which a model must be assessed for dangerous capabilities before additional compute or data is invested. Anthropic introduced the first widely publicized RSP in 2023, and the framework has since influenced how frontier AI labs think about the relationship between model capability and deployment readiness.
In practice, an RSP defines a tiered system of risk levels tied to specific model capabilities. As a model approaches a threshold — such as the ability to meaningfully assist in creating biological weapons, or to autonomously conduct long-horizon cyberattacks — the policy mandates a structured evaluation process. If the model crosses that threshold without adequate safety mitigations in place, the organization commits to halting further scaling until those mitigations are developed and validated. This creates a feedback loop between capability research and safety research, ensuring the two advance in tandem rather than independently.
The policy framework matters because the history of AI development has often seen safety and alignment work lag behind raw capability improvements. RSPs attempt to institutionalize a precautionary principle: the burden of proof shifts from "prove it's dangerous" to "prove it's safe enough to scale." This is a meaningful philosophical shift, particularly for organizations operating at the frontier where the consequences of miscalibration are potentially severe and irreversible.
Critics note that RSPs are self-imposed and largely unverifiable by outside parties, raising questions about accountability and enforcement. Nonetheless, they represent an important step toward operationalizing AI safety commitments in a concrete, auditable form. As governments and standards bodies develop external oversight mechanisms, RSPs may serve as a template for more formal regulatory requirements around frontier model development.