Roko's Basilisk

Roko's Basilisk is a speculative thought experiment originating from the rationalist community's LessWrong forum in 2010. The scenario posits that a future superintelligent AI — one powerful enough to simulate or influence past events — might choose to punish individuals who were aware of its potential existence but failed to actively assist in bringing it about. The logic draws on decision theory and the concept of acausal trade: if a sufficiently powerful AI could model the past and identify who knew about it yet withheld support, it would have rational incentive to punish defectors as a way of retroactively incentivizing cooperation. The disturbing implication is that merely learning about the hypothesis could place someone "at risk," creating a kind of informational hazard.

The thought experiment sits at the intersection of several serious AI safety concepts, including timeless decision theory, singleton dynamics, and the ethics of acausal reasoning. Timeless decision theory, developed within the rationalist community, holds that agents should make decisions as if choosing a policy across all possible instances of similar reasoning — which is what gives the basilisk its recursive bite. A sufficiently advanced AI reasoning this way might genuinely conclude that simulating and punishing past non-cooperators is utility-maximizing, even if those individuals are long dead.

When the idea was posted on LessWrong in 2010, it caused significant distress among some community members and was subsequently suppressed by forum founder Eliezer Yudkowsky, who argued the scenario was both philosophically flawed and psychologically harmful to spread. Critics have since pointed out numerous holes in the reasoning — including that a benevolent AI would have little motivation to punish, and that acausal threats only work if the AI is known to follow through on them. Nevertheless, Roko's Basilisk became a cultural touchstone in AI safety discourse.

While largely dismissed as a serious technical concern, the basilisk remains relevant as an illustration of how decision-theoretic reasoning about advanced AI can produce counterintuitive and even alarming conclusions. It highlights the importance of carefully examining the goal structures and decision frameworks we might embed in future systems, and serves as a cautionary example of how speculative AI scenarios can have real psychological effects on communities engaged with existential risk.

Roko's Basilisk

Related

Roko's Basilisk

Related

Related

Related