An AI system that builds broad internal representations of the world to predict and act.
A general world model is an AI system capable of constructing rich, flexible internal representations of its environment that generalize across a wide range of scenarios and tasks. Rather than being optimized for a single domain, these models learn underlying structure from diverse data — spanning vision, language, physics, and interaction — and use that structure to simulate plausible futures, reason about consequences, and guide decision-making. The ambition is to replicate something analogous to how humans maintain a mental model of the world: a dynamic, updatable representation that supports prediction, planning, and imagination without requiring task-specific retraining.
Technically, general world models typically combine representation learning with predictive modeling. A system learns to encode observations into a compact latent space, then trains a dynamics model to predict how that latent state evolves given actions or context. This architecture allows the model to "imagine" trajectories — rolling out hypothetical futures internally — which is especially valuable in reinforcement learning, where real-world interaction is expensive or dangerous. Approaches like Dreamer and its successors demonstrated that agents trained almost entirely in imagination, using a learned world model, could match or exceed performance of agents trained through direct environment interaction.
The push toward general world models, as opposed to task-specific ones, accelerated with the rise of large-scale foundation models. Researchers observed that models trained on vast multimodal datasets appeared to internalize surprisingly broad knowledge about physical dynamics, social behavior, and causal structure. This sparked interest in whether such models could serve as general-purpose simulators of reality — a substrate for planning and reasoning across domains. Video generation models like Sora reignited this debate, with some arguing that high-fidelity video prediction implies latent world knowledge, and others cautioning against conflating perceptual fidelity with genuine causal understanding.
General world models matter because they represent a potential path toward more sample-efficient, adaptable, and generalizable AI. An agent with an accurate world model can plan without trial and error, transfer knowledge across domains, and reason about novel situations. However, significant challenges remain: ensuring the internal model stays calibrated with reality, handling uncertainty, and scaling to the full complexity of open-ended environments. The concept sits at the intersection of reinforcement learning, unsupervised learning, and cognitive science, and is increasingly central to debates about the architecture of future intelligent systems.