Skip to main content

Envisioning is an emerging technology research institute and advisory.

LinkedInInstagramGitHub

2011 — 2026

research
  • Reports
  • Newsletter
  • Methodology
  • Origins
  • Vocab
services
  • Research Sessions
  • Signals Workspace
  • Bespoke Projects
  • Use Cases
  • Signal Scanfree
  • Readinessfree
impact
  • ANBIMAFuture of Brazilian Capital Markets
  • IEEECharting the Energy Transition
  • Horizon 2045Future of Human and Planetary Security
  • WKOTechnology Scanning for Austria
audiences
  • Innovation
  • Strategy
  • Consultants
  • Foresight
  • Associations
  • Governments
resources
  • Pricing
  • Partners
  • How We Work
  • Data Visualization
  • Multi-Model Method
  • FAQ
  • Security & Privacy
about
  • Manifesto
  • Community
  • Events
  • Support
  • Contact
  • Login
ResearchServicesPricingPartnersAbout
ResearchServicesPricingPartnersAbout
  1. Home
  2. Research
  3. Liminal
  4. Spatial Foundation Models

Spatial Foundation Models

AI models trained on 3D environments to understand spatial relationships and physical interactions
Back to LiminalView interactive version

Spatial Foundation Models represent a significant evolution in artificial intelligence, extending the capabilities of large language models (LLMs) beyond text and 2D imagery into the realm of three-dimensional understanding. Unlike traditional language models that process information in a flat, sequential manner, these systems are trained on multimodal datasets that include 3D spatial relationships, physical dynamics, depth information, and embodied interaction data captured from real-world environments. The technical architecture typically combines transformer-based models with specialized encoders that can process point clouds, mesh data, depth maps, and spatial scene graphs. By learning from vast datasets of 3D scans, robotic manipulation sequences, and spatial navigation patterns, these models develop an intrinsic understanding of how objects relate to one another in physical space, how forces and physics govern movement, and how humans naturally interact with their surroundings. This spatial reasoning capability enables the models to predict occluded geometry, understand affordances (what actions objects enable), and generate coherent instructions for tasks that require physical manipulation or navigation.

The emergence of Spatial Foundation Models addresses a critical gap in the development of extended reality (XR) systems, autonomous robotics, and spatial computing applications. Traditional AI systems struggle with tasks that require understanding the physical world's three-dimensional nature—challenges like determining whether a virtual object will fit in a real space, predicting how items should be arranged for optimal accessibility, or generating natural movement paths through cluttered environments. These limitations have constrained the development of truly intelligent AR/VR experiences and embodied AI agents. By incorporating spatial reasoning at the foundational level, these models enable new capabilities such as automatic scene understanding for mixed reality applications, intelligent content placement that respects real-world constraints, and natural language interfaces that can translate verbal instructions into precise spatial actions. This technology also supports more intuitive human-robot collaboration, where systems can understand commands like "place this on the shelf" without requiring explicit coordinate programming.

Early research implementations have demonstrated promising results in applications ranging from AR content generation to robotic task planning. Pilot programs in industrial settings suggest these models can significantly reduce the time required to configure spatial computing applications, automatically adapting virtual interfaces to physical workspace layouts. In the consumer space, spatial foundation models are beginning to enable more natural interactions with smart home devices and AR navigation systems that understand contextual placement rather than just GPS coordinates. The technology aligns with broader industry trends toward embodied AI and the convergence of physical and digital experiences. As spatial computing platforms mature and 3D sensing becomes ubiquitous through devices like smartphones and AR glasses, the training data available for these models will expand dramatically. This virtuous cycle suggests that spatial foundation models will become increasingly sophisticated, eventually enabling AI systems that can reason about and interact with the physical world with human-like spatial intelligence, fundamentally transforming how we design and experience immersive technologies.

TRL
3/9Conceptual
Impact
5/5
Investment
5/5
Category
Software

Related Organizations

Google DeepMind logo
Google DeepMind

United Kingdom · Research Lab

95%

Developers of the Gemini family of models, which are trained from the start to be multimodal across text, images, video, and audio.

Developer
OpenAI logo

OpenAI

United States · Company

95%

Creator of GPT-4o, a natively multimodal model capable of reasoning across audio, vision, and text in real-time.

Developer
Allen Institute for AI (AI2) logo
Allen Institute for AI (AI2)

United States · Nonprofit

90%

Creator of Semantic Scholar and various open-source models for scientific text processing.

Researcher
CSM.ai logo

CSM.ai

United States · Startup

90%

Common Sense Machines builds AI that translates 2D images into 3D assets.

Developer
Luma AI logo
Luma AI

United States · Startup

90%

Creators of Dream Machine, a high-quality video generation model, and 3D capture technology.

Developer
NVIDIA logo
NVIDIA

United States · Company

90%

Developing foundation models for robotics (Project GR00T) and vision-language models like VILA.

Developer
Stanford University (HAI) logo
Stanford University (HAI)

United States · University

90%

Human-Centered AI Institute conducting research on BEHAVIOR benchmark.

Researcher
Runway logo
Runway

United States · Startup

85%

Applied AI research company shaping the next era of art, entertainment and human creativity.

Developer
Stability AI logo
Stability AI

United Kingdom · Company

85%

Open source generative AI company, creators of Stable Audio.

Developer

Supporting Evidence

Evidence data is not available for this technology yet.

Connections

Software
Software
Embodied AI Agents

AI systems that perceive and navigate 3D spaces like physical or virtual worlds

TRL
3/9
Impact
4/5
Investment
4/5
Software
Software
Semantic Scene Understanding

Real-time spatial comprehension of rooms, objects, and their functional relationships

TRL
6/9
Impact
5/5
Investment
5/5
Software
Software
Spatial Operating Systems

Operating systems that organize apps and data in 3D space instead of flat screens

TRL
6/9
Impact
5/5
Investment
5/5
Software
Software
World Graph Indexing

Maps physical spaces as networks of connected anchors, objects, and spatial relationships

TRL
5/9
Impact
5/5
Investment
4/5
Applications
Applications
Spatial Design Collaboration

Real-time co-creation of 3D environments using mixed reality workspaces

TRL
6/9
Impact
5/5
Investment
4/5
Applications
Applications
Assistive Spatial Navigation

XR systems that guide blind, low-vision, and mobility-impaired users through physical spaces

TRL
6/9
Impact
5/5
Investment
3/5

Book a research session

Bring this signal into a focused decision sprint with analyst-led framing and synthesis.
Research Sessions