Skip to main content

Envisioning is an emerging technology research institute and advisory.

LinkedInInstagramGitHub

2011 — 2026

research
  • Reports
  • Newsletter
  • Methodology
  • Origins
  • My Collection
services
  • Research Sessions
  • Signals Workspace
  • Bespoke Projects
  • Use Cases
  • Signal Scanfree
  • Readinessfree
impact
  • ANBIMAFuture of Brazilian Capital Markets
  • IEEECharting the Energy Transition
  • Horizon 2045Future of Human and Planetary Security
  • WKOTechnology Scanning for Austria
audiences
  • Innovation
  • Strategy
  • Consultants
  • Foresight
  • Associations
  • Governments
resources
  • Pricing
  • Partners
  • How We Work
  • Data Visualization
  • Multi-Model Method
  • FAQ
  • Security & Privacy
about
  • Manifesto
  • Community
  • Events
  • Support
  • Contact
  • Login
ResearchServicesPricingPartnersAbout
ResearchServicesPricingPartnersAbout
  1. Home
  2. Research
  3. Interface
  4. Robot Foundation Models (RFM)

Robot Foundation Models (RFM)

AI models that let robots learn general skills and transfer knowledge across different robot types
Back to InterfaceView interactive version

Robot foundation models are large-scale AI models trained on diverse robotic data that enable robots to understand and perform a wide variety of tasks without task-specific training. These models learn generalizable representations of actions, objects, and environments that transfer across different robot platforms, tasks, and scenarios. Cross-embodiment capability means that knowledge learned on one type of robot (e.g., a robotic arm) can transfer to different robot forms (e.g., a humanoid or mobile robot), dramatically reducing the training required for each new application.

The foundation model approach mirrors the success of large language models in natural language processing, applying similar principles to robotics. By training on massive datasets of robotic demonstrations, sensor data, and task executions, these models develop a general understanding of manipulation, navigation, and interaction that can be adapted to specific tasks with minimal additional training. This enables robots to quickly learn new tasks, adapt to novel situations, and operate in diverse environments. The technology is fundamental to developing general-purpose robots that can perform many different tasks rather than being specialized for single applications, representing a major step toward truly versatile robotic systems.

Technology Readiness Level
4/9Formative
Impact
3/5Medium
Investment
3/5Medium
Category
Software

Related Organizations

Covariant logo
Covariant

United States · Startup

95%

AI robotics company building a universal AI brain for robots.

Developer
Google DeepMind logo
Google DeepMind

United Kingdom · Research Lab

95%

Developers of the Gemini family of models, which are trained from the start to be multimodal across text, images, video, and audio.

Developer
Physical Intelligence logo
Physical Intelligence

United States · Startup

95%

A startup building a general-purpose brain for robots, backed by OpenAI and Thrive Capital.

Developer
NVIDIA logo
NVIDIA

United States · Company

90%

Developing foundation models for robotics (Project GR00T) and vision-language models like VILA.

Developer
Skild AI logo
Skild AI

United States · Startup

90%

Building a shared general-purpose brain for diverse robot embodiments, leveraging massive training data.

Developer
1X Technologies logo
1X Technologies

Norway · Startup

85%

A Norwegian robotics company (backed by OpenAI) developing androids like EVE and NEO.

Developer
Figure AI logo
Figure AI

United States · Startup

85%

Developing general-purpose humanoid robots designed for commercial workforce deployment.

Deployer
Hugging Face logo
Hugging Face

United States · Company

85%

The global hub for open-source AI models and datasets. Founded by French entrepreneurs with a major office in Paris.

Developer
Sanctuary AI logo
Sanctuary AI

Canada · Startup

85%

Developing general-purpose humanoid robots (Phoenix) powered by Carbon, their AI control system.

Developer
Toyota Research Institute logo
Toyota Research Institute

United States · Research Lab

85%

R&D arm of Toyota Motor Corporation.

Researcher

Supporting Evidence

Paper

GR00T N1: An Open Foundation Model for Generalist Humanoid Robots

arXiv · Mar 12, 2025

Introduction of GR00T N1, a Vision-Language-Action (VLA) foundation model with a dual-system architecture designed for generalist humanoid robots, capable of interpreting environments and generating motor actions.

Support 95%Confidence 98%

Paper

HALO: A Unified Vision-Language-Action Model for Embodied Multimodal Chain-of-Thought Reasoning

arXiv · Feb 1, 2026

Proposal of HALO, a unified VLA model enabling embodied multimodal chain-of-thought reasoning, using a Mixture-of-Transformers architecture to decouple reasoning, foresight, and action prediction.

Support 92%Confidence 95%

Paper

OmniVLA: An Omni-Modal Vision-Language-Action Model for Robot Navigation

arXiv · Sep 1, 2025

Presentation of OmniVLA, a foundation model trained on over 9,500 hours of data across 10 platforms, capable of processing omni-modal goals (images, poses, language) for robust navigation.

Support 90%Confidence 95%

Paper

ZEST: Zero-shot Embodied Skill Transfer for Athletic Robot Control

arXiv · Feb 1, 2026

ZEST introduces a framework for zero-shot transfer of athletic skills to humanoid robots using reinforcement learning trained on diverse motion sources, deployed on Boston Dynamics' Atlas.

Support 85%Confidence 95%

Book a research session

Bring this signal into a focused decision sprint with analyst-led framing and synthesis.
Research Sessions