Skip to main content

Envisioning is an emerging technology research institute and advisory.

LinkedInInstagramGitHub

2011 — 2026

research
  • Reports
  • Newsletter
  • Methodology
  • Origins
  • My Collection
services
  • Research Sessions
  • Signals Workspace
  • Bespoke Projects
  • Use Cases
  • Signal Scanfree
  • Readinessfree
impact
  • ANBIMAFuture of Brazilian Capital Markets
  • IEEECharting the Energy Transition
  • Horizon 2045Future of Human and Planetary Security
  • WKOTechnology Scanning for Austria
audiences
  • Innovation
  • Strategy
  • Consultants
  • Foresight
  • Associations
  • Governments
resources
  • Pricing
  • Partners
  • How We Work
  • Data Visualization
  • Multi-Model Method
  • FAQ
  • Security & Privacy
about
  • Manifesto
  • Community
  • Events
  • Support
  • Contact
  • Login
ResearchServicesPricingPartnersAbout
ResearchServicesPricingPartnersAbout
  1. Home
  2. Research
  3. Forge
  4. Foundation Models for Robotics

Foundation Models for Robotics

Vision-language-action models from Google (RT-2), NVIDIA (GR00T), and startups like Physical Intelligence are enabling robots to understand natural language instructions and generalize manipulation skills across novel objects and environments.
Back to ForgeView interactive version

Foundation models for robotics bridge the gap between AI language understanding and physical manipulation. Google's RT-2 demonstrated that large language models fine-tuned on robot data can transfer commonsense reasoning to physical tasks. NVIDIA's Project GR00T provides a foundation model specifically for humanoid robots. Physical Intelligence (Pi) raised over $400 million to build a 'foundation model for robots' that can learn manipulation skills from demonstration.

The fundamental challenge in robotics has always been generalization: robots excel at repetitive tasks in controlled environments but fail when objects, lighting, or layouts change. Foundation models address this by providing broad world knowledge — a robot that understands language descriptions of objects and their properties can handle novel items it has never encountered before.

This represents a potential 'ChatGPT moment' for robotics: just as language models suddenly made AI useful for general text tasks, robotic foundation models could make robots useful for general physical tasks. The US leads in this research through its AI companies (Google, NVIDIA, OpenAI) and robotics startups, creating a potential advantage in deploying intelligent robots at scale.

TRL
5/9Validated
Impact
4/5
Investment
5/5
Category
Hardware

Book a research session

Bring this signal into a focused decision sprint with analyst-led framing and synthesis.
Research Sessions