Skip to main content

Envisioning is an emerging technology research institute and advisory.

LinkedInInstagramGitHub

2011 — 2026

research
  • Observatory
  • Newsletter
  • Methodology
  • Origins
  • Vocab
services
  • Research Sessions
  • Signals Workspace
  • Bespoke Projects
  • Use Cases
  • Readinessfree
impact
  • ANBIMAFuture of Brazilian Capital Markets
  • IEEECharting the Energy Transition
  • Horizon 2045Future of Human and Planetary Security
  • WKOTechnology Scanning for Austria
audiences
  • Innovation
  • Strategy
  • Consultants
  • Foresight
  • Associations
  • Governments
resources
  • Pricing
  • Partners
  • How We Work
  • Data Visualization
  • Multi-Model Method
  • FAQ
  • Security & Privacy
about
  • Manifesto
  • Community
  • Events
  • Support
  • Contact
  • Login
ResearchServicesPricingPartnersAbout
ResearchServicesPricingPartnersAbout
  1. Home
  2. Vocab
  3. Computer Use

Computer Use

AI models directly interacting with graphical user interfaces by perceiving and controlling screens

Year: 2024Generality: 723
Back to Vocab

Computer Use refers to the capability of AI models to directly interact with computer interfaces by viewing screenshots and issuing commands to move cursors, click buttons, type text, and navigate windows—essentially controlling a computer as a human would through a graphical interface. Unlike tool-use APIs where a model calls predefined functions, computer use grants models perception of and control over arbitrary software, web applications, and operating system interfaces in real-time. The model sees a screenshot, reasons about what action is needed, executes a click or keystroke, observes the resulting screen state, and iterates.

Anthropologic released Computer Use as a capability with Claude 3.5 Sonnet in late 2024, making it the first major foundation model to ship this capability at scale. The technical challenge is substantial: the model must process high-resolution images (screenshots), reason about spatial layout and semantics ("where is the submit button?"), map that reasoning to coordinate-based actions, and maintain context across multiple screen states. Vision transformers provide the perception layer, while reinforcement learning and demonstrations help train the action selection. Importantly, computer use works alongside language—a model can read text on screen and use that context to navigate. It differs fundamentally from API-based tool use because it doesn't require pre-integration: if a human can use software, so can the model, given sufficient capability.

Computer use unlocks automation of knowledge work that was previously hard to automate: navigating complex web portals, managing spreadsheets with dynamic layouts, executing multi-step workflows across disparate systems, and testing software. It also raises significant safety concerns. A system that can control your computer without guardrails could exfiltrate data, make unauthorized transactions, or introduce malware. Therefore, computer use applications typically run in sandboxed or isolated environments and require explicit user authorization for each interaction. The potential impact is enormous—routine office work, debugging, research, and many forms of personal assistance could be partially or fully automated.

Related

Related

GCC (General Computer Control)
GCC (General Computer Control)

An AI system's ability to autonomously operate diverse software without task-specific programming.

Generality: 337
LAM (Large Action Model)
LAM (Large Action Model)

AI systems that interpret human intent and execute actions directly within digital applications.

Generality: 337
A2UI (Agent-to-User Interface)
A2UI (Agent-to-User Interface)

The interaction layer connecting autonomous AI agents directly to human users.

Generality: 294
ACI (Agent-Computer Interface)
ACI (Agent-Computer Interface)

The interface layer enabling autonomous AI agents to interact with computer systems.

Generality: 323
Cognitive Computing
Cognitive Computing

AI systems that simulate human reasoning, learning, and natural language understanding.

Generality: 694
Compute
Compute

The processing power and hardware resources required to train and run AI models.

Generality: 875