Skip to main content

Envisioning is an emerging technology research institute and advisory.

LinkedInInstagramGitHub

2011 — 2026

research
  • Reports
  • Newsletter
  • Methodology
  • Origins
  • Vocab
services
  • Research Sessions
  • Signals Workspace
  • Bespoke Projects
  • Use Cases
  • Signal Scanfree
  • Readinessfree
impact
  • ANBIMAFuture of Brazilian Capital Markets
  • IEEECharting the Energy Transition
  • Horizon 2045Future of Human and Planetary Security
  • WKOTechnology Scanning for Austria
audiences
  • Innovation
  • Strategy
  • Consultants
  • Foresight
  • Associations
  • Governments
resources
  • Pricing
  • Partners
  • How We Work
  • Data Visualization
  • Multi-Model Method
  • FAQ
  • Security & Privacy
about
  • Manifesto
  • Community
  • Events
  • Support
  • Contact
  • Login
ResearchServicesPricingPartnersAbout
ResearchServicesPricingPartnersAbout
  1. Home
  2. Vocab
  3. ARC-AGI

ARC-AGI

Benchmark testing artificial reasoning and abstraction ability through novel, unseen visual puzzles

Year: 2019Generality: 634
Back to Vocab

ARC-AGI (Abstraction and Reasoning Corpus) is a benchmark and open-source dataset created by François Chollet in 2019 to evaluate artificial intelligence systems on their ability to reason abstractly and solve novel problems through pattern recognition, not memorization. The benchmark consists of hundreds of visual puzzles where an AI system is shown a small number of input-output examples and must infer the underlying rule or transformation, then apply it to new test inputs it has never seen. Each puzzle involves colored grids and spatial transformations—for instance, given three examples of how a shape is rotated or filled, predict the transformation on a fourth, unseen example.

The design of ARC-AGI is deliberately adversarial to large language models and memorization-based approaches. Puzzles are hand-crafted to be simple for humans (often solvable by children) but require genuine reasoning: identifying abstract patterns, generalizing across few examples, and avoiding overfitting to surface-level features. Test sets are kept private to prevent benchmark gaming. Crucially, the puzzles are designed to be novel—no training data contains the exact transformation, forcing the system to rely on reasoning rather than pattern matching. The original dataset (v1.0) contained 400 puzzles; subsequent versions (ARC-AGI-2, ARC-AGI-3, released in 2025-2026) expand the corpus and increase difficulty.

ARC-AGI has become a proxy for measuring progress toward artificial general intelligence because it tests abstraction ability—the hallmark of flexible, human-like reasoning. In 2024, Chollet announced the ARC Prize, a $2 million competition for the first system to solve 85% of the private test set. As of late 2025, the best AI systems score less than 1% on ARC-AGI-3, while humans easily exceed 85%. This gap highlights a critical limitation of current AI: exceptional performance on benchmarks like ImageNet or MMLU contrasts sharply with near-total failure on novel reasoning tasks. ARC-AGI has influenced benchmark design across the field and galvanized research into inductive reasoning, few-shot learning, and genuine abstraction.

Related

Related

Raven's Progressive Matrices
Raven's Progressive Matrices

A visual reasoning benchmark used to evaluate abstract pattern recognition in AI systems.

Generality: 384
AlphaGeometry
AlphaGeometry

A neuro-symbolic AI system that solves olympiad-level geometry problems at human-expert level.

Generality: 94
AGI (Artificial General Intelligence)
AGI (Artificial General Intelligence)

A hypothetical AI system capable of performing any intellectual task a human can.

Generality: 895
Adaptive Reasoning
Adaptive Reasoning

AI capability to flexibly construct and revise multi-step inferences when facing novel problems.

Generality: 701
Mirage Effect
Mirage Effect

When multimodal AI models produce confident visual analysis from images that were never provided

Generality: 542
Moravec's Paradox
Moravec's Paradox

AI finds abstract reasoning easy but struggles with basic human sensorimotor skills.

Generality: 678