Skip to main content

Envisioning is an emerging technology research institute and advisory.

LinkedInInstagramGitHub

2011 — 2026

research
  • Observatory
  • Newsletter
  • Methodology
  • Origins
  • Vocab
services
  • Research Sessions
  • Signals Workspace
  • Bespoke Projects
  • Use Cases
  • Readinessfree
impact
  • ANBIMAFuture of Brazilian Capital Markets
  • IEEECharting the Energy Transition
  • Horizon 2045Future of Human and Planetary Security
  • WKOTechnology Scanning for Austria
audiences
  • Innovation
  • Strategy
  • Consultants
  • Foresight
  • Associations
  • Governments
resources
  • Pricing
  • Partners
  • How We Work
  • Data Visualization
  • Multi-Model Method
  • FAQ
  • Security & Privacy
about
  • Manifesto
  • Community
  • Events
  • Support
  • Contact
  • Login
ResearchServicesPricingPartnersAbout
ResearchServicesPricingPartnersAbout
  1. Home
  2. Vocab
  3. Scaling Laws

Scaling Laws

Predictable power-law relationships between model size, data, compute, and performance.

Year: 2020Generality: 724
Back to Vocab

Scaling laws are empirical mathematical relationships that describe how the performance of machine learning models improves as key resources increase — specifically model parameter count, training dataset size, and computational budget. Rather than improving arbitrarily or unpredictably, model performance tends to follow smooth power-law curves with respect to each of these variables, meaning that each order-of-magnitude increase in scale yields a roughly consistent, predictable gain in capability. This regularity allows researchers to forecast how well a model will perform before training it, and to make principled decisions about how to allocate a fixed compute budget between model size and data volume.

The mechanics underlying scaling laws reflect deep statistical properties of how neural networks learn from data. Larger models have greater capacity to represent complex functions, while more data reduces overfitting and exposes the model to a richer distribution of patterns. Crucially, these factors interact: a very large model trained on too little data will underperform, and vice versa. The Chinchilla scaling laws, published by Hoffmann et al. in 2022, refined earlier estimates by showing that many prominent large language models had been significantly undertrained relative to their size — that optimal performance requires scaling data and parameters in roughly equal proportion.

Scaling laws matter because they transform AI development from an art into something closer to an engineering discipline. Instead of relying on intuition or trial-and-error, teams can use scaling predictions to plan multi-million-dollar training runs with reasonable confidence in the outcome. They also carry strategic implications: if performance scales smoothly and predictably with compute, then sustained investment in hardware and data becomes a reliable path to capability improvements, which has shaped the economics and competitive dynamics of frontier AI development.

However, scaling laws have important limitations. They describe average performance on benchmark metrics and do not guarantee that specific capabilities — reasoning, factual accuracy, or safety — emerge reliably at a given scale. Some abilities appear to emerge abruptly rather than smoothly, complicating extrapolation. Nonetheless, scaling laws remain one of the most practically influential empirical findings in modern deep learning research.

Related

Related

Scaling Hypothesis
Scaling Hypothesis

Increasing model size, data, and compute reliably improves machine learning performance.

Generality: 753
Chinchilla Scaling
Chinchilla Scaling

Optimal LLM training balances model size and data quantity for a fixed compute budget.

Generality: 337
Internet Scale
Internet Scale

ML systems designed to train, serve, or process data across billions of users and devices.

Generality: 520
Inference Scaling
Inference Scaling

Improving model outputs by allocating more compute during inference rather than during training

Generality: 812
Scaled Supervision Method
Scaled Supervision Method

An AI training approach that improves model performance through large-scale, high-quality labeled data.

Generality: 337
Scale Separation
Scale Separation

Distinguishing phenomena operating at fundamentally different magnitudes, time scales, or spatial dimensions.

Generality: 521