The total computational resources consumed while training a machine learning model.
Training compute refers to the aggregate computational work performed during the process of fitting a machine learning model to data — typically measured in floating-point operations (FLOPs). It encompasses every forward pass, backward pass, and parameter update executed across the full training run, and scales with model size, dataset size, and the number of training steps. As deep learning models have grown dramatically in scale, training compute has become one of the primary axes along which AI progress is measured and planned, often requiring clusters of specialized accelerators such as GPUs or TPUs running for weeks or months.
The significance of training compute is formalized in AI scaling laws, which describe empirical relationships between compute budget, model parameters, dataset tokens, and resulting model performance. Landmark work by researchers at OpenAI and DeepMind — including the Chinchilla scaling laws — demonstrated that compute is most efficiently spent when model size and data volume are scaled together in specific proportions. This insight transformed how practitioners allocate training budgets and has driven the field toward increasingly deliberate compute-optimal training strategies rather than simply maximizing model size.
Training compute matters beyond raw performance: it is a proxy for cost, energy consumption, and accessibility. The exponential growth in compute required by frontier models has concentrated cutting-edge AI development among a small number of well-resourced organizations, raising questions about research equity and environmental impact. Compute efficiency — achieving equivalent performance with fewer FLOPs — has consequently become a major research objective, motivating advances in architecture design, mixed-precision training, and data curation. Understanding training compute is now essential for anyone reasoning about the economics, capabilities, and limitations of modern AI systems.