The total computational, energy, and financial resources required to train an AI model.
Training cost refers to the aggregate of resources consumed when optimizing a machine learning model on data — encompassing compute time, hardware expenses, energy consumption, and the human labor involved in orchestrating the process. For small models trained on modest datasets, these costs may be negligible. But for large-scale systems like foundation models and large language models, training runs can require thousands of specialized accelerators operating for weeks, translating into millions of dollars and megawatt-hours of electricity.
The primary driver of training cost is the number of floating-point operations (FLOPs) required to complete a training run, which scales with model size, dataset size, and the number of training steps. Hardware efficiency, parallelism strategies (data, model, and pipeline parallelism), and numerical precision (e.g., using bfp16 or int8 instead of float32) all influence how efficiently those FLOPs are executed. Memory bandwidth and interconnect speed between accelerators also become critical bottlenecks at scale, meaning raw compute alone does not determine total cost.
Training cost has profound implications for who can participate in frontier AI research. When a single training run costs tens of millions of dollars, only well-resourced corporations and national labs can afford to experiment at the cutting edge, concentrating capability and raising concerns about equitable access to AI development. This dynamic has accelerated interest in techniques that reduce cost without sacrificing performance — including transfer learning, parameter-efficient fine-tuning, mixture-of-experts architectures, and improved data curation that reduces the volume of training needed.
The concept gained widespread attention as empirical scaling laws demonstrated that model performance improves predictably with compute, making training cost a central variable in strategic AI planning. Researchers now routinely report training costs alongside benchmark results, and organizations publish compute budgets as a transparency measure. Tools for estimating and tracking FLOPs, carbon emissions, and dollar costs have become standard parts of the ML practitioner's toolkit, reflecting how central resource accounting has become to responsible model development.