Simultaneous execution of multiple tasks across processors to accelerate computation.
Parallelism is a computational strategy in which a problem is decomposed into smaller, independent subtasks that execute simultaneously across multiple processing units. In machine learning, this principle is indispensable: training large neural networks involves billions of arithmetic operations, and distributing that workload across GPU cores, specialized accelerators, or entire clusters of machines can reduce training time from weeks to hours. Two primary forms dominate ML practice—data parallelism, where different batches of training data are processed concurrently on replicated model copies, and model parallelism, where different layers or partitions of a model are assigned to different devices when the model itself is too large to fit on a single accelerator.
The mechanics of parallelism in ML require careful coordination. In data-parallel training, each worker computes gradients on its local batch, and those gradients must be aggregated—typically via all-reduce operations—before weights are updated. Frameworks like PyTorch's Distributed Data Parallel (DDP) and TensorFlow's MirroredStrategy automate much of this synchronization. Model parallelism introduces pipeline stages and micro-batching to keep all devices busy and minimize idle time. Tensor parallelism, a finer-grained variant, splits individual weight matrices across devices, enabling efficient training of models with hundreds of billions of parameters.
Parallelism became central to modern deep learning with the GPU revolution of the late 2000s, when researchers discovered that the massively parallel architecture of graphics processors was ideally suited to the matrix multiplications underlying neural network training. Amdahl's Law—which states that the speedup from parallelization is bounded by the fraction of work that must remain sequential—provides a useful theoretical ceiling, but in practice, communication overhead, load imbalance, and memory bandwidth constraints are the dominant engineering challenges. Techniques such as gradient compression, asynchronous updates, and mixed-precision arithmetic have emerged to push practical efficiency closer to theoretical limits.
As models have grown from millions to trillions of parameters, parallelism has evolved from an optimization into a necessity. The ability to orchestrate thousands of accelerators in concert now determines which research organizations and companies can train frontier models, making parallelism one of the most consequential engineering disciplines in contemporary AI development.