A data structure that manages ordered task or element processing, typically FIFO.
A queue is a data structure that organizes elements or tasks for sequential processing, most commonly following a First-In-First-Out (FIFO) discipline where the earliest-added item is the first to be retrieved. In machine learning and AI systems, queues appear throughout the computational stack — from low-level hardware scheduling to high-level orchestration of training jobs. Their simplicity belies their importance: nearly every system that must coordinate asynchronous or concurrent work relies on some form of queuing to prevent race conditions, manage backpressure, and ensure predictable throughput.
In practice, ML pipelines use queues extensively to decouple data ingestion from model computation. A data-loading queue, for instance, allows CPU-based preprocessing workers to fetch and augment batches in parallel while the GPU processes the previous batch, eliminating idle time and dramatically improving training efficiency. Frameworks like TensorFlow and PyTorch expose queue-like abstractions (e.g., tf.queue, DataLoader with prefetching) precisely for this reason. Distributed training systems extend this further, using message queues to coordinate gradient exchanges between workers or to schedule parameter updates across nodes.
Beyond standard FIFO queues, priority queues order elements by a user-defined key rather than arrival time, enabling more sophisticated scheduling. Reinforcement learning systems, for example, use priority queues in prioritized experience replay, where transitions with higher temporal-difference error are sampled more frequently, accelerating learning. Search algorithms like A* also rely on priority queues to efficiently expand the most promising nodes in a search frontier.
Queues matter to AI practitioners not just as an implementation detail but as a design principle: well-designed queuing strategies can be the difference between a training pipeline that saturates hardware and one that wastes most of its cycles waiting. As models grow larger and training becomes increasingly distributed across heterogeneous hardware, thoughtful queue management — including bounded queues to prevent memory overflow and adaptive scheduling to handle variable-length tasks — becomes a first-class engineering concern in modern ML infrastructure.