A multi-dimensional array serving as the core data structure in deep learning.
A tensor is a generalized multi-dimensional array that extends the familiar concepts of scalars, vectors, and matrices into arbitrary numbers of dimensions. A scalar is a 0-dimensional tensor, a vector is 1-dimensional, a matrix is 2-dimensional, and higher-order tensors capture increasingly complex structures. In machine learning, tensors serve as the universal container for data and model parameters alike — a batch of color images, for instance, is naturally represented as a 4D tensor with dimensions corresponding to batch size, height, width, and color channels.
Tensors are not merely passive data containers; they are the substrate on which all computation in modern deep learning frameworks operates. Libraries like TensorFlow and PyTorch define rich algebras over tensors, supporting operations such as element-wise arithmetic, matrix multiplication, convolution, and reshaping. Critically, these frameworks track the computational graph formed by tensor operations, enabling automatic differentiation — the engine behind backpropagation. This means gradients can be computed with respect to any tensor in the graph, making end-to-end training of complex models both practical and efficient.
The practical importance of tensors in deep learning is also tied to hardware. Graphics processing units (GPUs) and dedicated accelerators like TPUs are architecturally optimized for the kinds of dense, parallelizable arithmetic that tensor operations require. By expressing neural network computations as sequences of tensor operations, frameworks can transparently dispatch work to these accelerators, achieving the throughput necessary to train large models on massive datasets. The alignment between the tensor abstraction and hardware capabilities is a key reason deep learning scaled so dramatically through the 2010s.
Beyond images, tensors naturally represent virtually every modality encountered in machine learning: sequences of word embeddings in NLP are 3D tensors, video data adds a temporal dimension to image tensors, and graph neural networks operate on tensors encoding node features and adjacency structure. This universality makes the tensor the single most important data structure in the field, and fluency with tensor shapes, broadcasting rules, and contraction operations is considered a foundational skill for any deep learning practitioner.