An algorithm whose runtime or memory usage scales directly with input size.
Linear complexity, denoted O(n) in Big-O notation, describes an algorithm whose time or space requirements grow in direct proportion to the size of its input. If the input doubles, the computational cost doubles as well. This predictable, proportional scaling makes linear complexity one of the most desirable efficiency classes in practical algorithm design, sitting just above constant-time O(1) operations and well below quadratic or exponential alternatives. A classic example is a simple linear search through an unsorted list — each additional element requires one additional comparison.
In machine learning, linear complexity appears throughout data preprocessing, feature engineering, and inference pipelines. Scanning a dataset to compute mean and variance, applying a learned linear transformation to input features, or performing a single forward pass through a shallow model all exhibit O(n) behavior. As datasets scale into the billions of examples, the difference between linear and superlinear algorithms becomes the deciding factor in whether a system is deployable at all. This is why researchers actively seek linear-time approximations to inherently expensive operations, such as attention mechanisms in transformers.
The significance of linear complexity becomes especially clear when contrasted with quadratic O(n²) algorithms. The standard self-attention mechanism in transformer models, for instance, computes pairwise relationships between all tokens, resulting in O(n²) complexity with respect to sequence length. This bottleneck has driven an entire subfield of research into linear attention approximations — methods like Performer, Linformer, and Mamba — that attempt to recover the expressive power of full attention while reducing complexity to O(n). Achieving linear scaling in such architectures unlocks the ability to process much longer sequences without prohibitive memory or compute costs.
Understanding linear complexity is foundational for ML practitioners reasoning about scalability. When profiling a training loop or inference pipeline, identifying which components are O(n) versus O(n log n) or O(n²) directly informs architectural decisions, hardware requirements, and the practical upper bound on problem size a system can handle. Big-O analysis, while an asymptotic abstraction, remains an indispensable tool for building systems that remain tractable as data and model sizes continue to grow.