SSM (State-Space Model)

State-space models (SSMs) represent dynamic systems by tracking a set of latent state variables that evolve over time according to transition equations, while observations are generated from those states through a separate emission process. In the classical formulation, two equations define the system: a state transition equation describing how the hidden state changes from one time step to the next (often incorporating noise or stochasticity), and an observation equation linking the hidden state to measurable outputs. This separation between latent dynamics and observed signals makes SSMs especially powerful for modeling systems where the underlying process is not directly observable — a common situation in engineering, economics, neuroscience, and machine learning.

In machine learning, SSMs gained renewed prominence as sequence modeling alternatives to recurrent neural networks and transformers. Modern deep SSMs — such as S4, Mamba, and related architectures — parameterize the state transition and emission matrices using structured or learned representations, enabling efficient long-range sequence modeling with linear or near-linear computational complexity. Unlike attention-based models that compare all token pairs, SSMs compress history into a fixed-size state vector, making them highly efficient for long sequences. The Kalman filter, a classic SSM algorithm, performs optimal inference in linear-Gaussian systems and remains foundational to understanding how SSMs propagate uncertainty through time.

SSMs matter in modern AI because they offer a principled, computationally tractable way to handle temporal dependencies, uncertainty, and missing data. Their recurrent structure allows constant-memory inference at test time, while their convolutional equivalence during training enables parallelization — a rare combination. As sequence lengths in applications like genomics, audio, and time-series forecasting grow into the tens of thousands, SSMs have emerged as a compelling complement or alternative to transformer architectures, combining theoretical grounding from control theory with the flexibility of deep learning.