Neural networks combined with state-space models to capture complex sequential dynamics.
Deep State-Space Models (Deep SSMs) merge the representational capacity of deep neural networks with the structured probabilistic framework of classical state-space models. Traditional SSMs represent a system through a hidden state that evolves over time according to transition dynamics, with observations generated from that state — but they typically assume linear relationships and Gaussian noise. Deep SSMs replace or augment these components with neural networks, enabling the model to learn highly nonlinear transition and emission functions directly from data. This makes them far more expressive than their classical counterparts while retaining the interpretable latent-state structure that makes SSMs appealing for sequential data.
The mechanics of Deep SSMs typically involve parameterizing the state transition distribution and the observation model using neural networks, then performing inference over the latent states using techniques such as variational inference, particle filtering, or amortized inference networks. Variants like the Deep Kalman Filter, Structured Inference Networks, and later architectures such as S4 and Mamba have explored different trade-offs between expressiveness, tractability, and computational efficiency. Recurrent neural networks and, more recently, structured convolutional approaches have been used to propagate information across time steps, allowing these models to handle long-range dependencies that challenge standard RNNs.
Deep SSMs matter because they offer a principled way to model uncertainty in sequential systems — a property that purely discriminative deep learning models often lack. This makes them particularly valuable in domains where quantifying prediction confidence is as important as accuracy itself, including biomedical signal processing, financial forecasting, robotics, and climate modeling. Their ability to disentangle latent dynamics from noisy observations also supports tasks like anomaly detection, imputation of missing data, and counterfactual simulation.
Interest in Deep SSMs accelerated around 2018 with the publication of models like the Deep Kalman Filter and Kalchbrenner's Video Pixel Networks, and surged again in the early 2020s when structured SSM architectures demonstrated competitive or superior performance to Transformers on long-sequence benchmarks. This renewed attention has positioned Deep SSMs as a serious architectural alternative in sequence modeling, particularly where efficiency and inductive temporal structure are priorities.