A technique where models update their parameters during inference to improve performance.
Test-Time Training (TTT) is a machine learning paradigm that blurs the traditional boundary between training and inference by allowing a model to continue updating its parameters when it encounters new test data. Rather than treating a trained model as frozen during deployment, TTT performs additional gradient-based optimization at inference time, typically using a self-supervised or auxiliary objective constructed from the test input itself. This makes the model's behavior adaptive rather than static, enabling it to respond to conditions that were not fully anticipated during the original training phase.
The mechanics of TTT generally involve a two-branch architecture or a shared-encoder design. A primary task branch handles the main prediction objective, while an auxiliary branch defines a self-supervised task — such as predicting image rotations, reconstructing masked inputs, or solving a contrastive objective — that can be optimized without ground-truth labels. When a test sample arrives, the model briefly trains on this auxiliary task using the test instance (and sometimes nearby unlabeled data), updates its shared parameters, and then produces a prediction with the adapted weights. This process can be applied to a single sample or a small batch, and the degree of adaptation is controlled by the number of update steps and learning rate.
TTT is especially valuable when test data exhibits distribution shift relative to training data — a pervasive challenge in real-world deployment where data statistics evolve over time, vary across geographic regions, or differ due to sensor changes. By adapting on the fly, TTT can recover accuracy that a static model would lose under such shifts, without requiring labeled test data or a full retraining cycle. It complements related techniques like domain adaptation and continual learning but is distinctive in operating at the level of individual test instances or small batches during live inference.
The approach carries practical trade-offs: inference becomes more computationally expensive since optimization steps must be run at test time, and poorly chosen auxiliary tasks can lead to degraded rather than improved performance. Research has focused on identifying robust auxiliary objectives, efficient update schedules, and theoretical guarantees for when TTT reliably helps. As models are increasingly deployed in dynamic, open-world settings, TTT represents a compelling direction for building systems that remain accurate beyond their original training distribution.