
Non-stationary of objectives
An optimization target that shifts over time so the loss or reward a model seeks to minimize or maximize is itself a moving quantity rather than a fixed function of model parameters and data.
Non-stationary objectives arise when the function defining performance (loss, reward, or utility) changes during training or deployment, turning optimization into a tracking problem rather than a standard convergence problem; this violates common ML (Machine Learning) assumptions such as IID data and a fixed objective, introduces time-dependence in gradients and optima, and creates challenges for stability, generalization, and evaluation. In practice this appears in reinforcement learning when the environment dynamics or opponent policies change, in online learning and streaming settings with concept drift, in multi-agent systems where other agents adapt, and in continual or lifelong learning where the task distribution evolves; theoretically it shifts analysis from static guarantees (e.g., convergence to a minimizer) to dynamic guarantees (e.g., bounding dynamic regret, tracking error, or transfer efficiency). Addressing non-stationary objectives requires methods such as change-point detection, adaptive learning rates and optimizers, online convex optimization with drifting-regret bounds, meta-learning for rapid adaptation, ensemble or memory-based approaches to retain past knowledge, and formulations that explicitly model time-varying reward/loss (non-stationary MDPs, non-stationary bandits). Evaluation metrics and training curricula must also be adapted (for example, using dynamic rather than static regret, forward/backward transfer metrics, or time-weighted validation) because standard stationary benchmarks can hide catastrophic failures when objectives move.
First usage of the phrase in ML contexts dates back to the 1990s in research on adaptive/online algorithms and non-stationary environments; the concept gained widespread attention and practical urgency in the 2010s as deep learning and deep reinforcement learning systems were deployed in non-stationary real-world settings and as the fields of continual learning and concept-drift detection matured.
Key contributors and communities include the reinforcement learning community (e.g., Richard S. Sutton and Andrew G. Barto for foundational work on non-stationary environments), the online-learning and learning-theory community (e.g., Nicolo Cesa-Bianchi and Gabor Lugosi on regret and adversarial/changing environments), stream-mining and concept-drift researchers (e.g., João Gama), the non-stationary bandits literature (e.g., works by Aurélien Garivier and Olivier Moulines), and the continual/lifelong learning community (e.g., Sebastian Thrun; and deep-continual methods such as Elastic Weight Consolidation by Kirkpatrick et al.). These groups collectively developed the theoretical tools and practical algorithms used to detect, adapt to, and mitigate shifting objectives in AI and ML systems.
