
TAID
Temporally Adaptive Interpolated Distillation
Temporally Adaptive Interpolated Distillation
A distillation method for sequence and video models that transfers temporal dynamics by adaptively interpolating teacher signals to match the student's temporal resolution and latency constraints.
Temporally Adaptive Interpolated Distillation (TAID) is a teacher–student distillation paradigm that explicitly handles mismatches in temporal granularity between teacher and student models by interpolating teacher representations (or soft labels) onto the student's timestamps and weighting those interpolated signals according to temporal importance and runtime constraints.
TAID extends classical knowledge distillation to temporal domains (video, speech, sensor time series) by addressing two common problems: teachers are often trained at high frame rates or with long temporal receptive fields, while deployed students operate at lower frame rates or under real-time latency budgets. In practice TAID aligns teacher and student sequences via learned or analytic interpolation (e.g., linear, spline, or learned temporal attention kernels), then applies distillation losses on interpolated features, logits, or temporal gradients. Loss components typically include per-timestep feature regression, temporally smoothed KL on logits, and consistency or continuity regularizers to preserve dynamics across time. A temporally adaptive weighting scheme lets the distillation focus on salient instants (motion boundaries, speech onsets) and reduce emphasis on redundant frames, which improves robustness to frame-rate changes and enables smaller models to capture fine-grained temporal structure despite subsampled inputs. TAID is applicable to action recognition, temporal segmentation, online detection, streaming ASR, and efficient sensor-model deployment; theoretically it sits at the intersection of KD (teacher–student compression), temporal alignment/interpolation theory, and sequence modeling (temporal attention and contrastive sequence objectives).
First used in the early 2020s in preprints and workshop papers addressing video and streaming-model compression; the approach gained broader traction in 2022–2024 as research and industry prioritized real-time, low-latency sequence models and frame-rate–robust distillation techniques.
TAID builds directly on foundational distillation work (Hinton et al.) and intermediate representation distillation (e.g., FitNets), and leverages advancements in temporal modeling from the video and sequence community (temporal segment and 3D-convolution approaches). Development and validation have been driven by research groups focused on efficient video and speech models at major labs and universities (Google Research, DeepMind, Meta/FAIR, and leading academic groups in CMU, Stanford, etc.), along with multiple independent teams publishing preprints on temporal distillation, frame-rate adaptation, and streaming inference.



