An information-theoretic framework for learning compact representations that preserve predictive power.
Information Bottleneck (IB) theory frames representation learning as a principled optimization over mutual information. Given an input variable X and a target variable Y, the goal is to find a compressed representation T that discards as much irrelevant information from X as possible while retaining whatever is necessary to predict Y. This tradeoff is formalized through a Lagrangian objective: minimize I(X;T) − β·I(T;Y), where β controls the balance between compression and predictive fidelity. Rooted in rate-distortion theory, IB defines a sufficiency criterion in purely information-theoretic terms — T is sufficient for Y when I(T;Y) equals I(X;Y) — and traces out a continuum of optimal encoders parameterized by β.
In machine learning, IB provides a normative lens for understanding feature extraction and representation learning in both supervised and unsupervised settings. Its most influential application has been to deep neural networks, where hidden layers are interpreted as progressively compressed representations of the input, each retaining only the information most relevant to the output. This perspective sparked significant debate about whether deep networks undergo distinct compression and fitting phases during training, and whether IB dynamics explain their generalization behavior. Practical scalability is achieved through the Variational Information Bottleneck, which replaces intractable mutual information terms with tractable variational bounds, enabling IB-style regularization in large-scale models.
Despite its theoretical appeal, IB faces real empirical challenges. Estimating mutual information reliably in high-dimensional continuous spaces is notoriously difficult, and deterministic networks — which dominate practice — require injected stochasticity to make information measures well-defined. Critics have shown that observed compression effects can be artifacts of activation functions or binning choices rather than fundamental training dynamics. These debates have sharpened the community's understanding of what IB can and cannot explain.
Beyond deep learning interpretation, IB connects to a broad ecosystem of theoretical frameworks including minimum description length, PAC-Bayes bounds, and even analogies to renormalization group methods in physics. It has influenced work on disentangled representations, privacy-preserving learning, and multi-view learning, cementing its status as one of the more generative theoretical ideas in modern machine learning.