A technique that compresses neural networks by reducing weights and activations to binary values.
Binary Quantization Learning (BQL) is a model compression technique that replaces the high-precision floating-point weights and activations of a neural network with binary values, typically represented as -1 and +1 or 0 and 1. By collapsing the continuous numerical space into a single bit per value, BQL dramatically reduces both memory consumption and the arithmetic complexity of inference. Floating-point multiply-accumulate operations — the dominant cost in standard neural network computation — can be replaced with fast bitwise XNOR and popcount operations, yielding substantial speedups on compatible hardware.
In practice, applying binary quantization without care causes significant accuracy degradation, because the information lost in rounding to a single bit is severe. To mitigate this, BQL methods typically introduce learned scaling factors, straight-through estimators to approximate gradients through the non-differentiable binarization step during training, or mixed-precision schemes that keep sensitive layers (such as the first and last layers) at higher precision. Post-training quantization variants also exist, though quantization-aware training generally recovers more accuracy. Architectures like XNOR-Net and BinaryConnect demonstrated that carefully designed training pipelines could preserve competitive performance on benchmarks such as ImageNet even under full binarization.
BQL is especially relevant for deploying deep learning models on edge devices — smartphones, microcontrollers, FPGAs, and other embedded systems — where memory bandwidth, power budgets, and the absence of floating-point hardware make standard full-precision models impractical. As demand for on-device AI inference has grown, binary and low-bit quantization methods have become a core part of the model efficiency toolkit, complementing other compression strategies such as pruning and knowledge distillation. The trade-off between compression ratio and accuracy loss remains an active research area, with newer approaches exploring learned mixed-bit-width policies and hardware-aware quantization to push the Pareto frontier further.