A flat subspace of one fewer dimension than its ambient space, used to separate data classes.
A hyperplane is a flat, affine subspace whose dimension is exactly one less than the space containing it. In two-dimensional space, a hyperplane is a line; in three dimensions, it is a plane; and in n-dimensional space, it is an (n−1)-dimensional surface. Mathematically, a hyperplane is defined by a linear equation of the form w · x + b = 0, where w is a normal vector perpendicular to the hyperplane and b is a scalar bias term. This compact representation makes hyperplanes computationally tractable even in very high-dimensional feature spaces.
In machine learning, hyperplanes serve as decision boundaries that partition a feature space into distinct regions corresponding to different class labels. A linear classifier assigns a label to a new data point based on which side of the hyperplane it falls on. The challenge is finding the hyperplane that best separates the training data while generalizing well to unseen examples. Support Vector Machines (SVMs) formalize this by seeking the maximum-margin hyperplane — the one that maximizes the distance to the nearest data points from each class, called support vectors. This margin maximization is directly linked to better generalization performance through statistical learning theory.
Hyperplanes also appear in neural networks, where each neuron in a fully connected layer computes a linear transformation that implicitly defines a hyperplane in its input space. The composition of many such hyperplanes, interleaved with nonlinear activations, allows deep networks to carve complex, non-linear decision boundaries. In dimensionality reduction techniques like Principal Component Analysis (PCA), hyperplanes represent the lower-dimensional subspaces onto which data is projected to capture maximum variance.
The practical importance of hyperplanes extends to kernel methods, where data that is not linearly separable in the original feature space is implicitly mapped to a higher-dimensional space where a separating hyperplane does exist. This kernel trick allows SVMs and related algorithms to handle complex, non-linear classification problems without explicitly computing the high-dimensional transformation, making hyperplane-based methods both powerful and widely applicable across domains ranging from text classification to bioinformatics.