Whether two data classes can be perfectly divided by a single hyperplane.
Linear separability is a geometric property of a dataset describing whether its classes can be perfectly partitioned by a linear decision boundary. In two dimensions, this boundary takes the form of a straight line; in three dimensions, a plane; and in higher-dimensional spaces, a hyperplane. A dataset is considered linearly separable if there exists at least one such boundary that places all examples of each class on opposite sides without any misclassification. This property sits at the heart of classical classification theory and directly determines which algorithms can be applied to a given problem.
The concept is most consequential when evaluating the capabilities of linear classifiers. The Perceptron, one of the earliest learning algorithms, is mathematically guaranteed to converge to a correct solution only when the training data is linearly separable. Similarly, hard-margin Support Vector Machines seek the maximum-margin hyperplane separating two classes and are only well-defined under linear separability. When this condition fails — as it does for problems like XOR — these methods either fail to converge or produce poor generalization, exposing a fundamental limitation of purely linear models.
To handle non-linearly separable data, practitioners employ several strategies. Kernel methods implicitly map input features into a higher-dimensional space where the classes may become linearly separable, allowing algorithms like SVMs to operate effectively without explicitly computing the transformation. Alternatively, soft-margin formulations introduce slack variables that permit some misclassification, trading perfect separation for robustness. Feature engineering can also manually construct representations in which previously entangled classes become separable. Deep neural networks sidestep the issue entirely by learning hierarchical, nonlinear representations that progressively reshape the data manifold.
Linear separability matters beyond its immediate algorithmic implications because it shapes intuitions about data complexity and model selection. Determining whether a dataset is linearly separable is itself computationally tractable via linear programming, making it a useful diagnostic. Understanding where linear boundaries succeed and fail motivates the design of more expressive models, and the concept remains a foundational reference point in the theory of computational learning, VC dimension, and the bias-variance tradeoff.