A supervised learning model that classifies data by finding the optimal separating hyperplane.
A Support Vector Machine is a supervised learning algorithm that solves classification and regression problems by identifying the hyperplane that maximally separates data points belonging to different classes. The core insight is margin maximization: rather than finding any boundary that separates classes, an SVM finds the one with the greatest distance to the nearest data points on either side — those boundary-defining points are called support vectors. This maximum-margin approach gives SVMs strong generalization properties, reducing the risk of overfitting even in high-dimensional feature spaces.
A critical innovation that expanded SVMs from linear to nonlinear problems is the kernel trick. By implicitly mapping input data into a higher-dimensional feature space using kernel functions — such as polynomial, radial basis function (RBF), or sigmoid kernels — SVMs can find linear separating hyperplanes in that transformed space, which correspond to complex nonlinear boundaries in the original input space. This allows a single algorithmic framework to handle a wide variety of data geometries without explicitly computing expensive high-dimensional transformations. The soft-margin extension further improved practical applicability by allowing some misclassifications, controlled by a regularization parameter, making SVMs robust to noisy or overlapping class distributions.
SVMs became a dominant machine learning method through the 1990s and 2000s, particularly excelling in text classification, image recognition, and bioinformatics tasks where data is high-dimensional but training sets are relatively small. They offer strong theoretical guarantees rooted in statistical learning theory and produce sparse, interpretable models defined entirely by their support vectors. While deep learning has displaced SVMs in many large-scale applications, they remain highly competitive in low-to-medium data regimes and are valued for their mathematical rigor, predictable behavior, and effectiveness without extensive hyperparameter tuning.