A supervised learning task that assigns input data to predefined discrete categories.
Classification is a fundamental supervised learning task in which a model learns to assign input data points to one of several predefined discrete categories, known as classes or labels. The model is trained on a labeled dataset — a collection of examples where the correct class for each input is already known — and learns to identify patterns and decision boundaries that distinguish one class from another. Once trained, the model can predict the class of previously unseen inputs with measurable accuracy.
The mechanics of classification vary widely depending on the algorithm used. Logistic regression models the probability of class membership using a sigmoid function, while decision trees partition the feature space through a series of hierarchical rules. Support vector machines find the optimal hyperplane that maximally separates classes in high-dimensional space. Neural networks, particularly deep learning architectures, learn hierarchical feature representations that enable classification of highly complex inputs such as images, audio, and natural language. In all cases, model performance is evaluated using metrics such as accuracy, precision, recall, F1 score, and the area under the ROC curve.
Classification problems are further distinguished by their structure. Binary classification involves exactly two classes — spam vs. not spam, malignant vs. benign — while multiclass classification handles three or more mutually exclusive categories. Multilabel classification allows a single input to belong to multiple classes simultaneously, as in tagging an image with several descriptive labels. Each variant introduces distinct modeling and evaluation challenges.
The practical importance of classification is enormous. It underpins applications ranging from email spam filtering and medical diagnosis to fraud detection, sentiment analysis, and autonomous vehicle perception. As datasets have grown larger and models more expressive, classification has remained one of the most active and consequential areas of machine learning research, serving as a benchmark for new architectures and training techniques alike.