A measure of angular similarity between two vectors, regardless of their magnitude.
Cosine similarity is a metric that quantifies the similarity between two vectors by computing the cosine of the angle between them. Given two vectors A and B, the cosine similarity is calculated as their dot product divided by the product of their magnitudes: cos(θ) = (A·B) / (‖A‖‖B‖). The result ranges from -1 to 1, where 1 indicates identical orientation, 0 indicates orthogonality (no similarity), and -1 indicates opposite directions. Crucially, the metric is insensitive to vector magnitude — only direction matters — making it well-suited for comparing objects whose scale varies independently of their content.
In machine learning and natural language processing, cosine similarity is most commonly applied to high-dimensional embeddings. In the vector space model of text, documents or words are represented as vectors where each dimension corresponds to a term or feature. Two documents with similar vocabulary distributions will point in similar directions through this space, even if one is much longer than the other. This property makes cosine similarity preferable to Euclidean distance for sparse, high-dimensional data, where raw distance measures are distorted by differences in vector length. Modern dense embedding models — such as word2vec, GloVe, and sentence transformers — also rely heavily on cosine similarity to surface semantic relationships between words, sentences, and documents.
Beyond NLP, cosine similarity is widely used in recommendation systems, image retrieval, and anomaly detection, wherever learned feature vectors need to be compared efficiently. It serves as the backbone of nearest-neighbor search in embedding spaces, enabling applications like semantic search, duplicate detection, and clustering. Its computational simplicity and geometric interpretability have made it one of the most broadly applied similarity measures across machine learning, and it remains a default choice when working with any kind of dense or sparse vector representation.