Word Vector

A word vector, also called a word embedding, is a dense, fixed-length numerical representation of a word in a continuous vector space. Unlike sparse representations such as one-hot encoding—where each word is an isolated binary flag with no relationship to any other—word vectors place semantically related words near each other in high-dimensional space. This geometric property means that words like "king" and "queen" cluster together, and arithmetic operations on vectors can capture analogies: the classic example being that vec("king") − vec("man") + vec("woman") ≈ vec("queen"). Typical embedding dimensions range from 50 to 1,000 values, compressing vocabulary-scale information into compact, learnable representations.

Word vectors are learned by training shallow neural networks or matrix factorization methods on large text corpora, using the distributional hypothesis—the idea that words appearing in similar contexts tend to have similar meanings. Word2Vec, introduced by Mikolov et al. in 2013, popularized two training objectives: Continuous Bag-of-Words (CBOW), which predicts a target word from surrounding context, and Skip-gram, which predicts context words from a target. GloVe (Global Vectors for Word Representation) takes a complementary approach, factorizing a global word co-occurrence matrix to produce embeddings that reflect corpus-wide statistical patterns. FastText extended these ideas by representing words as bags of character n-grams, enabling meaningful vectors for rare or morphologically complex words.

Word vectors became a cornerstone of NLP pipelines throughout the 2010s, dramatically improving performance on tasks such as sentiment analysis, named entity recognition, machine translation, and question answering. Pre-trained embeddings allowed practitioners to transfer linguistic knowledge learned from billions of tokens into downstream models trained on far smaller datasets—an early and influential form of transfer learning in NLP.

While static word vectors assign a single representation per word regardless of context, they laid the conceptual groundwork for contextual embeddings produced by models like ELMo, BERT, and GPT, where a word's representation shifts depending on its surrounding sentence. Despite being largely superseded by these transformer-based approaches for state-of-the-art tasks, word vectors remain widely used for their computational efficiency, interpretability, and strong performance in resource-constrained settings.

Word Vector

Related

Word Vector

Related

Related

Related