A text representation encoding documents as unordered token counts, ignoring word sequence.
Bag of Words (BOW) is a foundational text representation technique that converts a document into a fixed-length numerical vector by counting how often each word from a predefined vocabulary appears, entirely disregarding the order in which those words occur. The resulting vector is typically high-dimensional and sparse — most vocabulary terms appear in any given document rarely or not at all — and each dimension corresponds to a specific token. Values can be raw counts, binary presence indicators, or weighted scores such as term frequency-inverse document frequency (TF-IDF), which downweights common words and amplifies informative ones.
The model operates under an exchangeability assumption: words are treated as statistically independent of their neighbors and positions, so the sentence "the dog bit the man" and "the man bit the dog" produce identical representations. Despite this obvious limitation, BOW vectors are surprisingly effective inputs for classical machine learning algorithms. Linear classifiers like logistic regression and support vector machines, as well as generative models like multinomial Naive Bayes, perform well on BOW features for tasks such as spam detection, sentiment analysis, and topic classification. The representation also serves as the input layer for latent semantic analysis (LSA) and probabilistic topic models like LDA, which recover hidden thematic structure from co-occurrence statistics.
BOW's practical appeal lies in its computational simplicity, interpretability, and compatibility with sparse linear algebra libraries. Vocabulary hashing and dimensionality reduction via truncated SVD make it scalable to large corpora. These properties made it the dominant NLP baseline throughout the 1990s and 2000s, underpinning information retrieval systems and early machine learning pipelines for text.
The model's core weakness — its blindness to syntax, semantics, and word order — motivated successive improvements. N-gram extensions partially recover local context by treating adjacent word sequences as atomic tokens. Ultimately, the limitations of BOW drove the field toward dense word embeddings (Word2Vec, GloVe) and then Transformer-based contextual representations (BERT, GPT), which capture meaning, order, and long-range dependencies that BOW fundamentally cannot. Nevertheless, BOW remains a competitive and interpretable baseline, especially in low-data or resource-constrained settings.