A language model that applies the same neural structure repeatedly to process hierarchical data.
A recursive language model is a type of neural network architecture designed to process language by applying the same set of parameters repeatedly across a hierarchical or tree-structured representation of text. Unlike sequential models that process tokens one after another in a linear chain, recursive models operate on parse trees or other nested structures, combining representations of child nodes to form representations of parent nodes. This makes them naturally suited for capturing compositional semantics — the idea that the meaning of a phrase is built systematically from the meanings of its parts.
The core mechanism involves a shared weight function, often a simple neural network cell, that takes two or more child node representations as input and produces a single parent representation. This process recurses from the leaves of a syntactic or constituency parse tree up to the root, ultimately producing a single vector that encodes the meaning of the entire sentence. Variants such as the Recursive Neural Tensor Network (RNTN) introduced more expressive interaction terms between child vectors, allowing the model to capture phenomena like negation and sentiment modification more accurately.
Recursive language models gained significant traction in the early 2010s as a way to incorporate linguistic structure directly into deep learning pipelines. They demonstrated strong performance on tasks like sentiment analysis, particularly on the Stanford Sentiment Treebank, where fine-grained sentiment labels were available at every node of the parse tree. This allowed the model to learn how compositional operations like negation or intensification shift sentiment across phrases and clauses.
Despite their theoretical elegance, recursive models fell out of mainstream use with the rise of recurrent architectures like LSTMs and, later, Transformer-based models. Their reliance on external parse trees introduces a dependency on a separate parsing pipeline, which can propagate errors and limits scalability. However, they remain an important conceptual milestone in the history of NLP, demonstrating that structured linguistic knowledge could be integrated into learned representations — a theme that continues to influence research into syntax-aware and structure-aware language models today.