Special tokens that give language models explicit space to reason before answering.
A thought token is a designated token or token sequence inserted into a language model's context to facilitate explicit intermediate reasoning before the model produces a final output. Unlike standard word or subword tokens that represent linguistic units, thought tokens serve a computational function: they carve out space in the model's generation process for chain-of-thought style deliberation, allowing the model to "think" in a structured way that is visible in the token stream. This mechanism became particularly relevant with the emergence of reasoning-focused language models and inference-time compute scaling approaches around 2022–2024.
In practice, thought tokens often appear as special delimiters—such as <think> and </think> tags—that bracket a model's internal reasoning trace. During generation, the model produces a scratchpad of intermediate steps within these boundaries before emitting its final answer. This approach draws on chain-of-thought prompting research but formalizes the reasoning phase as a first-class architectural or training-time construct rather than an emergent behavior elicited purely through prompting. Models like DeepSeek-R1 and OpenAI's o1 series popularized this pattern, training models to generate extended reasoning traces that substantially improve performance on complex mathematical, logical, and coding tasks.
The significance of thought tokens lies in their role as a bridge between raw next-token prediction and deliberate, multi-step problem solving. By making reasoning explicit and token-countable, they enable inference-time scaling: allocating more compute budget to harder problems simply by allowing longer thought sequences. This shifts some of the intelligence burden from model parameters to generation-time computation, a paradigm sometimes called "thinking more" rather than "knowing more."
Thought tokens also raise important questions about interpretability and faithfulness. Because the reasoning trace is generated autoregressively like any other text, it may not perfectly reflect the model's internal computations—the visible "thoughts" could be post-hoc rationalizations rather than causal reasoning steps. Nonetheless, empirical results consistently show that models trained to use thought tokens outperform those that answer directly, making this one of the more impactful recent developments in applied language model research.