Language models that generate continuous-valued embeddings instead of discrete tokens.
CALM, or Continuous Autoregressive Language Models, refers to a class of generative language models that operate in a continuous embedding space rather than producing discrete token predictions at each step. Traditional autoregressive language models generate text by predicting one token at a time from a fixed vocabulary, selecting the next token based on a probability distribution over all possible discrete symbols. CALM breaks from this paradigm by having the model output a continuous vector at each step, which is then fed back as input for the next generation step — bypassing the need to project back into a discrete vocabulary during intermediate computations.
The core mechanism involves training a model to autoregressively predict continuous representations — typically embeddings or latent vectors — rather than softmax distributions over tokens. This allows the model to propagate richer, higher-dimensional information between steps without the information bottleneck imposed by discretization. During inference, the final continuous output can be decoded into human-readable text through a separate decoding head or diffusion-based process. This architecture draws inspiration from continuous diffusion models applied to language and connects to broader research on latent-space generation, where the generative process unfolds in a smooth, differentiable manifold.
CALM-style approaches offer several potential advantages over discrete token models. Because the intermediate representations are continuous and differentiable, gradients can flow more freely through the generation process, potentially enabling more expressive and coherent long-range dependencies. They also sidestep some limitations of tokenization, such as sensitivity to subword segmentation and the inability to represent fine-grained semantic nuance within a single token slot. This makes them particularly appealing for tasks requiring nuanced semantic generation or for integration with other continuous modalities like audio and vision.
The relevance of CALM to modern machine learning grew significantly as researchers sought alternatives to the discrete bottleneck in large language models, especially in the context of scaling and multimodal generation. While discrete autoregressive models like GPT-style architectures remain dominant, continuous autoregressive approaches represent an active research frontier exploring whether language generation can be made more fluid, efficient, and expressive by embracing the geometry of continuous latent spaces rather than the combinatorics of token vocabularies.