Large Language Diffusion Models

Large Language Diffusion Models

A class of generative architectures that adapt diffusion-based stochastic denoising processes to produce and refine natural-language outputs at scale.

LLaDA (Large Language Diffusion Models) are architectures that bring diffusion and score-based generative modeling—originally developed for continuous data like images—into the domain of large-scale language generation by performing forward noising and learned reverse denoising in continuous embedding or latent spaces to generate, infill, or refine discrete token sequences. Technically, these models train a neural denoiser (score network) to remove progressively applied noise from continuous representations of text (token embeddings, sentence latents, or other learned latents), enabling non-autoregressive or hybrid generation workflows that can offer improved mode coverage, controllability (via guidance), and flexible conditioning compared with strictly autoregressive Large Language Models (LLMs). Key methodological components include selection of an appropriate noise model and scheduler for discrete-conditioned tasks, mapping discrete tokens to and from continuous spaces (embedding diffusion, vector quantization or learned latents), application of classifier-free or score-based guidance for control, and use of accelerated samplers or distillation for practical sampling speed. Practically, LLaDA-style approaches are explored for controlled text synthesis, infilling/repair, robust paraphrase and data augmentation, and as refinement stages combined with autoregressive LLMs; they inherit both the theoretical benefits of diffusion/energy-based methods (well-behaved likelihood surrogates, denoising training objectives) and the engineering challenges of high compute, discrete-to-continuous coupling, and evaluation on semantic fidelity.

First use: the underlying idea of applying diffusion/score-based methods to text began appearing in the literature around 2021–2022; the specific framing and wider interest in "large language diffusion" approaches grew noticeably in 2023–2024 as latent-diffusion techniques and scaling experiments made diffusion-based text generation more practical.

Key contributors: foundational work on diffusion and score-based generative modeling by Jonathan Ho, Yang Song, Jascha Sohl‑Dickstein and collaborators established the core algorithms; Rombach et al.'s latent diffusion work helped popularize latent-space diffusion for scalable generation; subsequent text-focused research groups and papers (e.g., teams publishing under labels such as Diffusion-LM and related works) and engineering efforts from Google Research / DeepMind, OpenAI, Stability AI, NVIDIA, and several academic labs have driven adaptation, scaling, and evaluation of diffusion methods for large-language tasks.

Related