An AI model that transforms natural language input into natural language output.
A text-to-text model is a neural network architecture that frames all language tasks as a mapping from one text sequence to another. Rather than treating translation, summarization, classification, or question answering as fundamentally different problems requiring specialized architectures, the text-to-text paradigm unifies them under a single framework: given some input text, produce the appropriate output text. This approach was crystallized by Google's T5 (Text-to-Text Transfer Transformer) in 2019-2020, which demonstrated that a single model trained with a consistent input-output format could achieve strong performance across a wide range of NLP benchmarks.
The dominant architecture underlying text-to-text models is the Transformer, which uses self-attention mechanisms to capture long-range dependencies between tokens in a sequence. Most text-to-text models employ an encoder-decoder structure: the encoder reads and contextualizes the input sequence, while the decoder autoregressively generates the output token by token, attending to both the encoded input and previously generated tokens. This design makes the architecture naturally suited to tasks where the input and output lengths differ significantly, such as translation or summarization.
The power of text-to-text models comes largely from pretraining on massive text corpora using self-supervised objectives — such as masked span prediction — followed by fine-tuning on specific downstream tasks. This transfer learning approach allows a single pretrained model to be adapted efficiently to many applications with relatively little labeled data. More recent large language models like GPT-4 and Claude extend this paradigm further, using instruction tuning and reinforcement learning from human feedback to make models responsive to open-ended natural language prompts without task-specific fine-tuning.
Text-to-text models have become the backbone of modern NLP, powering applications from machine translation and document summarization to code generation and conversational assistants. Their unifying framing has simplified model development pipelines and enabled rapid progress across the field. As model scale has increased, emergent capabilities — such as multi-step reasoning and in-context learning — have made text-to-text models central to the broader development of general-purpose AI systems.