Incoherent, meaningless text output produced by a language model lacking semantic structure.
In AI and natural language processing, "word salad" refers to model outputs that are syntactically garbled, semantically incoherent, or so contextually disconnected that they convey no meaningful information. The term is borrowed from clinical psychology, where it describes fragmented, disorganized speech associated with certain psychiatric conditions, but in the ML context it specifically characterizes failure modes of generative language systems. Word salad outputs may superficially resemble natural language — containing real words and partial grammatical structures — yet fail to communicate any coherent idea or intent.
Word salad typically emerges from several underlying causes. Early rule-based NLP systems could produce it when template logic broke down or when input fell outside expected patterns. In neural language models, it can result from insufficient training data, poor sampling strategies (such as very high temperature settings that flatten the probability distribution over tokens), or model collapse during training. Adversarial inputs and prompt injection attacks can also deliberately induce word salad as a way to destabilize model outputs. The phenomenon became a prominent benchmark concern as large language models like GPT-2 and GPT-3 demonstrated that scale alone did not guarantee coherence under all conditions.
Understanding and measuring word salad is important for evaluating language model quality and safety. Metrics such as perplexity, BERTScore, and human coherence ratings are commonly used to detect incoherent outputs, though no single automated metric perfectly captures the full range of failure modes. Reducing word salad has driven advances in decoding strategies — including beam search, nucleus sampling, and repetition penalties — as well as improvements in fine-tuning and reinforcement learning from human feedback (RLHF). As language models are deployed in high-stakes applications like medical documentation or legal drafting, the ability to reliably avoid incoherent outputs has become a core reliability and trustworthiness requirement.