A property ensuring AI-generated content is anchored to verifiable, real-world knowledge.
Groundedness refers to the degree to which a language model's outputs are anchored to factual, verifiable information rather than fabricated or hallucinated content. In practice, a grounded model produces responses that can be traced back to a source document, knowledge base, or observable reality. This stands in contrast to models that generate plausible-sounding but unsupported claims — a failure mode commonly called hallucination. Groundedness has become a central evaluation criterion as large language models are deployed in high-stakes domains like medicine, law, and enterprise search, where factual accuracy is non-negotiable.
Achieving groundedness typically involves architectural and training choices that tether generation to retrieved evidence. Retrieval-augmented generation (RAG) is one of the most widely adopted approaches: rather than relying solely on parametric knowledge baked into model weights, the system retrieves relevant documents at inference time and conditions its output on that retrieved context. Other techniques include fine-tuning models on citation-producing tasks, applying faithfulness constraints during decoding, and using natural language inference (NLI) classifiers to verify that generated claims are entailed by source material.
Groundedness is closely related to but distinct from factuality and faithfulness. Factuality concerns whether a statement is true in the world; faithfulness concerns whether a summary or answer accurately reflects its source document; groundedness encompasses both by requiring that outputs be explicitly derivable from some grounded context. Evaluation methods range from human annotation — where raters check whether each claim in a response is supported by a cited source — to automated metrics like FactScore, which decomposes outputs into atomic claims and verifies each against a knowledge base.
The concept gained particular urgency around 2021–2022, as models like GPT-3 demonstrated impressive fluency alongside a troubling tendency to confabulate. Enterprises deploying conversational AI quickly discovered that ungrounded outputs eroded user trust and created liability risks. This drove significant investment in grounding techniques across both academic research and industry, making groundedness one of the defining reliability challenges of the current generation of large language models.