The subfield of AI enabling computers to understand, process, and generate human language.
Natural Language Processing (NLP) is the branch of artificial intelligence concerned with enabling computers to work meaningfully with human language in written or spoken form. It draws on computational linguistics, statistics, and machine learning to handle tasks spanning the full complexity of language: tokenization and parsing at the syntactic level, meaning extraction and coreference resolution at the semantic level, and discourse understanding and pragmatic inference at higher levels. The result is a broad toolkit that powers applications including machine translation, sentiment analysis, information retrieval, question answering, text summarization, and conversational agents.
Modern NLP is dominated by neural approaches, particularly transformer-based language models such as BERT, GPT, and their descendants. These models are pretrained on massive text corpora using self-supervised objectives—predicting masked tokens or next tokens—and then fine-tuned for specific downstream tasks. This pretraining paradigm dramatically improved performance across virtually every NLP benchmark and shifted the field away from hand-engineered feature pipelines toward general-purpose representations that transfer well across domains and languages. Large language models (LLMs) have pushed this further, demonstrating emergent capabilities in reasoning, code generation, and instruction following that were not explicitly trained.
NLP matters because language is the primary medium through which humans record knowledge, communicate intent, and interact with systems. The ability to process language at scale unlocks enormous practical value—from automating document review and customer support to enabling accessibility tools and scientific literature mining. It also raises significant challenges around bias, factual accuracy, and safety, since models trained on internet-scale text absorb the full spectrum of human expression, including harmful content and misinformation. As NLP systems become more capable and widely deployed, understanding their limitations and failure modes is as important as advancing their performance.