A machine learning approach using multi-layered neural networks to model complex data patterns.
Deep learning is a subfield of machine learning that trains artificial neural networks with many successive layers of computation to learn hierarchical representations of data. Rather than relying on hand-crafted features, deep networks automatically discover the structures needed to solve a task by transforming raw inputs — pixels, words, audio samples — through a cascade of learned transformations. Each layer extracts increasingly abstract features: early layers in an image model might detect edges, middle layers recognize shapes, and deeper layers identify objects. This end-to-end learning from raw data is what distinguishes deep learning from classical machine learning pipelines.
The mechanics rely on backpropagation, an algorithm that computes how much each parameter in the network contributed to the output error and adjusts weights accordingly via gradient descent. Modern deep learning also depends critically on hardware acceleration (GPUs and TPUs), large labeled datasets, and architectural innovations such as convolutional neural networks (CNNs) for spatial data, recurrent networks and transformers for sequential data, and residual connections that allow training of networks hundreds of layers deep. Regularization techniques like dropout and batch normalization help prevent overfitting in these high-capacity models.
Deep learning became the dominant paradigm in AI after 2012, when a deep convolutional network (AlexNet) dramatically outperformed all competitors on the ImageNet image classification benchmark. Since then, the approach has achieved human-level or superhuman performance across a remarkable range of tasks: speech recognition, machine translation, protein structure prediction, game playing, and generative modeling of images, text, and audio. Large language models like GPT and multimodal systems like CLIP are direct products of scaling deep learning to massive datasets and parameter counts.
The significance of deep learning lies in its generality and scalability. A single framework — differentiable computation graphs trained by gradient descent — has proven capable of tackling problems that previously required decades of domain-specific engineering. This has accelerated progress across science, medicine, and industry, while also raising important questions about interpretability, bias, energy consumption, and the societal implications of increasingly capable AI systems.