Layered computational models that learn from data by adjusting weighted connections.
Artificial Neural Networks (ANNs) are computational systems loosely inspired by the structure of biological brains, built from interconnected nodes organized into layers. A typical ANN consists of an input layer that receives raw data, one or more hidden layers that transform that data through learned representations, and an output layer that produces predictions or classifications. Each connection between nodes carries a numerical weight, and each node applies an activation function to its inputs before passing signals forward through the network.
Training an ANN involves feeding it labeled examples and iteratively adjusting the connection weights to minimize the difference between the network's predictions and the correct answers. This is accomplished through backpropagation, an algorithm that computes how much each weight contributed to the prediction error and updates weights accordingly using gradient descent. Over many training iterations, the network learns internal representations that capture meaningful patterns in the data, allowing it to generalize to new, unseen examples.
ANNs became practically significant in machine learning during the 1980s when backpropagation was popularized as a training method, and they experienced a dramatic resurgence in the 2010s as deep learning — the use of ANNs with many hidden layers — achieved breakthrough results in image recognition, speech processing, and natural language understanding. The combination of large datasets, powerful GPUs, and architectural innovations transformed ANNs from a niche research topic into the dominant paradigm in modern AI.
Today, ANNs underpin virtually every major AI application, from recommendation systems and medical diagnosis to autonomous vehicles and large language models. Their ability to learn complex, nonlinear mappings from raw, unstructured data without requiring hand-engineered features makes them extraordinarily versatile. Understanding ANNs is foundational to understanding nearly all of contemporary machine learning, as most advanced architectures — convolutional networks, recurrent networks, transformers — are specialized variants built on the same core principles.