A function that determines whether a neuron fires, introducing non-linearity into a neural network.
An activation function is a mathematical function applied to each neuron in a neural network that determines whether that neuron should fire — whether its output contributes to the network's final computation. It takes the weighted sum of the neuron's inputs and applies a transformation that determines the neuron's output signal.
Without activation functions, a neural network would simply compute a chain of linear operations. Stacking linear transformations, no matter how many layers deep, amounts to a single linear transformation — incapable of learning complex, non-linear patterns like visual recognition, language understanding, or strategic reasoning. Activation functions inject the non-linearity that makes deep networks universal function approximators, enabling them to model arbitrary relationships between inputs and outputs.
Different activation functions carry different tradeoffs. Rectified Linear Unit (ReLU) — the dominant default — is computationally cheap and avoids vanishing gradients for positive inputs, but can suffer from dying neurons that never activate. Sigmoid and tanh squash outputs to bounded ranges, useful for probability outputs, but their saturating gradients cause vanishing gradient problems in deep networks. Modern architectures increasingly use Gaussian Error Linear Units (GELU), used in Transformers, or Swish, which adapt better to gradient-based optimization at scale. Some research explores learned or programmatic activation functions as an alternative to hand-designed formulas.
Fundamentally, activation functions remain a hand-designed component rather than something fully theoretically derived. AutoML-based neural architecture search has found task-specific activations that outperform hand-designed ones, suggesting there may exist superior activations not yet discovered. Open questions include whether learned activations will displace hand-designed functions, whether certain activation functions are better suited to specific network architectures or depths, and whether biological neurons employ activation-like functions that could inspire more efficient artificial ones.