A graph neural network that uses attention to dynamically weight neighbor node contributions.
A Graph Attention Network (GAT) is a type of graph neural network (GNN) that incorporates attention mechanisms to learn how much importance each neighboring node should receive when aggregating information. Rather than treating all neighbors equally—as simpler GNN variants do—GATs compute learned attention coefficients for each edge, allowing the model to focus on the most relevant connections for a given task. This makes GATs particularly well-suited to graphs where node relationships vary significantly in importance, such as social networks, citation graphs, and molecular structures.
The core mechanism works by having each node compute a compatibility score between its own feature vector and those of its neighbors, typically using a small learnable neural network. These scores are normalized via softmax to produce attention weights, which then determine how neighbor features are combined into an updated node representation. GATs also support multi-head attention—running several independent attention functions in parallel and concatenating or averaging their outputs—which stabilizes training and allows the model to capture diverse relational patterns simultaneously.
GATs handle several practical challenges that earlier graph convolutional approaches struggled with. Because attention weights are computed locally from node features rather than from fixed graph structure, GATs generalize naturally to unseen nodes and graphs during inference, making them effective for inductive learning tasks. They also adapt gracefully to nodes with varying numbers of neighbors, since the softmax normalization accounts for degree differences without requiring explicit normalization of the adjacency matrix.
In practice, GATs have demonstrated strong performance on node classification benchmarks such as Cora, Citeseer, and PPI, and have been extended to link prediction and graph-level classification tasks. Their interpretability is a notable advantage: the learned attention weights can be inspected to understand which edges the model considers most informative, offering a degree of transparency uncommon in many deep learning architectures. Since their introduction, GATs have become a foundational building block in graph-based deep learning research and applications.