An explicit memory subsystem enabling neural networks to store and retrieve information persistently.
A neural long-term memory module is an explicit, addressable storage component—typically external to the core network—that allows models to write, retain, and retrieve information across sequences, episodes, or training runs without encoding everything into fixed weights. Unlike transient hidden states or context windows, these modules maintain persistent representations that can be selectively updated and queried, making them fundamentally different from the implicit memory encoded in a network's parameters. Architecturally, they range from associative key-value stores and differentiable external memories with read/write heads to sparse retrieval indices, and are integrated with controller networks—RNNs or transformers—trained end-to-end so that memory access becomes a learned attention or lookup operation.
The mechanics of reading and writing to these modules vary considerably. Content-based addressing retrieves entries by similarity to a query vector, while location-based addressing uses explicit indices. Differentiable designs allow gradients to flow through retrieval, enabling end-to-end training, whereas sparse non-differentiable retrieval trades gradient flow for scalability and speed. Key design challenges include memory eviction and compression policies, stability-plasticity trade-offs that prevent new writes from overwriting critical old information, and scaling storage to large knowledge bases without prohibitive computational cost.
The significance of long-term memory modules in machine learning is broad. They extend effective context beyond a model's nominal receptive field, support episodic recall and rapid task adaptation without catastrophic forgetting, and underpin retrieval-augmented generation (RAG) systems that ground language model outputs in external knowledge stores. In reinforcement learning, explicit memory enables agents to recall past experiences and reason over longer horizons than recurrent states alone permit. These capabilities address some of the most persistent limitations of standard neural architectures: brittleness to distribution shift, inability to update knowledge without retraining, and degradation over long input sequences.
The concept gained concrete traction in machine learning around 2014 with Neural Turing Machines and Memory Networks, which demonstrated that differentiable external memory could be trained end-to-end for algorithmic and question-answering tasks. Subsequent work—including Differentiable Neural Computers, Transformer-XL, and retrieval-augmented generation frameworks—expanded the paradigm from toy tasks to large-scale, knowledge-intensive applications, cementing long-term memory modules as a central tool for building more capable and adaptable AI systems.