C2C (Cache to Cache)

C2C
Cache to Cache

A mechanism in coherent shared‑memory systems that transfers cache lines directly between processor caches to satisfy remote memory requests without accessing main memory.

A mechanism in coherent shared‑memory systems that transfers cache lines directly between processor caches to satisfy remote memory requests without accessing main memory.

Cache‑to‑cache (C2C) transfer denotes the hardware capability and associated protocol behaviors that allow one core’s cache controller to supply a dirty or shared cache line directly to another core’s cache in response to a read or write request, rather than fetching it from DRAM. In practice this is realized within cache‑coherence protocols (e.g., MESI/MOESI and their derivatives) and can be implemented in both snoop‑based and directory‑based coherence architectures; it reduces latency and memory bandwidth consumption for remote accesses, but shifts traffic onto the on‑chip/off‑chip interconnect and coherence fabric. For AI and ML workloads—where large models and parameter exchanges produce frequent remote accesses—efficient C2C transfers reduce stalls in parallel training and inference, improve effective bandwidth for shared parameter buffers, and mitigate the cost of false sharing; however, they require careful hardware/software design (e.g., data placement, partitioning, coherence policy tuning, and awareness of manycore accelerator memory hierarchies) to avoid saturating interconnects or exacerbating contention.

First used: conceptually in cache‑coherence research of the 1970s; gained widespread practical importance in the 1990s with commercial multiprocessors and multi‑core CPUs, and saw renewed focus in the 2010s onward as many‑core accelerators and large‑scale ML workloads made on‑chip data movement efficiency critical.