Skip to main content

Envisioning is an emerging technology research institute and advisory.

LinkedInInstagramGitHub

2011 — 2026

research
  • Reports
  • Newsletter
  • Methodology
  • Origins
  • Vocab
services
  • Research Sessions
  • Signals Workspace
  • Bespoke Projects
  • Use Cases
  • Signal Scanfree
  • Readinessfree
impact
  • ANBIMAFuture of Brazilian Capital Markets
  • IEEECharting the Energy Transition
  • Horizon 2045Future of Human and Planetary Security
  • WKOTechnology Scanning for Austria
audiences
  • Innovation
  • Strategy
  • Consultants
  • Foresight
  • Associations
  • Governments
resources
  • Pricing
  • Partners
  • How We Work
  • Data Visualization
  • Multi-Model Method
  • FAQ
  • Security & Privacy
about
  • Manifesto
  • Community
  • Events
  • Support
  • Contact
  • Login
ResearchServicesPricingPartnersAbout
ResearchServicesPricingPartnersAbout
  1. Home
  2. Vocab
  3. C2C (Cache to Cache)

C2C (Cache to Cache)

Hardware mechanism transferring cache lines directly between processor caches without accessing main memory.

Year: 2012Generality: 101
Back to Vocab

Cache-to-cache (C2C) transfer is a hardware capability in coherent shared-memory systems that allows one processor's cache controller to supply a cache line directly to another processor's cache in response to a read or write request, bypassing main memory entirely. Rather than routing the request down to DRAM and back, the owning cache intercepts the miss and forwards the data laterally across the interconnect fabric. This behavior is governed by cache-coherence protocols such as MESI, MOESI, and their derivatives, and can be implemented in both snoop-based and directory-based coherence architectures.

The mechanism works by detecting, during a cache miss, that another cache already holds the requested line in a modified or shared state. The coherence protocol coordinates the transfer: the supplying cache sends the line directly to the requesting cache, updates its own state, and optionally writes back to memory depending on the protocol variant. This reduces round-trip latency compared to a full DRAM fetch and conserves memory bandwidth, though it shifts traffic onto the on-chip or off-chip interconnect and can introduce contention on the coherence fabric under high-concurrency workloads.

For AI and machine learning workloads, C2C efficiency has become increasingly important as models grow larger and parallel training across many cores or accelerator tiles becomes standard. Frequent parameter reads, gradient accumulations, and shared activation buffers generate dense patterns of remote cache accesses. Effective C2C transfers reduce stalls during these operations, improve effective bandwidth for shared data structures, and help mitigate the performance cost of false sharing. Conversely, poorly managed data placement or coherence policy mismatches can saturate interconnects and negate the benefits.

As many-core accelerators, chiplet-based designs, and large-scale distributed training systems have proliferated through the 2010s and 2020s, hardware architects and ML system designers have paid growing attention to C2C behavior. Optimizations such as careful tensor partitioning, NUMA-aware memory allocation, and coherence domain tuning are now standard considerations when deploying large models on modern multi-socket or multi-tile hardware.

Related

Related

Confidential Computing
Confidential Computing

Hardware-enforced secure enclaves that protect data during active computation.

Generality: 492
Prompt Caching
Prompt Caching

Storing prompt-response pairs to avoid redundant computation in large language model systems.

Generality: 339
C2PA (Coalition for Content Provenance and Authenticity)
C2PA (Coalition for Content Provenance and Authenticity)

An industry standard for cryptographically verifying the origin and history of digital content.

Generality: 322
AIMC (Analog In-Memory Computing)
AIMC (Analog In-Memory Computing)

A hardware paradigm that computes matrix operations directly inside analog memory arrays.

Generality: 293
Accelerated Computing
Accelerated Computing

Using specialized hardware to dramatically speed up AI and machine learning workloads.

Generality: 794
Circuit
Circuit

A structured computational subnetwork implementing specific functions within hardware or learned models.

Generality: 521