CPU Caches

A cache is a high-speed data storage layer which stores a subset of data, typically transient portions, so that future requests for that data are served up faster than by accessing the data's primary storage location.

Info

In modern computer systems, cache is typically implemented using SRAM. This is because SRAM is faster and has lower latency compared to other types of memory. So, when data is frequently accessed, it is stored in the cache (which is made of SRAM) to allow for faster retrieval. Basically, the SRAM serves in this situation as an extension of the register set of the processor

A minimum cache configuration for single core, L1-d/L1-d

typically on a 16 bits MCU

The core has very fast L1 caches split into instruction (L1-i) and data (L1-d) and beyond that the core talks over a bus (e.g. FSB) to the memory controller which manages access to DRAM.

A modern configuration of memory cache with UMA/SMP architecture

Each core has private L1/L2; all cores share a last-level cache (L3) over a coherent ring/mesh; misses go through the Integrated Memory Controller to DRAM. That’s UMA because every core reaches the same DRAM with (approximately) uniform latency in a single socket.

Memory Speed Comparison

Level Typical latency (cycles) Approx latency (ns @ 3 GHz) Notes
Register ≈ 1 ≈ 0.3 In-core register file
L1 Cache 3–5 1.0–1.7 Private per core
L2 Cache 10–15 3.3–5.0 Often private per core
L3 Cache 30–60 10–20 Shared last-level cache
DRAM (RAM) 150–300 50–100 Off-chip main memory (UMA, 1-socket)
Cache Coherency

In SMP system the cache of the CPUs cannot work independently from each other. All processors are supposed to see the same memory content at all times. The maintenance of the uniform view of memory is called Cache Coherency.