CPU Caches
A cache is a high-speed data storage layer which stores a subset of data, typically transient portions, so that future requests for that data are served up faster than by accessing the data's primary storage location.
In modern computer systems, cache is typically implemented using SRAM. This is because SRAM is faster and has lower latency compared to other types of memory. So, when data is frequently accessed, it is stored in the cache (which is made of SRAM) to allow for faster retrieval. Basically, the SRAM serves in this situation as an extension of the register set of the processor
A minimum cache configuration for single core, L1-d/L1-d
typically on a 16 bits MCU
The core has very fast L1 caches split into instruction (L1-i) and data (L1-d) and beyond that the core talks over a bus (e.g. FSB) to the memory controller which manages access to DRAM.
- L1-d : data cache
- L1-i : instruction cache
A modern configuration of memory cache with UMA/SMP architecture
Each core has private L1/L2; all cores share a last-level cache (L3) over a coherent ring/mesh; misses go through the Integrated Memory Controller to DRAM. That’s UMA because every core reaches the same DRAM with (approximately) uniform latency in a single socket.
Memory Speed Comparison
| Level | Typical latency (cycles) | Approx latency (ns @ 3 GHz) | Notes |
|---|---|---|---|
| Register | ≈ 1 | ≈ 0.3 | In-core register file |
| L1 Cache | 3–5 | 1.0–1.7 | Private per core |
| L2 Cache | 10–15 | 3.3–5.0 | Often private per core |
| L3 Cache | 30–60 | 10–20 | Shared last-level cache |
| DRAM (RAM) | 150–300 | 50–100 | Off-chip main memory (UMA, 1-socket) |
In SMP system the cache of the CPUs cannot work independently from each other. All processors are supposed to see the same memory content at all times. The maintenance of the uniform view of memory is called Cache Coherency.