Memory Temporal and Spatial Locality

#hardware/memory

Memory locality is a key principle that modern computer systems use to enhance performance, especially when it comes to caching. It refers to the tendency of a processor to access the same set of memory locations repetitively over a short period of time. There are two types of locality:

Temporal Locality
- This refers to the reuse of specific data, and/or resources, within a relatively small time duration. In simpler terms, if a memory location is referenced then it will tend to be referenced again soon. This principle is used to justify the use of caches - if a data item has been used recently, keep it in the cache because there's a good chance we'll use it again soon.
- In short: If you touch data now, there's a good chance you'll touch the same data again soon.
Spatial Locality
- This refers to the use of data elements within relatively close storage locations. In simpler terms, if a memory location is referenced then memory locations with nearby addresses will tend to be referenced soon. This is why, when a cache line is loaded from memory, it doesn't just load a single data item, it loads a contiguous block of memory around that item.
- In short: If you touch address X, you're likely to touch nearby address soon.

The concept of memory locality is used to predict and optimize memory behavior. Since accessing data from the cache is faster than accessing it from main memory, utilizing temporal and spatial locality can lead to significant performance improvements. By anticipating the data that is likely to be used in the near future and storing it in the cache, the system can minimize slower memory operations and increase overall speed.

tldr;

temporal locality: 使用以前使用過的資料
spatial locality: 使用附近的資料

Temporal Locality

temporal good

On a 64-bit system, an integer is typically 4 bytes, so an array of 1024 integers can fit in a 32 KB L1 cache. This allows the entire working set to remain in L1 between uses.

int hot[1024]; // small working set, stays in cache
unsigned long sum = 0;
for (int i = 0; i < 10; ++i) {
	for (int j = 0; j < 1024; ++j) {
		hot[j] += 1;
		sum += hot[j];
	}
}

temporal_poor

size_t size = (8 * 1024 * 1024) / sizeof(int); // ~ 8 MB working set
int *cold = calloc(size, sizeof(int));
unsigned long sum = 0;
for (int i = 0; i < 10; ++i) {
	for (size_t i = 0; i < size; i += 64) {
		cold[i] += 1;
		sum += cold[i];
	}
}
free(cold);

Spatial Locality

row major traversal touches contigous addresses (good spatial locality).

It accesses elements with indices a[i * N + j] where j changes fastest, so memory addresses increase contiguously.
Because C arrays are row-major, elements in the same row are stored next to each other in memory.
This means each cache line fetch brings in several elements that will be used immediately, giving good spatial locality.

double sum = 0;
for (size_t i = 0; i < N; ++i) {
	for (size_t j = 0; j < N; ++j) {
		sum += a[i * N + j]; // contigous along the row
	}
}

column major traversal uses a large stride (poor spatial locality) in

double sum = 0;
for (int i = 0; i < N; ++i) {
	for (int j = 0; j < N; ++j) {
		sum += a[j * N + i];
	}
}