Memory Temporal and Spatial Locality

Memory locality is a key principle that modern computer systems use to enhance performance, especially when it comes to caching. It refers to the tendency of a processor to access the same set of memory locations repetitively over a short period of time. There are two types of locality:

The concept of memory locality is used to predict and optimize memory behavior. Since accessing data from the cache is faster than accessing it from main memory, utilizing temporal and spatial locality can lead to significant performance improvements. By anticipating the data that is likely to be used in the near future and storing it in the cache, the system can minimize slower memory operations and increase overall speed.

tldr;

temporal locality: 使用以前使用過的資料
spatial locality: 使用附近的資料

Temporal Locality

temporal good

On a 64-bit system, an integer is typically 4 bytes, so an array of 1024 integers can fit in a 32 KB L1 cache. This allows the entire working set to remain in L1 between uses.

int hot[1024]; // small working set, stays in cache
unsigned long sum = 0;
for (int i = 0; i < 10; ++i) {
	for (int j = 0; j < 1024; ++j) {
		hot[j] += 1;
		sum += hot[j];
	}
}

temporal_poor

size_t size = (8 * 1024 * 1024) / sizeof(int); // ~ 8 MB working set
int *cold = calloc(size, sizeof(int));
unsigned long sum = 0;
for (int i = 0; i < 10; ++i) {
	for (size_t i = 0; i < size; i += 64) {
		cold[i] += 1;
		sum += cold[i];
	}
}
free(cold);

Spatial Locality

row major traversal touches contigous addresses (good spatial locality).

It accesses elements with indices a[i * N + j] where j changes fastest, so memory addresses increase contiguously.
Because C arrays are row-major, elements in the same row are stored next to each other in memory.
This means each cache line fetch brings in several elements that will be used immediately, giving good spatial locality.

double sum = 0;
for (size_t i = 0; i < N; ++i) {
	for (size_t j = 0; j < N; ++j) {
		sum += a[i * N + j]; // contigous along the row
	}
}

column major traversal uses a large stride (poor spatial locality) in

double sum = 0;
for (int i = 0; i < N; ++i) {
	for (int j = 0; j < N; ++j) {
		sum += a[j * N + i];
	}
}