|
|
On of the main patterns is bandwidth saturation. There are many devices/units in a CPU today that is shared among the computing units (hardware threads with attached functional units). While cache levels 1 and 2 are commonly computing unit local and are only rarely a bottleneck, the L3 cache, the intra-socket memory as well as inter-socket memory can be fully exploited causing the computing units to wait for data.
|
|
|
|
|
|
Requirements:
|
|
|
* Maximal achievable bandwidth for the regarded device/unit
|
|
|
* A benchmarking code with almost the same memory access pattern and same amount of required memory streams (arrays)
|
|
|
|
|
|
A common benchmark for the memory subsystem is the [STREAM](benchmark-stream) benchmark. This gives a first impression how fast the memory system can be.
|
|
|
|