3 research outputs found
Feedback Driven Restructuring of Multi-Threaded Applications for NUCA Cache Performance in CMPs
This paper addresses feedback-directed restructuring techniques tuned to Non Uniform Cache
Architectures (NUCA) in CMPs running multi-threaded
applications. Access time to NUCA caches depends on
the location of the referred block, so the locality and
cache mapping of the application influence the overall
performance. We show techniques for altering the distribution of applications into the cache space as to achieve
improved average memory access time. In CMPs running
multi-threaded applications, the aggregated accesses (and
locality) of the processors form the actual cache load and
pose specific issues. We consider a number of Splash-2
and Parsec benchmarks on an 8 processor system and
we show that a relatively simple remapping algorithm
is able to improve the average Static-NUCA (SNUCA)
cache access time by 5.5% and allows an SNUCA cache
to surpass the performance of a more complex dynamicNUCA (DNUCA) for most benchmarks.
Then, we present a more sophisticated remapping algorithm, relying on cache geometry information and on the
access distribution statistics from individual processors,
that reduces the average cache access time by 10.2% and
is very stable across all benchmark