






















Copyright and moral rights for the publications made accessible in the public portal are retained by the authors and/or other copyright owners 
and it is a condition of accessing publications that users recognise and abide by the legal requirements associated with these rights. 
 
• Users may download and print one copy of any publication from the public portal for the purpose of private study or research. 
• You may not further distribute the material or use it for any profit-making activity or commercial gain 
• You may freely distribute the URL identifying the publication in the public portal  
 
If you believe that this document breaches copyright please contact us providing details, and we will remove access to the work immediately 
and investigate your claim. 
   
 
Downloaded from orbit.dtu.dk on: Dec 20, 2017
Software Managed Cache for Parallel Systems




Publisher's PDF, also known as Version of record
Link back to DTU Orbit
Citation (APA):
Schleuniger, P., & Karlsson, S. (2012). Software Managed Cache for Parallel Systems. Poster session
presented at 7th International Conference on High-Performance and Embedded Architectures and Compilers ,
Paris, France.
Software Managed Cache for Parallel Systems
Pascal Schleuniger and Sven Karlsson
Technical University of Denmark
Current Systems Envisioned Future Systems
Motivation
IHomogeneous multicores tend to have small private caches
IWe envision heterogeneous multicores with a shared cache
I can benefit from data stored in neighboring cores
I cheap memory consistency within a group of processing elements
I data traffic on the global interconnect is reduced
IWe argue for a software managed cache!
Contributions
I Software managed multi-banked first level data cache
IApplication aware software controlled replacement strategies
Implementation
IUse a hardware efficient and energy efficient 4-way set associative cache
ICache hit check is done in the following way
I tags are checked in sequence, one tag per cycle
I a hashing function is used to predict what tag to check first
Iwe end when we find a hit or there are no more tags
I this leads to increased associativity at a low power consumption
I however, unless the first tag checked is a hit, the cache hit time is
increased
IUse hardware and software to implement replacement policy
I on cache misses, CPUs can execute a cache replacement algorithm
I balancing cache by relocating cache lines
I replacement policy may change dynamically
I try to avoid the software replacement algorithm when the expected
memory latency is low
I use a simple algorithm implemented in hardware in cases where the
software replacement algorithm is too slow
I Specific memory regions can be labeled
I often used variables may have a higher priority
I specific memory regions may have preferred locations in the cache
I specific memory regions may be locked in cache
IAdditional costs: duplication of cache tags
Replacement Policy
Figure: replacement policy flow graph
Depending on where memory is fetched up to hundreds of clock cycles are
used. Embedded processors typically stall on a cache miss.
Instead of waiting for the cache miss to be resolved processors can execute
an advanced cache replacement algorithm.
Conclusions
IWe propose a software managed multi-banked first level data cache for
parallel systems
I highly configurable
Imore area and power efficient than a pure hardware
implementation of highly associative cache
IWe propose an application aware software controlled replacement strategy
I use both hardware and software to implement replacement policy
DTU Informatics - Technical University of Denmark pass@imm.dtu.dk http://www.imm.dtu.dk
