16 research outputs found

    CP<sub>pf</sub>: A prefetch aware LLC partitioning approach

    Get PDF

    Evaluation of Cache Inclusion Policies in Cache Management

    Get PDF
    Processor speed has been increasing at a higher rate than the speed of memories over the last years. Caches were designed to mitigate this gap and, ever since, several cache management techniques have been designed to further improve performance. Most techniques have been designed and evaluated on non-inclusive caches even though many modern processors implement either inclusive or exclusive policies. Exclusive caches benefit from a larger effective capacity, so they might become more popular when the number of cores per last-level cache increases. This thesis aims to demonstrate that the best cache management techniques for exclusive caches do not necessarily have to be the same as for non-inclusive or inclusive caches. To assess this statement we evaluated several cache management techniques with different inclusion policies, number of cores and cache sizes. We found that the configurations for inclusive and non-inclusive policies usually performed similarly, but for exclusive caches the best configurations were indeed different. Prefetchers impacted performance more than replacement policies, and determined which configurations were the best ones. Also, exclusive caches showed a higher speedup on multi-core. The least recently used (LRU) replacement policy is among the best policies for any prefetcher combination in exclusive caches but is the one used as a baseline in most cache replacement policy research. Therefore, we conclude that the results in this thesis motivate further research on prefetchers and replacement policies targeted to exclusive caches

    Reference Speculation-driven Memory Management

    Get PDF
    The “Memory Wall”, the vast gulf between processor execution speed and memory latency, has led to the development of large and deep cache hierarchies over the last twenty years. Although processor frequency is no-longer on the exponential growth curve, the drive towards ever greater main memory capacity and limited off-chip bandwidth have kept this gap from closing significantly. In addition, future memory technologies such as Non-Volatile Memory (NVM) devices do not help to decrease the latency of the first reference to a particular memory address. To reduce the increasing off-chip memory access latency, this dissertation presents three intelligent speculation mechanisms that can predict and manage future memory usage. First, we propose a novel hardware data prefetcher called Signature Path Prefetcher (SPP), which offers effective solutions for major challenges in prefetcher design. SPP uses a compressed history-based scheme that accurately predicts a series of long complex address patterns. For example, to address a series of long complex memory references, SPP uses a compressed history signature that is able to learn and prefetch complex data access patterns. Moreover, unlike other history-based algorithms, which miss out on many prefetching opportunities when address patterns make a transition between physical pages, SPP tracks the stream of data accesses across physical page boundaries and continues prefetching as soon as they move to new pages. Finally, SPP uses the confidence it has in its predictions to adaptively throttle itself on a per-prefetch stream basis. In our analysis, we find that SPP outperforms the state-of-the-art hardware data prefetchers by 6.4% with higher prefetching accuracy and lower off-chip bandwidth usage. Second, we develop a holistic on-chip cache management system that tightly integrates data prefetching and cache replacement algorithms into one unified solution. Also, we eliminate the use of Program Counter (PC) in the cache replacement module by using a simple dead block prediction with global hysteresis. In addition to effectively predicting dead blocks in the Last-Level Cache (LLC) by observing program phase behaviors, the replacement component also gives feedback to the prefetching component to help decide on the optimal fill level for prefetches. Meanwhile, the prefetching component feeds confidence information about each individual prefetch to the LLC replacement component. A low confidence prefetch is less likely to interfere with the contents of the LLC, and as confidence in that prefetch increases, its position within the LLC replacement stack is solidified, and it eventually is brought into the L2 cache, close to where it will be used in the processor core. Third, we observe that the host machine in virtualized system operates under different memory pressure regimes, as the memory demand from guest Virtual Machines (VMs) changes dynamically at runtime. Adapting to this runtime system state is critical to reduce the performance cost of VM memory management. We propose a novel dynamic memory management policy called Memory Pressure Aware (MPA) ballooning. MPA ballooning dynamically speculates and allocates memory resources to each VM based on the current memory pressure regime. Moreover, MPA ballooning proactively reacts and adapts to sudden changes in memory demand from guest VMs. MPA ballooning requires neither additional hardware support, nor incurs extra minor page faults in its memory pressure estimation
    corecore