Search CORE

7 research outputs found

Evaluation of L1 Residence for Perceptron Filter Enhanced Signature Path Prefetcher

Author: Staggs Alexander
Publication venue
Publication date: 22/07/2020
Field of study

Rapid advancement of integrated circuit technology described by Moore’s Law has greatly increased computational power. Processors have taken advantage of this by increasing computation rates, while memory has gained increased capacity. As processor operation speeds have greatly exceeded memory access times, computer architects have added multiple levels of caches to avoid penalties for repeat accesses to memory. While this is an improvement, architects have further improved access efficiency by developing methods of prefetching data from memory to hide the latency penalty usually incurred on a cache miss. Previous work at Texas A&M and their submission to the Third Data Prefetching Championship (DPC3) primarily consisted of L2 cache prefetching. L1 prefetching has been less explored than L2 due to hardware limitations on implementation. In this paper, I attempt to evaluate the effect of L1 residence for Texas A&M’s Perceptron Filtered Signature Path Prefetcher (PPF). While an unoptimized movement of the PPF from the L2 to the L1 showed performance degradation, optimizations such as using the L1 data stream to prefetch to all cache levels and updating table sizes and lengths have matched L2 performance

Page size aware cache prefetching

Author: Casas Guix Marc
Chacon Gino
Gratz Paul V.
Jiménez Daniel A.
Vavouliotis Georgios
Álvarez Martí Lluc
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2022
Field of study

The increase in working set sizes of contemporary applications outpaces the growth in cache sizes, resulting in frequent main memory accesses that deteriorate system per- formance due to the disparity between processor and memory speeds. Prefetching data blocks into the cache hierarchy ahead of demand accesses has proven successful at attenuating this bottleneck. However, spatial cache prefetchers operating in the physical address space leave significant performance on the table by limiting their pattern detection within 4KB physical page boundaries when modern systems use page sizes larger than 4KB to mitigate the address translation overheads. This paper exploits the high usage of large pages in modern systems to increase the effectiveness of spatial cache prefetch- ing. We design and propose the Page-size Propagation Module (PPM), a µarchitectural scheme that propagates the page size information to the lower-level cache prefetchers, enabling safe prefetching beyond 4KB physical page boundaries when the accessed blocks reside in large pages, at the cost of augmenting the first-level caches’ Miss Status Holding Register (MSHR) entries with one additional bit. PPM is compatible with any cache prefetcher without implying design modifications. We capitalize on PPM’s benefits by designing a module that consists of two page size aware prefetchers that inherently use different page sizes to drive prefetching. The composite module uses adaptive logic to dynamically enable the most appropriate page size aware prefetcher. Finally, we show that the proposed designs are transparent to which cache prefetcher is used. We apply the proposed page size exploitation techniques to four state-of-the-art spatial cache prefetchers. Our evalua- tion shows that our proposals improve single-core geomean performance by up to 8.1% (2.1% at minimum) over the original implementation of the considered prefetchers, across 80 memory-intensive workloads. In multi-core contexts, we report geomean speedups up to 7.7% across different cache prefetchers and core configurations.This work is supported by the Spanish Ministry of Science and Technology through the PID2019-107255GB project, the Generalitat de Catalunya (contract 2017-SGR-1414), the European Union Horizon 2020 research and innovation program under grant agreement No 955606 (DEEP-SEA EU project), the National Science Foundation through grants CNS-1938064 and CCF-1912617, and the Semiconductor Research Corporation project GRC 2936.001. Georgios Vavouliotis has been supported by the Spanish Ministry of Economy, Industry, and Competitiveness and the European Social Fund under the FPI fellowship No. PRE2018-087046. Marc Casas has been partially supported by the Grant RYC2017-23269 funded by MCIN/AEI/10.13039/501100011033 and ESF ‘Investing in your future’.Peer ReviewedPostprint (author's final draft

Seventh Biennial Report : June 2003 - March 2005

Author
Publication venue: Max-Planck-Institut für Informatik
Publication date: 01/01/2005
Field of study