987 research outputs found

    Cache Calculus: Modeling Caches through Differential Equations

    Get PDF
    Caches are critical to performance, yet their behavior is hard to understand and model. In particular, prior work does not provide closed-form solutions of cache performance, i.e. simple expressions for the miss rate of a specific access pattern. Existing cache models instead use numerical methods that, unlike closed-form solutions, are computationally expensive and yield limited insight. We present cache calculus, a technique that models cache behavior as a system of ordinary differential equations, letting standard calculus techniques find simple and accurate solutions of cache performance for common access patterns

    A Cache Model for Modern Processors

    Get PDF
    Modern processors use high-performance cache replacement policies that outperform traditional alternatives like least-recently used (LRU). Unfortunately, current cache models use stack distances to predict LRU or its variants, and cannot capture these high-performance policies. Accurate predictions of cache performance enable many optimizations in multicore systems. For example, cache partitioning uses these predictions to divide capacity among applications in order to maximize performance, guarantee quality of service, or achieve other system objectives. Without an accurate model for high-performance replacement policies, these optimizations are unavailable to modern processors. We present a new probabilistic cache model designed for high-performance replacement policies. This model uses absolute reuse distances instead of stack distances, which makes it applicable to arbitrary age-based replacement policies. We thoroughly validate our model on several high-performance policies on synthetic and real benchmarks, where its median error is less than 1%. Finally, we present two case studies showing how to use the model to improve shared and single-stream cache performance

    Bridging Theory and Practice in Cache Replacement

    Get PDF
    Much prior work has studied processor cache replacement policies, but a large gap remains between theory and practice. The optimal policy (MIN) requires unobtainable knowledge of the future, and prior theoretically-grounded policies use reference models that do not match real programs. Meanwhile, practical policies are designed empirically. Lacking a strong theoretical foundation, they do not make the best use of the information available to them. This paper bridges theory and practice. We propose that practical policies should replace lines based on their economic value added (EVA), the difference of their expected hits from the average. We use Markov decision processes to show that EVA is optimal under some reasonable simplifications. We present an inexpensive, practical implementation of EVA and evaluate it exhaustively over many cache sizes. EVA outperforms prior practical policies and saves area at iso-performance. These results show that formalizing cache replacement yields practical benefits

    Jigsaw: Scalable Software-Defined Caches (Extended Version)

    Get PDF
    Shared last-level caches, widely used in chip-multiprocessors (CMPs), face two fundamental limitations. First, the latency and energy of shared caches degrade as the system scales up. Second, when multiple workloads share the CMP, they suffer from interference in shared cache accesses. Unfortunately, prior research addressing one issue either ignores or worsens the other: NUCA techniques reduce access latency but are prone to hotspots and interference, and cache partitioning techniques only provide isolation but do not reduce access latency. We present Jigsaw, a technique that jointly addresses the scalability and interference problems of shared caches. Hardware lets software define shares, collections of cache bank partitions that act as virtual caches, and map data to shares. Shares give software full control over both data placement and capacity allocation. Jigsaw implements efficient hardware support for share management, monitoring, and adaptation. We propose novel resource-management algorithms and use them to develop a system-level runtime that leverages Jigsaw to both maximize cache utilization and place data close to where it is used. We evaluate Jigsaw using extensive simulations of 16- and 64-core tiled CMPs. Jigsaw improves performance by up to 2.2x (18% avg) over a conventional shared cache, and significantly outperforms state-of-the-art NUCA and partitioning techniques.This work was supported in part by DARPA PERFECT contract HR0011-13-2-0005 and Quanta Computer

    Talus: A simple way to remove cliffs in cache performance

    Get PDF
    Caches often suffer from performance cliffs: minor changes in program behavior or available cache space cause large changes in miss rate. Cliffs hurt performance and complicate cache management. We present Talus, a simple scheme that removes these cliffs. Talus works by dividing a single application's access stream into two partitions, unlike prior work that partitions among competing applications. By controlling the sizes of these partitions, Talus ensures that as an application is given more cache space, its miss rate decreases in a convex fashion. We prove that Talus removes performance cliffs, and evaluate it through extensive simulation. Talus adds negligible overheads, improves single-application performance, simplifies partitioning algorithms, and makes cache partitioning more effective and fair.National Science Foundation (U.S.) (Grant CCF-1318384

    Jigsaw: Scalable software-defined caches

    Get PDF
    Shared last-level caches, widely used in chip-multi-processors (CMPs), face two fundamental limitations. First, the latency and energy of shared caches degrade as the system scales up. Second, when multiple workloads share the CMP, they suffer from interference in shared cache accesses. Unfortunately, prior research addressing one issue either ignores or worsens the other: NUCA techniques reduce access latency but are prone to hotspots and interference, and cache partitioning techniques only provide isolation but do not reduce access latency.United States. Defense Advanced Research Projects Agency (DARPA PERFECT contract HR0011-13-2-0005)Quanta Computer (Firm

    Jenga: Harnessing Heterogeneous Memories through Reconfigurable Cache Hierarchies

    Get PDF
    Conventional memory systems are organized as a rigid hierarchy, with multiple levels of progressively larger and slower memories. Hierarchy allows a simple, fixed design to benefit a wide range of applications, because working sets settle at the smallest (and fastest) level they fit in. However, rigid hierarchies also cause significant overheads, because each level adds latency and energy even when it does not capture the working set. In emerging systems with heterogeneous memory technologies such as stacked DRAM, these overheads often limit performance and efficiency. We propose Jenga, a reconfigurable cache hierarchy that avoids these pathologies and approaches the performance of a hierarchy optimized for each application. Jenga monitors application behavior and dynamically builds virtual cache hierarchies out of heterogeneous, distributed cache banks. Jenga uses simple hardware support and a novel software runtime to configure virtual cache hierarchies. On a 36-core CMP with a 1 GB stacked-DRAM cache, Jenga outperforms a combination of state-of-the-art techniques by 10% on average and by up to 36%, and does so while saving energy, improving system-wide energy-delay product by 29% on average and by up to 96%

    Scaling Distributed Cache Hierarchies through Computation and Data Co-Scheduling

    Get PDF
    Cache hierarchies are increasingly non-uniform, so for systems to scale efficiently, data must be close to the threads that use it. Moreover, cache capacity is limited and contended among threads, introducing complex capacity/latency tradeoffs. Prior NUCA schemes have focused on managing data to reduce access latency, but have ignored thread placement; and applying prior NUMA thread placement schemes to NUCA is inefficient, as capacity, not bandwidth, is the main constraint. We present CDCS, a technique to jointly place threads and data in multicores with distributed shared caches. We develop novel monitoring hardware that enables fine-grained space allocation on large caches, and data movement support to allow frequent full-chip reconfigurations. On a 64-core system, CDCS outperforms an S-NUCA LLC by 46% on average (up to 76%) in weighted speedup and saves 36% of system energy. CDCS also outperforms state-of-the-art NUCA schemes under different thread scheduling policies.National Science Foundation (U.S.) (Grant CCF-1318384)Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science (Jacobs Presidential Fellowship)United States. Defense Advanced Research Projects Agency (PERFECT Contract HR0011-13-2-0005

    The Doubters Dilemma

    Get PDF
    This book explores the extent and causes of attrition and retention in university Language & Culture (L&C) programs through a detailed analysis of an institutional case study at The Australian National University (ANU). Using extensive data collected through student surveys, coupled with data mining of university-wide enrolment data, the authors explore the enrolment and progress of students in all ANU L&C programs. Through their detailed statistical analysis of attrition and retention outcomes, the authors reveal serious inadequacies in the traditional, and common, methodology for determining the extent of student attrition and retention in tertiary L&C programs. Readers are shown why a year-to-year comparison of students who continue or discontinue language studies using traditional statistical methodology cannot provide data that is sufficiently meaningful to allow for sound policy- and decision-making. The authors instead suggest a more valid, replicable methodology that provides a new approach potentially applicable to all disciplines and all student retention measures. The authors also demonstrate that the empirical data supports a new hypothesis for the reasons for attrition, based on students’ relative belief or doubt in their capacity to complete their studies successfully. By highlighting the importance of language capital as a factor in students’ concerns about their capacity for success, and hence in their decisions to stay in, or leave, a university language program, the authors show the importance of the β€˜doubters’ dilemma’. By taking a rigorous approach to hypothesis building and testing around enrolment and attrition data, the authors provide valuable insights into attrition issues, and potential retention strategies, in L&C programs, which will be relevant to institutions, policy-makers and teaching academics

    ΠŸΡ€ΠΈΠΌΠ΅Π½Π΅Π½ΠΈΠ΅ МКЭ для Π°Π½Π°Π»ΠΈΠ·Π° сварных конструкций с Π΄Π΅Ρ„Π΅ΠΊΡ‚Π°ΠΌΠΈ Ρ‚ΠΈΠΏΠ° Ρ‚Ρ€Π΅Ρ‰ΠΈΠ½

    Get PDF
    Π’ Π΄Π°Π½Π½ΠΎΠΉ Ρ€Π°Π±ΠΎΡ‚Π΅ исслСдован Π½Π΅Π»ΠΈΠ½Π΅ΠΉΠ½Ρ‹ΠΉ процСсс дСформирования сварных конструкций с Ρ‚Ρ€Π΅Ρ‰ΠΈΠ½ΠΎΠΏΠΎΠ΄ΠΎΠ±Π½Ρ‹ΠΌΠΈ Π΄Π΅Ρ„Π΅ΠΊΡ‚Π°ΠΌΠΈ. Π’ качСствС основного ΠΏΠ°Ρ€Π°ΠΌΠ΅Ρ‚Ρ€Π° ΠΈΡΠΏΠΎΠ»ΡŒΠ·ΡƒΠ΅Ρ‚ΡΡ энСргСтичСский J-ΠΈΠ½Ρ‚Π΅Π³Ρ€Π°Π» ΠΈ эквивалСнтная пластичСская дСформация. Π’Π΅Π»ΠΈΡ‡ΠΈΠ½Ρƒ J-ΠΈΠ½Ρ‚Π΅Π³Ρ€Π°Π»Π° для элСмСнта конструкции с Ρ‚Ρ€Π΅Ρ‰ΠΈΠ½ΠΎΠΉ ΠΌΠΎΠΆΠ½ΠΎ ΠΎΠΏΡ€Π΅Π΄Π΅Π»ΠΈΡ‚ΡŒ числСнными ΠΌΠ΅Ρ‚ΠΎΠ΄Π°ΠΌΠΈ, Π½Π°ΠΏΡ€ΠΈΠΌΠ΅Ρ€ ΠΌΠ΅Ρ‚ΠΎΠ΄ΠΎΠΌ ΠΊΠΎΠ½Π΅Ρ‡Π½Ρ‹Ρ… элСмСнтов. Автоматизированный Π°Π½Π°Π»ΠΈΠ· ΠΈΠ·Π΄Π΅Π»ΠΈΠΉ с Ρ‚Ρ€Π΅Ρ‰ΠΈΠ½ΠΎΠΏΠΎΠ΄ΠΎΠ±Π½Ρ‹ΠΌΠΈ Π΄Π΅Ρ„Π΅ΠΊΡ‚Π°ΠΌΠΈ осущСствлялся с использованиСм ΠΏΡ€ΠΎΠ³Ρ€Π°ΠΌΠΌΠ½ΠΎΠ³ΠΎ комплСкса ANSYS ΠΈ ΠΏΡ€ΠΎΠ³Ρ€Π°ΠΌΠΌΠ½ΠΎΠ³ΠΎ комплСкса CRACK, Ρ€Π°Π·Ρ€Π°Π±ΠΎΡ‚Π°Π½Π½Ρ‹ΠΌ Π² ΠšΠ°Ρ€Π°Π³Π°Π½Π΄ΠΈΠ½ΡΠΊΠΎΠΌ государствСнном тСхничСском унивСрситСтС (ΠšΠ°Ρ€Π“Π’Π£)
    • …
    corecore