4 research outputs found

    Memory Centric Characterization and Analysis of SPEC CPU2017 Suite

    Full text link
    In this paper we provide a comprehensive, memory-centric characterization of the SPEC CPU2017 benchmark suite, using a number of mechanisms including dynamic binary instrumentation, measurements on native hardware using hardware performance counters and OS based tools. We present a number of results including working set sizes, memory capacity consumption and, memory bandwidth utilization of various workloads. Our experiments reveal that the SPEC CPU2017 workloads are surprisingly memory intensive, with approximately 50% of all dynamic instructions being memory intensive ones. We also show that there is a large variation in the memory footprint and bandwidth utilization profiles of the entire suite, with some benchmarks using as much as 16 GB of main memory and up to 2.3 GB/s of memory bandwidth. We also perform instruction execution and distribution analysis of the suite and find that the average instruction count for SPEC CPU2017 workloads is an order of magnitude higher than SPEC CPU2006 ones. In addition, we also find that FP benchmarks of the SPEC 2017 suite have higher compute requirements: on average, FP workloads execute three times the number of compute operations as compared to INT workloads.Comment: 12 pages, 133 figures, A short version of this work has been published at "Proceedings of the 2019 ACM/SPEC International Conference on Performance Engineering

    RAMPART: RowHammer Mitigation and Repair for Server Memory Systems

    Full text link
    RowHammer attacks are a growing security and reliability concern for DRAMs and computer systems as they can induce many bit errors that overwhelm error detection and correction capabilities. System-level solutions are needed as process technology and circuit improvements alone are unlikely to provide complete protection against RowHammer attacks in the future. This paper introduces RAMPART, a novel approach to mitigating RowHammer attacks and improving server memory system reliability by remapping addresses in each DRAM in a way that confines RowHammer bit flips to a single device for any victim row address. When RAMPART is paired with Single Device Data Correction (SDDC) and patrol scrub, error detection and correction methods in use today, the system can detect and correct bit flips from a successful attack, allowing the memory system to heal itself. RAMPART is compatible with DDR5 RowHammer mitigation features, as well as a wide variety of algorithmic and probabilistic tracking methods. We also introduce BRC-VL, a variation of DDR5 Bounded Refresh Configuration (BRC) that improves system performance by reducing mitigation overhead and show that it works well with probabilistic sampling methods to combat traditional and victim-focused mitigation attacks like Half-Double. The combination of RAMPART, SDDC, and scrubbing enables stronger RowHammer resistance by correcting bit flips from one successful attack. Uncorrectable errors are much less likely, requiring two successful attacks before the memory system is scrubbed.Comment: 16 pages, 13 figures. A version of this paper will appear in the Proceedings of MEMSYS2

    ๋ฉ”๋ชจ๋ฆฌ ๊ฐ€์ƒ ์ฑ„๋„์„ ํ†ตํ•œ ๋ผ์ŠคํŠธ ๋ ˆ๋ฒจ ์บ์‹œ ํŒŒํ‹ฐ์…”๋‹

    Get PDF
    ํ•™์œ„๋…ผ๋ฌธ(๋ฐ•์‚ฌ) -- ์„œ์šธ๋Œ€ํ•™๊ต๋Œ€ํ•™์› : ๊ณต๊ณผ๋Œ€ํ•™ ์ „๊ธฐยท์ •๋ณด๊ณตํ•™๋ถ€, 2023. 2. ๊น€์žฅ์šฐ.Ensuring fairness or providing isolation between multiple workloads with distinct characteristics that are collocated on a single, shared-memory system is a challenge. Recent multicore processors provide last-level cache (LLC) hardware partitioning to provide hardware support for isolation, with the cache partitioning often specified by the user. While more LLC capacity often results in higher performance, in this dissertation we identify that a workload allocated more LLC capacity result in worse performance on real-machine experiments, which we refer to as MiW (more is worse). Through various controlled experiments, we identify that another workload with less LLC capacity causes more frequent LLC misses. The workload stresses the main memory system shared by both workloads and degrades the performance of the former workload even if LLC partitioning is used (a balloon effect). To resolve this problem, we propose virtualizing the data path of main memory controllers and dedicating the memory virtual channels (mVCs) to each group of applications, grouped for LLC partitioning. mVC can further fine-tune the performance of groups by differentiating buffer sizes among mVCs. It can reduce the total system cost by executing latency-critical and throughput-oriented workloads together on shared machines, of which performance criteria can be achieved only on dedicated machines if mVCs are not supported. Experiments on a simulated chip multiprocessor show that our proposals effectively eliminate the MiW phenomenon, hence providing additional opportunities for workload consolidation in a datacenter. Our case study demonstrates potential savings of machine count by 21.8% with mVC, which would otherwise violate a service level objective (SLO).์ตœ๊ทผ ๋ฉ€ํ‹ฐ์ฝ”์–ด ํ”„๋กœ์„ธ์„œ ๊ธฐ๋ฐ˜ ์‹œ์Šคํ…œ์€ ํ•™๊ณ„ ๋ฐ ์—…๊ณ„์˜ ์ฃผ๋ชฉ์„ ๋ฐ›๊ณ  ์žˆ์œผ๋ฉฐ, ๋„๋ฆฌ ์‚ฌ์šฉ๋˜๊ณ  ์žˆ๋‹ค. ๋ฉ€ํ‹ฐ์ฝ”์–ด ํ”„๋กœ์„ธ์„œ ๊ธฐ๋ฐ˜ ์‹œ์Šคํ…œ์€ ์„œ๋กœ ๋‹ค๋ฅธ ํŠน์„ฑ์„ ๊ฐ€์ง„ ์—ฌ๋Ÿฌ ์‘์šฉ ํ”„๋กœ๊ทธ๋žจ๋“ค์ด ๋™์‹œ์— ์‹คํ–‰๋˜๋Š”๋ฐ, ์ด ๋•Œ ์‘์šฉ ํ”„๋กœ๊ทธ๋žจ๋“ค์€ ์‹œ์Šคํ…œ์˜ ์—ฌ๋Ÿฌ ์ž์›๋“ค์„ ๊ณต์œ ํ•˜๊ฒŒ ๋œ๋‹ค. ๋Œ€ํ‘œ์ ์ธ ๊ณต์œ  ์ž์›์˜ ์˜ˆ๋กœ๋Š” ๋ผ์ŠคํŠธ ๋ ˆ๋ฒจ ์บ์‹œ(LLC) ๋ฐ ๋ฉ”์ธ ๋ฉ”๋ชจ๋ฆฌ๋ฅผ ๋“ค ์ˆ˜ ์žˆ๋‹ค. ์ด๋Ÿฌํ•œ ๋‹จ์ผ ๊ณต์œ  ๋ฉ”๋ชจ๋ฆฌ ์‹œ์Šคํ…œ์—์„œ ์„œ๋กœ ๋‹ค๋ฅธ ํŠน์„ฑ์„ ๊ฐ€์ง„ ์—ฌ๋Ÿฌ ์‘์šฉ ํ”„๋กœ๊ทธ๋žจ๋“ค ๊ฐ„์— ๊ณต์œ  ์ž์›์˜ ๊ณต์ •์„ฑ์„ ๋ณด์žฅํ•˜๊ฑฐ๋‚˜ ํŠน์ • ์‘์šฉ ํ”„๋กœ๊ทธ๋žจ์ด ๋‹ค๋ฅธ ์‘์šฉ ํ”„๋กœ๊ทธ๋žจ์œผ๋กœ๋ถ€ํ„ฐ ๊ฐ„์„ญ์„ ๋ฐ›์ง€ ์•Š๋„๋ก ๊ฒฉ๋ฆฌํ•˜๋Š” ๊ฒƒ์€ ์–ด๋ ค์šด ์ผ์ด๋‹ค. ์ด๋ฅผ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•˜์—ฌ ์ตœ๊ทผ ๋ฉ€ํ‹ฐ์ฝ”์–ด ํ”„๋กœ์„ธ์„œ๋Š” LLC ํŒŒํ‹ฐ์…”๋‹์„ ํ•˜๋“œ์›จ์–ด์ ์œผ๋กœ ์ œ๊ณตํ•˜๊ธฐ ์‹œ์ž‘ํ•˜์˜€๋‹ค. ์‚ฌ์šฉ์ž๋Š” ํ•˜๋“œ์›จ์–ด์ ์œผ๋กœ ์ œ๊ณต๋œ LLC ํŒŒํ‹ฐ์…”๋‹์„ ํ†ตํ•ด ํŠน์ • ์‘์šฉ ํ”„๋กœ๊ทธ๋žจ์— ์›ํ•˜๋Š” ์ˆ˜์ค€๋งŒํผ LLC๋ฅผ ํ• ๋‹นํ•˜์—ฌ ๋‹ค๋ฅธ ์‘์šฉ ํ”„๋กœ๊ทธ๋žจ์œผ๋กœ๋ถ€ํ„ฐ ๊ฐ„์„ญ์„ ๋ฐ›์ง€ ์•Š๋„๋ก ๊ฒฉ๋ฆฌํ•  ์ˆ˜ ์žˆ๊ฒŒ ๋˜์—ˆ๋‹ค. ์ผ๋ฐ˜์ ์ธ ๊ฒฝ์šฐ LLC ์šฉ๋Ÿ‰์„ ๋งŽ์ด ํ• ๋‹น ๋ฐ›์„์ˆ˜๋ก ์„ฑ๋Šฅ์ด ํ–ฅ์ƒ๋˜๋Š” ๊ฒฝ์šฐ๊ฐ€ ๋งŽ์ง€๋งŒ, ๋ณธ ์—ฐ๊ตฌ์—์„œ๋Š” ๋” ๋งŽ์€ LLC ์šฉ๋Ÿ‰์„ ํ• ๋‹น ๋ฐ›์€ ์‘์šฉ ํ”„๋กœ๊ทธ๋žจ์ด ์˜คํžˆ๋ ค ์„ฑ๋Šฅ ์ €ํ•˜๋œ๋‹ค๋Š” ์‚ฌ์‹ค(MiW, more is worse)์„ ํ•˜๋“œ์›จ์–ด์  ์‹คํ—˜์„ ํ†ตํ•ด ํ™•์ธํ•˜์˜€๋‹ค. ๋‹ค์–‘ํ•œ ํ†ต์ œ๋œ ์‹คํ—˜์„ ํ†ตํ•ด LLC ํŒŒํ‹ฐ์…”๋‹์„ ํ†ตํ•ด LLC ์šฉ๋Ÿ‰์„ ์ ๊ฒŒ ํ• ๋‹น ๋ฐ›์€ ์‘์šฉ ํ”„๋กœ๊ทธ๋žจ์ด LLC ๋ฏธ์Šค๋ฅผ ๋” ์ž์ฃผ ๋ฐœ์ƒ์‹œํ‚จ๋‹ค๋Š” ์‚ฌ์‹ค์„ ํ™•์ผ ํ•  ์ˆ˜ ์žˆ์—ˆ๋‹ค. LLC ์šฉ๋Ÿ‰์„ ์ ๊ฒŒ ํ• ๋‹น ๋ฐ›์€ ์‘์šฉ ํ”„๋กœ๊ทธ๋žจ์€ ์‘์šฉ ํ”„๋กœ๊ทธ๋žจ๋“ค์ด ๊ณต์œ ํ•˜๋Š” ๋ฉ”์ธ ๋ฉ”๋ชจ๋ฆฌ ์‹œ์Šคํ…œ์— ์ŠคํŠธ๋ ˆ์Šค๋ฅผ ๊ฐ€ํ•˜๊ณ , LLC ํŒŒํ‹ฐ์…”๋‹์„ ํ†ตํ•ด ์„œ๋กœ ๊ฒฉ๋ฆฌ๋ฅผ ํ•˜์˜€์Œ์—๋„ ๋ถˆ๊ตฌํ•˜๊ณ  ์‘์šฉ ํ”„๋กœ๊ทธ๋žจ์˜ ์„ฑ๋Šฅ์„ ์ €ํ•˜์‹œ์ผฐ๋‹ค. MiW ํ˜„์ƒ์„ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•ด ๋ณธ ์—ฐ๊ตฌ์—์„œ๋Š” ๋ฉ”์ธ ๋ฉ”๋ชจ๋ฆฌ ์ปจํŠธ๋กค๋Ÿฌ์˜ ๋ฐ์ดํ„ฐ ๊ฒฝ๋กœ๋ฅผ ๊ฐ€์ƒํ™”ํ•˜๊ณ  LLC ํŒŒํ‹ฐ์…”๋‹์— ์˜ํ•ด ๊ทธ๋ฃนํ™”๋œ ๊ฐ ์‘์šฉ ํ”„๋กœ๊ทธ๋žจ ๊ทธ๋ฃน์— ์ „์šฉ์œผ๋กœ ํ• ๋‹น๋˜๋Š” ๋ฉ”๋ชจ๋ฆฌ ๊ฐ€์ƒ ์ฑ„๋„(mVC)์„ ์ œ์•ˆํ•˜์˜€๋‹ค. mVC๋ฅผ ํ†ตํ•ด ๊ฐ ์‘์šฉ ํ”„๋กœ๊ทธ๋žจ ๊ทธ๋ฃน์€ ๋…๋ฆฝ์ ์ธ ๋ฐ์ดํ„ฐ ๊ฒฝ๋กœ๋ฅผ ์†Œ์œ ํ•œ ๊ฒƒ์ฒ˜๋Ÿผ ๊ฐ€์ƒํ™” ๋œ๋‹ค. ๋”ฐ๋ผ์„œ ํŠน์ • ์‘์šฉ ํ”„๋กœ๊ทธ๋žจ ๊ทธ๋ฃน์ด ๋ฐ์ดํ„ฐ ๊ฒฝ๋กœ๋ฅผ ๋…์ ํ•˜๋”๋ผ๋„ ๋‹ค๋ฅธ ์‘์šฉ ํ”„๋กœ๊ทธ๋žจ๋“ค์€ ์„ฑ๋Šฅ ์ €ํ•˜๋ฅผ ์œ ๋ฐœํ•  ์ˆ˜ ์—†๊ฒŒ ๋˜์–ด ์„œ๋กœ ๊ฒฉ๋ฆฌ๋œ ํ™˜๊ฒฝ์„ ์กฐ์„ฑํ•œ๋‹ค. ์ถ”๊ฐ€์ ์œผ๋กœ mVC์˜ ๋ฒ„ํผ ํฌ๊ธฐ๋ฅผ ์กฐ์ •ํ•˜์—ฌ ์‘์šฉ ํ”„๋กœ๊ทธ๋žจ ๊ทธ๋ฃน์˜ ์„ฑ๋Šฅ ๋ฏธ์„ธ ์กฐ์ •์ด ๊ฐ€๋Šฅํ•˜๋„๋ก ํ•˜์˜€๋‹ค. mVC๋ฅผ ๋„์ž…ํ•จ์œผ๋กœ์จ ์ „์ฒด์ ์ธ ์‹œ์Šคํ…œ ๋น„์šฉ์„ ์ค„์ผ ์ˆ˜ ์žˆ๋‹ค. ์ง€์—ฐ ์‹œ๊ฐ„์ด ์ค‘์š”ํ•œ ์‘์šฉ ํ”„๋กœ๊ทธ๋žจ๊ณผ ์ฒ˜๋ฆฌ๋Ÿ‰์ด ์ค‘์š”ํ•œ ์‘์šฉ ํ”„๋กœ๊ทธ๋žจ์„ ํ•จ๊ป˜ ์‹คํ–‰ํ•  ๋•Œ mVC๊ฐ€ ์—†์„ ๊ฒฝ์šฐ์—๋Š” ์ง€์—ฐ ์‹œ๊ฐ„์˜ ์„ฑ๋Šฅ ๊ธฐ์ค€์น˜๋ฅผ ๋งŒ์กฑํ•  ์ˆ˜ ์—†์—ˆ์ง€๋งŒ, mVC๋ฅผ ํ†ตํ•ด ์„ฑ๋Šฅ ๊ธฐ์ค€์น˜๋ฅผ ๋งŒ์กฑํ•˜๋ฉด์„œ ์‹œ์Šคํ…œ์˜ ์ด ๋น„์šฉ์„ ๊ฐ์†Œ์‹œํ‚ฌ ์ˆ˜ ์žˆ์—ˆ๋‹ค. ๋ฉ€ํ‹ฐ ์นฉ ํ”„๋กœ์„ธ์„œ๋ฅผ ์‹œ๋ฎฌ๋ ˆ์ด์…˜ํ•œ ์‹คํ—˜ ๊ฒฐ๊ณผ๋Š” MiW ํ˜„์ƒ์„ ํšจ๊ณผ์ ์œผ๋กœ ์ œ๊ฑฐํ•จ์„ ๋ณด์—ฌ์ฃผ์—ˆ๋‹ค. ๋˜ํ•œ, ๋ฐ์ดํ„ฐ ์„ผํ„ฐ์—์„œ ์‘์šฉ ํ”„๋กœ๊ทธ๋žจ๋“ค์˜ ๋™์‹œ ์‹คํ–‰์„ ์œ„ํ•œ ์ถ”๊ฐ€์ ์ธ ๊ฐ€๋Šฅ์„ฑ์„ ์ œ๊ณตํ•˜๋Š” ๊ฒƒ์„ ๋ณด์—ฌ์ฃผ์—ˆ๋‹ค. ์‚ฌ๋ก€ ์—ฐ๊ตฌ๋ฅผ ํ†ตํ•ด mVC๋ฅผ ๋„์ž…ํ•˜์—ฌ ์‹œ์Šคํ…œ ๋น„์šฉ์„ 21.8%๊นŒ์ง€ ์ ˆ์•ฝํ•  ์ˆ˜ ์žˆ์Œ์„ ๋ณด์˜€์œผ๋ฉฐ, mVC๋ฅผ ๋„์ž…ํ•˜์ง€ ์•Š์€ ๊ฒฝ์šฐ์—๋Š” ์„œ๋น„์Šค ๊ธฐ์ค€(SLO)์„ ๋งŒ์กฑํ•˜์ง€ ์•Š์Œ์„ ํ™•์ธํ•˜์˜€๋‹ค.1. Introduction 1 1.1 Research Contributions 5 1.2 Outline 6 2. Background 7 2.1 Cache Hierarchy and Policies 7 2.2 Cache Partitioning 10 2.3 Benchmarks 15 2.3.1 Working Set Size 16 2.3.2 Top-down Analysis 17 2.3.3 Profiling Tools 19 3. More-is-Worse Phenonmenon 21 3.1 More LLC Leading to Performance Drop 21 3.2 Synthetic Workload Evaluation 27 3.3 Impact on Latency-critical Workloads 31 3.4 Workload Analysis 33 3.5 The Root Cause of the MiW Phenomenon 35 3.6 Limitations of Existing Solutions 41 3.6.1 Memory Bandwidth Throttling 41 3.6.2 Fairness-aware Memory Scheduling 44 4. Virtualizing Memory Channels 49 4.1 Memory Virtual Channel (mVC) 50 4.2 mVC Buffer Allocation Strategies 52 4.3 Evaluation 57 4.3.1 Experimental Setup 57 4.3.2 Reproducing Hardware Results 59 4.3.3 Mitigating MiW through mVC 60 4.3.4 Evaluation on Four Groups 64 4.3.5 Potentials for Operating Cost Savings with mVC 66 5. Related Work 71 5.1 Component-wise QoS/Fairness for Shared Resources 71 5.2 Holistic Approaches to QoS/Fairness 73 5.3 MiW on Recent Architectures 74 6. Conclusion 76 6.1 Discussion 78 6.2 Future Work 79 Bibliography 81 ๊ตญ๋ฌธ์ดˆ๋ก 89๋ฐ•
    corecore