8 research outputs found

    Implicit Decomposition for Write-Efficient Connectivity Algorithms

    Full text link
    The future of main memory appears to lie in the direction of new technologies that provide strong capacity-to-performance ratios, but have write operations that are much more expensive than reads in terms of latency, bandwidth, and energy. Motivated by this trend, we propose sequential and parallel algorithms to solve graph connectivity problems using significantly fewer writes than conventional algorithms. Our primary algorithmic tool is the construction of an o(n)o(n)-sized "implicit decomposition" of a bounded-degree graph GG on nn nodes, which combined with read-only access to GG enables fast answers to connectivity and biconnectivity queries on GG. The construction breaks the linear-write "barrier", resulting in costs that are asymptotically lower than conventional algorithms while adding only a modest cost to querying time. For general non-sparse graphs on mm edges, we also provide the first o(m)o(m) writes and O(m)O(m) operations parallel algorithms for connectivity and biconnectivity. These algorithms provide insight into how applications can efficiently process computations on large graphs in systems with read-write asymmetry

    Improving Phase Change Memory Performance with Data Content Aware Access

    Full text link
    A prominent characteristic of write operation in Phase-Change Memory (PCM) is that its latency and energy are sensitive to the data to be written as well as the content that is overwritten. We observe that overwriting unknown memory content can incur significantly higher latency and energy compared to overwriting known all-zeros or all-ones content. This is because all-zeros or all-ones content is overwritten by programming the PCM cells only in one direction, i.e., using either SET or RESET operations, not both. In this paper, we propose data content aware PCM writes (DATACON), a new mechanism that reduces the latency and energy of PCM writes by redirecting these requests to overwrite memory locations containing all-zeros or all-ones. DATACON operates in three steps. First, it estimates how much a PCM write access would benefit from overwriting known content (e.g., all-zeros, or all-ones) by comprehensively considering the number of set bits in the data to be written, and the energy-latency trade-offs for SET and RESET operations in PCM. Second, it translates the write address to a physical address within memory that contains the best type of content to overwrite, and records this translation in a table for future accesses. We exploit data access locality in workloads to minimize the address translation overhead. Third, it re-initializes unused memory locations with known all-zeros or all-ones content in a manner that does not interfere with regular read and write accesses. DATACON overwrites unknown content only when it is absolutely necessary to do so. We evaluate DATACON with workloads from state-of-the-art machine learning applications, SPEC CPU2017, and NAS Parallel Benchmarks. Results demonstrate that DATACON significantly improves system performance and memory system energy consumption compared to the best of performance-oriented state-of-the-art techniques.Comment: 18 pages, 21 figures, accepted at ACM SIGPLAN International Symposium on Memory Management (ISMM

    비휘발성 메모리 기반의 최종 레벨 캐시를 위한 쓰기 회피 기법

    Get PDF
    학위논문 (박사)-- 서울대학교 대학원 : 컴퓨터공학부, 2016. 2. 신현식.Non-volatile memory (NVM) is considered to be a promising memory technology for last-level caches (LLC) due to its low leakage of power and high storage density. However, NVM has some drawbacks including high dynamic energy when modifying NVM cells, long latency for write operations, and limited write endurance. To overcome these problems, the thesis focuses on two approaches: cache coherence and NVM capacity management policy for hybrid cache architecture (HCA). First, we review existing cache coherence protocols under the condition of NVM-based LLCs. Our analysis reveals that the LLCs perform unnecessary write operations because legacy protocols have very pay little attention to reducing the number of write accesses to the LLC. Therefore, a write avoidance cache coherence protocol (WACC) is proposed to reduce the number of write operations to the LLC. In addition, novel HCA schemes are proposed to efficiently utilize SRAM in the thesis. Previous studies on HCA have concentrated on detecting write-intensive blocks and placing them into the SRAM ways. However, unlike other studies, a dynamic way adjusting algorithm (DWA) and a linefill-aware cache partitioning (LCP) calculate the optimal size of NVM ways and SRAM ways in order to minimize the NVM write counts and assigning the corresponding number of NVM ways and SRAM ways to cores. The simulation results show that WACC achieves a 13.2% reduction in the dynamic energy consumption. For HCA schemes, the dynamic energy consumption of DWA and LCP is reduced by 26.9% and 37.2%, respectively.I. Introduction 1 1.1 Purpose of the thesis 1 1.2 Background 3 1.3 Motivation 4 1.4 Contributions 5 1.5 Organization of the thesis 8 II. Related work 9 2.1 Hybrid cache architecture 9 2.1.1 Write intensity prediction studies 11 2.1.2 Static approaches 11 2.1.3 Hybrid cache architecture for main memory 12 2.2 Cache partitioning schemes 14 III. Write avoidance cache coherence protocol 15 3.1 Limitation of existing cache coherence protocol 15 3.2 Write avoidance cache coherence protocol 19 IV. NVM capacity management policy for hybrid cache architecture 22 4.1 NVM capacity management policy 22 4.1.1 Concept of NVM capacity management policy 23 4.1.2 Feasibility of NVM capacity management policy 27 4.2 Dynamic way adjusting 37 4.2.1 Maximum stack distance 37 4.2.2 Adjusting the number of NVM ways 41 4.2.3 Algorithm of dynamic way adjusting 42 4.3 Cache partitioning for hybrid cache architecture 46 4.3.1 Linefill-aware cache partitioning 49 4.3.2 Metrics for cache partitioning 50 4.3.3 Algorithm for cache partitioning 59 4.4 Overhead of NVM capacity management policy 68 V. Experimental results 71 5.1 Experimental environment 71 5.2 Write access to NVM 78 5.3 Dynamic energy consumption 85 5.4 Lifetime 90 5.5 Multi-core environment 96 VI. Conclusion 104 6.1 Conclusion 104 6.2 Future work 106 References 107 Abstract in Korean 115Docto

    Shared Resource Management for Non-Volatile Asymmetric Memory

    Get PDF
    Non-volatile memory (NVM), such as Phase-Change Memory (PCM), is a promising energy-efficient candidate to replace DRAM. It is desirable because of its non-volatility, good scalability and low idle power. NVM, nevertheless, faces important challenges. The main problems are: writes are much slower and more power hungry than reads and write bandwidth is much lower than read bandwidth. Hybrid main memory architecture, which consists of a large NVM and a small DRAM, may become a solution for architecting NVM as main memory. Adding an extra layer of cache mitigates the drawbacks of NVM writes. However, writebacks from the last-level cache (LLC) might still (a) overwhelm the limited NVM write bandwidth and stall the application, (b) shorten lifetime and (c) increase energy consumption. Effectively utilizing shared resources, such as the last-level cache and the memory bandwidth, is crucial to achieving high performance for multi-core systems. No existing cache and bandwidth allocation scheme exploits the read/write asymmetry property, which is fundamental in NVM. This thesis tries to consider the asymmetry property in partitioning the cache and memory bandwidth for NVM systems. The thesis proposes three writeback-aware schemes to manage the resources in NVM systems. First, a runtime mechanism, Writeback-aware Cache Partitioning (WCP), is proposed to partition the shared LLC among multiple applications. Unlike past partitioning schemes, WCP considers the reduction in cache misses as well as writebacks. Second, a new runtime mechanism, Writeback-aware Bandwidth Partitioning (WBP), partitions NVM service cycles among applications. WBP uses a bandwidth partitioning weight to reflect the importance of writebacks (in addition to LLC misses) to bandwidth allocation. A companion Dynamic Weight Adjustment scheme dynamically selects the cache partitioning weight to maximize system performance. Third, Unified Writeback-aware Partitioning (UWP) partitions the last-level cache and the memory bandwidth cooperatively. UWP can further improve the system performance by considering the interaction of cache partitioning and bandwidth partitioning. The three proposed schemes improve system performance by considering the unique read/write asymmetry property of NVM

    Architectural Techniques for Multi-Level Cell Phase Change Memory Based Main Memory

    Get PDF
    Phase change memory (PCM) recently has emerged as a promising technology to meet the fast growing demand for large capacity main memory in modern computing systems. Multi-level cell (MLC) PCM storing multiple bits in a single cell offers high density with low per-byte fabrication cost. However, PCM suffers from long write latency, short cell endurance, limited write throughput and high peak power, which makes it challenging to be integrated in the memory hierarchy. To address the long write latency, I propose write truncation to reduce the number of write iterations with the assistance of an extra error correction code (ECC). I also propose form switch (FS) to reduce the storage overhead of the ECC. By storing highly compressible lines in single level cell (SLC) form, FS improves read latency as well. To attack the short cell endurance and large peak power, I propose elastic RESET (ER) to construct triple-level cell PCM. By reducing RESET energy, ER significantly reduces peak power and prolongs PCM lifetime. To improve the write concurrency, I propose fine-grained write power budgeting (FPB) observing a global power budget and regulates power across write iterations according to the step-down power demand of each iteration. A global charge pump is also integrated onto a DIMM to boost power for hot PCM chips while staying within the global power budget. To further reduce the peak power, I propose intra-write RESET scheduling distributing cell RESET initializations in the whole write operation duration, so that the on-chip charge pump size can also be reduced
    corecore