75 research outputs found

    A Case for Self-Managing DRAM Chips: Improving Performance, Efficiency, Reliability, and Security via Autonomous in-DRAM Maintenance Operations

    Full text link
    The memory controller is in charge of managing DRAM maintenance operations (e.g., refresh, RowHammer protection, memory scrubbing) in current DRAM chips. Implementing new maintenance operations often necessitates modifications in the DRAM interface, memory controller, and potentially other system components. Such modifications are only possible with a new DRAM standard, which takes a long time to develop, leading to slow progress in DRAM systems. In this paper, our goal is to 1) ease, and thus accelerate, the process of enabling new DRAM maintenance operations and 2) enable more efficient in-DRAM maintenance operations. Our idea is to set the memory controller free from managing DRAM maintenance. To this end, we propose Self-Managing DRAM (SMD), a new low-cost DRAM architecture that enables implementing new in-DRAM maintenance mechanisms (or modifying old ones) with no further changes in the DRAM interface, memory controller, or other system components. We use SMD to implement new in-DRAM maintenance mechanisms for three use cases: 1) periodic refresh, 2) RowHammer protection, and 3) memory scrubbing. We show that SMD enables easy adoption of efficient maintenance mechanisms that significantly improve the system performance and energy efficiency while providing higher reliability compared to conventional DDR4 DRAM. A combination of SMD-based maintenance mechanisms that perform refresh, RowHammer protection, and memory scrubbing achieve 7.6% speedup and consume 5.2% less DRAM energy on average across 20 memory-intensive four-core workloads. We make SMD source code openly and freely available at [128]

    Re-designing Main Memory Subsystems with Emerging Monolithic 3D (M3D) Integration and Phase Change Memory Technologies

    Get PDF
    Over the past two decades, Dynamic Random-Access Memory (DRAM) has emerged as the dominant technology for implementing the main memory subsystems of all types of computing systems. However, inferring from several recent trends, computer architects in both the industry and academia have widely accepted that the density (memory capacity per chip area) and latency of DRAM based main memory subsystems cannot sufficiently scale in the future to meet the requirements of future data-centric workloads related to Artificial Intelligence (AI), Big Data, and Internet-of-Things (IoT). In fact, the achievable density and access latency in main memory subsystems presents a very fundamental trade-off. Pushing for a higher density inevitably increases access latency, and pushing for a reduced access latency often leads to a decreased density. This trade-off is so fundamental in DRAM based main memory subsystems that merely looking to re-architect DRAM subsystems cannot improve this trade-off, unless disruptive technological advancements are realized for implementing main memory subsystems. In this thesis, we focus on two key contributions to overcome the density (represented as the total chip area for the given capacity) and access latency related challenges in main memory subsystems. First, we show that the fundamental area-latency trade-offs in DRAM can be significantly improved by redesigning the DRAM cell-array structure using the emerging monolithic 3D (M3D) integration technology. A DRAM bank structure can be split across two or more M3D-integrated tiers on the same DRAM chip, to consequently be able to significantly reduce the total on-chip area occupancy of the DRAM bank and its access peripherals. This approach is fundamentally different from the well known approach of through-silicon vias (TSVs)-based 3D stacking of DRAM tiers. This is because the M3D integration based approach does not require a separate DRAM chip per tier, whereas the 3D-stacking based approach does. Our evaluation results for PARSEC benchmarks show that our designed M3D DRAM cellarray organizations can yield up to 9.56% less latency and up to 21.21% less energy-delay product (EDP), with up to 14% less DRAM die area, compared to the conventional 2D DDR4 DRAM. Second, we demonstrate a pathway for eliminating the write disturbance errors in single-level-cell PCM, thereby positioning the PCM technology, which has inherently more relaxed density and latency trade-off compared to DRAM, as a more viable option for replacing the DRAM technology. We introduce low-temperature partial-RESET operations for writing ‘0’s in PCM cells. Compared to traditional operations that write \u270\u27s in PCM cells, partial-RESET operations do not cause disturbance errors in neighboring cells during PCM writes. The overarching theme that connects the two individual contributions into this single thesis is the density versus latency argument. The existing PCM technology has 3 to 4× higher write latency compared to DRAM; nevertheless, the existing PCM technology can store 2 to 4 bits in a single cell compared to one bit per cell storage capacity of DRAM. Therefore, unlike DRAM, it becomes possible to increase the density of PCM without consequently increasing PCM latency. In other words, PCM exhibits inherently improved (more relaxed) density and latency trade-off. Thus, both of our contributions in this thesis, the first contribution of re-designing DRAM with M3D integration technology and the second contribution of making the PCM technology a more viable replacement of DRAM by eliminating the write disturbance errors in PCM, connect to the common overarching goal of improving the density and latency trade-off in main memory subsystems. In addition, we also discuss in this thesis possible future research directions that are aimed at extending the impacts of our proposed ideas so that they can transform the performance of main memory subsystems of the future

    Randomized Line-to-Row Mapping for Low-Overhead Rowhammer Mitigations

    Full text link
    Modern systems mitigate Rowhammer using victim refresh, which refreshes the two neighbours of an aggressor row when it encounters a specified number of activations. Unfortunately, complex attack patterns like Half-Double break victim-refresh, rendering current systems vulnerable. Instead, recently proposed secure Rowhammer mitigations rely on performing mitigative action on the aggressor rather than the victims. Such schemes employ mitigative actions such as row-migration or access-control and include AQUA, SRS, and Blockhammer. While these schemes incur only modest slowdowns at Rowhammer thresholds of few thousand, they incur prohibitive slowdowns (15%-600%) for lower thresholds that are likely in the near future. The goal of our paper is to make secure Rowhammer mitigations practical at such low thresholds. Our paper provides the key insights that benign application encounter thousands of hot rows (receiving more activations than the threshold) due to the memory mapping, which places spatially proximate lines in the same row to maximize row-buffer hitrate. Unfortunately, this causes row to receive activations for many frequently used lines. We propose Rubix, which breaks the spatial correlation in the line-to-row mapping by using an encrypted address to access the memory, reducing the likelihood of hot rows by 2 to 3 orders of magnitude. To aid row-buffer hits, Rubix randomizes a group of 1-4 lines. We also propose Rubix-D, which dynamically changes the line-to-row mapping. Rubix-D minimizes hot-rows and makes it much harder for an adversary to learn the spatial neighbourhood of a row. Rubix reduces the slowdown of AQUA (from 15% to 1%), SRS (from 60% to 2%), and Blockhammer (from 600% to 3%) while incurring a storage of less than 1 Kilobyte

    Doctor of Philosophy in Computing

    Get PDF
    dissertationThe demand for main memory capacity has been increasing for many years and will continue to do so. In the past, Dynamic Random Access Memory (DRAM) process scaling has enabled this increase in memory capacity. Along with continued DRAM scaling, the emergence of new technologies like 3D-stacking, buffered Dual Inline Memory Modules (DIMMs), and crosspoint nonvolatile memory promise to continue this trend in the years ahead. However, these technologies will bring with them their own gamut of problems. In this dissertation, I look at the problems facing these technologies from a current delivery perspective. 3D-stacking increases memory capacity available per package, but the increased current requirement means that more pins on the package have to be now dedicated to provide Vdd/Vss, hence increasing cost. At the system level, using buffered DIMMs to increase the number of DRAM ranks increases the peak current requirements of the system if all the DRAM chips in the system are Refreshed simultaneously. Crosspoint memories promise to greatly increase bit densities but have long read latencies because of sneak currents in the cross-bar. In this dissertation, I provide architectural solutions to each of these problems. We observe that smart data placement by the architecture and the Operating System (OS) is a vital ingredient in all of these solutions. We thereby mitigate major bottlenecks in these technologies, hence enabling higher memory densities

    RowPress: Amplifying Read Disturbance in Modern DRAM Chips

    Full text link
    Memory isolation is critical for system reliability, security, and safety. Unfortunately, read disturbance can break memory isolation in modern DRAM chips. For example, RowHammer is a well-studied read-disturb phenomenon where repeatedly opening and closing (i.e., hammering) a DRAM row many times causes bitflips in physically nearby rows. This paper experimentally demonstrates and analyzes another widespread read-disturb phenomenon, RowPress, in real DDR4 DRAM chips. RowPress breaks memory isolation by keeping a DRAM row open for a long period of time, which disturbs physically nearby rows enough to cause bitflips. We show that RowPress amplifies DRAM's vulnerability to read-disturb attacks by significantly reducing the number of row activations needed to induce a bitflip by one to two orders of magnitude under realistic conditions. In extreme cases, RowPress induces bitflips in a DRAM row when an adjacent row is activated only once. Our detailed characterization of 164 real DDR4 DRAM chips shows that RowPress 1) affects chips from all three major DRAM manufacturers, 2) gets worse as DRAM technology scales down to smaller node sizes, and 3) affects a different set of DRAM cells from RowHammer and behaves differently from RowHammer as temperature and access pattern changes. We demonstrate in a real DDR4-based system with RowHammer protection that 1) a user-level program induces bitflips by leveraging RowPress while conventional RowHammer cannot do so, and 2) a memory controller that adaptively keeps the DRAM row open for a longer period of time based on access pattern can facilitate RowPress-based attacks. To prevent bitflips due to RowPress, we describe and evaluate a new methodology that adapts existing RowHammer mitigation techniques to also mitigate RowPress with low additional performance overhead. We open source all our code and data to facilitate future research on RowPress.Comment: Extended version of the paper "RowPress: Amplifying Read Disturbance in Modern DRAM Chips" at the 50th Annual International Symposium on Computer Architecture (ISCA), 202

    Retrospective: RAIDR: Retention-Aware Intelligent DRAM Refresh

    Full text link
    Dynamic Random Access Memory (DRAM) is the prevalent memory technology used to build main memory systems of almost all computers. A fundamental shortcoming of DRAM is the need to refresh memory cells to keep stored data intact. DRAM refresh consumes energy and degrades performance. It is also a technology scaling challenge as its negative effects become worse as DRAM cell size reduces and DRAM chip capacity increases. Our ISCA 2012 paper, RAIDR, examines the DRAM refresh problem from a modern computing systems perspective, demonstrating its projected impact on systems with higher-capacity DRAM chips expected to be manufactured in the future. It proposes and evaluates a simple and low-cost solution that greatly reduces the performance & energy overheads of refresh by exploiting variation in data retention times across DRAM rows. The key idea is to group the DRAM rows into bins in terms of their minimum data retention times, store the bins in low-cost Bloom filters, and refresh rows in different bins at different rates. Evaluations in our paper (and later works) show that the idea greatly improves performance & energy efficiency and its benefits increase with DRAM chip capacity. The paper embodies an approach we have termed system-DRAM co-design. This short retrospective provides a brief analysis of our RAIDR paper and its impact. We briefly describe the mindset and circumstances that led to our focus on the DRAM refresh problem and RAIDR's development, discuss later works that provided improved analyses and solutions, and make some educated guesses on what the future may bring on the DRAM refresh problem (and more generally in DRAM technology scaling).Comment: Selected to the 50th Anniversary of ISCA (ACM/IEEE International Symposium on Computer Architecture), Commemorative Issue, 202
    • …
    corecore