75 research outputs found
A Case for Self-Managing DRAM Chips: Improving Performance, Efficiency, Reliability, and Security via Autonomous in-DRAM Maintenance Operations
The memory controller is in charge of managing DRAM maintenance operations
(e.g., refresh, RowHammer protection, memory scrubbing) in current DRAM chips.
Implementing new maintenance operations often necessitates modifications in the
DRAM interface, memory controller, and potentially other system components.
Such modifications are only possible with a new DRAM standard, which takes a
long time to develop, leading to slow progress in DRAM systems.
In this paper, our goal is to 1) ease, and thus accelerate, the process of
enabling new DRAM maintenance operations and 2) enable more efficient in-DRAM
maintenance operations. Our idea is to set the memory controller free from
managing DRAM maintenance. To this end, we propose Self-Managing DRAM (SMD), a
new low-cost DRAM architecture that enables implementing new in-DRAM
maintenance mechanisms (or modifying old ones) with no further changes in the
DRAM interface, memory controller, or other system components. We use SMD to
implement new in-DRAM maintenance mechanisms for three use cases: 1) periodic
refresh, 2) RowHammer protection, and 3) memory scrubbing. We show that SMD
enables easy adoption of efficient maintenance mechanisms that significantly
improve the system performance and energy efficiency while providing higher
reliability compared to conventional DDR4 DRAM. A combination of SMD-based
maintenance mechanisms that perform refresh, RowHammer protection, and memory
scrubbing achieve 7.6% speedup and consume 5.2% less DRAM energy on average
across 20 memory-intensive four-core workloads. We make SMD source code openly
and freely available at [128]
Re-designing Main Memory Subsystems with Emerging Monolithic 3D (M3D) Integration and Phase Change Memory Technologies
Over the past two decades, Dynamic Random-Access Memory (DRAM) has emerged as the dominant technology for implementing the main memory subsystems of all types of computing systems. However, inferring from several recent trends, computer architects in both the industry and academia have widely accepted that the density (memory capacity per chip area) and latency of DRAM based main memory subsystems cannot sufficiently scale in the future to meet the requirements of future data-centric workloads related to Artificial Intelligence (AI), Big Data, and Internet-of-Things (IoT). In fact, the achievable density and access latency in main memory subsystems presents a very fundamental trade-off. Pushing for a higher density inevitably increases access latency, and pushing for a reduced access latency often leads to a decreased density. This trade-off is so fundamental in DRAM based main memory subsystems that merely looking to re-architect DRAM subsystems cannot improve this trade-off, unless disruptive technological advancements are realized for implementing main memory subsystems.
In this thesis, we focus on two key contributions to overcome the density (represented as the total chip area for the given capacity) and access latency related challenges in main memory subsystems. First, we show that the fundamental area-latency trade-offs in DRAM can be significantly improved by redesigning the DRAM cell-array structure using the emerging monolithic 3D (M3D) integration technology. A DRAM bank structure can be split across two or more M3D-integrated tiers on the same DRAM chip, to consequently be able to significantly reduce the total on-chip area occupancy of the DRAM bank and its access peripherals. This approach is fundamentally different from the well known approach of through-silicon vias (TSVs)-based 3D stacking of DRAM tiers. This is because the M3D integration based approach does not require a separate DRAM chip per tier, whereas the 3D-stacking based approach does. Our evaluation results for PARSEC benchmarks show that our designed M3D DRAM cellarray organizations can yield up to 9.56% less latency and up to 21.21% less energy-delay product (EDP), with up to 14% less DRAM die area, compared to the conventional 2D DDR4 DRAM. Second, we demonstrate a pathway for eliminating the write disturbance errors in single-level-cell PCM, thereby positioning the PCM technology, which has inherently more relaxed density and latency trade-off compared to DRAM, as a more viable option for replacing the DRAM technology. We introduce low-temperature partial-RESET operations for writing ‘0’s in PCM cells. Compared to traditional operations that write \u270\u27s in PCM cells, partial-RESET operations do not cause disturbance errors in neighboring cells during PCM writes.
The overarching theme that connects the two individual contributions into this single thesis is the density versus latency argument. The existing PCM technology has 3 to 4× higher write latency compared to DRAM; nevertheless, the existing PCM technology can store 2 to 4 bits in a single cell compared to one bit per cell storage capacity of DRAM. Therefore, unlike DRAM, it becomes possible to increase the density of PCM without consequently increasing PCM latency. In other words, PCM exhibits inherently improved (more relaxed) density and latency trade-off. Thus, both of our contributions in this thesis, the first contribution of re-designing DRAM with M3D integration technology and the second contribution of making the PCM technology a more viable replacement of DRAM by eliminating the write disturbance errors in PCM, connect to the common overarching goal of improving the density and latency trade-off in main memory subsystems. In addition, we also discuss in this thesis possible future research directions that are aimed at extending the impacts of our proposed ideas so that they can transform the performance of main memory subsystems of the future
Randomized Line-to-Row Mapping for Low-Overhead Rowhammer Mitigations
Modern systems mitigate Rowhammer using victim refresh, which refreshes the
two neighbours of an aggressor row when it encounters a specified number of
activations. Unfortunately, complex attack patterns like Half-Double break
victim-refresh, rendering current systems vulnerable. Instead, recently
proposed secure Rowhammer mitigations rely on performing mitigative action on
the aggressor rather than the victims. Such schemes employ mitigative actions
such as row-migration or access-control and include AQUA, SRS, and Blockhammer.
While these schemes incur only modest slowdowns at Rowhammer thresholds of few
thousand, they incur prohibitive slowdowns (15%-600%) for lower thresholds that
are likely in the near future. The goal of our paper is to make secure
Rowhammer mitigations practical at such low thresholds.
Our paper provides the key insights that benign application encounter
thousands of hot rows (receiving more activations than the threshold) due to
the memory mapping, which places spatially proximate lines in the same row to
maximize row-buffer hitrate. Unfortunately, this causes row to receive
activations for many frequently used lines. We propose Rubix, which breaks the
spatial correlation in the line-to-row mapping by using an encrypted address to
access the memory, reducing the likelihood of hot rows by 2 to 3 orders of
magnitude. To aid row-buffer hits, Rubix randomizes a group of 1-4 lines. We
also propose Rubix-D, which dynamically changes the line-to-row mapping.
Rubix-D minimizes hot-rows and makes it much harder for an adversary to learn
the spatial neighbourhood of a row. Rubix reduces the slowdown of AQUA (from
15% to 1%), SRS (from 60% to 2%), and Blockhammer (from 600% to 3%) while
incurring a storage of less than 1 Kilobyte
Doctor of Philosophy in Computing
dissertationThe demand for main memory capacity has been increasing for many years and will continue to do so. In the past, Dynamic Random Access Memory (DRAM) process scaling has enabled this increase in memory capacity. Along with continued DRAM scaling, the emergence of new technologies like 3D-stacking, buffered Dual Inline Memory Modules (DIMMs), and crosspoint nonvolatile memory promise to continue this trend in the years ahead. However, these technologies will bring with them their own gamut of problems. In this dissertation, I look at the problems facing these technologies from a current delivery perspective. 3D-stacking increases memory capacity available per package, but the increased current requirement means that more pins on the package have to be now dedicated to provide Vdd/Vss, hence increasing cost. At the system level, using buffered DIMMs to increase the number of DRAM ranks increases the peak current requirements of the system if all the DRAM chips in the system are Refreshed simultaneously. Crosspoint memories promise to greatly increase bit densities but have long read latencies because of sneak currents in the cross-bar. In this dissertation, I provide architectural solutions to each of these problems. We observe that smart data placement by the architecture and the Operating System (OS) is a vital ingredient in all of these solutions. We thereby mitigate major bottlenecks in these technologies, hence enabling higher memory densities
RowPress: Amplifying Read Disturbance in Modern DRAM Chips
Memory isolation is critical for system reliability, security, and safety.
Unfortunately, read disturbance can break memory isolation in modern DRAM
chips. For example, RowHammer is a well-studied read-disturb phenomenon where
repeatedly opening and closing (i.e., hammering) a DRAM row many times causes
bitflips in physically nearby rows.
This paper experimentally demonstrates and analyzes another widespread
read-disturb phenomenon, RowPress, in real DDR4 DRAM chips. RowPress breaks
memory isolation by keeping a DRAM row open for a long period of time, which
disturbs physically nearby rows enough to cause bitflips. We show that RowPress
amplifies DRAM's vulnerability to read-disturb attacks by significantly
reducing the number of row activations needed to induce a bitflip by one to two
orders of magnitude under realistic conditions. In extreme cases, RowPress
induces bitflips in a DRAM row when an adjacent row is activated only once. Our
detailed characterization of 164 real DDR4 DRAM chips shows that RowPress 1)
affects chips from all three major DRAM manufacturers, 2) gets worse as DRAM
technology scales down to smaller node sizes, and 3) affects a different set of
DRAM cells from RowHammer and behaves differently from RowHammer as temperature
and access pattern changes.
We demonstrate in a real DDR4-based system with RowHammer protection that 1)
a user-level program induces bitflips by leveraging RowPress while conventional
RowHammer cannot do so, and 2) a memory controller that adaptively keeps the
DRAM row open for a longer period of time based on access pattern can
facilitate RowPress-based attacks. To prevent bitflips due to RowPress, we
describe and evaluate a new methodology that adapts existing RowHammer
mitigation techniques to also mitigate RowPress with low additional performance
overhead. We open source all our code and data to facilitate future research on
RowPress.Comment: Extended version of the paper "RowPress: Amplifying Read Disturbance
in Modern DRAM Chips" at the 50th Annual International Symposium on Computer
Architecture (ISCA), 202
Retrospective: RAIDR: Retention-Aware Intelligent DRAM Refresh
Dynamic Random Access Memory (DRAM) is the prevalent memory technology used
to build main memory systems of almost all computers. A fundamental shortcoming
of DRAM is the need to refresh memory cells to keep stored data intact. DRAM
refresh consumes energy and degrades performance. It is also a technology
scaling challenge as its negative effects become worse as DRAM cell size
reduces and DRAM chip capacity increases.
Our ISCA 2012 paper, RAIDR, examines the DRAM refresh problem from a modern
computing systems perspective, demonstrating its projected impact on systems
with higher-capacity DRAM chips expected to be manufactured in the future. It
proposes and evaluates a simple and low-cost solution that greatly reduces the
performance & energy overheads of refresh by exploiting variation in data
retention times across DRAM rows. The key idea is to group the DRAM rows into
bins in terms of their minimum data retention times, store the bins in low-cost
Bloom filters, and refresh rows in different bins at different rates.
Evaluations in our paper (and later works) show that the idea greatly improves
performance & energy efficiency and its benefits increase with DRAM chip
capacity. The paper embodies an approach we have termed system-DRAM co-design.
This short retrospective provides a brief analysis of our RAIDR paper and its
impact. We briefly describe the mindset and circumstances that led to our focus
on the DRAM refresh problem and RAIDR's development, discuss later works that
provided improved analyses and solutions, and make some educated guesses on
what the future may bring on the DRAM refresh problem (and more generally in
DRAM technology scaling).Comment: Selected to the 50th Anniversary of ISCA (ACM/IEEE International
Symposium on Computer Architecture), Commemorative Issue, 202
- …