558 research outputs found
Flexible-Latency DRAM: Understanding and Exploiting Latency Variation in Modern DRAM Chips
This article summarizes key results of our work on experimental
characterization and analysis of latency variation and latency-reliability
trade-offs in modern DRAM chips, which was published in SIGMETRICS 2016, and
examines the work's significance and future potential.
The goal of this work is to (i) experimentally characterize and understand
the latency variation across cells within a DRAM chip for these three
fundamental DRAM operations, and (ii) develop new mechanisms that exploit our
understanding of the latency variation to reliably improve performance. To this
end, we comprehensively characterize 240 DRAM chips from three major vendors,
and make six major new observations about latency variation within DRAM.
Notably, we find that (i) there is large latency variation across the cells for
each of the three operations; (ii) variation characteristics exhibit
significant spatial locality: slower cells are clustered in certain regions of
a DRAM chip; and (iii) the three fundamental operations exhibit different
reliability characteristics when the latency of each operation is reduced.
Based on our observations, we propose Flexible-LatencY DRAM (FLY-DRAM), a
mechanism that exploits latency variation across DRAM cells within a DRAM chip
to improve system performance. The key idea of FLY-DRAM is to exploit the
spatial locality of slower cells within DRAM, and access the faster DRAM
regions with reduced latencies for the fundamental operations. Our evaluations
show that FLY-DRAM improves the performance of a wide range of applications by
13.3%, 17.6%, and 19.5%, on average, for each of the three different vendors'
real DRAM chips, in a simulated 8-core system
Voltron: Understanding and Exploiting the Voltage-Latency-Reliability Trade-Offs in Modern DRAM Chips to Improve Energy Efficiency
This paper summarizes our work on experimental characterization and analysis
of reduced-voltage operation in modern DRAM chips, which was published in
SIGMETRICS 2017, and examines the work's significance and future potential.
We take a comprehensive approach to understanding and exploiting the latency
and reliability characteristics of modern DRAM when the DRAM supply voltage is
lowered below the nominal voltage level specified by DRAM standards. We perform
an experimental study of 124 real DDR3L (low-voltage) DRAM chips manufactured
recently by three major DRAM vendors. We find that reducing the supply voltage
below a certain point introduces bit errors in the data, and we comprehensively
characterize the behavior of these errors. We discover that these errors can be
avoided by increasing the latency of three major DRAM operations (activation,
restoration, and precharge). We perform detailed DRAM circuit simulations to
validate and explain our experimental findings. We also characterize the
various relationships between reduced supply voltage and error locations,
stored data patterns, DRAM temperature, and data retention.
Based on our observations, we propose a new DRAM energy reduction mechanism,
called Voltron. The key idea of Voltron is to use a performance model to
determine by how much we can reduce the supply voltage without introducing
errors and without exceeding a user-specified threshold for performance loss.
Our evaluations show that Voltron reduces the average DRAM and system energy
consumption by 10.5% and 7.3%, respectively, while limiting the average system
performance loss to only 1.8%, for a variety of memory-intensive quad-core
workloads. We also show that Voltron significantly outperforms prior dynamic
voltage and frequency scaling mechanisms for DRAM
Adaptive-Latency DRAM: Reducing DRAM Latency by Exploiting Timing Margins
This paper summarizes the idea of Adaptive-Latency DRAM (AL-DRAM), which was
published in HPCA 2015, and examines the work's significance and future
potential. AL-DRAM is a mechanism that optimizes DRAM latency based on the DRAM
module and the operating temperature, by exploiting the extra margin that is
built into the DRAM timing parameters. DRAM manufacturers provide a large
margin for the timing parameters as a provision against two worst-case
scenarios. First, due to process variation, some outlier DRAM chips are much
slower than others. Second, chips become slower at higher temperatures. The
timing parameter margin ensures that the slow outlier chips operate reliably at
the worst-case temperature, and hence leads to a high access latency.
Using an FPGA-based DRAM testing platform, our work first characterizes the
extra margin for 115 DRAM modules from three major manufacturers. The
experimental results demonstrate that it is possible to reduce four of the most
critical timing parameters by a minimum/maximum of 17.3%/54.8% at 55C while
maintaining reliable operation. AL-DRAM uses these observations to adaptively
select reliable DRAM timing parameters for each DRAM module based on the
module's current operating conditions. AL-DRAM does not require any changes to
the DRAM chip or its interface; it only requires multiple different timing
parameters to be specified and supported by the memory controller. Our real
system evaluations show that AL-DRAM improves the performance of
memory-intensive workloads by an average of 14% without introducing any errors.
Our characterization and proposed techniques have inspired several other works
on analyzing and/or exploiting different sources of latency and performance
variation within DRAM chips.Comment: arXiv admin note: substantial text overlap with arXiv:1603.0845
Understanding Reduced-Voltage Operation in Modern DRAM Chips: Characterization, Analysis, and Mechanisms
The energy consumption of DRAM is a critical concern in modern computing
systems. Improvements in manufacturing process technology have allowed DRAM
vendors to lower the DRAM supply voltage conservatively, which reduces some of
the DRAM energy consumption. We would like to reduce the DRAM supply voltage
more aggressively, to further reduce energy. Aggressive supply voltage
reduction requires a thorough understanding of the effect voltage scaling has
on DRAM access latency and DRAM reliability.
In this paper, we take a comprehensive approach to understanding and
exploiting the latency and reliability characteristics of modern DRAM when the
supply voltage is lowered below the nominal voltage level specified by DRAM
standards. Using an FPGA-based testing platform, we perform an experimental
study of 124 real DDR3L (low-voltage) DRAM chips manufactured recently by three
major DRAM vendors. We find that reducing the supply voltage below a certain
point introduces bit errors in the data, and we comprehensively characterize
the behavior of these errors. We discover that these errors can be avoided by
increasing the latency of three major DRAM operations (activation, restoration,
and precharge). We perform detailed DRAM circuit simulations to validate and
explain our experimental findings. We also characterize the various
relationships between reduced supply voltage and error locations, stored data
patterns, DRAM temperature, and data retention.
Based on our observations, we propose a new DRAM energy reduction mechanism,
called Voltron. The key idea of Voltron is to use a performance model to
determine by how much we can reduce the supply voltage without introducing
errors and without exceeding a user-specified threshold for performance loss.
Voltron reduces the average system energy by 7.3% while limiting the average
system performance loss to only 1.8%, for a variety of workloads.Comment: 25 pages, 25 figures, 7 tables, Proceedings of the ACM on Measurement
and Analysis of Computing Systems (POMACS
RowHammer: A Retrospective
This retrospective paper describes the RowHammer problem in Dynamic Random
Access Memory (DRAM), which was initially introduced by Kim et al. at the ISCA
2014 conference~\cite{rowhammer-isca2014}. RowHammer is a prime (and perhaps
the first) example of how a circuit-level failure mechanism can cause a
practical and widespread system security vulnerability. It is the phenomenon
that repeatedly accessing a row in a modern DRAM chip causes bit flips in
physically-adjacent rows at consistently predictable bit locations. RowHammer
is caused by a hardware failure mechanism called {\em DRAM disturbance errors},
which is a manifestation of circuit-level cell-to-cell interference in a scaled
memory technology.
Researchers from Google Project Zero demonstrated in 2015 that this hardware
failure mechanism can be effectively exploited by user-level programs to gain
kernel privileges on real systems. Many other follow-up works demonstrated
other practical attacks exploiting RowHammer. In this article, we
comprehensively survey the scientific literature on RowHammer-based attacks as
well as mitigation techniques to prevent RowHammer. We also discuss what other
related vulnerabilities may be lurking in DRAM and other types of memories,
e.g., NAND flash memory or Phase Change Memory, that can potentially threaten
the foundations of secure systems, as the memory technologies scale to higher
densities. We conclude by describing and advocating a principled approach to
memory reliability and security research that can enable us to better
anticipate and prevent such vulnerabilities.Comment: A version of this work is to appear at IEEE Transactions on
Computer-Aided Design of Integrated Circuits and Systems (TCAD) Special Issue
on Top Picks in Hardware and Embedded Security, 2019. arXiv admin note:
substantial text overlap with arXiv:1703.00626, arXiv:1903.1105
Exploiting Row-Level Temporal Locality in DRAM to Reduce the Memory Access Latency
This paper summarizes the idea of ChargeCache, which was published in HPCA
2016 [51], and examines the work's significance and future potential. DRAM
latency continues to be a critical bottleneck for system performance. In this
work, we develop a low-cost mechanism, called ChargeCache, that enables faster
access to recently-accessed rows in DRAM, with no modifications to DRAM chips.
Our mechanism is based on the key observation that a recently-accessed row has
more charge and thus the following access to the same row can be performed
faster. To exploit this observation, we propose to track the addresses of
recently-accessed rows in a table in the memory controller. If a later DRAM
request hits in that table, the memory controller uses lower timing parameters,
leading to reduced DRAM latency. Row addresses are removed from the table after
a specified duration to ensure rows that have leaked too much charge are not
accessed with lower latency. We evaluate ChargeCache on a wide variety of
workloads and show that it provides significant performance and energy benefits
for both single-core and multi-core systems.Comment: arXiv admin note: substantial text overlap with arXiv:1609.0723
LISA: Increasing Internal Connectivity in DRAM for Fast Data Movement and Low Latency
This paper summarizes the idea of Low-Cost Interlinked Subarrays (LISA),
which was published in HPCA 2016, and examines the work's significance and
future potential. Contemporary systems perform bulk data movement movement
inefficiently, by transferring data from DRAM to the processor, and then back
to DRAM, across a narrow off-chip channel. The use of this narrow channel
results in high latency and energy consumption. Prior work proposes to avoid
these high costs by exploiting the existing wide internal DRAM bandwidth for
bulk data movement, but the limited connectivity of wires within DRAM allows
fast data movement within only a single DRAM subarray. Each subarray is only a
few megabytes in size, greatly restricting the range over which fast bulk data
movement can happen within DRAM.
Our HPCA 2016 paper proposes a new DRAM substrate, Low-Cost Inter-Linked
Subarrays (LISA), whose goal is to enable fast and efficient data movement
across a large range of memory at low cost. LISA adds low-cost connections
between adjacent subarrays. By using these connections to interconnect the
existing internal wires (bitlines) of adjacent subarrays, LISA enables
wide-bandwidth data transfer across multiple subarrays with little (only 0.8%)
DRAM area overhead. As a DRAM substrate, LISA is versatile, enabling a variety
of new applications. We describe and evaluate three such applications in
detail: (1) fast inter-subarray bulk data copy, (2) in-DRAM caching using a
DRAM architecture whose rows have heterogeneous access latencies, and (3)
accelerated bitline precharging by linking multiple precharge units together.
Our extensive evaluations show that each of LISA's three applications
significantly improves performance and memory energy efficiency on a variety of
workloads and system configurations
Heterogeneous-Reliability Memory: Exploiting Application-Level Memory Error Tolerance
This paper summarizes our work on characterizing application memory error
vulnerability to optimize datacenter cost via Heterogeneous-Reliability Memory
(HRM), which was published in DSN 2014, and examines the work's significance
and future potential. Memory devices represent a key component of datacenter
total cost of ownership (TCO), and techniques used to reduce errors that occur
on these devices increase this cost. Existing approaches to providing
reliability for memory devices pessimistically treat all data as equally
vulnerable to memory errors. Our key insight is that there exists a diverse
spectrum of tolerance to memory errors in new data-intensive applications, and
that traditional one-size-fits-all memory reliability techniques are
inefficient in terms of cost. This presents an opportunity to greatly reduce
server hardware cost by provisioning the right amount of memory reliability for
different applications.
Toward this end, in our DSN 2014 paper, we make three main contributions to
enable highly-reliable servers at low datacenter cost. First, we develop a new
methodology to quantify the tolerance of applications to memory errors. Second,
using our methodology, we perform a case study of three new data-intensive
workloads (an interactive web search application, an in-memory key--value
store, and a graph mining framework) to identify new insights into the nature
of application memory error vulnerability. Third, based on our insights, we
propose several new hardware/software heterogeneous-reliability memory system
designs to lower datacenter cost while achieving high reliability and discuss
their trade-offs. We show that our new techniques can reduce server hardware
cost by 4.7% while achieving 99.90% single server availability.Comment: 4 pages, 4 figures, summary report for DSN 2014 paper:
"Characterizing Application Memory Error Vulnerability to Optimize Datacenter
Cost via Heterogeneous-Reliability Memory
PreLatPUF: Exploiting DRAM Latency Variations for Generating Robust Device Signatures
Physically Unclonable Functions (PUFs) are potential security blocks to
generate unique and more secure keys in low-cost cryptographic applications.
Dynamic random-access memory (DRAM) has been proposed as one of the promising
candidates for generating robust keys. Unfortunately, the existing techniques
of generating device signatures from DRAM is very slow, destructive (destroy
the current data), and disruptive to system operation. In this paper, we
propose \textit{precharge} latency-based PUF (PreLatPUF) that exploits DRAM
\textit{precharge} latency variations to generate signatures. The proposed
PreLatPUF is fast, robust, least disruptive, and non-destructive. The silicon
results from commercially available chips from different manufacturers
show that the proposed key generation technique is at least
faster than the existing approaches, while reliably reproducing the key in
extreme operating conditions
Reducing DRAM Refresh Overheads with Refresh-Access Parallelism
This article summarizes the idea of "refresh-access parallelism," which was
published in HPCA 2014, and examines the work's significance and future
potential. The overarching objective of our HPCA 2014 paper is to reduce the
significant negative performance impact of DRAM refresh with intelligent memory
controller mechanisms.
To mitigate the negative performance impact of DRAM refresh, our HPCA 2014
paper proposes two complementary mechanisms, DARP (Dynamic Access Refresh
Parallelization) and SARP (Subarray Access Refresh Parallelization). The goal
is to address the drawbacks of state-of-the-art per-bank refresh mechanism by
building more efficient techniques to parallelize refreshes and accesses within
DRAM. First, instead of issuing per-bank refreshes in a round-robin order, as
it is done today, DARP issues per-bank refreshes to idle banks in an
out-of-order manner. Furthermore, DARP proactively schedules refreshes during
intervals when a batch of writes are draining to DRAM. Second, SARP exploits
the existence of mostly-independent subarrays within a bank. With minor
modifications to DRAM organization, it allows a bank to serve memory accesses
to an idle subarray while another subarray is being refreshed. Our extensive
evaluations on a wide variety of workloads and systems show that our mechanisms
improve system performance (and energy efficiency) compared to three
state-of-the-art refresh policies, and their performance bene ts increase as
DRAM density increases.Comment: 9 pages. arXiv admin note: text overlap with arXiv:1712.07754,
arXiv:1601.0635
- …