Voltron: Understanding and Exploiting the Voltage-Latency-Reliability
  Trade-Offs in Modern DRAM Chips to Improve Energy Efficiency by Chang, Kevin K. et al.
Voltron: Understanding and Exploiting
the Voltage–Latency–Reliability Trade-Os
in Modern DRAM Chips to Improve Energy Eciency
Kevin K. Chang1,2 Abdullah Giray Yağlıkçı2 Saugata Ghose2 Aditya Agrawal3
Niladrish Chatterjee3 Abhijith Kashyap4,2 Donghyuk Lee3
Mike O’Connor3,5 Hasan Hassan6 Onur Mutlu6
1Facebook 2Carnegie Mellon University 3NVIDIA Research
4NVIDIA 5The University of Texas at Austin 6ETH Zürich
This paper summarizes our work on experimental character-
ization and analysis of reduced-voltage operation in modern
DRAM chips, which was published in SIGMETRICS 2017 [29],
and examines the work’s signicance and future potential. This
work is motivated to reduce the energy consumption of DRAM,
which is a critical concern in modern computing systems. Im-
provements in manufacturing process technology have allowed
DRAM vendors to lower the DRAM supply voltage conserva-
tively, which reduces some of the DRAM energy consumption.
We would like to reduce the DRAM supply voltage more ag-
gressively, to further reduce energy. Aggressive supply voltage
reduction requires a thorough understanding of the eect volt-
age scaling has on DRAM access latency and DRAM reliability.
We take a comprehensive approach to understanding and
exploiting the latency and reliability characteristics of modern
DRAM when the supply voltage is lowered below the nom-
inal voltage level specied by DRAM standards. Using an
open-source FPGA-based testing platform based on SoftMC [54],
we perform an experimental study of 124 real DDR3L (low-
voltage) DRAM chips manufactured recently by three major
DRAM vendors. We nd that reducing the supply voltage be-
low a certain point introduces bit errors in the data, and we
comprehensively characterize the behavior of these errors. We
discover that these errors can be avoided by increasing the la-
tency of three major DRAM operations (activation, restoration,
and precharge). We perform detailed DRAM circuit simula-
tions to validate and explain our experimental ndings. We
also characterize the various relationships between reduced sup-
ply voltage and error locations, stored data patterns, DRAM
temperature, and data retention.
Based on our observations, we propose a new DRAM energy
reduction mechanism, called Voltron. The key idea of Voltron
is to use a performance model to determine by how much we
can reduce the supply voltage without introducing errors and
without exceeding a user-specied threshold for performance
loss. Our evaluations show that Voltron reduces the average
DRAM and system energy consumption by 10.5% and 7.3%, re-
spectively, while limiting the average system performance loss
to only 1.8%, for a variety of memory-intensive quad-core work-
loads. We also show that Voltron signicantly outperforms prior
dynamic voltage and frequency scaling mechanisms for DRAM.
We believe our experimental characterization and ndings can
pave the way for new mechanisms that exploit DRAM voltage
to improve power, performance, energy, and reliability.
1. Motivation
In a wide range of modern computing systems, spanning
from warehouse-scale data centers to mobile platforms, en-
ergy consumption is a rst-order concern [39,56,65, 105,107].
In these systems, the energy consumed by the DRAM-based
main memory system constitutes a signicant fraction of the
total energy. For example, experimental studies of produc-
tion systems have shown that DRAM consumes 40% of the
total energy in servers [56,140] and 40% of the total power in
graphics cards [115].
Improvements in manufacturing process technology have
allowed DRAM vendors to lower the DRAM supply voltage
conservatively, which reduces some of the DRAM energy
consumption [59, 60, 61]. In this work, we would like to re-
duce DRAM energy by further reducing DRAM supply voltage.
Vendors choose a conservatively high supply voltage, to pro-
vide a guardband that allows DRAM chips with worst-case
process variation to operate without errors under the worst-
case operating conditions [36]. The exact amount of supply
voltage guardband varies across chips, and lowering the volt-
age below the guardband can result in erroneous or even
undened behavior [29]. Therefore, we need to understand
how DRAM chips behave during reduced-voltage operation.
To our knowledge, no previously published work examines
the eect of using a wide range of dierent supply voltage
values on the reliability, latency, and retention characteristics
of DRAM chips.
Our goal in our SIGMETRICS 2017 paper [29] is to (i) char-
acterize and understand the relationship between supply volt-
age reduction and various characteristics of DRAM, including
DRAM reliability, latency, and data retention; and (ii) use the
insights derived from this characterization and understanding
to design a new mechanism that can aggressively lower the
supply voltage to reduce DRAM energy consumption while
keeping performance loss under a bound.
ar
X
iv
:1
80
5.
03
17
5v
1 
 [c
s.A
R]
  8
 M
ay
 20
18
To this end, we build an FPGA-based testing platform based
on SoftMC [54] that allows us to tune the DRAM supply volt-
age and change DRAM timing parameters (i.e., the amount
of time the memory controller waits for a DRAM operation
to complete). We perform an experimental study on 124
real 4Gb DDR3L (low-voltage) DRAM chips manufactured
recently (between 2014 and 2016) by three major DRAM ven-
dors. Our extensive experimental characterization yields four
major observations on how DRAM latency, reliability, and
data retention are aected by reduced voltage.
Based on our experimental observations, we propose a new
low-cost DRAM energy reduction mechanism called Voltron.
The key idea of Voltron is to use a performance model to de-
termine by how much we can reduce the DRAM array voltage
at runtime without introducing errors and without exceeding
a user-specied threshold for acceptable performance loss.
2. Characterization of DRAM Under
Reduced Supply Voltage
In this section, we briey summarize our four major ob-
servations from our detailed experimental characterization
of 31 commodity DRAM modules, also called DIMMs, from
three vendors, when the DIMMs operate under reduced sup-
ply voltage (i.e., below the nominal voltage level of 1.35V).
Each DIMM comprises 4 DDR3L DRAM chips, totaling to
124 chips for 31 DIMMs. Each chip has a 4Gb density. Thus,
each of our DIMMs has a 2GB capacity. Table 1 describes
the relevant information about the tested DIMMs. For a com-
plete discussion on all of our observations and experimental
methodology, we refer the reader to our SIGMETRICS 2017
paper [29].
Vendor Total Number Timing (ns) Assemblyof Chips (tRCD/tRP/tRAS) Year
A (10 DIMMs) 40 13.75/13.75/35 2015-16
B (12 DIMMs) 48 13.75/13.75/35 2014-15
C (9 DIMMs) 36 13.75/13.75/35 2015
Table 1: Main properties of the tested DIMMs. Reproduced
from [29].
2.1. DRAM Reliability as Voltage Decreases
We rst study the reliability of DRAM chips under low volt-
age, which was not studied by prior works on DRAM voltage
scaling (e.g., [36]; see Section 4 for a detailed discussion of
these works). Figure 1 shows the fraction of cache lines that
experience at least 1 bit of error (i.e., 1 bit ip) in each DIMM
(represented by each curve), categorized based on vendor.
We observe that we can reliably access data when DRAM
supply voltage is lowered below the nominal voltage level,
until a certain voltage value, Vmin, which is the minimum volt-
age level at which no bit errors occur. Furthermore, we nd
that we can reduce the voltage below Vmin to attain further
energy savings, but that errors start occurring in some of the
data read from memory. However, not all cache lines exhibit
1.025 1.05 1.075 1.1 1.125 1.15 1.175 1.2 1.25 1.3 1.35
Supply Voltage (V)
0
10−6
10−5
10−4
10−3
10−2
10−1
100
101
102
Fr
ac
tio
n o
f C
ac
he
 Li
ne
s
wi
th 
Er
ro
rs 
(%
)
Vendor A Vendor B Vendor C
Figure 1: The fraction of erroneous cache lines in eachDIMM
as we reduce the supply voltage, with a xed latency. Repro-
duced from [29].
errors for all supply voltage values below Vmin. Instead, the
number of erroneous cache lines for each DIMM increases
as we reduce the voltage further below Vmin. Specically,
Vendor A’s DIMMs experience a near-exponential increase
in errors as the supply voltage reduces below Vmin. This is
mainly due to themanufacturing process [90] and architectural
variation [87], which introduces strength and size variation
across the dierent DRAM cells within a chip.
We make two major conclusions: (i) the variation of errors
due to reduced-voltage operation across vendors is very sig-
nicant; and (ii) in most cases, there is a signicant margin
in the voltage specication, i.e., Vmin for each chip is signi-
cantly lower than the manufacturer-specied supply voltage
value.
2.2. Longer Access Latency Mitigates
Voltage-Induced Errors
We observe that while reducing the voltage below Vmin
introduces bit errors in the data, we can prevent these errors
if we increase the timing parameters of three major DRAM
operations, i.e., activation, restoration, and precharge [27, 29,
55, 87, 90].1 When the supply voltage is reduced, the DRAM
cell capacitor charge takes a longer time to change, thereby
causing these DRAM operations to become slower to com-
plete. Errors are introduced into the data when the memory
controller does not account for this slowdown in the DRAM
operations. We nd that if the memory controller allocates
extra time for these operations to nish when the supply
voltage is below Vmin, errors no longer occur. We validate,
analyze, and explain this behavior using SPICE simulation
of a detailed circuit-level model, which we have openly re-
leased online [124]. Sections 4.1 and 4.2 of our SIGMETRICS
2017 paper [29] provide our extensive circuit-level analyses,
validated using data from real DRAM chips.
2.3. Spatial Locality of Errors
While reducing the supply voltage induces errors when
the DRAM latency is not long enough, we also show that not
1We refer the reader to our prior works [26, 27, 28, 29, 54, 55, 72, 75, 77,
78, 79, 80, 87, 88, 90, 91, 92, 96, 97, 112, 128, 129] for a detailed background on
DRAM.
2
all DRAM locations experience errors at all supply voltage
levels. To understand the locality of the errors induced by a
low supply voltage, we show the probability of each DRAM
row in a DIMM experiencing at least one bit of error across
all experiments.
Figure 2 shows the probability of each row experiencing
at least a one-bit error due to reduced voltage in the two
representative DIMMs. For each DIMM, we choose the supply
voltage at which errors start appearing (i.e., the voltage level
one step below Vmin), and we do not increase the DRAM
access latency (i.e., keep it at 10ns for both tRCD and tRP,
which are the activation and precharge timing parameters,
respectively). The x-axis and y-axis indicate the bank number
and row number (in thousands), respectively. Our tested
DIMMs are divided into eight banks, and each bank consists
of 32K rows of cells. Additional results showing the error
locations at dierent voltage levels are in our SIGMETRICS
2017 paper [29].
(a) DIMM B6 of vendor B at 1.05V.
(b) DIMM C2 of vendor C at 1.20V.
Figure 2: The probability of error occurrence for two repre-
sentative DIMMs, categorized into dierent rows and banks,
due to reduced voltage. Reproduced from [29].
The major observation is that when only a small number
of errors occur due to reduced supply voltage, these errors
tend to cluster physically in certain regions of a DRAM chip,
as opposed to being randomly distributed throughout the
chip.2 This observation implies that when we reduce the
supply voltage to the DRAM array, we need to increase the
fundamental operation latencies for only the regions where
errors can occur.
2We believe this observation is due to both process and architectural
variation across dierent regions in the DRAM chip.
2.4. Impact on Refresh Rate
Commodity DRAM chips guarantee that all cells can safely
retain data for 64ms, after which the cells are refreshed to
replenish charge that leaks out of the capacitors [26, 96, 97].
We observe that the eect of the supply voltage on retention
times is not statistically signicant. Even when we reduce the
supply voltage from 1.35V to 1.15V (i.e., a 15% reduction), the
rate at which charge leaks from the capacitors is so slow that
no data is lost during the 64ms refresh interval at both 20℃
and 70℃. Therefore, we conclude that using a reduced supply
voltage does not require any changes to the standard refresh
interval at 20℃ and 70℃. Detailed results are in Section 4.6
of our SIGMETRICS 2017 paper [29].
2.5. Other Experimental Observations
We refer the reader to our SIGMETRICS 2017 paper [29]
for more details on the other two key observations. First, we
nd that the most commonly-used ECC scheme, SECDED [66,
99, 132], is unlikely to alleviate errors induced by a low sup-
ply voltage. This is because lowering voltage increases the
fraction of data that contains more than two bits of errors,
exceeding the one-bit correction capability of SECDED (see
Section 4.4 of our SIGMETRICS 2017 paper [29]). Second,
temperature aects the reliable access latency at low sup-
ply voltage levels and the eect is very vendor-dependent
(see Section 4.5 of our SIGMETRICS 2017 paper [29]). Out of
the three major vendors whose DIMMs we evaluate, DIMMs
from two vendors require longer activation and precharge
latencies to operate reliably at high temperature under low
supply voltage. The main reason is that DRAM chips become
slower at higher temperature [24, 87, 90].
3. Exploiting Reduced-Voltage Behavior
Based on the extensive understanding we have developed
on reduced-voltage operation of real DRAM chips, we pro-
pose a new mechanism called Voltron, which reduces DRAM
energy without sacricing memory throughput. Voltron ex-
ploits the fundamental observation that reducing the supply
voltage to DRAM requires increasing the latency of the three
DRAM operations in order to prevent errors. Using this ob-
servation, the key idea of Voltron is to use a performance
model to determine by how much to reduce the DRAM supply
voltage, without introducing errors and without exceeding a
user-specied threshold for performance loss. Voltron con-
sists of two main components: (i) array voltage scaling and
(ii) performance-aware voltage control.
3.1. Components of Voltron
Array Voltage Scaling. Unlike prior works, Voltron does
not reduce the voltage of the peripheral circuitry, which is
responsible for transferring commands and data between
the memory controller and the DRAM chip. If Voltron were
to reduce the voltage of the peripheral circuitry, we would
3
have to also reduce the operating frequency of DRAM. A re-
duction in the operating frequency reduces the memory data
throughput, which can signicantly degrade the performance
of applications that require high memory bandwidth. Instead,
Voltron reduces the voltage supplied to only the DRAM ar-
ray without changing the voltage supplied to the peripheral
circuitry, thereby allowing the DRAM channel to maintain
a high frequency while reducing the power consumption
of the DRAM array. To prevent errors from occurring dur-
ing reduced-voltage operation, Voltron increases the latency
of the three DRAM operations (activation, restoration, and
precharge) based our observation in Section 2.2.
Performance-Aware Voltage Control. Array voltage scal-
ing provides system users with the ability to decrease DRAM
array voltage (Varray) to reduce DRAM power. Employing a
lower Varray provides greater power savings, but at the cost
of longer DRAM access latency, which leads to larger perfor-
mance degradation. This trade-o varies widely across dier-
ent applications, as each application has a dierent tolerance
to the increased memory latency. This raises the question of
how to pick a “suitable” array voltage level for dierent appli-
cations as a system user or designer. For our evaluations, we
say that an array voltage level is suitable if it does not degrade
system performance by more than a user-specied threshold.
Our goal is to provide a simple technique that can automati-
cally select a suitable Varray value for dierent applications.
To this end, we propose performance-aware voltage control, a
power–performance management policy that selects a mini-
mum Varray which satises a desired performance constraint.
The key observation is that an application’s performance loss
(due to increased memory latency) scales linearly with the ap-
plication’s memory demand (e.g., memory intensity). Based
on this empirical observation we make, we build a perfor-
mance loss predictor that leverages a linear model to predict
an application’s performance loss based on its characteristics
and the eect of dierent voltage level choices at runtime.
Using the performance loss predictor, Voltron nds a value
of Varray that can keep the predicted performance within the
user-specied target at runtime. We refer the reader to Sec-
tion 5.2 of our SIGMETRICS 2017 paper [29] for more detail
and for an evaluation of the performance model alone.
3.2. Evaluation
We evaluate the system-level energy and performance im-
pact of Voltron using Ramulator [75, 124], integrated with
McPAT [93] and DRAMPower [25] for modeling the energy
consumption of both the processor and DRAM. Our work-
loads consist of 27 benchmarks from SPEC CPU2006 [134]
and YCSB [34]. We evaluate Voltron with a target perfor-
mance loss of 5%. Voltron executes the performance-aware
voltage control mechanism once every four million cycles.
We refer the reader to Section 6.1 of our SIGMETRICS 2017 pa-
per [29] for more detail on the system conguration and work-
loads. We qualitatively and quantitatively compare Voltron to
MemDVFS, a dynamic DRAM frequency and voltage scaling
mechanism proposed by prior work [36].
Figure 3 shows the system energy savings and the system
performance (i.e., weighted speedup [43, 131]) loss due to
MemDVFS and Voltron, compared to a baseline DRAM with
a supply voltage of 1.35V. The graph uses box plots to show
the distribution among all workloads that are categorized
as either non-memory-intensive or memory-intensive. The
memory intensity is determined based on the commonly-used
metric MPKI (last-level cache misses per kilo-instruction). We
categorize an application as memory intensive when its MPKI
is greater than or equal to 15. We make two observations.
1on-IntenVLve IntenVLve
0
1
2
3
4
5
6
6
yV
te
P
 3
er
Io
rP
Dn
Fe
Lo
VV
 (%
)
0ePDV)6 Voltron
1on-Intensive Intensive
−2
0
2
4
6
8
10
12
6
ys
te
m
 (
ne
rg
y
6
av
in
gs
 (%
)
1on-IntenVLve IntenVLve
0
1
2
3
4
5
6
6
yV
te
P
 3
er
Io
rP
Dn
Fe
Lo
VV
 (%
)
0ePDV)6 Voltron
Figure 3: Energy (left) and performance (right) comparison
between Voltron and MemDVFS on non-memory-intensive
and memory-intensive workloads. Adapted from [29].
First, Voltron is eective and saves more energy than
MemDVFS. MemDVFS has almost zero eect on memory-
intensive workloads. This is because MemDVFS avoids scal-
ing DRAM frequency (and hence voltage) when an applica-
tion’s memory bandwidth utilization is above a xed thresh-
old. Reducing the frequency can result in a large performance
loss since the memory-intensive workloads require high mem-
ory throughput. As memory-intensive applications have high
memory bandwidth consumption that easily exceeds the xed
threshold used by MemDVFS, MemDVFS cannot perform fre-
quency and voltage scaling during most of the execution time.
In contrast, Voltron reduces system energy by 7.0% on aver-
age for memory-intensive workloads. Thus, we demonstrate
that Voltron is an eective mechanism that improves system
energy eciency not only on non-memory-intensive appli-
cations, but also (especially) on memory-intensive workloads
where prior work was unable to do so.
Second, as shown in Figure 3 (right), Voltron consistently
selects aVarray value that satises the performance loss bound
of 5% across all workloads. Voltron incurs an average (max-
imum) performance loss of 2.5% (4.4%) and 2.9% (4.1%) for
non-memory-intensive and memory-intensive workloads, re-
spectively. This demonstrates that our performance model
enables Voltron to select a low voltage value that saves en-
ergy while bounding performance loss based on the user’s
requirement.
Our SIGMETRICS 2017 paper contains extensive perfor-
mance and energy analysis of the Voltron mechanism in Sec-
tions 6.2 to 6.8 [29]. In particular, we show that if we exploit
spatial locality of errors (Section 2.3), we can improve the
performance benets of Voltron, reducing the average per-
4
formance loss for memory-intensive workloads to 1.8% (see
Section 6.5 of our SIGMETRICS 2017 paper [29]). We refer the
reader to these sections for a detailed evaluation of Voltron.
4. Related Work
To our knowledge, this is the rst work to (i) experimen-
tally characterize the reliability and performance of modern
low-power DRAM chips under dierent supply voltages, and
(ii) introduce a new mechanism that reduces DRAM energy
while retaining high memory data throughput by adjusting
the DRAM array voltage. We briey discuss other prior work
in DRAM energy reduction.
DRAM Frequency and Voltage Scaling. Many prior
works propose to reduce DRAM energy by adjusting the
memory channel frequency and/or the DRAM supply voltage
dynamically. Deng et al. [39] propose MemScale, which scales
the frequency of DRAM at runtime based on a performance
predictor of an in-order processor. Other work focuses on
developing management policies to improve system energy
eciency by coordinating DRAM DFS with DVFS on the
CPU [12,37,38] or GPU [115]. In addition to frequency scaling,
David et al. [36] propose to scale the DRAM supply voltage
along with the memory channel frequency, based on the
memory bandwidth utilization of applications.
In contrast to all these works, our work focuses on a de-
tailed experimental characterization of real DRAM chips as
the supply voltage varies. Our study provides fundamental ob-
servations for potential mechanisms that can mitigate DRAM
and system energy consumption. Furthermore, frequency
scaling hurts memory throughput, and thus signicantly
degrades the system performance of especially memory-
intensive workloads (see Section 2.4 in our SIGMETRICS 2017
paper [29] for our quantitative analysis). We demonstrate
the importance and benets of exploiting our experimental
observations by proposing Voltron, one new example mecha-
nism that uses our observations to reduce DRAM and system
energy without sacricing memory throughput.
Low-Power Modes for DRAM. Modern DRAM chips
support various low-power standby modes. Entering and
exiting these modes incurs some amount of latency, which
delays memory requests that must be serviced. To increase
the opportunities to exploit these low-power modes, several
prior works propose mechanisms that increase periods of
memory idleness through data placement (e.g., [44, 83]) and
memory trac reshaping (e.g., [2, 9, 14, 40, 100]). Exploiting
low-power modes is orthogonal to our work on studying the
impact of reduced-voltage operation in DRAM. Furthermore,
low-power modes have a smaller eect on memory-intensive
workloads, which exhibit little idleness in memory accesses,
whereas, as we show in Section 3.2, our mechanism is espe-
cially eective on memory-intensive workloads.
Low-Power DDR DRAM Chips. Low-power DDR
(LPDDR) [59, 61, 112] is a specic type of DRAM that is opti-
mized for low-power systems like mobile devices. To reduce
power consumption, LPDDRx (currently in its 4th generation)
employs a few major design changes that dier from conven-
tional DDRx chips. First, LPDDRx uses a low-voltage swing
I/O interface that consumes 40% less I/O power than DDR4
DRAM [33]. Second, it supports additional low-power modes
with a lower supply voltage. Since the LPDDRx array design
remains the same as DDRx, our observations on the correla-
tion between access latency and array voltage are applicable
to LPDDRx DRAM as well. Voltron, our proposal, can provide
signicant benets in LPDDRx, since array energy consump-
tion is signicantly higher than the energy consumption of
peripheral circuitry in LPDDRx chips [33]. We leave the de-
tailed evaluation of LPDDRx chips for future work since our
current experimental platform is not capable of evaluating
them. Two recent experimental works [72, 112] examine the
retention time behavior of LPDDRx chips and nd it to be
similar to DDRx chips.
Low-PowerDRAMArchitectures. Prior works (e.g., [31,
35, 137, 150]) propose to modify the DRAM chip architecture
to reduce the ACTIVATE power by activating only a fraction of
a row instead of the entire row. Another common technique,
called sub-ranking or mini-ranks, reduces dynamic DRAM
power by accessing data from a subset of chips from a DRAM
module [139, 145, 152]. A couple of prior works [102, 144]
propose DRAM module architectures that integrate many
low-frequency LPDDR chips to enable DRAM power reduc-
tion. These proposed changes to DRAM chips or DIMMs are
orthogonal to our work.
Reducing Refresh Power. In modern DRAM chips, al-
though dierent DRAM cells have widely dierent retention
times [74,96,112], memory controllers conservatively refresh
all of the cells based on the retention time of a small fraction of
weak cells, which have the longest retention time out of all of
the cells. To reduce DRAM refresh power, many prior works
(e.g., [3,11,13,68,69,70,71,95,96,97,106,108,110,112,119,138])
propose mechanisms to reduce unnecessary refresh opera-
tions, and, thus, refresh power, by characterizing the reten-
tion time prole (i.e., the data retention behavior of each cell)
within the DRAM chips. However, these techniques do not
reduce the power of other DRAM operations, and these prior
works do not provide an experimental characterization of the
eect of reduced voltage levels on data retention time.
Improving DRAMEnergy Eciency by Reducing La-
tency or Improving Parallelism. Various prior works
(e.g., [26, 28, 54, 55, 80, 87, 88, 89, 90, 91, 92, 107, 128, 129, 130])
improve DRAM energy eciency by reducing the execu-
tion time through techniques that reduce the DRAM access
latency or improve parallelism between memory requests.
These mechanisms are orthogonal to ours, because they do
not reduce the voltage level of DRAM.
Improving Energy Eciency by Processing in Mem-
ory. Various prior works [4, 5, 6, 10, 16, 17, 28, 41, 45, 46, 48, 49,
50, 51, 53, 57, 58, 67, 73, 81, 101, 111, 113, 114, 118, 126, 127, 129,
130, 135, 136, 149] examine processing in memory to improve
5
energy eciency. Our analyses and techniques can be com-
bined with these works to enable low-voltage operation in
processing-in-memory engines.
Experimental Studies of DRAM Chips. Recent works
experimentally investigate various reliability, data retention,
and latency characteristics of modern DRAM chips [24, 27,
54, 63, 64, 70, 71, 76, 77, 87, 89, 90, 96, 97, 104, 112, 125, 132, 133]
usually using FPGA-based DRAM testing infrastructures, like
SoftMC [54], or using large-scale data from the eld. None
of these works study these characteristics under reduced-
voltage operation, which we do in this paper.
Reduced-Voltage Operation in SRAM Caches. Prior
works propose dierent techniques to enable SRAM caches
to operate under reduced voltage levels (e.g., [7, 8, 32, 123, 141,
142]). These works are orthogonal to our experimental study
because we focus on understanding and enabling reduced-
voltage operation in DRAM, which is a signicantly dierent
memory technology than SRAM.
5. Signicance
Our SIGMETRICS 2017 paper [29] presents a new set of
detailed experimental characterization and analyses on the
voltage-latency-reliability trade-os in modern DRAM chips.
In this section, we describe the potential impact that our study
can bring to the research community and industry.
5.1. Potential Industry Impact
We believe our experimental characterization results and
proposed mechanism can have signicant impact in fast-
growing data centers as well as mobile systems, where DRAM
power consumption is growing due to higher demand for
memory capacity for certain types of service (e.g., mem-
cached). To reduce the energy and power consumed by
DRAM, DRAM manufacturers have been decreasing the sup-
ply voltage of DRAM chips with newer DRAM standards
(e.g., DDR4) or low-voltage variants of DDR, such as LPDDR4
(Low-Power DDR4) and DDR3L (DDR3 Low-voltage). How-
ever, the supply voltage reduction has been conservative with
each new DDR standard, which takes years to be adopted by
the vendors and the market. For example, since the release
of DDR3L (1.35V) in 2010, the supply voltage has reduced by
only 11% with the latest DDR4 standard (1.2V) released in
2014. Furthermore, since the release of DDR4 in 2014, the sup-
ply voltage for most commodity DDR4 chips has remained at
1.2V. As a result, further reducing DRAM supply voltage be-
low the standard voltage, as we do in our SIGMETRICS 2017
paper [29], can be a very eective way of reducing DRAM
power consumption. However, to do so, we need to carefully
and rigorously understand how DRAM chips behave under
reduced-voltage operation.
To enable the development of new mechanisms that lever-
age reduce-voltage operation in DRAM, we provide the rst
set of comprehensive experimental results on the eect of
using a wide range of dierent supply voltage values on the
reliability, latency, and retention characteristics of DRAM
chips. In this work, we demonstrate how we can use our
experimental data to design a new mechanism, Voltron (Sec-
tion 3), which reduces DRAM energy consumption through
voltage reduction. Therefore, we believe that understanding
and leveraging reduced-voltage operation will help industry
improve the energy eciency of memory subsystems.
5.2. Potential Research Impact
Our paper sheds new light on the feasibility of enabling
reduced-voltage operation in manufactured DRAM chips.
One important research question that our work raises is how
do modern DRAM chips behave under a wide range of supply
voltage levels? Existing systems are limited to a few DRAM
power states, which prevent DRAM from serving memory
accesses when it enters a low-power state. However, in our
work, we show that it is possible to operate commodity DRAM
chips under a wide range of supply voltage levels while still
being able to serve memory accesses under a dierent set of
trade-os. To facilitate further research initiative to exploit
reduced-voltage operation in DRAM chips, we have open-
sourced our characterization results, FPGA-based testing plat-
form [54], and DRAM SPICE circuit model (for validation)
in our GitHub repository [124]. We believe that these tools
can be extended for other research objectives besides study-
ing voltage reduction in DRAM. One potential direction is to
leverage our results to design mechanisms that reduce DRAM
latency by operating DRAM at a higher supply voltage.
5.3. Applicability to Other Memory Technologies
We believe the high-level ideas of our work can be lever-
aged in the context of other memory technologies, such as
NAND ash memory [19, 20, 21], PCM [84, 85, 86, 103, 120,
121, 146, 147], STT-MRAM [30, 52, 82, 103, 109], RRAM [143],
or hybrid memory systems [1, 15, 42, 47, 62, 94, 98, 103, 116,
117, 121, 122, 146, 148, 151]. A recent work on NAND ash
memory, for example, proposes reducing the pass-through
voltage [18, 19, 20, 21] to reduce read disturb errors, which
in turn saves energy. We refer the reader to past works on
NAND ash memory for a more detailed analysis of reliability-
voltage trade-os [18, 19, 20, 21, 22, 23]. We hope our work in-
spires characterization and understanding of reduced-voltage
operation in other memory technologies, with the goal of
enabling a more energy-ecient system design.
6. Conclusion
Our SIGMETRICS 2017 paper [29] provides the rst ex-
perimental study that comprehensively characterizes and
analyzes the behavior of DRAM chips when the supply volt-
age is reduced below its nominal value. We demonstrate,
using 124 DDR3L DRAM chips, that the DRAM supply volt-
age can be reliably reduced to a certain level, beyond which
errors arise within the data. We then experimentally demon-
strate the relationship between the supply voltage and the
6
latency of the fundamental DRAM operations (activation,
restoration, and precharge). We show that bit errors caused
by reduced-voltage operation can be eliminated by increas-
ing the latency of the three fundamental DRAM operations.
By changing the memory controller conguration to allow
for the longer latency of these operations, we can thus fur-
ther lower the supply voltage without inducing errors in the
data. We also experimentally characterize the relationship
between reduced supply voltage and error locations, stored
data patterns, temperature, and data retention.
Based on these observations, we propose and evaluate
Voltron, a low-cost energy reduction mechanism that reduces
DRAM energy without aecting memory data throughput.
Voltron reduces the supply voltage for only the DRAM array,
while maintaining the nominal voltage for the peripheral cir-
cuitry to continue operating the memory channel at a high
frequency. Voltron uses a new piecewise linear performance
model to nd the array supply voltage that maximizes the
system energy reduction within a given performance loss
target. Our experimental evaluations across a wide variety of
workloads demonstrate that Voltron signicantly reduces sys-
tem energy consumption with only very modest performance
loss.
We conclude that it is very promising to understand and
exploit reduced-voltage operation in modern DRAM chips.
We hope that the experimental characterization, analysis, and
optimization techniques presented in our SIGMETRICS 2017
paper will enable the development of other new mechanisms
that can eectively exploit the trade-os between voltage,
reliability, and latency in DRAM to improve system perfor-
mance, eciency, and/or reliability. We also hope that our pa-
per’s studies inspire new experimental studies to understand
reduced-voltage operation in other memory technologies,
such as NAND ash memory, PCM, and STT-MRAM.
Acknowledgments
We thank the anonymous reviewers of SIGMETRICS 2017
and SAFARI group members for their feedback. We ac-
knowledge the support of Google, Intel, NVIDIA, Samsung,
VMware, and the United States Department of Energy. This
research was supported in part by the ISTC-CC, SRC, and NSF
(grants 1212962 and 1320531). Kevin Chang was supported
in part by an SRCEA/Intel Fellowship.
References
[1] N. Agarwal and T. F. Wenisch, “Thermostat: Application-Transparent Page Man-
agement for Two-Tiered Main Memory,” in ASPLOS, 2017.
[2] N. Aggarwal et al., “Power-Ecient DRAM Speculation,” in HPCA, 2008.
[3] A. Agrawal et al., “Mosaic: Exploiting the spatial locality of process variation to
reduce refresh energy in on-chip eDRAM modules,” in HPCA, 2014.
[4] J. Ahn et al., “A Scalable Processing-in-Memory Accelerator for Parallel Graph
Processing,” in ISCA, 2015.
[5] J. Ahn et al., “PIM-Enabled Instructions: A Low-Overhead, Locality-Aware
Processing-in-Memory Architecture,” in ISCA, 2015.
[6] B. Akin et al., “Data Reorganization in Memory Using 3D-stacked DRAM,” in
ISCA, 2015.
[7] A. R. Alameldeen et al., “Adaptive Cache Design to Enable Reliable Low-Voltage
Operation,” IEEE TC, 2011.
[8] A. R. Alameldeen et al., “Energy-Ecient Cache Design Using Variable-Strength
Error-Correcting Codes,” in ISCA, 2011.
[9] A. M. Amin and Z. A. Chishti, “Rank-aware Cache Replacement and Write Buer-
ing to Improve DRAM Energy Eciency,” in ISLPED, 2010.
[10] O. O. Babarinsa and S. Idreos, “Jafar: Near-data processing for databases,” in
SIGMOD, 2015.
[11] S. Baek et al., “Refresh Now and Then,” IEEE TC, vol. 63, no. 12, pp. 3114–3126,
2014.
[12] R. Begum et al., “Energy-Performance Trade-os on Energy-Constrained De-
vices with Multi-component DVFS,” in IISWC, 2015.
[13] I. Bhati et al., “Flexible Auto-refresh: Enabling Scalable and Energy-ecient
DRAM Refresh Reductions,” in ISCA, 2015.
[14] M. Bi et al., “Delay-Hiding Energy Management Mechanisms for DRAM,” in
HPCA, 2010.
[15] S. Bock et al., “Concurrent Migration of Multiple Pages in Software-Managed
Hybrid Main Memory,” in ICCD, 2016.
[16] A. Boroumand et al., “LazyPIM: An Ecient Cache Coherence Mechanism for
Processing-in-Memory,” CAL, 2016.
[17] A. Boroumand et al., “Google Workloads for Consumer Devices: Mitigating Data
Movement Bottlenecks,” in ASPLOS, 2018.
[18] Y. Cai et al., “Read Disturb Errors in MLC NAND Flash Memory: Characteriza-
tion and Mitigation,” in DSN, 2015.
[19] Y. Cai et al., “Error Characterization, Mitigation, and Recovery in Flash-Memory-
Based Solid-State Drives,” Proceedings of the IEEE, 2017.
[20] Y. Cai et al., “Error Characterization, Mitigation, and Recovery in Flash Memory
Based Solid-State Drives,” arXiv:1706.08642 [cs.AR], 2017.
[21] Y. Cai et al., “Errors in Flash-Memory-Based Solid-State Drives: Analysis, Miti-
gation, and Recovery,” arXiv:1711.11427 [cs.AR], 2017.
[22] Y. Cai et al., “Vulnerabilities in MLC NAND Flash Memory Programming: Ex-
perimental Analysis, Exploits, and Mitigation Techniques,” in HPCA, 2017.
[23] Y. Cai et al., “Data Retention in MLC NAND Flash Memory: Characterization,
Optimization, and Recovery,” in HPCA, 2015.
[24] K. Chandrasekar et al., “Exploiting Expendable Process-Margins in DRAMs for
Run-Time Performance Optimization,” in DATE, 2014.
[25] K. Chandrasekar et al., “DRAMPower: Open-source DRAM Power & Energy Es-
timation Tool,” http://www.drampower.info.
[26] K. K. Chang et al., “Improving DRAM Performance by Parallelizing Refreshes
with Accesses,” in HPCA, 2014.
[27] K. K. Chang et al., “Understanding Latency Variation in Modern DRAM Chips:
Experimental Characterization, Analysis, and Optimization,” in SIGMETRICS,
2016.
[28] K. K. Chang et al., “Low-Cost Inter-Linked Subarrays (LISA): Enabling Fast Inter-
Subarray Data Movement in DRAM,” in HPCA, 2016.
[29] K. K. Chang et al., “Understanding Reduced-Voltage Operation in Modern DRAM
Devices: Experimental Characterization, Analysis, and Mechanisms,” in SIGMET-
RICS, 2017.
[30] M. T. Chang et al., “Technology Comparison for Large Last-Level Caches (L3Cs):
Low-Leakage SRAM, Low Write-Energy STT-RAM, and Refresh-Optimized
eDRAM,” in HPCA, 2013.
[31] N. Chatterjee et al., “Architecting an Energy-Ecient DRAM System for GPUs,”
in HPCA, 2017.
[32] Z. Chishti et al., “Improving Cache Lifetime Reliability at Ultra-Low Voltages,”
in MICRO, 2009.
[33] J. Choi, “LPDDR4: Evolution for new Mobile World,” in MEMCON, 2013.
Available: http://www.memcon.com/pdfs/proceedings2013/track1/LPDDR4_
Evolution_for_a_New_Mobile_World.pdf
[34] B. F. Cooper et al., “Benchmarking Cloud Serving Systems with YCSB,” in SOCC,
2010.
[35] E. Cooper-Balis and B. Jacob, “Fine-Grained Activation for Power Reduction in
DRAM,” IEEE Micro, vol. 30, no. 3, pp. 34–47, 2010.
[36] H. David et al., “Memory Power Management via Dynamic Voltage/Frequency
Scaling,” in ICAC, 2011.
[37] Q. Deng et al., “CoScale: Coordinating CPU and Memory System DVFS in Server
Systems,” in MICRO, 2012.
[38] Q. Deng et al., “MultiScale: Memory System DVFS with Multiple Memory Con-
trollers,” in ISLPED, 2012.
[39] Q. Deng et al., “MemScale: Active Low-power Modes for Main Memory,” in AS-
PLOS, 2011.
[40] B. Diniz et al., “Limiting the Power Consumption of Main Memory,” in ISCA,
2007.
[41] J. Draper et al., “The Architecture of the DIVA Processing-in-memory Chip,” in
ICS, 2002.
[42] S. R. Dulloor et al., “Data Tiering in Heterogeneous Memory Systems,” in EuroSys,
2016.
[43] S. Eyerman and L. Eeckhout, “System-Level Performance Metrics for Multipro-
gram Workloads,” IEEE Micro, 2008.
[44] X. Fan et al., “Memory Controller Policies for DRAM Power Management,” in
ISLPED, 2001.
[45] A. Farmahini-Farahani et al., “NDA: Near-DRAM acceleration architecture lever-
aging commodity DRAM devices and standard memory modules,” in HPCA,
2015.
7
[46] B. B. Fraguela et al., “Programming the FlexRAM Parallel Intelligent Memory
System,” in PPoPP, 2003.
[47] K. Gai et al., “Smart Energy-Aware Data Allocation for Heterogeneous Memory,”
in HPCC, 2016.
[48] M. Gao et al., “Practical near-data processing for in-memory analytics frame-
works,” in PACT, 2015.
[49] M. Gao and C. Kozyrakis, “HRL: Ecient and exible recongurable logic for
near-data processing,” in HPCA, 2016.
[50] M. Gokhale et al., “Processing in memory: the Terasys massively parallel PIM
array,” Computer, vol. 28, no. 4, pp. 23–31, 1995.
[51] Q. Guo et al., “3D-Stacked Memory-Side Acceleration: Accelerator and System
Design,” in WONDP, 2014.
[52] X. Guo et al., “Resistive Computation: Avoiding the Power Wall with Low-
Leakage, STT-MRAM Based Computing,” in ISCA, 2010.
[53] M. Hashemi et al., “Accelerating Dependent Cache Misses with an Enhanced
Memory Controller,” in ISCA, 2016.
[54] H. Hassan et al., “SoftMC: A Flexible and Practical Open-Source Infrastructure
for Enabling Experimental DRAM Studies,” in HPCA, 2017.
[55] H. Hassan et al., “ChargeCache: Reducing DRAM Latency by Exploiting Row
Access Locality,” in HPCA, 2016.
[56] U. Höelzle and L. A. Barroso, The Datacenter as a Computer: An Introduction to
the Design of Warehouse-Scale Machines. Morgan & Claypool, 2009.
[57] K. Hsieh et al., “Transparent Ooading and Mapping (TOM): Enabling
Programmer-Transparent Near-Data Processing in GPU Systems,” in ISCA, 2016.
[58] K. Hsieh et al., “Accelerating pointer chasing in 3D-stacked memory: Challenges,
mechanisms, evaluation,” in ICCD, 2016.
[59] JEDEC, “Low Power Double Data Rate 3 (LPDDR3),” 2012.
[60] JEDEC, “Addendum No.1 to JESD79-3 - 1.35V DDR3L-800, DDR3L-1066, DDR3L-
1333, DDR3L-1600, and DDR3L-1866,” 2013.
[61] JEDEC, “Low Power Double Data Rate 4 (LPDDR4),” 2014.
[62] X. Jiang et al., “CHOP: Adaptive Filter-Based DRAM Caching for CMP Server
Platforms,” in HPCA, 2010.
[63] M. Jung et al., “A New Bank Sensitive DRAMPower Model for Ecient Design
Space Exploration,” in PATMOS, 2016.
[64] M. Jung et al., “Reverse Engineering of DRAMs: Row Hammer with Crosshair,”
in MEMSYS, 2016.
[65] R. Kalla et al., “Power7: IBM’s Next-Generation Server Processor,” IEEE Micro,
vol. 30, no. 2, pp. 7–15, 2010.
[66] U. Kang et al., “Co-Architecting Controllers and DRAM to Enhance DRAM Pro-
cess Scaling,” in The Memory Forum, 2014.
[67] Y. Kang et al., “FlexRAM: toward an advanced intelligent memory system,” in
ICCD, 1999.
[68] S. Khan et al., “Detecting and Mitigating Data-Dependent DRAM Failures by
Exploiting Current Memory Content,” in MICRO, 2017.
[69] S. Khan et al., “A Case for Memory Content-Based Detection and Mitigation of
Data-Dependent Failures in DRAM,” CAL, 2016.
[70] S. Khan et al., “PARBOR: An Ecient System-Level Technique to Detect Data
Dependent Failures in DRAM,” in DSN, 2016.
[71] S. Khan et al., “The Ecacy of Error Mitigation Techniques for DRAM Retention
Failures: A Comparative Experimental Study,” in SIGMETRICS, 2014.
[72] J. S. Kim et al., “The DRAM Latency PUF: Quickly Evaluating Physical Unclon-
able Functions by Exploiting the Latency–Reliability Tradeo in Modern DRAM
Devices,” in HPCA, 2018.
[73] J. S. Kim et al., “GRIM-Filter: Fast Seed Location Filtering in DNA Read Mapping
Using Processing-in-Memory Technologies,” BMC Genomics, 2018.
[74] K. Kim and J. Lee, “A New Investigation of Data Retention Time in Truly
Nanoscaled DRAMs,” EDL, vol. 30, no. 8, pp. 846–848, 2009.
[75] Y. Kim et al., “Ramulator: A Fast and Extensible DRAM Simulator,” CAL, 2015.
[76] Y. Kim, “Architectural Techniques to Enhance DRAM Scaling,” Ph.D. dissertation,
Carnegie Mellon University, 2015.
[77] Y. Kim et al., “Flipping Bits in Memory Without Accessing Them: An Experimen-
tal Study of DRAM Disturbance Errors,” in ISCA, 2014.
[78] Y. Kim et al., “ATLAS: A Scalable and High-Performance Scheduling Algorithm
for Multiple Memory Controllers,” in HPCA, 2010.
[79] Y. Kim et al., “Thread Cluster Memory Scheduling: Exploiting Dierences in
Memory Access Behavior,” in MICRO, 2010.
[80] Y. Kim et al., “A Case for Exploiting Subarray-Level Parallelism (SALP) in
DRAM,” in ISCA, 2012.
[81] P. M. Kogge, “EXECUBE-A New Architecture for Scaleable MPPs,” in ICPP, 1994.
[82] E. Kultursay et al., “Evaluating STT-RAM as an energy-ecient main memory
alternative,” in ISPASS, 2013.
[83] A. R. Lebeck et al., “Power Aware Page Allocation,” in ASPLOS, 2000.
[84] B. C. Lee et al., “Architecting Phase Change Memory as a Scalable DRAM Alter-
native,” in ISCA, 2009.
[85] B. C. Lee et al., “Phase Change Memory Architecture and the Quest for Scalabil-
ity,” CACM, vol. 53, no. 7, pp. 99–106, 2010.
[86] B. C. Lee et al., “Phase-Change Technology and the Future of Main Memory,”
IEEE Micro, vol. 30, no. 1, pp. 143–143, 2010.
[87] D. Lee et al., “Design-Induced Latency Variation in Modern DRAM Chips: Char-
acterization, Analysis, and Latency Reduction Mechanisms,” in SIGMETRICS,
2017.
[88] D. Lee et al., “Decoupled Direct Memory Access: Isolating CPU and IO Trac
by Leveraging a Dual-Data-Port DRAM,” in PACT, 2015.
[89] D. Lee, “Reducing DRAM Latency at Low Cost by Exploiting Heterogeneity,”
Ph.D. dissertation, Carnegie Mellon University, 2016.
[90] D. Lee et al., “Adaptive-Latency DRAM: Optimizing DRAM Timing for the
Common-Case,” in HPCA, 2015.
[91] D. Lee et al., “Tiered-Latency DRAM: A Low Latency and Low Cost DRAM Ar-
chitecture,” in HPCA, 2013.
[92] D. Lee et al., “Simultaneous Multi Layer Access: A High Bandwidth and Low
Cost 3D-Stacked Memory Interface,” TACO, 2016.
[93] S. Li et al., “McPAT: An Integrated Power, Area, and Timing Modeling Frame-
work for Multicore and Manycore Architectures,” in MICRO, 2009.
[94] Y. Li et al., “Utility-Based Hybrid Memory Management,” in CLUSTER, 2017.
[95] C. H. Lin et al., “SECRET: Selective Error Correction for Refresh Energy Reduc-
tion in DRAMs,” in ICCD, 2012.
[96] J. Liu et al., “An Experimental Study of Data Retention Behavior in Modern
DRAM Devices: Implications for Retention Time Proling Mechanisms,” in ISCA,
2013.
[97] J. Liu et al., “RAIDR: Retention-Aware Intelligent DRAM Refresh,” in ISCA, 2012.
[98] L. Liu et al., “Memos: A Full Hierarchy Hybrid Memory Management Frame-
work,” in ICCD, 2016.
[99] Y. Luo et al., “Characterizing Application Memory Error Vulnerability to Opti-
mize Datacenter Cost via Heterogeneous-Reliability Memory,” in DSN, 2014.
[100] C. Lyuh and T. Kim, “Memory Access Scheduling and Binding Considering En-
ergy Minimization in Multi-Bank Memory Systems,” in DAC, 2004.
[101] K. Mai et al., “Smart memories: a modular recongurable architecture,” in ISCA,
2000.
[102] K. T. Malladi et al., “Towards Energy-Proportional Datacenter Memory with Mo-
bile DRAM,” in ISCA, 2012.
[103] J. Meza et al., “A Case for Ecient Hardware/Software Cooperative Management
of Storage and Memory,” in WEED, 2013.
[104] J. Meza et al., “Revisiting Memory Errors in Large-Scale Production Data Centers:
Analysis and Modeling of New Trends from the Field,” in DSN, 2015.
[105] T. Mudge, “Power: a rst-class architectural design constraint,” Computer, vol. 34,
no. 4, pp. 52–58, 2001.
[106] O. Mutlu, “The RowHammer problem and other issues we may face as memory
becomes denser,” in DATE, 2017.
[107] O. Mutlu, “Memory Scaling: A Systems Architecture Perspective,” IMW, 2013.
[108] O. Mutlu and L. Subramanian, “Research Problems and Opportunities in Memory
Systems,” SUPERFRI, 2014.
[109] H. Naeimi et al., “STT-RAM Scaling and Retention Failure,” Intel Technology Jour-
nal, 2013.
[110] T. Ohsawa et al., “Optimizing the DRAM Refresh Count for Merged DRAM/Logic
LSIs,” in ISLPED, 1998.
[111] M. Oskin et al., “Active pages: a computation model for intelligent memory,” in
ISCA, 1998.
[112] M. Patel et al., “The Reach Proler (REAPER): Enabling the Mitigation of DRAM
Retention Failures via Proling at Aggressive Conditions,” in ISCA, 2017.
[113] D. Patterson et al., “A Case for Intelligent RAM,” IEEE Micro, 1997.
[114] A. Pattnaik et al., “Scheduling Techniques for GPU Architectures with
Processing-In-Memory Capabilities,” in PACT, 2016.
[115] I. Paul et al., “Harmonia: Balancing Compute and Memory Power in High-
performance GPUs,” in ISCA, 2015.
[116] A. J. Peña and P. Balaji, “Toward the Ecient Use of Multiple Explicitly Managed
Memory Subsystems,” in CLUSTER, 2014.
[117] S. Phadke and S. Narayanasamy, “MLP aware heterogeneous memory system,”
in DATE, 2011.
[118] S. H. Pugsley et al., “NDC: Analyzing the impact of 3D-stacked memory+logic
devices on MapReduce workloads,” in ISPASS, 2014.
[119] M. K. Qureshi et al., “AVATAR: A Variable-Retention-Time (VRT) Aware Refresh
for DRAM Systems,” in DSN, 2015.
[120] M. K. Qureshi et al., “Enhancing Lifetime and Security of PCM-based Main Mem-
ory with Start-gap Wear Leveling,” in MICRO, 2009.
[121] M. K. Qureshi et al., “Scalable High Performance Main Memory System Using
Phase-change Memory Technology,” in ISCA, 2009.
[122] L. E. Ramos et al., “Page Placement in Hybrid Memory Systems,” in ICS, 2011.
[123] D. Roberts et al., “On-Chip Cache Device Scaling Limits and Eective Fault Re-
pair Techniques in Future Nanoscale Technology,” in DSD, 2007.
[124] SAFARI Research Group, “SAFARI Software Tools – GitHub Repository,” https:
//github.com/CMU-SAFARI.
[125] B. Schroeder et al., “DRAM Errors in the Wild: A Large-Scale Field Study,” in
SIGMETRICS, 2009.
[126] V. Seshadri et al., “Fast Bulk Bitwise AND and OR in DRAM,” CAL, 2015.
[127] V. Seshadri, “Simple DRAM and Virtual Memory Abstractions to Enable Highly
Ecient Memory Systems,” Ph.D. dissertation, Carnegie Mellon University, 2016.
[128] V. Seshadri et al., “RowClone: Fast and Energy-Ecient In-DRAM Bulk Data
Copy and Initialization,” in MICRO, 2013.
[129] V. Seshadri et al., “Ambit: In-Memory Accelerator for Bulk Bitwise Operations
Using Commodity DRAM Technology,” in MICRO, 2017.
[130] V. Seshadri et al., “Gather-Scatter DRAM: In-DRAM Address Translation to Im-
prove the Spatial Locality of Non-Unit Strided Accesses,” in MICRO, 2015.
8
[131] A. Snavely and D. Tullsen, “Symbiotic Jobscheduling for a Simultaneous Multi-
threading Processor,” in ASPLOS, 2000.
[132] V. Sridharan et al., “Memory Errors in Modern Systems: The Good, The Bad, and
The Ugly,” in ASPLOS, 2015.
[133] V. Sridharan and D. Liberty, “A Study of DRAM Failures in the Field,” in SC, 2012.
[134] Standard Performance Evaluation Corp., “SPEC CPU2006 Benchmarks,”
http://www.spec.org/cpu2006.
[135] H. S. Stone, “A Logic-in-Memory Computer,” IEEE TC, 1970.
[136] Z. Sura et al., “Data access optimization in a processing-in-memory system,” in
CF, 2015.
[137] A. N. Udipi et al., “Rethinking DRAM Design and Organization for Energy-
Constrained Multi-Cores,” in ISCA, 2010.
[138] R. Venkatesan et al., “Retention-Aware Placement in DRAM (RAPID): Software
Methods for Quasi-Non-Volatile DRAM,” in HPCA, 2006.
[139] F. A. Ware and C. Hampel, “Improving Power and Data Eciency with Threaded
Memory Modules,” in ICCD, 2006.
[140] M. Ware et al., “Architecting for Power Management: The IBM® POWER7™
Approach,” in HPCA, 2010.
[141] C. Wilkerson et al., “Trading O Cache Capacity for Reliability to Enable Low
Voltage Operation,” in ISCA, 2008.
[142] C. Wilkerson et al., “Trading O Cache Capacity for Low-Voltage Operation,”
IEEE Micro, 2009.
[143] H.-S. P. Wong et al., “Metal-Oxide RRAM,” Proc. IEEE, 2012.
[144] D. H. Yoon et al., “BOOM: Enabling Mobile Memory Based Low-power Server
DIMMs,” in ISCA, 2012.
[145] D. H. Yoon et al., “Adaptive Granularity Memory Systems: A Tradeo Between
Storage Eciency and Throughput,” in ISCA, 2011.
[146] H. Yoon et al., “Row Buer Locality Aware Caching Policies for Hybrid Memo-
ries,” in ICCD, 2012.
[147] H. Yoon et al., “Ecient Data Mapping and Buering Techniques for Multilevel
Cell Phase-Change Memories,” TACO, vol. 11, no. 4, pp. 40:1–40:25, 2014.
[148] X. Yu et al., “Banshee: Bandwidth-Ecient DRAM Caching via Software/Hard-
ware Cooperation,” in MICRO, 2017.
[149] D. Zhang et al., “TOP-PIM: Throughput-Oriented Programmable Processing in
Memory,” in HPDC, 2014.
[150] T. Zhang et al., “Half-DRAM: A High-Bandwidth and Low-Power DRAM Archi-
tecture from the Rethinking of Fine-grained Activation,” in ISCA, 2014.
[151] W. Zhang and T. Li, “Exploring Phase Change Memory and 3D Die-Stacking
for Power/Thermal Friendly, Fast and Durable Memory Architectures,” in PACT,
Raleigh, NC, September 2009, pp. 101–112.
[152] H. Zheng et al., “Mini-rank: Adaptive DRAM Architecture for Improving Mem-
ory Power Eciency,” in MICRO, 2008.
9
