Characterizing, Exploiting, and Mitigating Vulnerabilities in MLC NAND
  Flash Memory Programming by Cai, Yu et al.
Characterizing, Exploiting, and Mitigating
Vulnerabilities in MLC NAND Flash Memory Programming
Yu Cai1 Saugata Ghose2 Yixin Luo1,2
Ken Mai2 Onur Mutlu3,2 Erich F. Haratsch1
1Seagate Technology 2Carnegie Mellon University 3ETH Zürich
This paper summarizes our work on experimentally analyz-
ing, exploiting, and addressing vulnerabilities in multi-level
cell NAND ash memory programming, which was published
in the industrial session of HPCA 2017 [9], and examines the
work’s signicance and future potential. Modern NAND ash
memory chips use multi-level cells (MLC), which store two bits
of data in each cell, to improve chip density. As MLC NAND
ash memory scaled down to smaller manufacturing process
technologies, manufacturers adopted a two-step programming
method to improve reliability. In two-step programming, the
two bits of a multi-level cell are programmed using two separate
steps, in order to minimize the amount of cell-to-cell program
interference induced on neighboring ash cells.
In this work, we demonstrate that two-step programming
exposes new reliability and security vulnerabilities in state-of-
the-art MLC NAND ash memory. We experimentally char-
acterize contemporary 1X-nm (i.e., 15–19nm) ash memory
chips, and nd that a partially-programmed ash cell (i.e., a
cell where the second programming step has not yet been per-
formed) is much more vulnerable to cell-to-cell interference and
read disturb than a fully-programmed cell. We show that it
is possible to exploit these vulnerabilities on solid-state drives
(SSDs) to alter the partially-programmed data, causing (poten-
tially malicious) data corruption. Based on our observations,
we propose several new mechanisms that eliminate or mitigate
these vulnerabilities in partially-programmed cells, and at the
same time increase ash memory lifetime by 16%.
1. Introduction
Solid-state drives (SSDs), which consist of NAND ash
memory chips, are widely used for storage today due to sig-
nicant decreases in the per-bit cost of NAND ash memory,
which, in turn, have driven great increases in SSD capacity.
These improvements have been enabled by both aggressive
process technology scaling and the development of multi-
level cell (MLC) technology. NAND ash memory stores data
by changing the threshold voltage of each ash cell, where a
cell consists of a oating-gate transistor [44, 74, 81]. In single-
level cell (SLC) ash memory, the threshold voltage range
could represent only a single bit of data. A multi-level cell
uses the same threshold voltage range to represent two bits
of data within a single cell (i.e., the range is split up into four
windows, known as states, where each state represents one
of the data values 00, 01, 10, or 11), thereby doubling storage
capacity [11,20,37,63,92,114]. In a NAND ash memory chip,
a row of cells is connected together by a common wordline,
which typically spans 32K–64K cells. Each wordline contains
two pages of data, where a page is the granularity at which
the data is read and written (i.e., programmed). The most
signicant bits (MSBs) of all cells on the same wordline are
combined to form an MSB page, and the least signicant bits
(LSBs) of all cells on the wordline are combined to form an
LSB page [13].
To precisely control the threshold voltage of a ash cell, the
ash memory device uses incremental step pulse programming
(ISPP) [20, 37, 63, 114]. ISPP applies multiple short pulses of
a high programming voltage to each cell in the wordline
being programmed, with each pulse increasing the threshold
voltage of the cell by some small amount. SLC and older
MLC devices programmed the threshold voltage in one shot,
issuing all of the pulses back-to-back to program both bits
of data at the same time. However, as ash memory scales
down to smaller technology nodes, the distance between
neighboring ash cells decreases, which in turn increases the
program interference that occurs due to cell-to-cell coupling.
This program interference causes errors to be introduced into
neighboring cells during programming [13,16,29,66,68,92]. To
reduce this interference by half [13], manufacturers have been
using two-step programming for MLC NAND ash memory
since the 40nm technology node [92]. A large fraction of
SSDs on the market today use sub-40nm MLC NAND ash
memory.
Two-step programming stores each bit within an MLC ash
memory cell using two separate, partial programming steps, as
shown in Figure 1. An unprogrammed cell starts in the erased
(ER) state. The rst programming step programs the LSB page:
for each ash cell within the page, the cell is partially pro-
grammed depending on the LSB being written to the cell. If
the LSB of the cell should be 0, the cell is programmed into a
temporary program state (TP); otherwise, it remains in the
ER state. The maximum voltage of a partially-programmed
cell is approximately half of the maximum possible thresh-
old voltage of a fully-programmed ash cell. In its second
step, two-step programming programs the MSB page: it reads
the LSB value into a buer inside the ash chip (called the
internal LSB buer) to determine the partially-programmed
state of the cell’s threshold voltage, and then partially pro-
grams the cell again, depending on whether the MSB of the
cell is a 0 or a 1. The second programming step moves the
ar
X
iv
:1
80
5.
03
29
1v
1 
 [c
s.A
R]
  8
 M
ay
 20
18
threshold voltage from the partially-programmed state to
the desired nal state (i.e., ER, P1, P2, or P3). By breaking
MLC programming into two separate steps, manufacturers
halve the program interference of each programming oper-
ation [13, 68]. The SSD controller employs shadow program
sequencing [6, 7, 8, 13, 25, 91], which interleaves the partial
programming steps of a cell with the partial programming
steps of neighboring cells to ensure that a fully-programmed
cell experiences interference only from a single neighboring
partial programming step.1
Unprogrammed
1. Program LSB
2. Program MSB
Temporary Vth
Starting Vth
ER
11
P1
01
P2
00
P3
10
ER
XX
Final Vth
ER
X1
TP
X0
Pr
ob
ab
ilit
y
De
ns
ity
Pr
ob
ab
ilit
y
De
ns
ity
MSB LSB
Pr
ob
ab
ilit
y
De
ns
ity
Figure 1: Starting (after erase), temporary (after LSB pro-
gramming), and nal (after MSB programming) states for
two-step programming. Reproduced from [9].
2. Error Sources in Two-Step Programming
In our HPCA 2017 paper [9], we demonstrate that two-step
programming introduces new possibilities for ash memory
errors that can corrupt some of the data stored within ash
cells without accessing them, and that these errors can be
exploited to design malicious attacks. As there is a delay
between programming the LSB and the MSB of a single cell
due to the interleaved writes to neighboring cells, raw bit
errors can be introduced into the already-programmed LSB
page before the MSB page is programmed. These errors can
cause a cell to be programmed to an incorrect state in the
second programming step. During the second step, both the
MSB and LSB of each cell are required to determine the nal
target threshold voltage of the cell. As shown in Figure 2,
the data to be programmed into the MSB is loaded from the
SSD controller to the internal MSB buer ( 1 in the gure).
Concurrently, the LSB data is loaded into the internal LSB
buer from the ash memory wordline ( 2 ). By buering the
LSB data inside the ash chip and not in the SSD controller,
ash manufacturers avoid data transfer between the chip and
the controller during the second programming step, thereby
reducing the step’s latency. Unfortunately, this means that
the errors loaded from the internal LSB buer cannot be cor-
rected as they would otherwise be during a read operation,
because the error correction (ECC) engine resides only in-
side the controller ( 3 ), and not inside the ash chip. As a
result, the nal cell voltage can be incorrectly set during MSB
programming, permanently corrupting the LSB data.
1We refer the reader to our prior works [6, 7, 8, 9, 11, 12, 14, 14, 15, 16, 17,
18,72] for a detailed background on NAND ash memory. Our recent survey
paper [6, 7, 8] provides an extensive survey of the state-of-the-art in NAND
ash memory.
Flash MemorySSD Controller
ECC Engine
MSB to be programmed
Internal
MSB Buffer
LSB Page
MSB Page
Internal
LSB Buffer
MSB 0
LSB 0
MSB 1
LSB 1
MSB n
LSB n
.  .  .
.  .  .
Read with Errors
Read
Without
Errors
1
2
3
Figure 2: In the second step of two-step programming, LSB
data does not go to the controller, and is not corrected when
read into the internal LSB buer, resulting in program errors.
Reproduced from [9].
We briey discuss two sources of errors that can corrupt
LSB data, and characterize their impact on real state-of-the-
art 1X-nm (i.e., 15-19nm) MLC NAND ash chips. We perform
our characterization using an FPGA-based ash testing plat-
form [10,11] that allows us to issue commands directly to raw
NAND ash memory chips. In order to determine the thresh-
old voltage stored within each cell, we use the read-retry
mechanism built into modern SSD controllers [13,17,108,130].
Throughout this work, we present normalized voltage values,
as actual voltage values are proprietary information to ash
manufacturers. Our complete characterization results can be
found in our HPCA 2017 paper [9].
2.1. Cell-to-Cell Program Interference
The rst error source, cell-to-cell program interference,
introduces errors into a ash cell when neighboring cells
are programmed, as a result of parasitic capacitance cou-
pling [6, 7, 8, 13, 16, 28, 29, 32, 68]. While two-step program-
ming reduces program interference for fully-programmed
cells, we nd that interference during two-step programming
is a signicant error source for partially-programmed cells.
As an example, we look at a ash block in the commonly-
used all-bit-line (ABL) ash architecture [13, 19, 20], which is
shown in Figure 3. After the LSB page on Wordline 1 (Page 1
in Figure 3) is programmed, the next two pages that are pro-
grammed (Pages 2 and 3) reside on directly-adjacent word-
lines. Therefore, before the MSB page on Wordline 1 (Page 4)
is programmed, the LSB page (Page 1) could be susceptible to
program interference when Pages 2 and 3 are programmed.
Internal
MSB Buffer
Page 0 (LSB)
Page 1 (LSB)
Page 2 (MSB)
Page 3 (LSB)
Page 4 (MSB)
Page 6 (MSB)
Internal
LSB Buffer
MSB 1
LSB 1
.  .  .
.  .  .
.  .  .
.  .  .
Wordline
Wordline 1
Flash Cell
Bitline
MSB 0
LSB 0
MSB n
LSB n
Wordline 0
Wordline 2
Figure 3: Internal architecture of a block of all-bit-line (ABL)
ash memory. Reproduced from [9].
Figure 4 shows the measured raw bit error rate for Page 1
in real NAND ash memory devices after four dierent times,
normalized to the error rate just after Page 1 is programmed:
2
A. Just after Page 1 is programmed (no interference),
B. Page 2 is programmed with pseudo-random data,
C. Pages 2 and 3 are programmed with pseudo-random data,
D. Pages 2 and 3 are programmed with a data pattern that
induces the worst-case program interference.
We observe that the amount of interference is especially high
when Pages 2 and 3 in Figure 3 are written with the worst-case
data pattern, after which the raw bit error rate of Page 1 is
4.9x the rate before interference. Note that the worst-case data
pattern that we write to Pages 2 and 3 requires no knowledge
of the data stored within Page 1 [9].
0
1
2
3
4
5
A B C D
N
or
m
al
iz
ed
 R
BE
R
Interference Due to Writes
A: Before
interference
B: Page 2,
random data
C: Pages 2+3,
random data
D: Pages 2+3, 
worst-case data
Figure 4: Normalized raw bit error rate of partially-
programmed Page 1, before and after cell-to-cell program in-
terference. Adapted from [9].
2.2. Read Disturb
The second error source, read disturb, disrupts the contents
of a ash cell when another cell is read [6, 7, 8, 18, 28, 32, 35,
77, 90, 115]. NAND ash memory cells are organized into
multiple ash blocks (two-dimensional cell arrays), where
each block contains a set of bitlines that connect multiple
ash cells in series. To accurately read the value from one
cell, the SSD controller applies a pass-through voltage to turn
on the unread cells on the bitline, which allows the value
to propagate through the bitline. Unfortunately, this pass-
through voltage induces a weak programming eect on an
unread cell: it slightly increases the cell threshold voltage [6,
7,8,18]. As more neighboring cells within a block are read, an
unread cell’s threshold voltage can increase enough to change
the data value stored in the cell [6, 7, 8, 18, 35, 90]. In two-step
programming, a partially-programmed cell is more likely to
have a lower threshold voltage than a fully-programmed cell,
and the weak programming eect is stronger on cells with
a lower threshold voltage. Measuring errors in real NAND
ash memory devices, we nd that the raw bit error rates for
an LSB page in a partially-programmed or unprogrammed
wordline is an order of magnitude greater than the rate for an
LSB page in a fully-programmed wordline. However, existing
read disturb management solutions are designed to protect
fully-programmed cells [18, 31, 35, 36, 52, 105], and oer little
mitigation for partially-programmed cells.
3. Exploiting Two-Step Programming Errors
Two major issues arise from the program interference and
read disturb vulnerabilities of partially-programmed and un-
programmed cells. First, the vulnerabilities induce a large
number of errors on these cells, exhausting the SSD’s error
correction capacity and limiting the SSD lifetime. Second,
the vulnerabilities can potentially allow (malicious) applica-
tions to aggressively corrupt and change data belonging to
other programs and further hurt the SSD lifetime. We present
two example sketches of potential exploits in our HPCA 2017
paper [9], which we briey summarize here.
3.1. Sketch of Program Interference Based Exploit
In this exploit, a malicious application can induce a signi-
cant amount of program interference onto a ash page that
belongs to another, benign victim application, corrupting the
page and shortening the SSD lifetime. Recall from Section 2.1
that writing the worst-case data pattern can induce 4.9x the
number of errors into a neighboring page (with respect to
an interference-free page). The goal of this exploit is for a
malicious application to write this worst-case data pattern
in a way that ensures that the page that is disrupted belongs
to the victim application, and that the page that is disrupted
experiences the greatest amount of program interference pos-
sible. Figure 5 illustrates the contents of the pages within
neighboring 8KB wordlines (rows of ash cells within a block).
The SSD controller uses shadow program sequencing to inter-
leave partial programming steps to pages in ascending order
of the page numbers shown on the left side of the gure. A
malicious application can write a small 16KB le with all 1s
to prepare for the attack ( 1 in the gure), and then waits
for the victim application to write its data to Wordline n ( 2 ).
Once the victim writes its data, the malicious application then
writes all 0s to a second 16KB le ( 3a and 3b ). This induces
the largest possible change in voltage on the victim data, and
can be used to ip bits within the data. In our HPCA 2017 pa-
per [9], we discuss how a malicious application can (1) work
around SSD scrambling and (2) monitor victim application
data writes.
Wordline n – 2
Malicious File A (all 1s)
Page 2n (MSB)
Wordline n – 1
Malicious File B (all 0s)
Malicious File A (all 1s)
Wordline n
Page Under Attack (Victim)
Wordline n + 1
Malicious File B (all 0s)
Page 2n – 3 (LSB)
Page 2n – 2 (MSB)
Page 2n – 5 (LSB)
Page 2n + 4 (MSB)
Page 2n + 1 (LSB)
Page 2n + 2 (MSB)
Page 2n – 1 (LSB) 2
1
3a
3b
Figure 5: Layout of data within a ash block during a pro-
gram interference based exploit. Reproduced from [9].
3.2. Sketch of Read Disturb Based Exploit
In this exploit, a malicious application can induce a sig-
nicant amount of read disturb onto several ash pages that
belong to other, benign victim applications. Recall from Sec-
tion 2.2 that the error rate after read disturb for an LSB page
in a partially-programmed wordline is an order of magni-
3
tude greater than the error rate for an LSB page in a fully-
programmed wordline. The goal of this exploit is for a mali-
cious application to quickly perform a large number of read
operations in a very short amount of time, to induce read
disturb errors that corrupt both pages already written to
partially-programmed wordlines and pages that have yet to
be written. The malicious application writes an 8KB le, with
arbitrary data, to the SSD. Immediately after the le is written,
the malicious application repeatedly forces the le system
to send a new read request to the SSD. Each request induces
read disturb on the other wordlines within the ash block,
causing the cell threshold voltages of these wordlines to in-
crease. After the malicious application nishes performing
the repeated read requests, a victim application writes data
to a le. As the SSD is unaware that an attack took place,
it does not detect that the data cannot be written correctly
due to the increased cell threshold voltages. As a result, bit
ips can occur in the victim application’s data. Unlike the
program interference exploit, which attacks a single page, the
read disturb exploit can corrupt multiple pages with a single
attack, and the corruption can aect pages written at a much
later time than the attack if the host write rate is low.
4. Protection and Mitigation Mechanisms
We propose three mechanisms to eliminate or mitigate
the program interference and read disturb vulnerabilities of
partially-programmed and unprogrammed cells due to two-
step programming. Table 1 summarizes the cost and benets
of each mechanism. We briey discuss our three mechanisms
here, and provide more detail on them in our HPCA 2017
paper [9].
Table 1: Summary of our proposed protection mechanisms.
Reproduced from [9].
Mechanism Protects Overhead Error RateAgainst Reduction
Buering LSB Data interference 2MB storage 100%in the Controller read disturb 1.3–15.7% latency
Adaptive LSB Read interference 64B storage 21–33%Reference Voltage read disturb 0.0% latency
Multiple Pass-Through read disturb 0B storage 72%Voltages 0.0% latency
Our rst mechanism buers LSB data in the SSD controller,
eliminating the need to read the LSB page from ash memory
at the beginning of the second programming step, thereby
completely eliminating the vulnerabilities. It maintains a copy
of all partially-programmed LSB data within DRAM buers
that exist in the SSD near the controller. Doing so ensures
that the LSB data is read without any errors from the DRAM
buer, where it is free from the vulnerabilities (instead of
from the ash memory, where it incurs errors that are not
corrected), in the second programming step. Figure 6 shows a
owchart of our modied two-step programming algorithm.
This solution increases the programming latency of the ash
memory by 4.9% in the common case, due to the long latency
of sending the LSB data from the controller to the internal
LSB buer inside ash memory.
A: Send LSB 
data to internal
LSB buffer
YES
Step 1
Step 2
B: Keep copy
of LSB in
DRAM buffer
Program
LSB page
C: Is LSB
in DRAM 
buffer?
E: Send LSB 
data to internal
LSB buffer
D: Retrieve 
LSB data from 
DRAM buffer F: Send MSB 
data to internal
MSB buffer
G: Retrieve LSB 
data from
flash chip
NO H: Correct LSB data using
ECC engine
Program
MSB page
Figure 6: Modied two-step programming, using a DRAM
buer for LSB data (modications shown in shaded boxes).
Reproduced from [9].
The two other mechanisms that we develop largely miti-
gate (but do not fully eliminate) the probability of two-step
programming errors at much lower latency impact. Our sec-
ond mechanism adapts the LSB read operation to account for
threshold voltage changes induced by program interference
and read disturb. It adaptively learns an optimized read refer-
ence voltage for LSB data, lowering the probability of an LSB
read error. Our third mechanism greatly reduces the errors
induced during read disturb, by customizing the pass-through
voltage for unprogrammed and partially-programmed ash
cells. State-of-the-art SSDs apply a single pass-through volt-
age (Vpass) to all of the unread cells, as shown in Figure 7a.
This leaves a large gap between the pass-through voltage
and the threshold voltage of a partially-programmed or un-
programmed cell, which greatly increases the impact of read
disturb [9, 18]. To minimize this gap, and, thus, the impact of
read disturb, we propose to use three pass-through voltages,
as shown in Figure 7b: V erasepass for unprogrammed cells, V
partial
pass
for partially-programmed cells, and the same pass-through
voltage as before (Vpass) for fully-programmed cells. This
mechanism decreases the number of errors induced by read
operations to neighboring cells by 72%, which translates to a
16% increase in NAND ash memory lifetime (see Section 6.3
of our HPCA 2017 paper [9] for more detail).
We conclude that, by eliminating or reducing the probabil-
ity of introducing errors during two-step programming, our
solutions completely close or greatly reduce the exposure to
reliability and security vulnerabilities.
ER TP
ER P1 P2 P3
Vth
ER
VpassLARGE GAP
(a)
Unprogrammed
Partially
Programmed
Fully
Programmed
ER TP
ER P1 P2 P3
Vth
ER
Vpass
Vpass
partial
Vpass
erase
(b)
Figure 7: (a) Applying singleVpass to all unreadwordlines; (b)
Ourmultiple pass-through voltagemechanism, where dier-
ent voltages are applied based on the thewordline’s program-
ming status, to minimize the eects of read disturb. Repro-
duced from [9].
4
5. Related Work
To our knowledge, our HPCA 2017 paper [9] is the rst
to (1) experimentally characterize both program interference
and read disturb errors that occur due to the two-step pro-
gramming method commonly used in MLC NAND ash mem-
ory; (2) reveal new reliability and security vulnerabilities
exposed by two-step programming in ash memory; and
(3) develop novel solutions to reduce these vulnerabilities.
We briey describe related works in the areas of DRAM and
NAND ash memory. We note that a thorough survey of
error mechanisms in NAND ash memory is provided in our
recent works [6, 7, 8].
5.1. Read Disturb Errors in DRAM
Commodity DRAM chips that are sold and used in the eld
today exhibit read disturb errors [55], also called RowHammer-
induced errors [82], which are conceptually similar to the read
disturb errors found in NAND ash memory (see Section 2.2).
Repeatedly accessing the same row in DRAM can cause bit
ips in data stored in adjacent DRAM rows. In order to access
data within DRAM, the row of cells corresponding to the
requested address must be activated (i.e., opened for read
and write operations). This row must be precharged (i.e.,
closed) when another row in the same DRAM bank needs
to be activated. Through experimental studies on a large
number of real DRAM chips, we show that when a DRAM
row is activated and precharged repeatedly (i.e., hammered)
enough times within a DRAM refresh interval, one or more
bits in physically-adjacent DRAM rows can be ipped to the
wrong value [55].
In our original RowHammer paper [55], we tested
129 DRAM modules manufactured by three major manu-
facturers (A, B, and C) between 2008 and 2014, using an
FPGA-based experimental DRAM testing infrastructure [38]
(more detail on our experimental setup, along with a list of
all modules and their characteristics, can be found in our
original RowHammer paper [55]). Figure 8 shows the rate
of RowHammer errors that we found, with the 129 modules
that we tested categorized based on their manufacturing date.
We nd that 110 of our tested modules exhibit RowHammer
errors, with the earliest such module dating back to 2010.
In particular, we nd that all of the modules manufactured
in 2012–2013 that we tested are vulnerable to RowHammer.
Like with many NAND ash memory error mechanisms, espe-
cially read disturb, RowHammer is a recent phenomenon that
especially aects DRAM chips manufactured with more ad-
vanced manufacturing process technology generations [82].
The phenomenon is due to reliability problems caused by
DRAM technology scaling [82, 83, 84, 85].
Figure 9 shows the distribution of the number of rows (plot-
ted in log scale on the y-axis) within a DRAM module that
ip the number of bits shown along the x-axis, as measured
for example DRAM modules from three dierent DRAM man-
ufacturers [55]. We make two observations from the gure.
2008 2009 2010 2011 2012 2013 2014
Module Manufacture Date
0
100
101
102
103
104
105
106
E
rr
or
s
pe
r1
09
C
el
ls
A Modules B Modules C Modules
Figure 8: RowHammer error rate vs. manufacturing dates of
129 DRAMmodules we tested. Reproduced from [55].
First, the number of bits ipped when we hammer a row
(known as the aggressor row) can vary signicantly within
a module. Second, each module has a dierent distribution
of the number of rows. Despite these dierences, we nd
that this DRAM failure mechanism aects more than 80%
of the DRAM chips we tested [55]. As indicated above, this
read disturb error mechanism in DRAM is popularly called
RowHammer [82].
0 10 20 30 40 50 60 70 80 90 100 110 120
Victim Cells per Aggressor Row
0
100
101
102
103
104
105
C
ou
nt
A124023 B
1146
11 C
1223
19
Figure 9: Number of victim cells (i.e., number of bit errors)
when an aggressor row is repeatedly activated, for three rep-
resentativeDRAMmodules from threemajormanufacturers.
We label the modules in the format Xyywwn , where X is the
manufacturer (A, B, or C), yyww is the manufacture year (yy)
andweek of the year (ww), and n is the number of the selected
module. Reproduced from [55].
Various recent works show that RowHammer can be
maliciously exploited by user-level software programs to
(1) induce errors in existing DRAM modules [55, 82] and
(2) launch attacks to compromise the security of various sys-
tems [3, 4, 33, 34, 82, 101, 106, 107, 117, 123]. For example, by
exploiting the RowHammer read disturb mechanism, a user-
level program can gain kernel-level privileges on real laptop
systems [106,107], take over a server vulnerable to RowHam-
mer [34], take over a victim virtual machine running on the
same system [3], and take over a mobile device [117]. Thus,
the RowHammer read disturb mechanism is a prime (and
perhaps the rst) example of how a circuit-level failure mech-
anism in DRAM can cause a practical and widespread system
security vulnerability.
Note that various solutions to RowHammer exist [53,55,82],
but we do not discuss them in detail here. Our recent
work [82] provides a comprehensive overview. A very promis-
ing proposal is to modify either the memory controller or
5
the DRAM chip such that it probabilistically refreshes the
physically-adjacent rows of a recently-activated row, with
very low probability. This solution is called Probabilistic Ad-
jacent Row Activation (PARA) [55]. Our prior work shows
that this low-cost, low-complexity solution, which does not
require any storage overhead, greatly closes the RowHammer
vulnerability [55].
The RowHammer eect in DRAM worsens as the manu-
facturing process scales down to smaller node sizes [55, 82].
More ndings on RowHammer, along with extensive exper-
imental data from real DRAM devices, can be found in our
prior works [53, 55, 82].
5.2. Cell-to-Cell Interference Errors in DRAM
Like NAND ash memory cells, DRAM cells are suscep-
tible to cell-to-cell interference. In DRAM, one important
way in which cell-to-cell interference exhibits itself is the
data-dependent retention behavior, where the retention time
of a DRAM cell is dependent on the values written to nearby
DRAM cells [46,47,48,49,70,82,97]. This phenomenon is called
data pattern dependence (DPD) [70]. Data pattern dependence
in DRAM is similar to the data-dependent nature of program
interference that exists in NAND ash memory (see Sec-
tion 2.1). Within DRAM, data pattern dependence occurs as a
result of parasitic capacitance coupling (between DRAM cells).
Due to this coupling, the amount of charge stored in one cell’s
capacitor can inadvertently aect the amount of charge stored
in an adjacent cell’s capacitor [46, 47, 48, 49, 70, 82, 97]. As
DRAM cells become smaller with technology scaling, cell-
to-cell interference worsens because parasitic capacitance
coupling between cells increases [46, 70]. More ndings on
cell-to-cell interference and the data-dependent nature of cell
retention times in DRAM, along with experimental data ob-
tained from modern DRAM chips, can be found in our prior
works [46, 47, 48, 49, 70, 82, 97].
5.3. Errors in Emerging Memory Technologies
Emerging nonvolatile memories, such as phase-change
memory (PCM) [60, 61, 62, 100, 122, 125, 129], spin-transfer
torque magnetic RAM (STT-RAM or STT-MRAM) [57, 86],
metal-oxide resistive RAM (RRAM) [121], and memristors [26,
113], are expected to bridge the gap between DRAM and
NAND-ash-memory-based SSDs, providing DRAM-like ac-
cess latency and energy, and at the same time SSD-like
large capacity and nonvolatility (and hence SSD-like data
persistence). While their underlying designs are dierent
from DRAM and NAND ash memory, these emerging mem-
ory technologies have been shown to exhibit similar types
of errors. PCM-based devices are expected to have a lim-
ited lifetime, as PCM can only sustain a limited number
of writes [60, 100, 122], similar to the P/E cycling errors in
SSDs (though PCM’s write endurance is higher than that of
SSDs [60]). PCM suers from (1) resistance drift [41, 98, 122],
where the resistance used to represent the value becomes
higher over time (and eventually can introduce a bit error),
similar to how charge leakage in NAND ash memory and
DRAM lead to retention errors over time; and (2) write dis-
turb [43], where the heat generated during the program-
ming of one PCM cell dissipates into neighboring cells and
can change the value that is stored within the neighboring
cells, similar in concept to cell-to-cell program interference
in NAND ash memory. STT-RAM suers from (1) reten-
tion failures, where the value stored for a single bit (as the
magnetic orientation of the layer that stores the bit) can ip
over time; and (2) read disturb (a conceptually dierent phe-
nomenon from the read disturb in DRAM and ash memory),
where reading a bit in STT-RAM can inadvertently induce a
write to that same bit [86].
Due to the nascent nature of emerging nonvolatile mem-
ory technologies and the lack of availability of large-capacity
devices built with them, extensive and dependable experi-
mental studies have yet to be conducted on the reliability of
real PCM, STT-RAM, RRAM, and memristor chips. However,
we believe that error mechanisms conceptually or abstractly
similar to those for ash memory and DRAM are likely to be
prevalent in emerging technologies as well (as supported by
some recent studies [2, 43, 50, 86, 109, 110, 128]), albeit with
dierent underlying mechanisms and error rates.
5.4. Other Related Works
Memory Error Characterization and Understanding.
Prior works study various types of NAND ash memory er-
rors derived from circuit-level noise, such as data retention
noise [6,7,8,11,12,14,15,73,77,79], read disturb noise [6,7,8,18,
77, 90], cell-to-cell program interference noise [11, 13, 15, 16],
and P/E cycling noise [6, 7, 8, 11, 15, 17, 72, 77, 96]. Other prior
works examine the aggregate eect of these errors on large
sets of SSDs that are deployed in the production data centers
of Facebook [75], Google [103], and Microsoft [87]. None
of these works characterize how program interference and
read disturb signicantly increase errors within the unpro-
grammed or partially-programmed cells of an open block due
to the vulnerabilities in two-step programming, nor do they
develop mechanisms that exploit or mitigate such errors.
A concurrent work by Papandreou et al. [89] character-
izes the impact of read disturb on partially-programmed and
unprogrammed cells in state-of-the-art MLC NAND ash
memory. The authors come to similar conclusions as we
do about the impact of read disturb. However, unlike our
work, they do not (1) characterize the impact of cell-to-cell
program interference on partially-programmed cells, (2) pro-
pose exploits that can take advantage of the vulnerabilities in
partially-programmed cells, or (3) propose mechanisms that
mitigate or eliminate the vulnerabilities.
Similar to the characterization studies performed for
NAND ash memory, DRAM latency, reliability, and vari-
ation have been experimentally characterized at both a small
scale (e.g., hundreds of chips) [21,22,23,38,46,47,48,49,51,53,
6
55,64,65,67,70,97,99] and a large scale (e.g., tens of thousands
of chips) [40, 76, 104, 111, 112].
Program Interference Error Mitigation Mechanisms.
Prior works [13, 16] model the behavior of program inter-
ference, and propose mechanisms that estimate the optimal
read reference voltage once interference has occurred. These
works minimize program interference errors only for fully-
programmed wordlines, by modeling the change in the thresh-
old voltage distribution as a result of the interference. These
models are tted to the distributions of wordlines after both
the LSB and MSB pages are programmed, and are unable to
determine and mitigate the shift that occurs for wordlines
that are partially programmed. In contrast, we propose mech-
anisms that specically address the program interference re-
sulting from two-step programming, and reduce the number
of errors induced on LSB pages in both partially-programmed
and unprogrammed wordlines.
Read Disturb Error Mitigation Mechanisms. One
patent [31] proposes a mechanism that uses counters to mon-
itor the total number of reads to each block. Once a block’s
counter exceeds a threshold, the mechanism remaps and
rewrites all of the valid pages within the block to remove the
accumulated read disturb errors [31]. Another patent [105]
proposes to monitor the MSB page error rate to ensure that
it does not exceed the ECC error correction capability, to
avoid data loss. Both of these mechanisms monitor pages
only from fully-programmed wordlines. Unfortunately, as we
observed, LSB pages in partially-programmed and unpro-
grammed wordlines are twice as susceptible to read disturb
as pages in fully-programmed wordlines (see Section 2.2). If
only the MSB page error rate is monitored, read disturb may
be detected too late to correct some of the LSB pages.
Our earlier work [18] dynamically changes the pass-
through voltage for each block to reduce the impact of read
disturb. As a single voltage is applied to the whole block, this
mechanism does not help signicantly with the LSB pages
in partially-programmed and unprogrammed wordlines. In
contrast, our read disturb mitigation technique (see Section 4)
specically targets these LSB pages by applying multiple dif-
ferent pass-through voltages in an open block, optimized to
the dierent programmed states of each wordline, to reduce
read disturb errors.
Other prior works [35,36,52] propose to use read reclaim to
mitigate read disturb errors. The key idea of read reclaim is to
remap the data in a block to a new ash block, if the block has
experienced a high number of reads [35, 36, 52]. Read reclaim
is similar to the remapping-based refresh mechanism [14, 15,
71, 80, 88] employed by many modern SSDs to mitigate data
retention errors [6,7,8]. Read reclaim can remap the contents
of a wordline only after the wordline is fully programmed,
and does not mitigate the impact of read disturb on partially-
programmed or unprogrammed wordlines.
Using FlashMemory for Security Applications. Some
prior works studied how ash memory can be used to en-
hance the security of applications. One work [119] uses ash
memory as a secure channel to hide information, such as
a secure key. Other works [118, 124] use ash memory to
generate random numbers and digital ngerprints. None of
these works study vulnerabilities that exist within the ash
memory.
Based on our HPCA 2017 paper [9], recent work [58]
demonstrates how an attack can be performed on a real SSD
using our program interference based exploit (see Section 3.1).
The authors use our exploit to perform a le system level
attack on a Linux machine, using the attack to gain root
privileges.
Two-Step vs. One-Shot Programming. One-shot pro-
gramming shifts ash cells directly from the erased state
to their nal target state in a single step. For smaller tran-
sistors with less distance between neighboring ash cells,
such as those in sub-40nm planar (i.e., 2D) NAND ash mem-
ory, two-step programming has replaced one-shot program-
ming to alleviate the coupling capacitance resulting from
cell-to-cell program interference [92]. 3D NAND ash mem-
ory currently uses one-shot programming [94, 95, 127], as
3D NAND ash memory chips use larger process technology
nodes (i.e., 30–50 nm) [102,126] and employ charge trap tran-
sistors [30, 42, 45, 56, 93, 116, 120] for ash cells, as opposed
to the oating-gate transistors used in planar NAND ash
memory. However, once the number of 3D-stacked layers
reaches its upper limit [59, 69], 3D NAND ash memory is
expected to scale to smaller transistors [126], and we expect
that the increased program interference will again require
partial programming (just as it happened for planar NAND
ash memory in the past [54, 92]). More detail on 3D NAND
ash memory is provided in a recent survey article [8].
6. Long-Term Impact
As we discuss in Section 5, our HPCA 2017 paper [9] makes
several novel contributions on characterizing, exploiting, and
mitigating vulnerabilities in the two-step programming algo-
rithm used in state-of-the-art MLC NAND ash memory. We
believe that these contributions are likely to have a signicant
impact on academic research and industry.
6.1. Exposing the Existence of Errors
NAND ash manufacturers use two-step programming
widely in their contemporary MLC NAND ash devices. Prior
to our HPCA 2017 paper [9] and concurrent work by Pa-
pandreou et al. [89], there was no publicly-available knowl-
edge about how two-step programming introduced new error
sources that did not exist in the prior one-shot program-
ming approach. Using real o-the-shelf contemporary NAND
ash memory chips, our HPCA 2017 paper exposes the fact
that fundamental limitations of the two-step programming
7
method introduce program errors that reduce the lifetime of
SSDs available on the market today.
Through a rigorous characterization, our HPCA 2017 pa-
per [9] analyzes two major sources of these errors, program
interference and read disturb, demonstrating how they can
corrupt data stored in a partially-programmed ash cell.
While prior works have addressed both program interfer-
ence (e.g. [13,29,68,92]) and read disturb (e.g., [18,31,35,105])
errors in the past, we nd that none of these existing solu-
tions are able to protect the vulnerable partially-programmed
pages produced during two-step programming. We expect
that by exposing these errors and the unique vulnerabilities of
partially-programmed cells, our work will (1) provide NAND
ash memory manufacturers and the academic community
with signicant insight into the problem; (2) foster the devel-
opment of new solutions that can reduce or eliminate this
vulnerability; and (3) inspire others to search for other relia-
bility and security vulnerabilities that exist in NAND ash
memory.
6.2. Security Implications for Flash Memory
Our HPCA 2017 paper [9] proposes two sketches of new
potential security exploits based on errors arising from two-
step programming. Malicious applications can be developed
to use these (or other similar) exploits to corrupt data belong-
ing to other applications. For example, our paper has already
enabled the development and demonstration of a le system
based attack by IBM security researchers [58]. In that work,
the researchers built upon our program interference based
exploits to show how to use the le system to acquire root
privileges on a real machine. The work conrms that our ex-
ploit sketches are likely viable on a real system, and that the
threat of maliciously exploiting vulnerabilities in two-step
programming is real (and needs to be addressed).
As was the case for RowHammer attacks in DRAM (see
Section 5.1), our ndings have already generated signicant
interest and concern in the broader technology community
(e.g., [5, 24, 27, 39]). The reason behind the broader impact of
our work is that many existing drives in the eld today can
be attacked. After IBM researchers demonstrated the ability
to perform such attacks on a real system [58], there has been
further interest in NAND ash memory attacks (e.g., [1, 78]).
We hope and expect that other researchers will take our
cue and begin to investigate how other reliability issues in
NAND ash memory can be exploited by applications to
perform malicious attacks. We believe that this is a new area
of research that will grow in importance as SSDs and ash
memory become even more widely used.
6.3. Eliminating Program Error Attacks
Our HPCA 2017 paper [9] proposes three solutions that
either eliminate or mitigate vulnerabilities to program inter-
ference and read disturb during two-step programming. We
intentionally design all three of our solutions to be low over-
head and easily implementable in commercial SSDs. One of
our three solutions completely eliminates the vulnerabilities,
albeit with a small increase in ash programming latency. We
expect our work to have a direct impact on the NAND ash
memory industry, as manufacturers will likely incorporate
solutions such as the ones we propose to mitigate or elimi-
nate these vulnerabilities in their new SSDs. We also expect
manufacturers and researchers to explore new mechanisms,
inspired by our work and by our solutions, that can eliminate
these or other vulnerabilities and exploits due to NAND ash
memory reliability errors.
7. Conclusion
Our HPCA 2017 paper [9] shows that the two-step pro-
gramming mechanism commonly employed in modern MLC
NAND ash memory chips opens up new vulnerabilities to
errors, based on an experimental characterization of mod-
ern 1X-nm MLC NAND ash chips. We show that the root
cause of these vulnerabilities is the fact that when a partially-
programmed cell is set to an intermediate threshold voltage,
it is much more susceptible to both cell-to-cell program in-
terference and read disturb. We demonstrate that (1) these
vulnerabilities lead to errors that reduce the overall reliability
of ash memory, and (2) attackers can potentially exploit
these vulnerabilities to maliciously corrupt data belonging to
other programs. Based on our experimental observations and
the resulting understanding, we propose three new mecha-
nisms that can remove or mitigate these vulnerabilities, by
eliminating or reducing the errors introduced as a result of
the two-step programming method. Our experimental eval-
uation shows that our new mechanisms are eective: they
can either eliminate the vulnerabilities with modest/low la-
tency overhead, or drastically reduce the vulnerabilities and
reduce errors with negligible latency or storage overhead.
We hope that the vulnerabilities we analyzed and exposed
in this work, along with the experimental data we provided,
open up new avenues for mitigation as well as for exposure of
other potential vulnerabilities due to internal ash memory
operation.
Acknowledgments
We thank the anonymous reviewers for their feedback on
our HPCA 2017 paper [9]. This work is partially supported by
the Intel Science and Technology Center, CMU Data Storage
Systems Center, and NSF grants 1212962 and 1320531.
References
[1] J. W. Aldersho, “IBM Researchers: Rowhammer-Like Attack on Flash Memory
Can Provide Root Privileges to Attacker,” Myce, 2017.
[2] A. Athmanathan, M. Stanisavljevic, N. Papandreou, H. Pozidis, and E. Elefthe-
riou, “Multilevel-Cell Phase-Change Memory: A Viable Technology,” JETCAS,
2016.
[3] E. Bosman, K. Razavi, H. Bos, and C. Guirida, “Dedup Est Machina: Memory
Deduplication as an Advanced Exploitation Vector,” in SP, 2016.
[4] W. Burleson, O. Mutlu, and M. Tiwari, “Who is the Major Threat to Tomorrow’s
Security? You, the Hardware Designer,” in DAC, 2016.
[5] G. Burton, “Rowhammer-Style NAND Flash Attack Can Corrupt SSD Data,” The
Inquirer, 2017.
8
[6] Y. Cai, S. Ghose, E. F. Haratsch, Y. Luo, and O. Mutlu, “Error Characterization,
Mitigation, and Recovery in Flash-Memory-Based Solid-State Drives,” Proc. IEEE,
2017.
[7] Y. Cai, S. Ghose, E. F. Haratsch, Y. Luo, and O. Mutlu, “Error Characteri-
zation, Mitigation, and Recovery in Flash Memory Based Solid-State Drives,”
arXiv:1706.08642 [cs.AR], 2017.
[8] Y. Cai, S. Ghose, E. F. Haratsch, Y. Luo, and O. Mutlu, “Errors in Flash-Memory-
Based Solid-State Drives: Analysis, Mitigation, and Recovery,” arXiv:1711.11427
[cs.AR], 2017.
[9] Y. Cai, S. Ghose, Y. Luo, K. Mai, O. Mutlu, and E. F. Haratsch, “Vulnerabilities in
MLC NAND Flash Memory Programming: Experimental Analysis, Exploits, and
Mitigation Techniques,” in HPCA, 2017.
[10] Y. Cai, E. F. Haratsch, M. P. McCartney, and K. Mai, “FPGA-Based Solid-State
Drive Prototyping Platform,” in FCCM, 2011.
[11] Y. Cai, E. F. Haratsch, O. Mutlu, and K. Mai, “Error Patterns in MLC NAND Flash
Memory: Measurement, Characterization, and Analysis,” in DATE, 2012.
[12] Y. Cai, Y. Luo, E. F. Haratsch, K. Mai, and O. Mutlu, “Data Retention in MLC
NAND Flash Memory: Characterization, Optimization, and Recovery,” in HPCA,
2015.
[13] Y. Cai, O. Mutlu, E. F. Haratsch, and K. Mai, “Program Interference in MLC
NAND Flash Memory: Characterization, Modeling, and Mitigation,” in ICCD,
2013.
[14] Y. Cai, G. Yalcin, O. Mutlu, E. F. Haratsch, A. Cristal, O. Unsal, and K. Mai, “Flash
Correct and Refresh: Retention Aware Management for Increased Lifetime,” in
ICCD, 2012.
[15] Y. Cai, G. Yalcin, O. Mutlu, E. F. Haratsch, A. Cristal, O. Unsal, and K. Mai, “Error
Analysis and Retention-Aware Error Management for NAND Flash Memory,”
Intel Technology Journal, 2013.
[16] Y. Cai, G. Yalcin, O. Mutlu, E. F. Haratsch, O. Unsal, A. Cristal, and K. Mai, “Neigh-
bor Cell Assisted Error Correction in MLC NAND Flash Memories,” in SIGMET-
RICS, 2014.
[17] Y. Cai, E. F. Haratsch, O. Mutlu, and K. Mai, “Threshold Voltage Distribution in
MLC NAND Flash Memory: Characterization, Analysis, and Modeling,” in DATE,
2013.
[18] Y. Cai, Y. Luo, S. Ghose, E. F. Haratsch, K. Mai, and O. Mutlu, “Read Disturb Errors
in MLC NAND Flash Memory: Characterization, Mitigation, and Recovery,” in
DSN, 2015.
[19] R. Cernea et al., “A 34MB/s-Program-Throughput 16Gb MLC NAND with All-
Bitline Architecture in 56nm,” in ISSCC, 2008.
[20] R.-A. Cernea et al., “A 34 MB/s MLC Write Throughput 16 Gb NAND with All
Bit Line Architecture on 56 nm Technology,” JSSC, 2009.
[21] K. K. Chang, “Understanding and Improving the Latency of DRAM-Based Mem-
ory Systems,” Ph.D. dissertation, Carnegie Mellon Univ., 2017.
[22] K. K. Chang, A. Kashyap, H. Hassan, S. Ghose, K. Hsieh, D. Lee, T. Li, G. Pekhi-
menko, S. Khan, and O. Mutlu, “Understanding Latency Variation in Modern
DRAM Chips: Experimental Characterization, Analysis, and Optimization,” in
SIGMETRICS, 2016.
[23] K. K. Chang, A. G. Yaglikci, A. Agrawal, N. Chatterjee, S. Ghose, A. Kashyap,
H. Hassan, D. Lee, M. O’Connor, and O. Mutlu, “Understanding Reduced-Voltage
Operation in Modern DRAM Devices: Experimental Characterization, Analysis,
and Mechanisms,” in SIGMETRICS, 2017.
[24] R. Chirgwin, “Rowhammer RAM Attack Adapted to Hit Flash Storage,” The Reg-
ister, 2017.
[25] K. Choi, “NAND Flash Memory,” Samsung Electronics Co., Ltd., 2010.
[26] L. Chua, “Memristor—The Missing Circuit Element,” TCT, 1971.
[27] C. Cimpanu, “SSD Drives Vulnerable to Attacks That Corrupt User Data,” Bleep-
ing Computer, 2017.
[28] J. Cooke, “The Inconvenient Truths of NAND Flash Memory,” in Flash Memory
Summit, 2007.
[29] G. Dong, S. Li, and T. Zhang, “Using Data Postcompensation and Prediction to
Tolerate Cell-to-Cell Interference in MLC NAND Flash Memory,” TCAS I, 2010.
[30] B. Eitan, “Non-Volatile Semiconductor Memory Cell Utilizing Asymmetrical
Charge Trapping,” U.S. Patent No. 5,768,192, 1998.
[31] H. H. Frost, C. J. Camp, T. J. Fisher, J. A. Fuxa, and L. W. Shelton, “Ecient Reduc-
tion of Read Disturb Errors in NAND Flash Memory,” U.S. Patent No. 7,818,525,
2010.
[32] L. M. Grupp, A. M. Cauleld, J. Coburn, S. Swanson, E. Yaakobi, P. H. Siegel,
and J. K. Wolf, “Characterizing Flash Memory: Anomalies, Observations, and
Applications,” in MICRO, 2009.
[33] D. Gruss, M. Lipp, M. Schwarz, D. Genkin, J. Junger, S. O’Connell,
W. Schoechl, and Y. Yarom, “Another Flip in the Wall of Rowhammer Defenses,”
arXiv:1710.00551 [cs.CR], 2017.
[34] D. Gruss, C. Maurice, and S. Mangard, “Rowhammer.js: A Remote Software-
Induced Fault Attack in JavaScript,” in DIMVA, 2016.
[35] K. Ha, J. Jeong, and J. Kim, “A Read-Disturb Management Technique for High-
Density NAND Flash Memory,” in APSys, 2013.
[36] K. Ha, J. Jeong, and J. Kim, “An Integrated Approach for Managing Read Disturbs
in High-Density NAND Flash Memory,” TCAD, 2016.
[37] T. Hara, K. Fukunda, K. Kanazawa, and N. Shibata, “A 146 mm2 8 Gb NAND
Flash Memory with 70 nm CMOS Technology,” in ISSCC, 2005.
[38] H. Hassan, N. Vijaykumar, S. Khan, S. Ghose, K. Chang, G. Pekhimenko, D. Lee,
O. Ergin, and O. Mutlu, “SoftMC: A Flexible and Practical Open-Source Infras-
tructure for Enabling Experimental DRAM Studies,” in HPCA, 2017.
[39] J. Hruska, “SSDs Vulnerable to Deliberate, Low-Level Data Corruption Attacks,”
ExtremeTech, 2017.
[40] A. Hwang, I. Stefanovici, and B. Schroeder, “Cosmic Rays Don’t Strike Twice:
Understanding the Nature of DRAM Errors and the Implications for System De-
sign,” in ASPLOS, 2012.
[41] D. Ielmini, A. L. Lacaita, and D. Mantegazza, “Recovery and Drift Dynamics of
Resistance and Threshold Voltages in Phase-Change Memories,” TED, 2007.
[42] J. Jang et al., “Vertical Cell Array Using TCAT (Terabit Cell Array Transistor)
Technology for Ultra High Density NAND Flash Memory,” in VLSIT, 2009.
[43] L. Jiang, Y. Zhang, and J. Yang, “Mitigating Write Disturbance in Super-Dense
Phase Change Memories,” in DSN, 2014.
[44] D. Kahng and S. M. Sze, “A Floating Gate and Its Application to Memory Devices,”
Bell System Technical Journal, 1967.
[45] R. Katsumata et al., “Pipe-Shaped BiCS Flash Memory with 16 Stacked Layers and
Multi-Level-Cell Operation for Ultra High Density Storage Devices,” in VLSIT,
2009.
[46] S. Khan, D. Lee, Y. Kim, A. Alameldeen, C. Wilkerson, and O. Mutlu, “The Ecacy
of Error Mitigation Techniques for DRAM Retention Failures: A Comparative
Experimental Study,” in SIGMETRICS, 2014.
[47] S. Khan, D. Lee, and O. Mutlu, “PARBOR: An Ecient System-Level Technique
to Detect Data-Dependent Failures in DRAM,” in DSN, 2016.
[48] S. Khan, C. Wilkerson, D. Lee, A. R. Alameldeen, and O. Mutlu, “A Case for
Memory Content-Based Detection and Mitigation of Data-Dependent Failures
in DRAM,” CAL, 2016.
[49] S. Khan, C. Wilkerson, Z. Wang, A. R. Alameldeen, D. Lee, and O. Mutlu, “De-
tecting and Mitigating Data-Dependent DRAM Failures by Exploiting Current
Memory Content,” in MICRO, 2017.
[50] W.-S. Khwa et al., “A Resistance-Drift Compensation Scheme to Reduce MLC
PCM Raw BER by Over 100x for Storage-Class Memory Applications,” in ISSCC,
2016.
[51] J. Kim, M. Patel, H. Hassan, and O. Mutlu, “The DRAM Latency PUF: Quickly
Evaluating Physical Unclonable Functions by Exploiting the Latency–Reliability
Tradeo in Modern DRAM Devices,” in HPCA, 2018.
[52] N. Kim and J.-H. Jang, “Nonvolatile Memory Device, Method of Operating Non-
volatile Memory Device and Memory System Including Nonvolatile Memory De-
vice,” U.S. Patent No. 8,203,881, 2012.
[53] Y. Kim, “Architectural Techniques to Enhance DRAM Scaling,” Ph.D. dissertation,
Carnegie Mellon Univ., 2015.
[54] Y. S. Kim, D. J. Lee, C. K. Lee, H. K. Choi, S. S. Kim, J. H. Song, D. H. Song, J.-H.
Choi, K.-D. Suh, and C. Chung, “New Scaling Limitation of the Floating Gate
Cell in NAND Flash Memory,” in IRPS, 2010.
[55] Y. Kim, R. Daly, J. Kim, C. Fallin, J. H. Lee, D. Lee, C. Wilkerson, K. Lai, and
O. Mutlu, “Flipping Bits in Memory Without Accessing Them: An Experimental
Study of DRAM Disturbance Errors,” in ISCA, 2014.
[56] Y. Komori, M. Kido, M. Kito, R. Katsumata, Y. Fukuzumi, H. Tanaka, Y. Nagata,
M. Ishiduki, H. Aochi, and A. Nitayama, “Disturbless Flash Memory Due to High
Boost Eciency on BiCS Structure and Optimal Memory Film Stack for Ultra
High Density Storage Device,” in IEDM, 2008.
[57] E. Kültürsay, M. Kandemir, A. Sivasubramaniam, and O. Mutlu, “Evaluating STT-
RAM as an Energy-Ecient Main Memory Alternative,” in ISPASS, 2013.
[58] A. Kurmus, N. Ioannou, M. Neigschwandter, N. Papandreou, and T. Parnell,
“From Random Block Corruption to Privilege Escalation: A Filesystem Attack
Vector for Rowhammer-Like Attacks,” in WOOT, 2017.
[59] M. LaPedus, “How to Make 3D NAND,” Semiconductor Engineering, 2016.
[60] B. C. Lee, E. Ipek, O. Mutlu, and D. Burger, “Architecting Phase Change Memory
as a Scalable DRAM Alternative,” in ISCA, 2009.
[61] B. C. Lee, E. Ipek, O. Mutlu, and D. Burger, “Phase Change Memory Architecture
and the Quest for Scalability,” CACM, 2010.
[62] B. C. Lee, P. Zhou, J. Yang, Y. Zhang, B. Zhao, E. Ipek, O. Mutlu, and D. Burger,
“Phase-Change Technology and the Future of Main Memory,” IEEE Micro, 2010.
[63] C. Lee et al., “A 32-Gb MLC NAND Flash Memory with Vth Endurance Enhanc-
ing Schemes in 32 nm CMOS,” JSSC, 2011.
[64] D. Lee, “Reducing DRAM Energy at Low Cost by Exploiting Heterogeneity,” Ph.D.
dissertation, Carnegie Mellon Univ., 2016.
[65] D. Lee, S. Khan, L. Subramanian, S. Ghose, R. Ausavarungnirun, G. Pekhimenko,
V. Seshadri, and O. Mutlu, “Design-Induced Latency Variation in Modern DRAM
Chips: Characterization, Analysis, and Latency Reduction Mechanisms,” in SIG-
METRICS, 2017.
[66] D.-H. Lee and W. Sung, “Least Squares Based Cell-to-Cell Interference Cancela-
tion Technique for Multi-Level Cell NAND Flash Memory,” in ICASSP, 2012.
[67] D. Lee, Y. Kim, G. Pekhimenko, S. Khan, V. Seshadri, K. Chang, and O. Mutlu,
“Adaptive-Latency DRAM: Optimizing DRAM Timing for the Common-Case,”
in HPCA, 2015.
[68] J.-D. Lee, S.-H. Hur, and J.-D. Choi, “Eects of Floating-Gate Interference on
NAND Flash Memory Cell Operation,” EDL, 2002.
[69] S.-Y. Lee, “Limitations of 3D NAND Scaling,” EE Times, 2017.
[70] J. Liu, B. Jaiyen, Y. Kim, C. Wilkerson, and O. Mutlu, “An Experimental Study of
Data Retention Behavior in Modern DRAM Devices: Implications for Retention
9
Time Proling Mechanisms,” in ISCA, 2013.
[71] Y. Luo, Y. Cai, S. Ghose, J. Choi, and O. Mutlu, “WARM: Improving NAND Flash
Memory Lifetime With Write-Hotness Aware Retention Management,” in MSST,
2015.
[72] Y. Luo, S. Ghose, Y. Cai, E. F. Haratsch, and O. Mutlu, “Enabling Accurate and
Practical Online Flash Channel Modeling for Modern MLC NAND Flash Mem-
ory,” JSAC, 2016.
[73] Y. Luo, S. Ghose, Y. Cai, E. F. Haratsch, and O. Mutlu, “HeatWatch: Improving 3D
NAND Flash Memory Device Reliability by Exploiting Self-Recovery and Tem-
perature Awareness,” in HPCA, 2018.
[74] F. Masuoka, M. Momodomi, Y. Iwata, and R. Shirota, “New Ultra High Density
EPROM and Flash EEPROM With NAND Structure Cell,” in IEDM, 1987.
[75] J. Meza, Q. Wu, S. Kumar, and O. Mutlu, “A Large-Scale Study of Flash Memory
Errors in the Field,” in SIGMETRICS, 2015.
[76] J. Meza, Q. Wu, S. Kumar, and O. Mutlu, “Revisiting Memory Errors in Large-
Scale Production Data Centers: Analysis and Modeling of New Trends from the
Field,” in DSN, 2015.
[77] N. Mielke, T. Marquart, N. Wu, J. Kessenich, H. Belgal, E. Schares, F. Trivedi,
E. Goodness, and L. R. Nevill, “Bit Error Rate in NAND Flash Memories,” in IRPS,
2008.
[78] M. Mimoso, “Rowhammer Attacks Come to MLC NAND Flash Memory,” Threat-
post, 2017.
[79] K. Mizoguchi, T. Takahashi, S. Aritome, and K. Takeuchi, “Data-Retention Char-
acteristics Comparison of 2D and 3D TLC NAND Flash Memories,” in IMW, 2017.
[80] V. Mohan, S. Sankar, and S. Gurumurthi, “reFresh SSDs: Enabling High En-
durance, Low Cost Flash in Datacenters,” Univ. of Virginia, Tech. Rep. CS-2012-
05, 2012.
[81] M. Momodomi, F. Masuoka, R. Shirota, Y. Itoh, K. Ohuchi, and R. Kirisawa, “Elec-
trically Erasable Programmable Read-Only Memory With NAND Cell Structure,”
U.S. Patent No. 4,959,812, 1988.
[82] O. Mutlu, “The RowHammer Problem and Other Issues We May Face as Memory
Becomes Denser,” in DATE, 2017.
[83] O. Mutlu, “Memory Scaling: A Systems Architecture Perspective,” in IMW, 2013.
[84] O. Mutlu, “Memory Scaling: A Systems Architecture Perspective,” in MEMCON,
2013.
[85] O. Mutlu and L. Subramanian, “Research Problems and Opportunities in Memory
Systems,” SUPERFRI, 2014.
[86] H. Naeimi, C. Augustine, A. Raychowdhury, S.-L. Lu, and J. Tschanz, “STT-RAM
Scaling and Retention Failure,” Intel Technology Journal, 2013.
[87] I. Narayanan, D. Wang, M. Jeon, B. Sharma, L. Cauleld, A. Sivasubramaniam,
B. Cutler, J. Liu, B. Khessib, and K. Vaid, “SSD Failures in Datacenters: What?
When? and Why?” in SYSTOR, 2016.
[88] Y. Pan, G. Dong, Q. Wu, and T. Zhang, “Quasi-Nonvolatile SSD: Trading Flash
Memory Nonvolatility to Improve Storage System Performance for Enterprise
Applications,” in HPCA, 2012.
[89] N. Papandreou, T. Parnell, T. Mittelholzer, H. Pozidis, T. Grin, G. Tressler,
T. Fisher, and C. Camp, “Eect of Read Disturb on Incomplete Blocks in MLC
NAND Flash Arrays,” in IMW, 2016.
[90] N. Papandreou, T. Parnell, H. Pozidis, T. Mittelholzer, E. Eleftheriou, C. Camp,
T. Grin, G. Tressler, and A. Walls, “Using Adaptive Read Voltage Thresholds
to Enhance the Reliability of MLC NAND Flash Memory Systems,” in GLSVLSI,
2014.
[91] J. Park, J. Jeong, S. Lee, Y. Song, and J. Kim, “Improving Performance and Lifetime
of NAND Storage Systems Using Relaxed Program Sequence,” in DAC, 2016.
[92] K.-T. Park, M. Kang, D. Kim, S.-W. Hwang, B. Y. Choi, Y.-T. Lee, C. Kim, and
K. Kim, “A Zeroing Cell-to-Cell Interference Page Architecture with Temporary
LSB Storing and Parallel MSB Program Scheme for MLC NAND Flash Memories,”
JSSC, 2008.
[93] K. Park et al., “Three-Dimensional 128 Gb MLC Vertical NAND Flash Memory
With 24-WL Stacked Layers and 50 MB/s High-Speed Programming,” J. Solid-
State Circuits, Jan. 2015.
[94] T. Parnell, “NAND Flash Basics & Error Characteristics: Why Do We Need Smart
Controllers?” in Flash Memory Summit, 2016.
[95] T. Parnell and R. Pletka, “NAND Flash Basics & Error Characteristics,” in Flash
Memory Summit, 2017.
[96] T. Parnell, N. Papandreou, T. Mittelholzer, and H. Pozidis, “Modelling of the
Threshold Voltage Distributions of Sub-20nm NAND Flash Memory,” in GLOBE-
COM, 2014.
[97] M. Patel, J. Kim, and O. Mutlu, “The Reach Proler (REAPER): Enabling the Mit-
igation of DRAM Retention Failures via Proling at Aggressive Conditions,” in
ISCA, 2017.
[98] A. Pirovano, A. L. Lacaita, F. Pellizzer, S. A. Kostylev, A. Benvenuti, and R. Bez,
“Low-Field Amorphous State Resistance and Threshold Voltage Drift in Chalco-
genide Materials,” TED, 2004.
[99] M. Qureshi, D. H. Kim, S. Khan, P. Nair, and O. Mutlu, “AVATAR: A Variable-
Retention-Time (VRT) Aware Refresh for DRAM Systems,” in DSN, 2015.
[100] M. K. Qureshi, V. Srinivasan, and J. A. Rivers, “Scalable High Performance Main
Memory System Using Phase-Change Memory Technology,” in ISCA, 2009.
[101] K. Razavi, B. Gras, E. Bosman, B. Preneel, C. Guirida, and H. Bos, “Flip Feng
Shui: Hammering a Needle in the Software Stack,” in USENIX Security, 2016.
[102] Samsung Electronics Co., Ltd., “Samsung V-NAND Technology,” http://www.
samsung.com/us/business/oem-solutions/pdfs/V-NAND_technology_WP.pdf.
2014.
[103] B. Schroeder, R. Lagisetty, and A. Merchant, “Flash Reliability in Production: The
Expected and the Unexpected,” in FAST, 2016.
[104] B. Schroeder, E. Pinheiro, and W.-D. Weber, “DRAM Errors in the Wild: A Large-
Scale Field Study,” in SIGMETRICS, 2009.
[105] A. Schushan, “Refreshing of Memory Blocks Using Adaptive Read Disturb
Threshold,” U.S. Patent Appl. No. 20140173239, 2014.
[106] M. Seaborn and T. Dullien, “Exploiting the DRAM Rowhammer Bug to Gain
Kernel Privileges,” Google Project Zero Blog, 2015.
[107] M. Seaborn and T. Dullien, “Exploiting the DRAM Rowhammer Bug to Gain
Kernel Privileges,” in BlackHat, 2015.
[108] H. Shim, S.-S. Lee, and B. Kim, “Highly Reliable 26nm 64Gb MLC E2NAND
(Embedded-ECC & Enhanced-Eciency) Flash Memory with MSP (Memory Sig-
nal Processing) Controller,” in VLSIT, 2011.
[109] S. Sills, S. Yasuda, A. Calderoni, C. Cardon, J. Strand, K. Aratani, and N. Ra-
maswamy, “Challenges for High-Density 16Gb ReRAM with 27nm Technology,”
in VLSIC, 2015.
[110] S. Sills, S. Yasuda, J. Strand, A. Calderoni, K. Aratani, A. Johnson, and N. Ra-
maswamy, “A Copper ReRAM Cell for Storage Class Memory Applications,” in
VLSIT, 2014.
[111] V. Sridharan, J. Stearley, N. DeBardeleben, S. Blanchard, and S. Gurumurthi,
“Feng Shui of Supercomputer Memory: Positional Eects in DRAM and SRAM
Faults,” in SC, 2013.
[112] V. Sridharan, N. DeBardeleben, S. Blanchard, K. B. Ferreira, J. Stearley, J. Shalf,
and S. Gurumurthi, “Memory Errors in Modern Systems: The Good, The Bad,
and the Ugly,” in ASPLOS, 2015.
[113] D. B. Strukov, G. S. Snider, D. R. Stewart, and R. S. Williams, “The Missing Mem-
ristor Found,” Nature, 2008.
[114] K.-D. Suh, B.-H. Suh, Y.-H. Lim, and J.-K. Kim, “A 3.3V 32 Mb NAND Flash Mem-
ory with Incremental Step Pulse Programming Scheme,” JSSC, 1995.
[115] K. Takeuchi, S. Satoh, T. Tanaka, K. Imamiya, and K. Sakui, “A Negative Vth
Cell Architecture for Highly Scalable, Excellently Noise-Immune, and Highly
Reliable NAND Flash Memories,” JSSC, 1999.
[116] H. Tanaka et al., “Bit Cost Scalable Technology with Punch and Plug Process for
Ultra High Density Flash Memory,” in VLSIT, 2007.
[117] V. van der Veen, Y. Fratanonio, M. Lindorfer, D. Gruss, C. Maurice, G. Vigna,
H. Bos, K. Razavi, and C. Guirida, “Drammer: Deterministic Rowhammer At-
tacks on Mobile Platforms,” in CCS, 2016.
[118] Y. Wang, W.-K. Yu, S. Wu, G. Malysa, and G. E. Suh, “Flash Memory for Ubiqui-
tous Hardware Security Functions: True Random Number Generation and De-
vice Fingerprints,” in SP, 2012.
[119] Y. Wang, W.-K. Yu, S. Q. Xu, E. Kan, and G. E. Suh, “Hiding Information in Flash
Memory,” in SP, 2013.
[120] H. A. R. Wegener, A. J. Lincoln, H. C. Pao, M. R. O’Connell, R. E. Oleksiak, and
H. Lawrence, “The Variable Threshold Transistor, A New Electrically-Alterable,
Non-Destructive Read-Only Storage Device,” in IEDM, 1967.
[121] H.-S. P. Wong, H.-Y. Lee, S. Yu, Y.-S. Chen, Y. Wu, P.-S. Chen, B. Lee, F. T. Chen,
and M.-J. Tsai, “Metal-Oxide RRAM,” Proc. IEEE, 2012.
[122] H.-S. P. Wong, S. Raoux, S. Kim, J. Liang, J. P. Reifenberg, B. Rajendran,
M. Asheghi, and K. E. Goodson, “Phase Change Memory,” Proc. IEEE, 2010.
[123] Y. Xiao, X. Zhang, Y. Zhang, and R. Teodorescu, “One Bit Flips, One Cloud Flops:
Cross-VM Row Hammer Attacks and Privilege Escalation,” in USENIX Security,
2016.
[124] S. Q. Xu, W.-K. Yu, G. E. Suh, and E. Kan, “Understanding Sources of Variations
in Flash Memory for Physical Unclonable Functions,” in IMW, 2014.
[125] H. Yoon, J. Meza, N. Muralimanohar, N. P. Jouppi, and O. Mutlu, “Ecient Data
Mapping and Buering Techniques for Multi-Level Cell Phase-Change Memo-
ries,” TACO, 2014.
[126] J. H. Yoon, “3D NAND Technology – Implications to Enterprise Storage Appli-
cations,” in Flash Memory Summit, 2015.
[127] J. H. Yoon, R. Godse, G. Tressler, and H. Hunter, “3D-NAND Scaling and 3D-SCM
— Implications to Enterprise Storage,” in Flash Memory Summit, 2017.
[128] Z. Zhang, W. Xiao, N. Park, and D. J. Lilja, “Memory Module-Level Testing and
Error Behaviors for Phase Change Memory,” in ICCD, 2012.
[129] P. Zhou, B. Zhao, J. Yang, and Y. Zhang, “A Durable and Energy Ecient Main
Memory Using Phase Change Memory Technology,” in ISCA, 2009.
[130] L. Zuolo, C. Zambelli, R. Micheloni, and M. Indaco, “SSDExplorer: A Virtual
Platform for Performance/Reliability-Oriented Fine-Grained Design Space Ex-
ploration of Solid State Drives,” TCAD, 2015.
10
