Revitalizing Copybacks in Modern SSDs: Why and How by Hong, Duwon et al.
ar
X
iv
:1
81
0.
04
60
3v
1 
 [c
s.O
S]
  1
0 O
ct 
20
18
Revitalizing Copybacks in Modern SSDs: Why and How
Duwon Hong, Myungsuk Kim, Jisung Park, †Myoungsoo Jung, and Jihong Kim,
Seoul National University and †Yonsei University
ABSTRACT
For modern flash-based SSDs, the performance overhead of inter-
nal data migrations is dominated by the data transfer time, not by
the flash program time as in old SSDs. In order to mitigate the per-
formance impact of data migrations, we propose rcopyback, a
restricted version of copyback. Rcopyback works like the orig-
inal copyback except that only n consecutive copybacks are al-
lowed. By limiting the number of successive copybacks, it guaran-
tees that no data reliability problem occurs when data is internally
migrated using rcopyback. In order to take a full advantage of
rcopyback, we developed arcopyback-aware FTL, rcFTL, which
intelligently decides whether rcopyback should be used or not
by exploiting varying host workloads. Our evaluation results show
that rcFTL can improve the overall I/O throughput by 54% on average
over an existing FTL which does not use copybacks.
KEYWORDS
Copyback, NAND flash memory, FTL, Storage system
1 INTRODUCTION
Flash-based SSDs move a large amount of data internally for sup-
porting various SSD management tasks such as garbage collection
(GC), wear leveling and reliability enhancement. For example, be-
cause of the erase-before-write constraint in the NAND flash mem-
ory, GC is required to reclaim invalid pages for future writes. Dur-
ing GC, valid pages of a GC victim block should be migrated to a
new target block with free pages. Since these internal data copy
operations directly interfere with I/O requests from user applica-
tions, how to efficiently handle internal data migrations is a key
challenge for designing a high-performance SSD.
Although there have been extensive investigations (e.g., [6, 8, 9,
12? ]) to mitigate the impact of internal data migrations on the SSD
performance, most existing techniques do not adequately handle a
new performance bottleneck of copy operations in modern SSDs.
Unlike old SSDs where the copy cost was dominated by the flash
program time tPROG , in recent high-end SSDs, the data transfer
time tDMA between flash cells and off-chip DRAM takes a large
portion of the copy cost. This shift in the performance bottleneck
is due to two recent flash/SSD technology changes: 1) innovations
in the flash cell design (which reduced tPROG ) [11] and 2) a high
degree of the internal parallelism in high-end SSDs (which results
in frequent access collisions on a shared medium (e.g., a channel
bus or a serial DRAM bus) between flash cells and off-chip DRAM.)
In order to minimize tDMA, copyback operations [2] are the
most effective solution because the copyback operation can move
pages within an SSD without off-chip data transfers, thus elimi-
nating tDMA completely. However, copyback operations are rarely
used in modern SSDs because they cause a fatal reliability problem.
When pages are migrated using copyback operations, they bypass
an off-chip error-correction code (ECC) module and bit errors oc-
curred during copyback operations are accumulated. If the number
of the accumulated bit errors exceeds the correction capacity of
the ECC module, the stored data in the copybacked page becomes
unreadable. Furthermore, since tPROG was responsible for a large
portion of the data migration time in old SSDs, the performance
improvement from copyback operations was marginal. In this pa-
per, however, we argue that it is time to revitalize old copyback
operations for modern SSDs.
We revisit copyback operations in the context of modern high-
density flashmemory and propose a restricted version of copyback,
called rcopyback, which works like the original copyback ex-
cept that data migrated by n successive rcopyback operations
must be error-corrected by an off-chip ECC module. The proposed
rcopyback technique is based on a simple observation on the er-
ror propagation characteristics of successive copyback operations.
From our characterization study with recent 1x nm-node NAND
chips, we observed that if we properly limit the number of consec-
utive copyback operations, accumulated bit errors can be within
the error correction capability of a common ECC scheme. Further-
more, we observed that the overhead of internal data migrations is
significantly reduced even when only a small number of copyback
operations can be successively used. For example, when only two
consecutive copyback operations are possible, tDMA is effectively
reduced by 1/3.
Based on the rcopyback model from a detailed characteri-
zation study using 1x nm-node NAND MLC chips, we designed
a new FTL, called rcFTL, which takes advantage of rcopyback
for data migrations. In addition to basic extensions for support-
ing rcopyback, rcFTL implements an intelligent data-migration
mode selector for maximizing the effect of rcopyback on the
SSDperformance. Themode selector decideswhetherrcopyback
or an off-chip copy operation is used for a given data migration.
For light-load intervals, rcFTL uses the off-chip copy mode, which
increases the number of future rcopyback-eligible blocks. On
the other hand, for heavy-load intervals, rcFTL maximally utilizes
rcopyback for higher I/O performance.
We have evaluated rcFTL using various benchmarks on our SSD
emulation environment [7]. Our experimental results show that
rcFTL can improve the overall IO throughput by 54% on average
over a conventional FTL without copyback support. We also show
that the proposed migration mode selector is effective in maximiz-
ing the efficient use of rcopyback under varying workload re-
quirements.
Data Migration by Copyback
…
…
Plane 0
…
NAND die
Plane 1
…
1011 1001
1011
1001
1011 1001
Channel
Channel
Channel
Channel
FMC ECC
DRAM 
Buffer
Plane 0
đ
NAND die
Plane 1
đ
Plane 0
đ
NAND die
Plane 1
đ
1011
1011
1001
1001 0011 1000
0011 1000
Data migration by off-chip copy
Way Way WayWay
R
e
a
d
P
ro
g
ra
m
R
e
a
d
 f
o
r 
C
o
p
y
b
a
ck
P
ro
g
ra
m
 f
o
r 
C
o
p
y
b
a
ck
D
M
A
o
u
t
D
M
A
in
Buffer 
Controller
FMC ECC
FMC ECC
FMC ECC
Figure 1: A data path comparison between an off-chip data
migration and an internal copyback.
2 MOTIVATIONS
A typical data migration in SSDs is performed by an off-chip data
copy as shown in the left dotted box of Fig. 1. An SSD firmware
reads data from a source page and transfers the data to a DRAM
buffer through a channel bus. Before the data are sent to the DRAM
buffer, errors are corrected by the ECCmodule of the flash memory
controller (FMC). In the program phase, in order to move the data
back to the target page, the SSD firmware takes a reverse data path
from the DRAM buffer to the target page. When no contention oc-
curs along the off-chip copy data path, the data copy latency tCOPY
can be expressed as follows: tCOPY = tR + tDMAout + tDMAin +
tPROG where tR , tDMAout and tDMAin are a data transfer time
from NAND cells to a per-plane register and a DMA out/in time
between the register and DRAM buffer, respectively.
However, in a modern SSD which consists of multiple channels
and multiple NAND dies per channel, a large number of data mi-
grations may occur at the same time. A high degree of the paral-
lelism in data migrations may significantly increase tDMAin and
tDMAout because of contentions on the channel level as well as
the serial bus to/from the DRAM buffer. For example, when eight
data migrations are concurrently requested by the SSD firmware,
if all eight migrations had both the source page and destination
page on the same channel, tDMAin and tDMAout may increase by
eight times because all data transfers should be serialized.
On the other hand, when a copyback command is supported by
a NAND flash chip, a data migration can be performed without re-
quiring neither tDMAout nor tDMAin as shown in the right dotted
box of Fig. 1. The SSD firmware can read data from the source page
to the per-plane local register and directly write back to the desti-
nation page from the per-plane local register. Since the copyback
command transfers data within a given plane, even when multi-
ple data migrations occur at the same time, if they can be sup-
ported by the copyback command, all data migrations can com-
plete by (tR + tPROG ). Thus, if the copyback command can be sup-
ported, it can significantly reduce the overhead from SSD-internal
data migrations. Unfortunately, however, the copyback command
is rarely used in modern SSDs because it accumulates all the bit er-
rors occurred to a page during itsmigrations. Since, in older NAND
flash memory, tPROG , which is much larger than tDMA, dominated
tCOPY , little effort was made to overcome the error propagation
problem of the copyback command.
In order to develop an effective solution to revitalize the copy-
back for modern SSDs, our proposed technique is motivated by an
observation on internal data migration characteristics of storage
0.77
0.86
0.91 0.94
0.96 0.97
0.0
0.2
0.4
0.6
0.8
1.0
0.0
0.1
0.2
0.3
0.4
0.5
0 4 8 12 16 20 24
C
u
m
u
la
ti
v
e 
 P
ro
b
ab
il
it
y
P
ro
b
ab
il
it
y
No. of Data Migrations
Probability Cumulative Probability
Figure 2: A probability distribution of internal data migra-
tions.
workloads:most pages migrate internally just a few times. For exam-
ple, Fig. 2 shows a probability distribution of internal data migra-
tions in RocksDBunder the append-randomworkload of db_bench.
77% of pages migrate less than five times. Therefore, if we could
support 4 consecutive copybackswithout causing any flash reliabil-
ity problem, about 86% of off-chip data migrations can be avoided
in this workload.
3 RCOPYBACK: COPYBACK WITH A
THRESHOLD
3.1 Copyback Error-Propagation
Characteristics
In order to manage the flash reliability problem caused by succes-
sive copyback operations, it is important to understand the error
propagation characteristics when the same page experiences con-
secutive copyback operations without the ECC module’s involve-
ment. Using 1x nm-node MLC NAND chips, we conducted exper-
iments using a total of 81,920 pages out of 20 NAND chips. First,
we confirmed that, in our tested MLC NAND blocks, which con-
sist of 64 word lines (WLs, each of which can store two pages, the
MSB page and LSB page), MSB pages of WL 621 were the most
unreliable because the outer WLs, the more vulnerable to noise
(e.g., hot-carrier effects and gate-induced drain leakage (GIDL))
and the more disturbed from Vpass. Since we are interested in find-
ing a safe bound on the number of consecutive copybacks over all
the possible data migrations, we focus on understanding the error
propagation characteristics when both the source and destination
pages are in WL 62, which is the worst combination from the bit-
error rate (BER) perspective.
As with other NAND flash reliability evaluations, we used the
NAND retention BER as our measurement metric. (The NAND re-
tention BER is based on the number N (x, t) of bit errors after t-
month retention time at 30◦C for x pre-cycled NAND cells [10].)
For a given upper limit on the number of consecutive copyback
operations, we measured N (x, t) values while changing both x’s
(i.e., P/E cycles) and t ’s (i.e., retention times). Fig. 3(a) shows how
retention BER’s change as the number of successive copybacks in-
creases. Retention BER values were normalized over N (0, 0). For
example, N (3K , 1 year ) values increase almost linearly over the
number of consecutive copybacks. When a block is erased by 3,000
1In fact, WL 63 is the most unreliable WL. Because of the reliability problem, WL 63
is configured to work as SLC cells.
2
13
5
7
9
0 1 2 3 4 5
N
o
rm
al
iz
ed
 B
E
R
No. of Successive Copybacks
Right after 1K P/E's Right after 3K P/E's
1 year after 3K P/E's
Maximum ECC limit
max.
avg.
min.
“uncorrectable errors”
(a) BER variations over successive
copybacks under varying P/E cy-
cles.
0
1
2
3
4
5
6
0 1000 2000 3000
C
o
p
y
b
ac
k
 T
h
re
sh
o
ld
No. of P/E Cycles
3 months 6 months 1 year
(b) Changes in copyback
threshold values under
varying P/E cycles.
Figure 3: The effect of successive copybacks on the reliabil-
ity.
times, if the 1-year data retention time is required, copyback oper-
ations cannot be successively used for more than twice.
3.2 Rcopyback Operation Model
From our characterization study on the copyback error propaga-
tion, we constructed a table of copyback threshold CT (x, t) val-
ues for x P/E-cycled NAND blocks with t-month retention require-
ment. The CT (x, t) value indicates the maximum number of con-
secutive copyback operations that does not cause any reliability
problem for x P/E-cycled blocks when t-month data retention is
required. Fig. 3(b) shows how copyback threshold values change
under varying P/E cycles and different retention time requirements
for our evaluated MLC NAND chips. As the retention time require-
ment increases, the copyback threshold decreases for the same P/E
cycle. For the same retention time requirement, as expected, the
number of P/E cycles strongly affects the copyback threshold. For
the 1-year data retention requirement at 30◦C (which is the JEDEC
client class retention requirement), the copyback threshold value
decreases from 5 to 2 as the P/E cycle increases from 0 to 3,000.
That is, after 3K P/E cycles, the copyback command can be consec-
utively used only twice. If the third data migration is required on
the same page, the page must be migrated using an off-chip data
copy, thus the accumulated bit errors can be corrected by the ECC
engine of the FMC. Table 1 summarizes our proposedrcopyback
operation model with the 1-year data retention requirement based
on the copyback threshold values described in Fig. 3(b).
4 DESIGN AND IMPLEMENTATION OF RCFTL
Based on the proposed rcopyback model presented in Section
3, we implemented rcFTL which efficiently uses rcopyback oper-
ation for data migrations. Fig. 4 shows an overall organization of
rcFTL. RcFTL, which is based on an existing page-level mapping FTL,
consists of two additional modules, the error propagation manage-
ment (EPM) module and the data migration mode selector (DMMS)
Table 1: Rcopyback operation model.
P/E cycle 1-1000 1001-2000 2001-3000
Copyback Threshold 4 3 2
Selected victim
Garbage Collector
Write Request
Copyback count
NAND Flash Memory
CopybackRead Program Erase
Extended Mapping Table
Per-Block Copyback 
Count Management
Multiple Active 
Block Management
Copyback 
Threshold
Error Propagation Management (EPM)
Mapping 
Table
Write Buffer
Rcopyback Off-Chip Copy
Data Migration Mode Selector 
(DMMS)
Utilization
Wear leveler
Migration 
mode
Foreground Background
Figure 4: An organizational overview of rcFTL.
module. The EPM module efficiently manages various data struc-
tures for supportingrcopybackwhile the DMMSmodule selects
the most appropriate data copy mode for a given data migration re-
quest.
4.1 Error Propagation Management
Themain function of the EPMmodule is tomonitor the cumulative
number of successive copyback operations for each page so that
no page can be rcopybacked more than the copyback thresh-
old. A simple approach to keep track of the cumulative count is
to maintain a per-page counter which is incremented whenever
rcopyback is used for the page. However, for recent high-capacity
NAND flash memory, the space overhead of per-page counting is
quite high. For example, 1.4-GB memory is needed for supporting
a 3-bit per-page counter for a 16-TB SSD. In addition, in highly-
optimized commercial SSDs, updating the per-page counter (in slower
DRAM memory) can incur a significant CPU cycle overhead as
well because memory accesses are optimized to occur in SRAM
memory for higher performance. In order to avoid the overhead of
per-page counting, the EPM module employs a per-block counting
approach. That is, the cumulative number of rcopyback opera-
tions is manged at the block level, not at the page level. Since the
number of counters for the per-blocking counting is at least two
orders of magnitude smaller than that for the per-page counting,
the per-block counting technique significantly reduces the mem-
ory footprint for maintaining counters and minimizes the comput-
ing overhead of bookkeeping operations to a negligible level.
Since all the pages in a block are assumed to have been migrated
by the same number of rcopyback operations in the per-block
counting scheme, when a source page p in a block b(c) with the
counter value c is migrated by rcopyback, the page p should
be moved to a page in a block b(c ′) where c ′ = c + 1. In order
to efficiently support this additional constraint, the EPM module
manages multiple active blocks at the same time. If the maximum
copyback threshold value is given byMcpb , the EPMmodulemain-
tains (Mcpb + 1) active blocks, b0, ...,bMcpb , where bi indicates a
block with its counter value i . Fig. 5 shows an example of how
data migrations are performed using rcopyback operations in
the per-block counting scheme. For example, when the blockvb(1)
3
GH
A C
D
E
F…
(Mcpb+1) active blocks per plane
b(0) b(1) b(2) b(Mcpb)
Invalid
Invalid
Invalid
D
Invalid
Invalid
Invalid
Invalid
…
vb(0) vb(1) vb(Mcpb-1) vb(Mcpb)
A
B
C
G
H
Invalid
E
F
B
CopybackCopyback Off-chip copy
Figure 5: Datamigrations in the per-block counting scheme.
is selected as a GC victim block, its valid pages, C and D, are moved
to the active blockb(2)when they aremigrated using rcopyback
operations. When the block vb(Mcpb ) is selected as a GC victim
block, its valid pages are moved using off-chip copies to the active
block b(0).
4.2 Data Migration Mode Selection
Since the copyback threshold is rather small, using rcopyback
in a greedy fashion may not be the most effective use of it from an
overall I/O throughput perspective. For example, when no high I/O
throughput is required, it does not make sense to use rcopyback
for data migrations. Doing so may prevent more effective future
use of rcopybackwhen high I/O throughput is needed. Further-
more, when the high I/O bandwidth is not necessary, using off-
chip data migrations enables more future data migrations to be
supported by rcopyback.
In order to take full advantages of rcopyback, the DMMS
module intelligently chooses when to use rcopyback operation
over a normal off-chip copy depending on the write buffer utiliza-
tion ratio u . When u is low, which indicates that the current host
I/O workload is not intensive, the DMMS module selects the off-
chip copy mode so that more future data migrations can be sup-
ported by rcopyback. On the other hand, when u is high, the
DMMS module chooses the rcopyback mode for higher perfor-
mance. In our current implementation, the utilization threshold ra-
tio for themode selectionwas set to 50%. (That is, ifu is higher than
50%, the rcopyback mode is used for data migrations.) Since
rcFTL employs the per-block counting scheme and most data mi-
gration decisions are made in a block granularity, the DMMS mod-
ule makes its mode selection decisions in a per-block level as well.
When a data migration decision is made (e.g., by a foreground GC
task), the DMMS module selects a proper mode based on the cur-
rent u value. In order to filter out abrupt noise-like changes in u ,
the DMMS module makes its mode selection based on a t-second
moving average of u . In the current implementation, t is set to an
average block write time.
In rcFTL, both the garbage collector and wear leveler operate
in an rcopyback-aware fashion. For urgent management tasks
(such as a foreground GC task), the rcopybackmode is actively
used regardless of the current u ratio value. On the other hand,
when background management tasks (such as a background GC
Table 2: I/O characteristics of traces used for evaluations.
OLTP NTRX Fileserver Varmail
Read:Write 7:3 0.5:9.5 4:6 4:6
WAF 2.17 2.11 3.08 1.8
1
1.2
1.4
1.6
1.8
OLTP NTRX FileServer Varmail Average
N
o
rm
al
iz
ed
 I
/O
 T
h
ro
u
g
h
p
u
t
rcFTL2 rcFTL3 rcFTL4
(a) Normalized I/O throughput.
1
1.2
1.4
1.6
1.8
High Mid Low
N
o
rm
al
iz
ed
 I
/O
 T
h
ro
u
g
h
p
u
t
rcFTL2-- rcFTL2 rcFTL4-- rcFTL4
(b) Effect of the mode selec-
tor.
Figure 6: Performance comparisons of different rcFTL ver-
sions.
task) are invoked, the DMMS module decides proper modes as ex-
plained above.
5 EXPERIMENTAL RESULTS
5.1 Experimental Setup
In order to evaluate the effectiveness of the proposed rcFTL tech-
nique, we implemented rcFTL as a host-level FTL on a custom flash
storage system [7]. For our evaluation, we configured our flash
storage system to support a 64-GB storage capacity only for effi-
cient experimental evaluations. Our emulated storage system was
configured to have eight channels with eight NAND flash chips per
channel. Each NAND flash chip has 1024 blocks which are com-
posed of 64 16-KB pages. The average tPROG was set to 640 us and
the size of the write buffer was set to 10MB.We evaluated rcFTL us-
ing four I/O traces generated from Sysbench [3] and Filebench [1].
As shown in Table 2, each workload has different ratios between
read and write and different WAF values. Using these workloads,
we evaluated the overall I/O throughput for three different rcFTL
versions, rcFTL2, rcFTL3, and rcFTL4, where rcFTLn indicates that the
maximumcopyback thresholdwas set to n. Allmeasurements were
normalized over a page-level mapping FTL which always migrates
data using the off-chip copy.
5.2 Evaluation Results
Fig. 6(a) shows normalized I/O throughputs of different rcFTL ver-
sions. As the copyback threshold increases, the I/O throughput in-
creases accordingly because more data migrations are supported
by rcopyback. The overall I/O throughput was improved on av-
erage by 54% in rcFTL4 over the baseline FTL. Even rcFTL2, which
can use rcopyback only twice in a row, outperforms the base-
line by 41% on average. As the maximum copyback threshold in-
creases, the I/O throughput of NTRX quickly increases over other
traces. This difference comes from the difference in update patterns
of each trace. In general, when data are updated sequentially (as in
Varmail), it is less likely that data are moved multiple times, thus
4
making rcFTL with a higher maximum copyback threshold less ef-
ficient.
In order to understand how the mode selector proposed in rcFTL
performs, we compared the performance of rcFTLwith rcFTL– (which
uses rcopyback in a greedy fashion). Fig. 6(b) shows how these
two rcFTL versions compare under varying I/O intensity cases. In
order to generate workload fluctuations, which are needed to prop-
erly evaluate the DMMS module, we generated three synthetic
workloads, High, Mid and Low, using Fio. In High, 70% of I/O re-
quests were issued without inter-request idle times while 30%were
issued with some idle times. For Mid and Low, the ratio between
two requests is 50:50 and 30:70, respectively. When the I/O inten-
sity is lower, since the off-chip copy mode is more likely to be used
in rcFTL, rcopyback-eligible blocks tend to increase over rcFTL–
because the per-block counters of more blocks are reset. The in-
creased number of rcopyback-eligible blocks, in turn, improves
the I/O throughput when the I/O intensity is high. In Fig. 6(b),
rcFTL2 outperforms rcFTL2– by this effect. In particular, for the Low
case, rcFTL2 improves the I/O throughput by 17% over rcFTL2–.
6 RELATED WORK
There have been several studies to improve the performance of
flash-based storage systems with the copyback operation. How-
ever, many existing techniques [4, 13, 14] are not applicable for
modern NAND flash memory because they assumed an ideal SLC
NAND flash memory where no error propagation occurs from suc-
cessive copyback commands. Other studies such as Jang et al. [5]
considered the error propagation problem in their techniques. How-
ever, their solutions was to bring data out to the ECC module to
check the validity of data, thus minimizing the potential benefit
of using copyback. Our technique is different from existing tech-
niques in that the error propagation problem is fully controlled
while maximizing the potential benefit of copyback.
7 CONCLUSIONS
We have presented rcopyback to minimize performance degra-
dations from internal data migrations in modern highly-parallel
SSDs. From an experimental characterization study, we developed
a rcopyback operation model that takes as the key input the P/E
cycle and data retention requirement. Based on the rcopyback
operationmodel, we have implemented arcopyback-aware FTL,
rcFTL, which intelligently manages when to use rcopyback for
a given I/O workload requirement. Our experimental results show
that rcFTL can improve the overall I/O throughput by 54% on aver-
age over an existing FTL with no copyback supported.
REFERENCES
[1] [n. d.]. Filebench. http://filebench.sourceforge.net. ([n. d.]).
[2] [n. d.]. NAND Flash Performance Improvement Using Internal Data Move. Tech-
nical Note 29-15. http://download.micron.com/pdf/technotes/nand/tn2915.pdf.
([n. d.]).
[3] [n. d.]. Sysbench. http://github.com/akopytov/sysbench. ([n. d.]).
[4] Abdul R Abdurrab et al. 2013. DLOOP: A flash translation layer exploiting plane-
level parallelism. In Proc. Int’l Symp. Parallel and Distributed Processing. 908–918.
[5] Woo Tae Chang et al. 2014. An Efficient Copy-Back Operation Scheme Using
Dedicated Flash Memory Controller in Solid-State Disks. Int’l Journal of Electri-
cal Energy 2, 1 (2014), 13–17.
[6] Aayush Gupta et al. 2009. DFTL: a flash translation layer employing demand-
based selective caching of page-level addressmappings. In Proc. Int’l Conf. Archi-
tectural Support for Programming Languages and Operating Systems. 229–240.
[7] Sang-Woo Jun et al. 2015. Bluedbm: An appliance for big data analytics. In Proc.
Int’l Symp. Computer Architecture. 1–13.
[8] Jeong-Uk Kang et al. 2006. A superblock-based flash translation layer for NAND
flash memory. In Proc. Int’l Conf. Embedded Software. 161–170.
[9] Jesung Kim et al. 2002. A space-efficient flash translation layer for CompactFlash
systems. IEEE Trans. Consumer Electronics 48, 2 (2002), 366–375.
[10] Myungsuk Kim et al. 2017. Improving performance and lifetime of large-page
NAND storages using erase-free subpage programming. In Proc. Design Automa-
tion Conf.
[11] Seungjae Lee et al. 2016. A 128Gb 2b/cell NAND flash memory in 14nm technol-
ogy with tPROG= 640µs and 800MB/s I/O rate. In Proc. Int’l Solid-State Circuits
Conf. 138–139.
[12] Sang-Won Lee et al. 2006. FAST: An efficient flash translation layer for flash
memory. In Proc. Int’l Conf. Embedded and Ubiquitous Computing. 879–887.
[13] Yoon Jae Seong et al. 2010. Hydra: A block-mapped parallel flash memory solid-
state disk architecture. IEEE Trans. Computers 59, 7 (2010), 905–921.
[14] Wei Wang and Tao Xie. 2015. PCFTL: A plane-centric flash translation layer
utilizing copy-back operations. IEEE Trans. Parallel and Distributed Systems 26,
12 (2015), 3420–3432.
5
