Data Representation for Efficient and Reliable Storage in Flash Memories by Wang, Yue
DATA REPRESENTATION FOR EFFICIENT AND RELIABLE STORAGE IN
FLASH MEMORIES
A Dissertation
by
YUE WANG
Submitted to the Office of Graduate Studies of
Texas A&M University
in partial fulfillment of the requirements for the degree of
DOCTOR OF PHILOSOPHY
Approved by:
Chair of Committee, Anxiao (Andrew) Jiang
Committee Members, Andreas Klappenecker
Jennifer Welch
Henry Pfister
Head of Department, Hank Walker
May 2013
Major Subject: Computer Science
Copyright 2013 Yue Wang
ABSTRACT
Recent years have witnessed a proliferation of flash memories as an emerging
storage technology with wide applications in many important areas. Like magnetic
recording and optimal recording, flash memories have their own distinct properties
and usage environment, which introduce very interesting new challenges for data
storage. They include accurate programming without overshooting, error correction,
reliable writing data to flash memories under low-voltages and file recovery for flash
memories. Solutions to these problems can significantly improve the longevity and
performance of the storage systems based on flash memories.
In this work, we explore several new data representation techniques for efficient
and reliable data storage in flash memories. First, we present a new data repre-
sentation scheme—rank modulation with multiplicity—to eliminate the overshooting
and charge leakage problems for flash memories. Next, we study the Half-Wits—
stochastic behavior of writing data to embedded flash memories at voltages lower
than recommended by a microcontroller’s specifications—and propose three software-
only algorithms that enable reliable storage at low voltages without modifying hard-
ware, which can reduce energy consumption by 30%. Then, we address the file
erasures recovery problem in flash memories. Instead of only using traditional error-
correcting codes, we design a new content-assisted decoder (CAD) to recover text
files. The new CAD can be combined with the existing error-correcting codes and
the experiment results show CAD outperforms the traditional error-correcting codes.
ii
ACKNOWLEDGEMENTS
Foremost, the greatest gratitude is extended to my advisor, Dr. Anxiao (An-
drew) Jiang, for his thoughtful advice and guidance. He quickly became for me the
role model of a successful researcher in the field. His dedication and passion on re-
search and education influenced me positively. His insights and perception on novel
approaches as well as on issues and challenges of active research areas inspired me
tremendously. He is open-minded and caring for students, which helps make my
research experience focused and fruitful. It is a great honor and pleasure to work
with him. Without Andrew’s encouragement and help, this thesis would not have
been possible.
I would like to express my appreciation to Dr. Andreas Klappenecker, Dr. Jen-
nifer Welch and Dr. Henry Pfister for serving on my degree committee. Their advice,
feedback, and encouragement have been invaluable.
In addition, I am indebted to my peer colleagues, Fenghui Zhang, Hao Li, Vishal
Kapoor, Shoeb Ahmed Mohammed, Qing Li and Yue Li for providing a stimulating
and fun environment in which to learn and grow. Those great discussions and fun
learning times will be memorable for years to come. I am especially thankful to
Hao Li who is always willing to share his experiences and provide help not only in
research but also in campus life.
Lastly, and most importantly, I must thank my parents and my husband, Yu Zhu,
for their unflagging love and concern. Without their support and encouragement,
this dissertation was simply impossible and I could not have gone this far. I also
greatly thank my son, Ryan, whose lovely and adorable smile ignites the light of my
life. To them I dedicate my dissertation.
iii
TABLE OF CONTENTS
Page
ABSTRACT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ii
ACKNOWLEDGEMENTS . . . . . . . . . . . . . . . . . . . . . . . . . . . . iii
TABLE OF CONTENTS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iv
LIST OF FIGURES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vi
LIST OF TABLES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viii
1. INTRODUCTION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1 Flash Memories and Their Properties . . . . . . . . . . . . . . . . . 1
1.1.1 Flash Cell Structure . . . . . . . . . . . . . . . . . . . . . . . 2
1.1.2 NOR and NAND Flash . . . . . . . . . . . . . . . . . . . . . . 3
1.1.3 Basic Operations in Flash Memory . . . . . . . . . . . . . . . 3
1.2 Challenges of Flash Memories . . . . . . . . . . . . . . . . . . . . . . 5
1.2.1 Accurate Programming without Overshooting . . . . . . . . . 5
1.2.2 Asymmetric Errors in Flash Memories . . . . . . . . . . . . . 6
1.2.3 Reliable Storage for Low-Power Devices . . . . . . . . . . . . . 6
1.2.4 File Recovery for Non-Volatile Memories . . . . . . . . . . . . 7
1.3 Contributions of This Work . . . . . . . . . . . . . . . . . . . . . . . 7
1.3.1 Rank Modulation with Multiplicity . . . . . . . . . . . . . . . 8
1.3.2 Half-Wits: Software Techniques for Embedded Flash Storage
at Low Voltages . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.3.3 Content-Assisted File Decoding for Non-Volatile Memories . . 9
1.4 Related Works . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.4.1 Rank Modulation for Flash Memories . . . . . . . . . . . . . . 10
1.4.2 Storage for Low-Power Embedded Devices . . . . . . . . . . . 11
1.4.3 Error-Correcting Codes for Storage . . . . . . . . . . . . . . . 11
2. RANK MODULATION WITH MULTIPLICITY . . . . . . . . . . . . . . 14
2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.1.1 Rank Modulation for Flash Memories . . . . . . . . . . . . . . 15
2.1.2 Existing Codes for Rank Modulation . . . . . . . . . . . . . . 15
2.1.3 Rank Modulation’s Drawback . . . . . . . . . . . . . . . . . . 16
2.1.4 Rank Modulation with Multiplicity . . . . . . . . . . . . . . . 16
2.1.5 Storage Capacity Improvement by Rank Modulation with Mul-
tiplicity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
iv
2.2 Basic Operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
2.3 Unweighted Rewriting Cost . . . . . . . . . . . . . . . . . . . . . . . 20
2.4 Sizes of Spheres . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
2.5 Weighted Rewriting Cost . . . . . . . . . . . . . . . . . . . . . . . . . 31
3. EXPLOITING HALF-WITS: SMARTER STORAGE FOR LOW-POWER
DEVICES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
3.1 Storage on Low-Power Devices: Limitations and Challenges . . . . . . 35
3.2 Behavior of Storage on Half-Wits . . . . . . . . . . . . . . . . . . . . 39
3.2.1 Experimental Methodology . . . . . . . . . . . . . . . . . . . . 41
3.2.2 Unreliable, Low-Voltage Flash Memory Writes . . . . . . . . . 42
3.2.3 Determining Factors That Affect Error Rates . . . . . . . . . 43
3.2.4 Accumulative Memory Behavior . . . . . . . . . . . . . . . . . 48
3.3 Design of a Low-Voltage Storage System . . . . . . . . . . . . . . . . 49
3.3.1 Modeling Low-Voltage Flash Memory . . . . . . . . . . . . . . 49
3.3.2 Design Goals . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
3.3.3 Proposed Methods . . . . . . . . . . . . . . . . . . . . . . . . 51
3.4 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
3.4.1 Comparison of the Proposed Storage Methods . . . . . . . . . 57
3.4.2 Half-Wits Versus Wits in Practice . . . . . . . . . . . . . . . . 59
3.4.3 Finding a Crossover Point . . . . . . . . . . . . . . . . . . . . 60
3.5 Improvements and Alternatives . . . . . . . . . . . . . . . . . . . . . 61
3.5.1 Sign Bits and Storing Complements . . . . . . . . . . . . . . . 61
3.5.2 Memory Mapping Table . . . . . . . . . . . . . . . . . . . . . 62
4. CONTENT-ASSISTED FILE DECODING FOR NON-VOLATILE MEM-
ORIES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
4.2 The Models of File Decoding . . . . . . . . . . . . . . . . . . . . . . . 66
4.2.1 Notations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
4.2.2 File Decoding Model . . . . . . . . . . . . . . . . . . . . . . . 68
4.3 The Content-Assisted Decoding Algorithms . . . . . . . . . . . . . . 70
4.3.1 Creating Dictionaries . . . . . . . . . . . . . . . . . . . . . . . 70
4.3.2 Codeword Segmentation . . . . . . . . . . . . . . . . . . . . . 70
4.3.3 Ambiguity Resolution . . . . . . . . . . . . . . . . . . . . . . 75
4.3.4 Post Processing . . . . . . . . . . . . . . . . . . . . . . . . . . 79
4.4 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
4.4.1 Implementation Detail . . . . . . . . . . . . . . . . . . . . . . 80
4.4.2 Performance Evaluation . . . . . . . . . . . . . . . . . . . . . 81
5. SUMMARIES AND FUTURE DIRECTIONS . . . . . . . . . . . . . . . . 83
5.1 Summaries and Contributions . . . . . . . . . . . . . . . . . . . . . . 83
5.2 Future Directions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
REFERENCES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
v
LIST OF FIGURES
FIGURE Page
1.1 The structure of a flash cell. . . . . . . . . . . . . . . . . . . . . . . . 2
2.1 The value of |Sn,λ| for λ = 1, 2, 3, 4. . . . . . . . . . . . . . . . . . . . 19
2.2 Change rank-modulation state from s to s′ with d(s, s′) pushes. . . . 27
3.1 Operating at a lower voltage and tolerating errors instead of the con-
ventional case of choosing the highest minimum voltage requirement
may help decrease energy consumption. Considering that Energy
= voltage2×time/resistance, decreasing voltage decreases the energy
consumption quadratically. . . . . . . . . . . . . . . . . . . . . . . . . 38
3.2 As operating voltage decreases, flash-write errors increase. (a) shows
an original ECG signal correctly stored at 2.0V (despite operating be-
low the recommended threshold). As the voltage decreases in (b) and
further in (c), erroneous writes (light-colored spikes, height varying
according to the magnitude of the error) become more common. The
back line shows the reconstructed signal that includes the errors. . . . 40
3.3 Flash write error rates decrease as voltage increases. This trend holds
for all the chips (MSP430F2131 and MSP430F1232) we tested, though
error rates differ even between chips of the same model. . . . . . . . . 44
3.4 As the Hamming weight (number of 1s in the binary representation) of
a number increases, the error rate of low-voltage flash write declines.
The data corresponds to a MSP430F2131 running at 1.84V. . . . . . 45
3.5 Worn-out flash memory blocks are biased toward ease of writing zeros.
Lighter color represents higher average number of error over 50 trials.
The middle block has been write/ease cycles 6000 times. The other
two blocks are minimally used. . . . . . . . . . . . . . . . . . . . . . . 46
3.6 Error rate of a cell is not noticeably influenced by the value of its
neighbor. The graph shows that the value of the second LSB does
not greatly affect the error rate of the LSB. The bars show the error
rate of the LSB for writing numbers from the same Hamming-weight
equivalence class whose two LSBs are set to either 00 (dark bars) or
to 10 (light bars). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
vi
3.7 Structure of input/output sequence of Berger code. . . . . . . . . . . 55
3.8 A diagram representing the RS-Berger code. An RS-Berger code is
the concatenation of the Reed Solomon code and a Berger code. . . . 56
3.9 Reliability improvement using in-place writes over five voltages. . . . 59
3.10 Reliability improvement using multiple-place writes over five voltages. 59
4.1 An example on correcting erasures in the codeword of a text. . . . . . 64
4.2 The channel model for data storage. . . . . . . . . . . . . . . . . . . . 68
4.3 The work-flow of a channel decoder with content-assisted decoding. . 69
4.4 The examples of codeword segmentation. In Figure (b): sets of words
means the subcodeword x[i] can be decoded to a word or word se-
quence chosen from any word in the word set. The → defines the
word sequence order. The cross × represents a subcodeword x[i] can
neither be decoded to a word nor to a word sequence. . . . . . . . . . 73
4.5 An illustrative example of the mapping to trellis decoding. The sets
W1 = {w1,1,w1,2}, W2 = {w2,1,w2,2,w2,3}, W3 = {w3,1,w3,2,w3,3}
and W4 = {w4,1,w4,2} respectively corresponds to the subcodewords
x1, x2, x3 and x4. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
4.6 The comparison on the correction performance of three decoders:
LDPC erasure hard decoder, CAD only and CAD+LDPC error soft
decoder. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
vii
LIST OF TABLES
TABLE Page
3.1 CPU vs flash memory voltage requirements . . . . . . . . . . . . . . . 36
3.2 Erroneous flash writes at low voltage. Insufficient electrical charge
may result in some bits failing to transition from 1 (the initial state)
to 0. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
3.3 Performance comparison of the proposed methods at 1.8V and 1.9V.
Error Correction Rate (ECR) shows the effectiveness of methods. . . 58
3.4 Energy consumption and execution time for the accelerometer sensor
application. At voltage below the recommended (1.8V and 1.9V),
in-place writes method with a threshold of two is used. . . . . . . . . 60
4.1 The benchmark used in our performance evaluation . . . . . . . . . . 81
viii
1. INTRODUCTION
The representation of data plays a key role in storage systems. The objective of
this thesis focuses on data representation techniques for efficient and reliable storage
in flash memories. In this chapter, we first introduce flash memories and their key
properties, then point out the main challenges of flash memories. We also describe
the contribution of our work to solve those challenges. In addition, we present an
overview of related works on flash memories.
1.1 Flash Memories and Their Properties
Flash memory, invented by Dr. Fujio Masuoka, is a type of non-volatile memory
that can be electrically erased and reprogrammed. Flash memory is considered by
system designers as an almost “ideal” non-volatile memory because it can be elec-
trically erased and programmed in-system, offer at the same time very high-density
and low cost-per-bit, random access, bit alterability, short read/write times and cycle
times, excellent reliability [9]. Flash memory is a milestone in the development of
the data storage technology. Due to its high performance, the applications of flash
memories have expanded widely in recent years, such as cell phones, portable media
players, digital cameras, and in the latest netbooks, tablets, and e-book readers, it is
also being utilized widely by the video gaming device industry, which make it become
the dominating member in the family of non-volatile memories [26]. It is expected
that the world wide flash memory market will reach $51.2 billion by 2015, and then
constitute a 12 percent share of the total semiconductor market.
1
1.1.1 Flash Cell Structure
The basic storage unit in a flash memory is a floating-gate transistor [9]. We
also call it a cell. Each memory cell resembles a standard MOSFET, except the
transistor has two gates instead of one. On top is the control gate (CG), as in other
MOS transistors, but below this there is a floating gate (FG) insulated all around
by an oxide layer. The FG is interposed between the CG and the MOSFET channel.
Because the FG is electrically isolated by its insulating layer, any electrons placed
on it are trapped there and, under normal conditions, will not discharge for many
years [42]. Figure 1.1 shows the structure of a cell. The threshold voltage of the
memory cell can be altered by changing the amount of charge present between the
gate and the channel. If no electron is on the floating gate, the threshold voltage is
low, and the transistor is “on” under reading voltage, whereas with injecting many
electrons in the floating gate, the threshold becomes high, and then the transistor is
“off”.
Control Gate
Floating Gate
Source Drain
P-substrate
INTERPOLY OX.
TUNNEL OX.
Figure 1.1: The structure of a flash cell.
The cell level is determined by the amount of charge trapped in the floating gate.
Charge can be injected into the cell using the hot-electron injection mechanism or
the Fowler-Nordheim tunnelling mechanism. The charge can also be removed from
the cell using the Fowler-Nordheim tunnelling mechanism. By checking whether the
2
transistor is “on” or “off” in a single level cell (SLC), it can represent one bit of
information. In a multi-level cell (MLC), which stores more than one bit per cell,
the amount of current flow through the transistor is sensed (rather than simply its
presence or absence), in order to determine more precisely the level of charge on the
floating gate. q-level cell can store log2 q bits.
1.1.2 NOR and NAND Flash
There are two main types of flash memories: NOR flash and NAND flash.
In NOR gate flash, each cell has one end connected directly to ground, and the
other end connected directly to a bit line. Therefore, it allows random access to its
cells. NOR flash is commonly used in embedded applications requiring a discrete
non-volatile memory device, such as mobile phones. NOR flash is faster, but it’s also
more expensive.
A NAND flash partitions every block into multiple sections called pages, and a
page is the unit of a read or write operation. Compared to NOR flash, NAND flash
has the advantage of higher cell density. However, it may be much more restrictive
on how its pages can be programmed, such as allowing a page to be programmed
only a few times before erasure [18]. NAND flash has found a market in devices to
which large files are frequently uploaded and replaced, such as MP3 players, digital
cameras and USB drives.
1.1.3 Basic Operations in Flash Memory
Flash memory has three basic operations.
1.1.3.1 Reading
The read operation is performed by applying to the cell a gate voltage that senses
the current flowing through the device. In NOR type flash memory, each cell’s level
3
can be read individually, where in NAND type flash memory, the cells connected in
series must be read in series. The read operation is easy and fast in both types of
flash memory.
1.1.3.2 Writing/Programming
When programming, charge is injected into the cell using the hot-electronic mech-
anism or Fowler-Nordheim tunneling mechanism by applying an appropriate voltage
to the control gate. The cell of both NOR and NAND can be programmed individ-
ually and the process is easy and fast [9].
1.1.3.3 Erasing
A prominent property of flash memories is block erasure. Cells in a flash memory
are organized into blocks, with each block containing 105 or so cells. The state of a
cell can be raised individually (program operation). But to decrease a cell level, the
flash memory needs to erase the whole block (i.e., lowering the states of all the cells
to 0) and then re-program all the cells. Such an operation is called block erasure.
A very large voltage of the opposite polarity is applied between the control gate,
pulling the electrons off the floating gate through quantum tunneling to erase the
whole block, which is slow and energy-intensive. The block erasure operation not
only significantly reduces speed, but also reduces the lifetime of the flash memory.
This is because a block can only endure about 104 ∼ 106 erasures, after which the
block may break down. Since the breaking down of a single block can make the
whole memroy stop working, it is important to balance the erasures performed to
different blocks.
4
1.2 Challenges of Flash Memories
As mentioned in the previous section, block erasure is a fairly violent process.
Every time the system erase a block, it slightly damages the insulating barriers.
Usually, the lifetime of flash memory is 105 erasure cycles. Therefore, block erasures
can substantially reduce the writing speed, reliability and longevity of flash memories.
For storage schemes, it is important to minimize block erasures. Although flash
memory has many advantages such as low cost per bit, high storage density, quick
read and write operations over other non-volatile memories, its interesting feature
of “block erasure” operation makes flash memory face some new challenges, which
require new data representation and coding schemes for efficient and reliable storage
in flash memory.
1.2.1 Accurate Programming without Overshooting
When programming a cell, the charge is injected into the cell, and the injected
charge becomes trapped. The amount of charge in a cell determines its level. Fast and
accurate programming schemes for multi-level flash memory are a topic of significant
research and design efforts. The flash memory does not support charge removal from
individual cells due to block erasure. Overshooting is very costly for programming
because once the injected charge overshoots the target level, the block need to be
erased and then reprogrammed. As a result, in the industry, to program a cell, a
sequence of charge injection operations are used to shift the cell level cautiously and
monotonically toward the target charge level from below, in order to avoid undesired
global erasures in case of overshoots. Thus, the attempt to program a cell requires
quite a few programming cycles.
It is interesting to study a new data representation scheme to avoid the problem of
overshooting while programming cells. In this work, we will present a generalization
5
of rank modulation, called rank modulation with multiplicity, in which different cells
can share the same rank. We focus on the rewriting of data based on this new
scheme, and study its basic properties.
1.2.2 Asymmetric Errors in Flash Memories
Flash memory is a storage medium with asymmetric properties [26]. After cells
are programmed, the data are not error-proof, because the cell levels can be changed
by various errors over time. Some important error sources include write disturb and
read disturb (disturbs caused by writing or reading), as well as leakage of charge
from the cells (called data retention problem). The errors in the cell levels have an
asymmetric distribution in the up and the down directions. Our research of rank
modulation with multiplicity provide a solution to tolerate asymmetric errors better.
1.2.3 Reliable Storage for Low-Power Devices
While the reliability, low cost, and high storage density of flash memory make it
a natural choice for embedded systems [27], its relatively high voltage requirement
introduces challenges for energy-efficient designs aiming to maximize the system’s
effective lifetime(e.g., the run time on a typical battery whose voltage declines over
time). Lowering the common supply voltage would allow the CPU to operate in a
more energy efficient manner, but writes to the flash memory then become unreliable.
How to address the voltage limitations of flash memory and guarantee reliable
flash writes under lower voltage is a prominent topic. In this thesis, we present
software-only coding schemes to enable reliable storage at low voltages without mod-
ifying hardware. It includes three algorithms: in-place writes, multiple-place writes,
and RS-Berger codes.
6
1.2.4 File Recovery for Non-Volatile Memories
Non-volatile memories, especially flash memories have emerged as a crucial tech-
nology for storage systems due to their excellent speed and storage capacity. How-
ever, accompanying the improvement in data density, the reliability issue of non-
volatile memories are attracting more and more attention [23]. File recovery will
be one of the biggest challenges for storage systems. The amount of stored data is
increasing at an explosive rate, but the data are not constantly checked to verify
their reliability. The needed bit-error rate after decoding is 10−20 for storage sys-
tems. However, with the existing flash memories technologies, it cannot be achieved
unless extra long error-correcting codes with substantial redundancy are used, which
is impractical. However, it is not difficult for storage systems to achieve much higher
bit-error rates, such as 10−3.
We are interested in designing a content-based file recovery systems such that as
long as the conventional error-correcting codes can reduced the bit-error/erasure rate
to 10−3 after decoding, our file-recover system can practically recover the original files
completely.
1.3 Contributions of This Work
In this thesis, we address the challenges facing flash memories by three tech-
niques: A new data representation scheme for flash memories called rank modulation
with multiplicity to eliminate overshooting and charge leakage problems; Half-Wits,
a set of algorithms to enable reliable writes to flash memories while coping with
low voltage; Content-assisted file decoding algorithms to make data storage in flash
memories reliable. In the following, we introduce the three topics.
7
1.3.1 Rank Modulation with Multiplicity
Rank modulation, a new data representation scheme, is proposed to eliminate
both the problem of overshooting while programming cells and the problem of mem-
ory endurance in aging devices [27]. This work proposes a generalization of rank
modulation, called rank modulation with multiplicity, in which different cells can
share the same rank.
We focus on the rewriting of data based on this new scheme. We study its basic
properties, including the rewriting cost, optimal ways to change rank modulation
states, and the expansion of rank modulation states given the rewriting cost. We
consider two rewriting cost: unweighted and weighted rewriting cost and describe
the analysis respectively. This work has been published in ACTEMT [30].
1.3.2 Half-Wits: Software Techniques for Embedded Flash Storage at Low Voltages
This work analyzes the stochastic behavior of writing to embedded flash memory
at voltages lower than recommended by a microcontroller’s specifications to reduce
energy consumption. Flash memory integrated within a microcontroller typically
requires the entire chip to operate on a common supply voltage almost double what
the CPU portion requires. Our approach tolerates a lower supply voltage so that
the CPU may operate in a more energy efficient manner. Energy efficient coding
algorithms then cope with flash memory that behaves unpredictably.
The software-only coding algirhtms proposed in this work (in-place writes, multiple-
place writes, RS-Berger codes) enable reliable storage at low voltages on unmodified
hardware by exploiting the electrically cumulative nature of half-written data in
write-once bits. For a sensor monitoring application using the MSP430, coding with
in-place writes reduces the overall energy consumption by 34%. In-place writes are
competitive when the time spent on low-voltage operations such as computation are
8
at least four times greater than the time spent on writes to flash memory. The evalu-
ation of the proposed schemes shows that tightly maintaining the digital abstraction
for storage in embedded flash memory comes at a significant cost to energy con-
sumption with minimal gain in reliability. This work has been published in USENIX
FAST [47].
1.3.3 Content-Assisted File Decoding for Non-Volatile Memories
To address the file recovery problem for data storage in non-volatile memories
such as flash memories, we propose a content-assisted decoding (CAD) method for
erasures recovery, which can be combined with existing storage solutions for text
files. We preload the dictionaries that include the frequency information of words
and phrases in the text of a given language. Thanks to the random and fast access
features in flash memories, our proposed decoder gets the statistical information from
the dictionaries quickly, then split the whole input noisy codeword into small pieces
of subcodewords. Each subcodeword can be decoded into a word in the text, and
the whole noisy codeword is recovered to form a most likely word sequences.
The CAD is modelled as a solution to an optimization problem, which mainly
includes two parts: (1) segment the whole noisy codeword into a sequence of sub-
codewords and each subcodeword has a set of candidate words to decode; (2) choose
the most likely word in the candidate word set for each subcodeword to form the
most likely word sequence, which is a recovery for the original text file. Each part is
also defined as an optimization problem and the dynamic programming algorithms
are designed to get the solutions. The evaluation of the proposed methods with a
set of benchmark files shows CAD can provide better erasure recovery capacity than
the traditional ECC. This work has been published in [34].
9
1.4 Related Works
With the increasing importance of flash memories, numerous research work and
accomplishments in flash memories have been published. The work in this thesis
relates to a number of important research areas. They include rank modulation for
flash memories, storage for low-power embedded devices, as well as error-correcting
codes for flash memories and embedded systems.
1.4.1 Rank Modulation for Flash Memories
Rank modulation is a scheme that uses the relative order of cells, instead their
absolute values, to represent data. It is first proposed and studies in [27, 28]. In
addition to rewriting [27] and error correction [28], a family of Gray codes for rank
modulation are also presented in [27]. A drawback to the rank-modulation scheme is
the need for a large number of comparisons when reading the induced permutation
from a set of n cell-charge levels. Instead, in a recent work [57], the n cells are locally
viewed through a sliding window resulting in a sequence of small permutations which
require less comparisons. Based on [57], gray codes are studied for the local rank
modulation scheme in order to simulate conventional multi-level flash cells while
retaining the benefits of rank modulation in [49, 15, 16].
The encoding approach for rank modulation in [27, 28] is based on the “push to
the top” operation, which raises the charge level of a single cell above the rest of the
cells. It is a good scheme that speeds up cell programming by eliminating the over
shooting problems. However, it is not optimal in terms of minimizing the increase
of cell levels. Gad presents a “minimal-push-up” operation model and proposes a
compressed encoding for rank modulation in [17].
An extension for rank modulation is to use permutations of a given multiset
to represent data. A series of papers [32, 31] discuss the permutation array under
10
the Chebysheve distance. Decoding algorithms for permutation arrays are proposed
in [55, 51, 33] and the capacity of a new WAM code based on permutation arrays is
studied in [14].
1.4.2 Storage for Low-Power Embedded Devices
Recent research focuses on optimizing use of off-chip flash memory. Off-chip mem-
ory allows for special features and larger memories than found on microcontrollers,
but introduces additional costs for components. Microhash [58] is a memory index
structure tailored for sensor devices with a large external flash memory. Mathur [39]
perform an extensive study of available flash memory candidates for sensor devices
and demonstrate that an off-chip parallel NAND flash memory decreases the en-
ergy consumption of storage. Considering the off-chip NAND flash memory as the
best candidates for sensor devicse, Agrawal [3] proposes a method that allows sensor
devices to exploit their flash memory while adapting to different amount of RAM.
However, our storage schemes are designed for already deployed low-power devices
that use on-chip flash memory. Moreover, while devices at the scale of sensor nodes
might switch to block-grained, large off-chip flash memory, RFID-scale platforms
might not benefit from this transition because of their challenging resource limita-
tions to drive I/O.
1.4.3 Error-Correcting Codes for Storage
An error-correcting code is a system of adding redundant data, or parity data,
to a message, such that it can be recovered by a receiver even when a number of
errors (up to the capacity of the code being used) are introduced. Error-correcting
codes are frequently used in reliable storage in media such as CDs, DVDs, hard disks
and flash memories [35]. ECCs are usually categorized into convolutional codes and
block codes. Convolutional codes work on bit or symbol streams of arbitrary length,
11
while block codes are processed on fixed-size blocks. Examples of block codes are
repetition codes, Hamming codes, Reed-Solomon codes [44], turbo codes [7] and low-
density parity-check codes (LDPC) [19, 36]. Those codes are widely used in binary
symmetric channel (BSC) and not considering the asymmetric property.
With the rapid development of flash memories, there have been many research on
error correction for flash memories. Most previously published flash error correction
codes [11, 13, 22] are designed for NAND flash memory. Chen [12] mentions that
NOR flash normally does not require error correction. The errors in flash cell levels
often have an asymmetric property. These techniques consider neither the asym-
metry in flash memory nor the resource limitations of low-power embedded devices.
In [10], error-correcting codes that correct asymmetric errors of limited magnitude
are designed and in [61], ECCs that correct different numbers of asymmetric errors
depending on the codewords’ Hamming weights are described for flash memories.
Jiang [25] suggests error-correcting codes for multi-level cells (MLC) flash memory
that cope well with the WOM property of flash memory and Zhou [60] discusses
solutions by selecting dynamic reading thresholds to reduce the asymmetric errors
due to voltage or resistance drift in flash memory.
Many previous codes leverage the fact that each cell of MLC flash memory rep-
resents more than one bit of information. But the fact that single-level cells (SLC)
are more suitable for embedded devices, in addition to the occurrence of errors in
low-voltage conditions, requires a reconsideration of these codes for SLCs at low
voltage. Zemor [59] introduces error-correcting WOM codes for flash memory. They
suggest codes that are able to correct up to one error when the flash memory is given
enough voltage. This work does not account for errors that occur at low voltage.
Godard [20] proposes hierarchical code correction and reliability management for
NOR flash memory. This work considers on-chip ECCs such as Hamming codes to
12
correct the errors in NOR flash memory.
The rest of the thesis is organized as follows. Chapter 2 describes a new data
representation scheme: rank modulation with multiplicity for flash memories. Chap-
ter 3 focuses on software techniques for reliable embedded flash storage under low
voltage. Chapter 4 presents content-assisted decoding algorithms for file recovery
in non-volatile memories. The thesis closes in Chapter 5, where general concluding
remarks and recommendations for future work are presented.
13
2. RANK MODULATION WITH MULTIPLICITY
In this chapter, we present a novel data representation scheme for multilevel
flash memory cells—rank modulation with multiplicity—in which a set of n cells
stores information according to their charge levels’ relative order and multiple cells
can share the same rank. We focus on the rewriting of data based on the new
rank modulation scheme. We study its basic properties, including the rewriting
cost, optimal ways to change rank-modulation states, and the expansion of rank
modulation states given the rewriting cost.
2.1 Introduction
Flash memory is a dominant nonvolatile memory technology and a prominent
candidate to replace the well-established magnetic recording technology in the near
future due to its properties of high reliability and storage density, as well as relative
low cost. A prominent property of flash memories is that although it is easy to
increase a cell level, to decrease any cell level, a whole block of cells have to be erased
and reprogrammed, which is very costly. Therefore, fast and accurate programming
schemes for multilevel flash memories are a topic of significant research and design
efforts. The programming cycle sequence is designed to cautiously approach the
target charge level from below so as to avoid undesired global erases in case of
overshoots. Consequently, these attempts still require many programming cycles,
and they work only up to a moderate number of levels per cell. Besides of the need
for accurate programming, another problem for multilevel flash cells is errors that
originate from low memory endurance [9], by which a drift of threshold levels in
aging devices may cause programming and read errors. To minimize the number of
expensive block erasure operations caused by overshooting and to maintain the data
14
integrity, a new data representation scheme is needed for flash memories.
2.1.1 Rank Modulation for Flash Memories
Rank modulation is a scheme that uses the relative order of cell levels to represent
data. Consider n cells c1, c2, . . . , cn whose levels are `1, `2, . . . , `n, respectively, where
`i 6= `j when i 6= j. Let (a1, a2, . . . , an) be a permutation of the set {1, 2, . . . , n},
such that `a1 > `a2 > · · · > `an . Then for 1 ≤ i ≤ n, the cell cai has the i-th highest
level and is said to have rank i. The rank modulation scheme uses the ranks of cells
(instead of the real values of the cell levels) to represent data; namely, the information
bits are mapped to the permutation (a1, a2, . . . , an) [27]. In this way, no discrete cell
levels are needed and only a basic charge-comparing operation is required to read
the permutation. Rank modulation can make it simpler and more robust to program
flash memory cells, where the cell levels are only allowed to monotonically increase
during the programming process. Besides, it eliminates the overshooting problem in
flash memory and reduces corruption due to retention.
2.1.2 Existing Codes for Rank Modulation
Rank modulation is first proposed in [27], in which balanced Gray codes are
constructed. They also investigate rewriting schemes for random data modification
and present both an optimal scheme for the worst case rewrite performance and an
approximation scheme for the average-case rewrite performance [27].
Error-correcting codes are very important for rank modulation, and they have at-
tracted interest among researchers. There have been some results on error-correcting
codes for rank modulation equipped with the Kendall’s τ -distance. In [29], an one-
error-correcting code is constructed based on metric embedding, whose size is prov-
ably within half of the optimal size. In [5], the capacity of rank modulation codes is
derived for the full range of minimum distance between codewords. There has also
15
been some work on error-correcting codes for rank modulation equipped with the L∞
distance [54, 50]. The distance metric is more appropriate for cells where the noise in
cell levels has limited magnitudes, called limited-magnitude rank-modulation codes.
Some optimal codes for limited-magnitude errors are presented in [54, 50]. The sys-
tematic error-correcting codes for rank modulation is explored and proved to achieve
the same capacity as general error-correcting codes in [62].
2.1.3 Rank Modulation’s Drawback
Although rank modulation scheme is able to eliminate both the problem of over-
shooting while programming cells, and the problem of memory endurance in aging
devices, it makes sacrifice of reducing the storage capacity. n cells with q levels can
represent at most log2 q
n information bits with the concrete cell levels representation
schemes, however, using rank modulation scheme, it can only store at most log2 n!
bits.
2.1.4 Rank Modulation with Multiplicity
In order to improve the storage capacity of rank modulation, we study an ex-
tension of rank modulation, where multiple cells can have the same rank. The
general idea is that we see cells of similar levels as having the same rank, and
see cells of sufficiently different levels as having different ranks. There are nat-
urally various ways to define the similarity of cell levels, including the following
one. Let ∆ and δ be two parameters, where ∆ ≥ δ ≥ 0. For n cells whose levels
can be ordered as `a1 ≥ `a2 ≥ · · · ≥ `an , we require that for 1 ≤ i < n, either
`ai − `ai+1 ≤ δ or `ai − `ai+1 > ∆. Then for 1 ≤ i < n, if `ai − `ai+1 ≤ δ, we
say the cells cai and cai+1 have the same rank ; if `ai − `ai+1 > ∆, we say they have
different ranks. For example, assume δ = 0.2, ∆ = 0.5, n = 8 and (`1, . . . , `8) =
(0.8, 2.2, 1.56, 0.21, 0.2, 2.1, 1.35, 1.38). Then (a1, . . . , a8) = (2, 6, 3, 8, 7, 1, 4, 5),
16
and (`a1 , . . . , `a8) = (2.2, 2.1, 1.56, 1.38, 1.35, 0.8, 0.21, 0.2); so the cells c2, c6 have
rank 1, c3, c8, c7 have rank 2, c1 has rank 3, and c4, c5 have rank 4. (We may further
bound the maximum difference between the levels of the cells of the same rank.) Here
the parameter ∆ ensures the cell levels for different ranks are sufficiently apart so that
they can tolerate noise better, and δ is chosen appropriately so that the cell levels
for the same rank can be programmed successfully with high probability. Allowing
cells to have the same rank can help achieve higher storage capacity. And since the
gap between the cell levels of different ranks does not have a specific required value
– in particular it is not upper bounded – the cells can still be programmed easily
without the risk of charge overshooting (as long as the cell levels of each individual
rank are programmed well.) We can use the same low-rank-to-high-rank method to
program cells as in [27]. Note that when δ = ∆ = 0, as no two cells can practically
have exactly the same level, the scheme is reduced to the original rank modulation
where every cell has a distinct rank [27].
Let
Sn ={(s1, s2, . . . , sk) | 1 ≤ k ≤ n; si ⊆ {1, 2, . . . , n} and |si| ≥ 1 for 1 ≤ i ≤ k;
∪ki=1 si = {1, . . . , n}; si ∩ sj = ∅ for i 6= j}
Every element (s1, s2, . . . , sk) in Sn is a partition of the set {1, 2, . . . , n}. We use
(s1, s2, . . . , sk) to denote the cells’ ranks, where for 1 ≤ i ≤ k, the cells with in-
dices in si have the rank i. (For the previous example, we have (s1, s2, . . . , sk) =
({2, 6}, {3, 7, 8}, {1}, {4, 5}).) The data are represented by the elements of Sn. Note
that the difficulty of programming cells varies for the different elements of Sn. It is
simple to program two cells into different ranks since we only need the gap between
their levels to be sufficiently large; but it is more challenging to program cells into
17
the same rank because their levels need to be similar. The more cells share the same
rank, the more difficult it is to program them. In the following, we consider only the
elements of Sn where every rank accommodates at most λ cells; that is, let
Sn,λ = {(s1, s2, . . . , sk) ∈ Sn | ∀ i, |si| ≤ λ}
and we use only the elements of Sn,λ to represent data. The parameter λ determines
the tradeoff between the complexity of cell programming and the storage capacity.
We call the scheme rank modulation with multiplicity λ.
The rank modulation with multiplicity λ uses the elements in Sn,λ, called rank-
modulation states, to represent data. Let L = {0, 1, . . . , L− 1} denote the alphabet
of the stored data. Then there is a surjective map D : Sn,λ → L, such that the
rank-modulation state s = (s1, s2, . . . , sk) ∈ Sn,λ represents the data D(s) ∈ L. The
number of stored information bits, log2 L, can be maximized by letting L = |Sn,λ|;
and by letting L < |Sn,λ|, the cost of rewriting data can be reduced.
Example1. Let n = 3, λ = 2. Then Sn,λ = {({1}, {2}, {3}), ({1}, {3}, {2}), ({2}, {1}, {3}),
({2}, {3}, {1}), ({3}, {1}, {2}), ({3}, {2}, {1}), ({1}, {2, 3}), ({2}, {1, 3}), ({3}, {1, 2}),
({1, 2}, {3}), ({1, 3}, {2}), ({2, 3}, {1})}. So |S3,2| = 12. Up to log2 12 information
bits can be stored.
2.1.5 Storage Capacity Improvement by Rank Modulation with Multiplicity
The general value of |Sn,λ| can be computed by recursion:
|Sn,λ| =
min{n,λ}∑
i=1
(
n
i
)
|Sn−i,λ| for n > 0; and |S0,λ| = 1
We show |Sn,λ| for 2 ≤ n ≤ 16 and λ = 1, 2, 3, 4 in Figure 2.1.
18
Figure 2.1: The value of |Sn,λ| for λ = 1, 2, 3, 4.
The plot for λ = 1 shows the maximum number of symbols it can represent by
using the original rank modulation scheme. When λ = 2, 3, 4, the cardinality of Sn,λ
is increased obviously shown in Figure 2.1.
2.2 Basic Operations
For the rewriting of data, we consider the memory model where the cell levels can
only increase, not decrease. For flash memories, this is the way cells are programmed
via charge injection (without the expensive block erasure operation). Let us define
the basic operation we can use to change the rank-modulation state, in order to
rewrite data. The basic operation is a “push operation”, where we either push a cell
to a higher rank (if there are fewer than λ cells of that rank), or push the cell to
19
the top so that it has a higher rank than all the other n− 1 cells. More specifically,
let s = (s1, s2, . . . , sk) ∈ Sn,λ be a rank-modulation state. For any i, j such that
1 ≤ i < j ≤ k and |si| < λ, if |sj| > 1, with a push operation, we can change s to
(s1, . . . , si ∪ {p}, . . . , sj \ {p}, . . . , sk)
for some p ∈ sj; if |sj| = 1, we can change s to
(s1, . . . , si ∪ {p}, . . . , sj−1, sj+1, . . . , sk)
with p being the only element in sj. And for any i ∈ {1, 2, . . . , k} such that |si| > 1,
we can change s to
({p}, s1, . . . , si \ {p}, . . . , sk)
for some p ∈ si. For any i ∈ {2, 3, . . . , k} such that |si| = 1, we can change s to
({p}, s1, . . . , si−1, si+1, . . . , sk)
with p being the only element in si. (Note that if λ = 1, the push operation here is
reduced to the “push-to-top” operation for the original rank modulation scheme [27].)
2.3 Unweighted Rewriting Cost
For rewriting data, it is desirable to increase the cell levels as little as possible
with each rewrite, so that more rewrites can be performed before the cell levels
reach the maximum limit. (After that, the block erasure will be needed to lower the
cell levels back to the minimum value.) So in this section, we consider the cost of
changing the rank-modulation state from s to s′ as the minimum number of push
operations needed to change s to s′, which we denote by d(s, s′). We call d(s, s′) the
20
unweighted rewriting cost. (A weighted version of the rewriting cost will be studied
in the latter section.) It is not hard to see that
max
s,s′∈Sn,λ
d(s, s′) = n− 1.
An example of s and s′ that achieve this maximum unweighted rewriting cost,
d(s, s′) = n− 1, is s = ({1}, . . . , {i− 1}, {i}, {i+ 1}, . . . , {n}) and s′ = ({1}, . . . , {i−
1}, {i+1}, . . . , {n}, {i}) for some 1 ≤ i < n. (Every cell except ci needs to be pushed
once to change s to s′.)
Given two rank-modulation states s, s′ ∈ Sn,λ, we consider how to compute the
unweighted rewriting cost d(s, s′), and how to change s to s′ with this minimum
number of push operations. For the special case λ = 1, the answer is known [27]:
given s = (s1, s2, . . . , sn) and s
′ = (s′1, s
′
2, . . . , s
′
n), let φ : {1, 2, . . . , n} → {1, 2, . . . , n}
be a bijective map such that for i = 1, 2, . . . , n, we have s′i = sφ(i); let r be the
minimum integer in {1, 2, . . . , n} such that
φ(r + 1) < φ(r + 2) < · · · < φ(n);
then we have d(s, s′) = r, and the way to change the rank-modulation state from s
to s′ with r push operations is to sequentially pushed the cells with their indices in
s′r, s
′
r−1, . . . , s
′
1 to the top.
For the case λ ≥ 2, we use a tool called virtual levels.
Definition 2. Given a rank-modulation state s = (s1, s2, . . . , sk) ∈ Sn,λ, a “real-
ization” of s is a vector (v1, v2, . . . , vn) ∈ Nn that satisfies two conditions: (1) ∀
1 ≤ i ≤ k and j1, j2 ∈ si, we have vj1 = vj2; (2) ∀ 1 ≤ i1 < i2 ≤ k, j1 ∈ si1
and j2 ∈ si2, we have vj1 > vj2. We call vi the “virtual level” of the cell ci, for
21
i = 1, 2, . . . , n.
Definition 3. Let v = (v1, v2, . . . , vn) be a realization of s ∈ Sn,λ, and let v′ =
(v′1, v
′
2, . . . , v
′
n) be a realization of s
′ ∈ Sn,λ. The Hamming distance between v and
v′, denoted by H(v,v′), is H(v,v′) = |{i | 1 ≤ i ≤ n, vi 6= v′i}|. And we say “v′
dominates v” if two conditions are satisfied: (1) for i = 1, 2, . . . , n, we have v′i ≥ vi;
(2) we have {v′i | 1 ≤ i ≤ n, v′i ≤ max1≤j≤n vj} ⊆ {v1, v2, . . . , vn}. We denote “v′
dominates v” by v′ ≥ v.
Lemma 4. Let λ ≥ 2. Let s, s′ ∈ Sn,λ be two rank-modulation states, let v =
(v1, v2, . . . , vn) be a realization of s, and let x be a non-negative integer. Then, s can
be changed into s′ by at most x push operations if and only if there exists a realization
v′ = (v′1, v
′
2, . . . , v
′
n) of s
′ such that v′ ≥ v and H(v,v′) ≤ x.
Proof. First, assume that s can be changed into s′ by y ≤ x push operations. We will
construct a corresponding realization v′ of s′ as follows. Initially, for i = 1, 2, . . . , n,
let v′i = vi. Then for i = 1, 2, . . . , y, if the i-th push operation pushes a cell cj1 to
the same rank as another cell cj2 , then assign to v
′
j1
the value of v′j2 . Otherwise, the
i-th push operation pushes a cell cj to a rank that is higher than all the other n− 1
cells; in this case, let z = max1≤b≤n v′b, and we assign to v
′
j the value z + 1. Then,
let v′ = (v′1, v
′
2, . . . , v
′
n). It is simple to see that v
′ is a realization of s′ and v′ ≥ v.
Since at most y cells are pushed, at least n− y cells have the same virtual levels in
v and v′; so we have H(v,v′) ≤ y ≤ x.
Now consider the other direction. Assume that there exists a realization v′ =
(v′1, v
′
2, . . . , v
′
n) of s
′ such that v′ ≥ v and H(v,v′) ≤ x. We will show how to change
s to s′ with H(v,v′) push operations. We first partition {v′1, v′2, . . . , v′n} into two
22
subsets A and B as follows:
A = {v′i | 1 ≤ i ≤ n, v′i > max
1≤j≤n
vj};
B = {v′i | 1 ≤ i ≤ n, v′i ≤ max
1≤j≤n
vj}.
Since v′ ≥ v, we know that B ⊆ {v1, v2, . . . , vn}. Here B is the set of virtual levels
that are retained when we change s into s′, and A is the set of virtual levels in
v′ that are higher than any virtual level in v. For convenience, we shall denote A
as A = {a1, a2, . . . , a|A|} such that a1 < a2 < · · · < a|A|, and denote B as B =
(b1, b2, . . . , b|B|) such that b1 > b2 > · · · > b|B|.
We change the rank-modulation state from s to s′ as follows. Initially, for i =
1, 2, . . . , n, let the cell ci have the virtual level vi. We will push the cells to higher
virtual levels, and the rank-modulation state – which is determined by the virtual
levels of the n cells – will change accordingly. We push the cells using the following
two steps:
1. For i = 1, 2, . . . , |A|, push the cells in {cj | 1 ≤ j ≤ n, v′j = ai} to the virtual
level ai.
2. For i = 1, 2, . . . , |B|, push the cells in {cj | 1 ≤ j ≤ n, vj < v′j = bi} to the
virtual level bi.
During the above two steps, we will use the following method to make sure that for
i = 1, 2, . . . , |B|, there is always at least one cell of the virtual level bi:
• When we are to push a cell ci from the virtual level j1 ∈ B to j2 > j1, if ci
is the only cell of virtual level j1 at that moment, then before pushing ci, we
first push a cell in {cz | 1 ≤ z ≤ n, v′z = j1} to the virtual level j1. (Note that
23
if that cell is also the only cell of its own virtual level at that moment, then
the same rule applies. So there can be a chain reaction of cell pushing of this
type. But this chain reaction will stop somewhere because the virtual level of
the concerned cell keeps decreasing.)
In the above process, we push every cell at most once.
When the above process ends, the cells have virtual levels (v′1, v
′
2, . . . , v
′
n), which
is a realization of s′. A cell ci (1 ≤ i ≤ n) is pushed if and only if vi 6= v′i; and
if it is pushed, it is pushed directly to the virtual level v′i. So the number of push
operations equals H(v,v′). We now show that these H(v,v′) push operations are
all valid operations for the rank-modulation states. Step 1) consists of the “push-
to-top” operations, and we sequentially push the cells to higher and higher ranks;
clearly, the number of cells at the virtual level ai (for 1 ≤ i ≤ |A|) is never more
than λ at any moment. Step 2) consists of the operations that push a cell to a higher
and existing rank; and since we process the virtual levels b1, b2, . . . , b|B| sequentially
(from high to low), when we process the virtual level bi (for 1 ≤ i ≤ |B|), all the
cells that are originally at level bi have already been pushed up; so as we push cells
from below into the level bi, there will be no more than λ cells in that level. So we
have changed s into s′ with H(v,v′) ≤ x valid push operations.
Theorem5. Let λ ≥ 2. Let s = (s1, s2, . . . , sk) ∈ Sn,λ and s′ = (s′1, s′2, . . . , s′k′) ∈ Sn,λ
be two rank-modulation states, let v = (v1, v2, . . . , vn) be a realization of s, and define
V as V = {u | u is a realization of s′, u ≥ v}. Then we have
d(s, s′) = min
u∈V
H(v,u).
Furthermore, define v′ = (v′1, v
′
2, . . . , v
′
n) as follows:
1. Let hk′ = maxj∈s′
k′
vj.
24
∀ i ∈ s′k′, let v′i = hk′.
2. For i1 = k
′ − 1, k′ − 2, . . . , 1, do:
• If maxj∈s′i1 vj > hi1+1, then let
hi1 = maxj∈s′i1 vj;
if maxj∈s′i1 vj ≤ hi1+1 < max1≤j≤n vj, then let
hi1 = min{vj | 1 ≤ j ≤ n, vj > hi1+1};
if maxj∈s′i1 vj ≤ hi1+1 and hi1+1 ≥ max1≤j≤n vj, then let
hi1 = hi1+1 + 1.
• ∀ i2 ∈ s′i1, let v′i2 = hi1.
Then we have
v′ ∈ V and H(v,v′) = min
u∈V
H(v,u)
Proof. Lemma 4 leads to d(s, s′) = minu∈V H(v,u). When we assign values to
(v′1, v
′
2, . . . , v
′
n) (which are virtual levels for the n cells corresponding to the rank-
modulation state s′), we are sequentially assigning virtual levels to the cells with
indices in s′k′ , s
′
k′−1, . . . , s
′
1; and for i = k
′, k′ − 1, . . . , 1, we give the cells with indices
in s′i a virtual level that is as small as possible, as long as the condition v
′ ∈ V is
satisfied. A proof by induction can show that compared to all the realizations of s′
in V , here each hi (1 ≤ i ≤ k′) – and therefore each virtual level v′i (1 ≤ i ≤ n)
– is individually minimized, and a cell is pushed only when necessary. (Since the
cells are pushed only upward, minimizing hi is a greedy and optimal approach for
minimizing hi−1, hi−2, . . . , h1 and for minimizing the number of cells that need to be
pushed.) So H(v,v′) = minu∈V H(v,u).
Theorem 5 shows how to find the realization v′ for s′ such that v′ dominates v
(the realization of s) and H(v,v′) = d(s, s′). The proof of Lemma 4 shows given such
25
a realization v, how to change the rank-modulation state from s to s′ with d(s, s′)
push operations. By combining them, we can not only compute d(s, s′), but also
transform s to s′ with the minimum unweighted rewriting cost. We show an example
below.
Example 6. Suppose λ = 2, n = 8, s = ({2, 3}, {7}, {4, 5}, {1, 8}, {6}), s′ =
({2, 3}, {4}, {1}, {7, 8}, {5, 6}). We let v = (2, 5, 5, 3, 3, 1, 4, 2) be a realization of s.
(See Figure 2.2.) Then by Theorem 5, we get the realization v′ = (5, 7, 7, 6, 3, 3, 4, 4)
of s′. (It can be seen that v′ ≥ v.) So we get d(s, s′) = H(v,v′) = 6. Then by the
steps specified in the proof of Lemma 4, we get the 6 push operations that change
s into s′. (See Figure 2.2, where the push operations are shown as arrows, and the
numbers beside arrows represent their order.)
2.4 Sizes of Spheres
For a rank-modulation state s ∈ Sn,λ and an unweighted rewriting cost r ≥ 0, we
define the sphere of unweighted radius r centered at s as
θ(s, r) , {u ∈ Sn,λ | d(s,u) = r}
and define the ball of unweighted radius r centered at s as
β(s, r) , {u ∈ Sn,λ | d(s,u) ≤ r}
Clearly, |β(s, r)| = ∑ri=0 |θ(s, i)|. Knowing the sizes of spheres and balls is useful for
analyzing the performance of rewriting. For example, when the states in Sn,λ are
used to represent data of the alphabet L, if the rank-modulation state is currently
s ∈ Sn,λ, for the next rewrite, the unweighted rewriting cost in the worst case is at
26
Figure 2.2: Change rank-modulation state from s to s′ with d(s, s′) pushes.
least min{r | r ≥ 0, |β(s, r)| ≥ |L|}.
We show how to compute |θ(s, r)| for s ∈ Sn,λ and 0 ≤ r ≤ n − 1. If λ = 1, we
have
|θ(s, r)| = n!
(n− r)! −
n!
(n− r + 1)!
for 1 ≤ r ≤ n − 1 and |θ(s, 0)| = 1 [27]. So in the following, we consider λ ≥ 2.
Fix a realization v = (v1, v2, . . . , vn) for s = (s1, s2, . . . , sk) – say the realization
where the n cells have virtual levels from 1 to k – and we see that for any s′ ∈ Sn,λ,
Theorem 5 finds a unique realization v′ = (v′1, v
′
2, . . . , v
′
n) for s
′ such that v′ ≥ v,
H(v,v′) = d(s, s′) and every virtual level v′i (1 ≤ i ≤ n) is minimized. So to compute
|θ(s, r)|, the number of states in the sphere θ(s, r), we can equivalently compute the
number of such unique realizations (of the states in θ(s, r)), because they have a one
to one correspondence.
27
Let σ1, σ2, . . . , σκ and X be κ+1 mutually disjoint sets of cells, where 1 ≤ |σi| ≤ λ
for 1 ≤ i ≤ κ and |X| = x ∈ {0, 1, . . . , n− 1}. For i = 1, . . . , κ, we assign the virtual
level κ + 1 − i to the cells in the set σi. Let δ ∈ {0, 1, . . . , n − 1}, t ∈ {1, 2, . . . , λ},
γ ∈ {x, x+ 1, . . . , n− 1} and tag ∈ {0, 1} be given parameters. Let R denote the set
of realizations (that is, assignments of virtual levels to the x +
∑κ
i=1 |σi| cells) that
we can change this current realization into, given the following constraints:
1. We obtain a realization in R by pushing γ − x cells in ∪κi=1σi to higher virtual
levels, and by assigning the x cells in X to the virtual levels between 1 and
κ+ δ. Every cell is pushed or assigned at most once. For the realization in R,
every virtual level has at most λ cells.
2. For a realization in R, the maximum virtual level that has a cell is level κ+ δ,
and exactly t cells are in that virtual level κ+ δ.
3. For a realization inR, if a cell in ∪κi=1σi is pushed to a level j ∈ {2, 3, . . . , κ+δ},
or if a cell inX is assigned to a level j ∈ {2, 3, . . . , κ+δ}, then for this realization
in R, either some cell is in the virtual level j − 1, or 2 ≤ j ≤ κ and some cell
in σκ+1−j is in the level j.
4. If tag = 1, then no cell in X can be assigned to the virtual level 1 unless for
this realization in R, some cell in σκ is in the virtual level 1.
We use f(|σ1| , |σ2| , . . . , |σκ| ;x; δ; t; γ; tag) to denote the cardinality of R. We
can see that the sphere size
|θ(s, r)| =
r∑
δ=0
λ∑
t=1
f(|s1| , |s2| , . . . , |sk| ; 0; δ; t; r; 0).
We show how to use recursion to compute the value of f(|σ1| , |σ2| , . . . , |σκ| ;x; δ; t; γ; tag).
For simplicity, we only introduce the main recursion, and skip introducing the values
28
of f(|σ1| , |σ2| , . . . , |σκ| ;x; δ; t; γ; tag) for the boundary cases. (The boundary values
can be obtained easily.)
To change the given realization to a realization in R, say that we push y1 cells
in σκ to the maximum virtual level k + δ, push y2 cells in σκ to the virtual levels
2, 3, . . . , k + δ − 1, and assign y3 cells in X to the virtual level 1. Note that once
y1, y2, y3 are fixed, the number of cells in level 1 becomes fixed, and we do not need
to consider it furthermore. So we get the recursion:
• If tag = 0, then let
P1 ,{(y1, y2, y3) ∈ Z3 | 0 ≤ y1 < t; 0 ≤ y2 ≤ |σκ| ; 0 ≤ y3 ≤ min{x, λ− |σκ|+ y1 + y2};
either “y1 + y2 < |σκ| ” or “y1 + y2 = |σκ| and y3 > 0”},
P2 ,{(y1, y2, y3) ∈ Z3 | 0 ≤ y1 ≤ min{t− 1, |σκ|}; y2 = |σκ| − y1; y3 = 0},
P3 ,{(y1, y2, y3) ∈ Z3 | y1 = t; 0 ≤ y2 ≤ |σκ| ; 0 ≤ y3 ≤ min{x, λ− |σκ|+ y1 + y2};
either “y1 + y2 < |σκ| ” or “y1 + y2 = |σκ| and y3 > 0”},
P4 ,{(y1, y2, y3) ∈ Z3 | y1 = t; y2 = |σκ| − t ≥ 0; y3 = 0}
If tag = 1, then let
P1 ,{(y1, y2, y3) ∈ Z3 | 0 ≤ y1 < t; 0 ≤ y2 < |σκ| − y1; 0 ≤ y3 ≤ min{x, λ− |σκ|+ y1 + y2}},
P2 ,{(y1, y2, y3) ∈ Z3 | 0 ≤ y1 ≤ min{t− 1, |σκ|}; y2 = |σκ| − y1; y3 = 0},
P3 ,{(y1, y2, y3) ∈ Z3 | y1 = t; 0 ≤ y2 < |σκ| − t; 0 ≤ y3 ≤ min{x, λ− |σκ|+ y1 + y2}},
P4 ,{(y1, y2, y3) ∈ Z3 | y1 = t; y2 = |σκ| − t ≥ 0; y3 = 0}
29
• We have
f(|σ1| , |σ2| , . . . , |σκ| ;x; δ; t; γ; tag) =
∑
(y1,y2,y3)∈P1
(|σκ|
y1
)(|σκ|−y1
y2
)(
x
y3
)
f(|σ1| , |σ2| ,
· · · , |σκ−1| ; x+y2−y3; δ; t−y1; γ−y1−y3; 0) +
∑
(y1,y2,y3)∈P2
(|σκ|
y1
)(|σκ|−y1
y2
) (
x
y3
)
f(|σ1| , |σ2| , · · · , |σκ−1| ;x+ y2; δ; t− y1; γ − y1; 1) +
∑
(y1,y2,y3)∈P3
(|σκ|
y1
)(|σκ|−y1
y2
)
(
x
y3
)∑
1≤z≤λ f(|σ1| , |σ2| , · · · , |σκ−1| ;x+y2−y3; δ−1; z; γ−y1−y3; 0)+
∑
(y1,y2,y3)∈P4(|σκ|
y1
)(|σκ|−y1
y2
)(
x
y3
)∑
1≤z≤λ f(|σ1| , |σ2| , · · · , |σκ−1| ;x+ y2; δ − 1; z; γ − y1; 1).
Given any s ∈ Sn,λ and r ≤ n− 1, the time complexity of computing the sphere
size |θ(s, r)| using the above recursion is O(n4λ5).
Theorem 7. The above recursion correctly computes |θ(s, r)|.
Proof. In order to compute |θ(s, r)|, we enumerate all the possible ways to push r
cells by the recursion. The recursion is processed in the order of cells’ virtual levels:
the cells in σκ+1−i, 1 ≤ i ≤ κ with virtual level i are processed in the i-th round. If
a cell is pushed, it is only pushed once. We can get four disjoint cases after pushing
y1 + y2 cells in σκ and assigning y3 cells in X to some virtual levels:
1. P1: σκ is not empty, and at least one cell with the maximum virtual level k+ δ
is not coming from σκ.
2. P2: σκ is empty (no cell is assigned to the same virtual level as cells in σκ
originally have), and at least one cell with the maximum virtual level k + δ is
not coming from σκ.
3. P3: σκ is not empty, and all the cells with the maximum virtual level k+ δ are
coming from σκ.
4. P4: σκ is empty (no cell is assigned to the same virtual level as cells in σκ
originally have), and all the cells with the maximum virtual level k + δ are
coming from σκ.
30
For each case, we set the appropriate parameters in the above recursive function.
Since it covers all the possible ways to push r cells, the result from the recursion is
no less than |θ(s, r)|.
On the other side, we need to prove all the ways to push cells getting from the
recursion are unique and valid. It is easily to see that all the push operation sequences
are unique because the above four cases are disjoint and each cell is processed only
once. In order to prove they are valid, we need to show 1) at each step, no more
than λ cells are with the same virtual level; 2) If a cell is pushed, it is pushed to the
smallest virtual level. The parameters’ upper bounds in P1, P2, P3, P4 as well as the
recursive function guarantee that the number of cells at each virtual level is never
more than λ at any moment. The parameter tag is used to assign each cell to the
smallest virtual level in its realization. For the case P2 and P4, no cell is assigned to
the same virtual level as cells in σκ originally have in the realization, we label tag to
1. Then during the next round recursion, (when we deal with the cells in σκ−1), no
cell in X can be assigned to the virtual level as cells in σκ−1 originally have, unless in
this realization, some cells in σκ−1 are not pushed up. Otherwise, the cells in X can
be assigned to virtual level as cells in σκ originally have, which is lower than virtual
levels in σκ−1, when we deal with the cells in σκ to get the state s′. Therefore, the
tag parameter guarantees that each cell is assigned to the smallest virtual level to
reach state s′, such that d(s, s′) = r.
2.5 Weighted Rewriting Cost
We have studied the unweighted rewriting cost, where every push operation is
considered to have cost one. In practice, however, the operations can have different
cost values: a push operation that increases the cell level less is more preferable than
a push operation that increases the cell level more. So in this section, we present
31
the definition of weighted rewriting cost, which measures the cost of push operations
based on how much they increase the cell levels.
As a combinatorial definition, we use the help of virtual levels. Let s = (s1, s2, . . . , sk) ∈
Sn,λ and s′ ∈ Sn,λ be two rank-modulation states. Let v = (v1, v2, . . . , vn) be
the unique realization of s such that {v1, v2, . . . , vn} = {1, 2, . . . , k}. Let V ,
{u | u is a realization of s′, u ≥ v}. By the previous analysis, we know that a
sequence of push operations that changes the rank-modulation state from s to s′ also
changes the realization from v to some u ∈ V (and vice versa). Virtual levels are a
reasonable simplification of real cell levels. So we define the weighted rewriting cost
of changing s into s′ as
w(s, s′) = min
(u1,u2,...,un)∈V
n∑
i=1
(ui − vi).
Let v′ = (v′1, v
′
2, . . . , v
′
n) be the unique realization of s
′ that is generated by Theorem 5.
It has been shown that v′ minimizes the virtual level of every cell; so we have
w(s, s′) =
n∑
i=1
(v′i − vi) =
n∑
i=1
min
(u1,...,un)∈V
(ui − vi).
And it is not hard to see that
max
s,s′∈Sn,λ
w(s, s′) = n(n− 1).
Given a state s ∈ Sn,λ and an integer r ≥ 0, we can define the sphere of weighted
radius r centered at s as
Θ(s, r) , {u ∈ Sn,λ | w(s,u) = r}
32
The sphere size, |Θ(s, r)|, can be computed with a similar recursion as the one in the
previous section.
Like the previous section, we also define σ1, σ2, . . . , σκ and X to be κ+1 mutually
disjoint sets of cells, where 1 ≤ |σi| ≤ λ for 1 ≤ i ≤ κ and |X| = x ∈ {0, 1, . . . , n−1}.
And δ ∈ {0, 1, . . . , n − 1}, t ∈ {1, 2, . . . , λ}, γ ∈ {x, x + 1, . . . , n − 1} and tag ∈
{0, 1} are also the given parameters. Let R denote the set of realizations (that is,
assignments of virtual levels to the x+
∑κ
i=1 |σi| cells) that we can change this current
realization into, given the following constraints:
1. We obtain a realization in R by pushing some cells in ∪κi=1σi to higher virtual
levels, and by assigning the x cells in X to the virtual levels between 1 and
k + δ, with the summation of cells’ virtual level increased by γ. Every cell is
pushed or assigned at most once. For the realization in R, every virtual level
has at most λ cells.
2. For a realization in R, the maximum virtual level that has a cell is level k+ δ,
and exactly t cells are in that virtual level k + δ.
3. For a realization inR, if a cell in ∪κi=1σi is pushed to a level j ∈ {2, 3, . . . , k+δ},
or if a cell inX is assigned to a level j ∈ {2, 3, . . . , k+δ}, then for this realization
in R, either some cell is in the virtual level j − 1, or 2 ≤ j ≤ κ and some cell
in σκ+1−j is in the level j.
4. If tag = 1, then no cell in X can be assigned to the virtual level 1 unless for
this realization in R, some cell in σκ is in the virtual level 1.
We use g(|σ1| , |σ2| , . . . , |σκ| ;x; δ; t; γ; tag) to denote the cardinality of R. We can
33
see that the sphere size
|Θ(s, r)| =
min{n−1,r}∑
δ=0
λ∑
t=1
g(|s1| , |s2| , . . . , |sk| ; 0; δ; t; r; 0).
We use the same way to change the given realization to a realization in R and
define the set P1, P2, P3, P4 as the previous section. We have
g(|σ1| , |σ2| , . . . , |σκ| ;x; δ; t; γ; tag) =
∑
(y1,y2,y3)∈P1
(|σκ|
y1
)(|σκ|−y1
y2
)(
x
y3
)
g(|σ1| , |σ2| , · · · ,
|σκ−1| ; x+y2−y3; δ; t−y1; γ−y1(δ−1+κ)−y2−x+y3; 0) +
∑
(y1,y2,y3)∈P2
(|σκ|
y1
)(|σκ|−y1
y2
)(
x
y3
)
g(|σ1| , |σ2| , · · · , |σκ−1| ; x+y2; δ; t−y1; γ−y1(δ−1+κ)−y2−x; 1)+
∑
(y1,y2,y3)∈P3
(|σκ|
y1
)
(|σκ|−y1
y2
)(
x
y3
)∑
1≤z≤λ g(|σ1| , |σ2| , · · · , |σκ−1| ; x+ y2 − y3; δ − 1; z; γ − y1(δ − 1 + κ)−
y2−x+y3; 0)+
∑
(y1,y2,y3)∈P4
(|σκ|
y1
)(|σκ|−y1
y2
)(
x
y3
)∑
1≤z≤λ g(|σ1| , |σ2| , · · · , |σκ−1| ;x+y2;
δ − 1; z; γ − y1(δ − 1 + κ)− y2 − x; 1).
Theorem 8. The above recursion correctly computes |Θ(s, r)|.
Proof. The recursion is similar to the unweighted case, except for the parameter γ.
By pushing y1 cells in σκ to virtual level k + δ, y2 cells in σκ to some virtual level
less than k+ δ and assigning y3 cells in X to virtual level k+ 1−κ, the cells’ virtual
level is totally increased by y1(δ − 1 + κ) + y2 + x − y3 or y1(δ − 1 + κ) + y2 + x
(when y3 = 0). Therefore, the parameter γ is set as γ − y1(δ − 1 + κ)− y2 − x+ y3
or γ − y1(δ − 1 + κ) − y2 − x in the recursive function g(·), which is equal to the
remaining weighted distance from the current state to the final state. The other part
is the same as the proof of Theorem 7.
34
3. EXPLOITING HALF-WITS: SMARTER STORAGE FOR LOW-POWER
DEVICES
The high voltage requirements of on-chip flash memory is a barrier to reducing the
total energy consumption of low-power devices. This work examines the main factors
affecting the behavior of flash memory at low voltage. Based on our observations
of flash memory behavior at low voltage, we proposed three storage schemes to
enable reliable storage on flash memory. The first scheme, in-place writes, makes
attempts at write time to store a value correctly in the given memory address. The
second scheme, multiple-place writes, tries to decrease the probability of error by
making attempts at both write time and read time. This method stores data in more
than one location hoping that the data will be stored correctly in at least one of
these locations. The third scheme is a hybrid error-correcting code combining Reed-
Solomon (RS) [44] and Berger [6] codes. The Berger code detects asymmetric errors
caused by the low write voltage. Given the approximate locations of errors, which are
determined by the Berger code, the RS code efficiently recovers the originally stored
data. Our evaluation shows that in-place writes can save 34% of energy consumption
for a sensing workload on the MSP430 microcontroller.
3.1 Storage on Low-Power Devices: Limitations and Challenges
Billions of microcontrollers appear in embedded systems ranging from thermostats
and utility meters to tollway payment transponders and pacemakers. Recently years
have witnessed a proliferation of low-power embedded devices [4, 8, 37], many of
which use on-chip flash memory for storage.
The relatively high voltage requirement of flash memory (Table 3.1) introduces
challenges for energy-efficient designs aiming to maximize the system’s effective life-
35
time. Instrumenting the system to operate at a fixed low voltage vl is one way to
reduce power consumption; however, achieving consistently correct results for flash
writes are guaranteed only if vl is higher than a manufacturer-specified threshold.
Moreover, in energy-limited devices that cannot provide a constant supply voltage,
scenarios may arise in which the flash memory is the only part of the circuit whose
operating requirements are not met. In such cases, applications can expect normal
operation when they are not performing flash writes and unpredictable behavior
when they are.
Table 3.1: CPU vs flash memory voltage requirements
Microcontroller CPU Min. voltage Flash write Min. voltage
TI MSP430 [24] 1.8V 2.2V or 2.7V
PIC32M [40] 2.3V 3.0V
ATmega128L [53] 2.7V 4.5V
Because embedded flash memory typically shares a common voltage supply with
the CPU (separate power rails are cost prohibitive), a single voltage must be chosen
that satisfies different components with different minimum voltage requirements.
Current embedded systems address the voltage limitations of flash memory in one of
the following ways:
1. A system can choose a high supply voltage sufficient for both reliable writes to
flash memory and reliable CPU operation. This is a common choice for embed-
ded systems with on-chip flash memory, but causes the CPU to consume more
energy than necessary. For example, the TI MSP430F2131 microcontroller [24]
36
in active mode consumes almost double the power when operating at 2.2V in-
stead of 1.8V. Its on board flash memory requires 2.2V for reliable writes to
flash memory.
2. A system can choose a low supply voltage sufficient for CPU operation, but
insufficient for reliable writes to flash memory. This choice allows the energy
source to last longer and for the CPU to compute more efficiently. An example
of such a system is the Intel WISP [48], a batteryless RFID tag that sets
its operating voltage to 1.8V—below its onboard flash memory’s 2.2V specified
minimum—to save power. Flash memory cannot be written on this device. The
microcontroller could use a low-power wireless interface (e.g., RF backscatter)
to store data remotely. Such an approach, however, raises privacy as well as
performance concerns [46].
3. A system can modify hardware to enable dynamic voltage scaling. This ap-
proach requires additional analog circuitry such as voltage regulators and GPIO-
controlled switches. Because many embedded systems are extremely cost sen-
sitive, this choice is unattractive for high-volume manufacturing with low per-
unit profit margins. An additional 50 cent part on a thermostat control can be
cost prohibitive. Moreover, small changes may necessitate a new PCB layout—
upsetting the delicate supply chain and invalidating stocked inventories of al-
ready fabricated PCBs.
Approach Our approach reduces the operating voltage of the microcontroller
to a point at which the resulting power savings of the CPU portion of the workload
exceeds the power cost of the algorithms for ensuring reliable writes (Figure 3.1). Our
low-power storage scheme benefits from the accumulative property of flash memory
by repeating writes to the same cell. Each write operation will increase the chance
37
of success by forcing some number of state transitions. That is, a failed write is
still progress. The technique requires minimal or no hardware modification and also
allows for RFID-scale and small-scale energy harvesting devices to better exploit
capacitors as power supplies. The capacitor provides finite energy and therefore the
voltage decays exponentially. The long tail of the curve provides insufficient voltage
for conventional writes to flash memory, but it is sufficient for reliable storage with
our techniques.
Figure 3.1: Operating at a lower voltage and tolerating errors instead of the conven-
tional case of choosing the highest minimum voltage requirement may help decrease
energy consumption. Considering that Energy = voltage2×time/resistance, decreas-
ing voltage decreases the energy consumption quadratically.
Of wits and half-wits In 1982, Rivest and Shamir introduced the notion of
write-once bits (Wits) in the context of coding theory to make write-once storage
behave like read-write storage [45]. Bits in flash memory behave like wits because a
programmed bit cannot be reprogrammed without calling an energy-intensive erase
operation to a block of memory much larger than a single write. We coin the term
Half-Wits to refer to wits used in a manner inconsistent with a manufacturer’s spec-
ifications, resulting in stochastic behavior. Half-Wits in this work are wits of flash
memory used below the recommended supply voltage.
In examining error rates at low voltage and constructing a system that provides
38
reliable storage despite errors, our work suggests that it is appropriate to relax pre-
viously assumed constraints and reexamine the costly digital abstractions layered
above on-chip flash memory.
3.2 Behavior of Storage on Half-Wits
Before we can design effective coding algorithms, we must first understand the
behavior of errors on Half-Wits. By tolerating a lower voltage, an energy-limited
embedded device can decrease its power consumption and therefore extend its lifetime
on a finite energy supply. The minimum operating voltage of embedded devices that
use non-volatile on-chip storage is usually determined by the requirements of flash
memory. For example, the TI MSP430 microcontroller can operate at 1.8V, but its
nominal minimum voltage for flash writing and erasure is 2.2V (Table 3.1). Increasing
operating voltage from 1.8V to 2.2V causes the CPU to draw about 50% more power
without commensurate gain in clock speed because of the voltage squaring effect.
The drawback of lowering voltage below flash memory requirements in order
to save power is the extra work necessary to ensure reliable writes to flash memory.
Figure 3.2 shows the result of running a MSP430F2131 at three different voltages—all
lower than the nominal minimum for flash writes—to store electrocardiogram (ECG)
data samples from the PhysioNet database [21] in flash memory. Many medical sensor
networks [38, 52] that provide ECG measurements are energy-limited and use on-chip
flash memory as primary storage.
39
(a) Writes at 2.0V
(b) Writes at 1.9V
(c) Writes at 1.8V
Figure 3.2: As operating voltage decreases, flash-write errors increase. (a) shows
an original ECG signal correctly stored at 2.0V (despite operating below the rec-
ommended threshold). As the voltage decreases in (b) and further in (c), erroneous
writes (light-colored spikes, height varying according to the magnitude of the error)
become more common. The back line shows the reconstructed signal that includes
the errors.
40
These graphs support the intuition that flash writes may not be error-free at low
voltages and that there exist voltage levels below the minimum recommended voltage
at which flash writes function correctly. To investigate the behavior of flash memory
at low voltage and determine the factors influencing the error rate, we performed
experiments on an automated testbed designed by Salajegheh [47].
3.2.1 Experimental Methodology
We use a consistent experimental setup for all of the experiments in this work.
Our choice of test platform is a TI MSP430 microcontroller with on-chip flash mem-
ory. More specifically, we tested two types of TI microcontrollers: MSP430F2131 and
MSP430F1232. The MSP430 is common in low-power embedded applications; we
note especially its use in sensor motes [43] and RFID-scale batteryless devices [48].
In our setup, an MSP430 microcontroller runs a test program that involves both
computation and flash operation. We power the microcontroller with an external
power supply held steady at a voltage below the nominal minimum for flash writes.
An external chip captures the contents of flash memory after each experiment.
To automate the testing of flash write behavior, we use a flash memory testbed
designed by Salajegheh [47]. The two major components of the testbed are a test
platform and a connected monitoring platform. The monitoring platform is based
on an additional MSP430 microcontroller. The test platform runs a test program at
low voltage. When the test program completes, the test platform sends the result
of the experiment to the monitoring chip via GPIO pins. The test and monitoring
platforms share 8+1 GPIO pins to carry one byte of data and a clock signal. Once the
test platform puts data on its eight data pins, it raises the clock pin. The monitoring
chip reads data from its GPIO pins whenever it detects a rising clock signal and logs
the results in its own flash memory. The monitoring chip runs at a voltage above
41
the nominal minimum for its own flash writes, thereby storing reliably.
3.2.2 Unreliable, Low-Voltage Flash Memory Writes
The TI MSP430 datasheet [24] states that flash writes at any voltage lower than
the nominal minimum voltage (which is 2.2V in the case of MSP430F2131) are not
guaranteed to succeed. However, as the graphs in Figure 3.2 show, not all flash
writes fail at low voltages. On the contrary, in this specific experiment, most of the
writes(95.24% at 1.9V and 89.88% at 1.8V) succeed.
In a NOR flash memory, all cells are initialized to 1, and writing data to a byte of
flash memory means setting an appropriate number of bits to 0 by applying electrical
charge to the corresponding flash cells. At low voltage, there may be insufficient
charge to effect a transition to 0, and a flash write may store fewer 0 bits than
requested [42]. To be specific, we define errors as follows: when a byte of data d1 is
written in a flash memory address and then data d2 is read from that address, there
is an error if d1 6= d2. An experiment, explained next, investigates the behavior of
low-voltage flash memory and gives bit-level results.
Using the automated flash testbed explained in Section 3.2.1, the test platform
runs a program that writes numbers {0, · · · , 255} to flash memory, then sends the
contents of its flash memory to the monitoring platform via GPIO pins. Table 3.2
compares the written data and the intended data for cases in which errors occurred.
It demonstrates that, when both are represented as integers, the absolute value of
the stored data is always greater than or equal to the absolute value of the intended
data.
42
Table 3.2: Erroneous flash writes at low voltage. Insufficient electrical charge may
result in some bits failing to transition from 1 (the initial state) to 0.
Intended 00001100 00001101 00001110 00010100 00100111 10100100
(Binary)
Written 11101101 01011111 11111111 11111111 00101111 10101111
Hamming distance 4 3 5 6 1 3
3.2.3 Determining Factors That Affect Error Rates
We consider the following potential factors that may affect the error rate of setting
a bit to 0 in a flash memory at low voltage: voltage level, Hamming weight of the
data, wear-out history, permutation of 0s, and neighbor cells. The effects of each of
these variables are evaluated by designing an experiment to test a hypothesis. All
the experiments are performed on flash memories with minimal previous usage unless
stated otherwise.
Voltage level: Our hypothesis is that the lower a chip’s operating voltage (and
that of its on-chip flash memory), the higher the error rate of flash writes. Figure 3.3
confirms this hypothesis; moreover, the graph shows that for different chips of exactly
the same type, the error rate can be different even under equivalent voltages.
Experiment: Two MSP430F2131 and two MSP430F1232 microcontrollers run a
program that writes zeros to the data segment of their flash memory. We increased
the microcontroller’s operating voltage in 10-mV steps, and used the monitoring
platform to compute the byte error rates over 50 runs.
43
Figure 3.3: Flash write error rates decrease as voltage increases. This trend holds
for all the chips (MSP430F2131 and MSP430F1232) we tested, though error rates
differ even between chips of the same model.
Hamming weight: In an erased (i.e., having value 1) flash cell, writing a 1 is
always error-free because no change to the cell is necessary. However, setting a cell
to 0 might fail if there is not enough charge accumulated in that cell. Our hypothesis
is that the lower the Hamming weight (number of 1s in the binary representation)
of a number, the higher the probability of error when writing that number to flash
at low voltage.
Based on per-byte Hamming weight, there are nine equivalence classes of integers
that can be represented in one byte. The weight-8 equivalence class has only one
member, 255, which can always be written to an erased flash cell without error.
The other extreme case is the weight-0 equivalence class, containing only 0s, that
requires all eight bits to transition to 0. Figure 3.4 shows the byte error rate for all
nine equivalence classes, measured in the following experiment.
Experiment: At 1.84V, a MSP430F2131 runs a program that writes numbers
from the same equivalence class to one block (64 bytes) of flash memory. We used
the monitoring platform to compute the average byte error rate of flash writes for
44
each of the nine equivalence classes over 50 runs.
Corollary: To exploit the fact that the Hamming weight of a number affects error
rate when written to flash, one can transform numbers into numbers with greater
Hamming weights before writing them to flash memory.
Figure 3.4: As the Hamming weight (number of 1s in the binary representation)
of a number increases, the error rate of low-voltage flash write declines. The data
corresponds to a MSP430F2131 running at 1.84V.
Wear-out history: Flash memory has a limited lifetime (about 105 cycles of
erasures) after which the erase operations fail to reset the bits to 1 reliably. We
suspect that the more flash memory is erased (worn-out), the lower its error rate
of setting bits to 0 would become. This counterintuitive hypothesis is consistent
with the notion that flash erasures (settings bits to 1) become harder with wear-out.
Figure 3.5 shows a heat map of bit error rate for three blocks of flash memory (192
bytes) on an MSP430F2131 microprocessor. Lighter colors in the heat map represent
higher error rates. The disproportionately dark color of the middle block is due to
more frequent erasure of that block compared to the other two blocks.
Experiment: A MSP430F2131 runs a program that writes zeros to all three blocks
of its flash memory. The MSP430 is first worn out such that one block has 6000
45
write/erase cycles and two blocks have minimal previous usage. We used the moni-
toring platform to compute the average error rate for all bits in the three blocks of
memory over 50 trials.
Corollary: Wear-out history affects error rate, so storing data in more than one
location might decrease the error rate, especially if those locations are in different
blocks of memory.
Figure 3.5: Worn-out flash memory blocks are biased toward ease of writing zeros.
Lighter color represents higher average number of error over 50 trials. The middle
block has been write/ease cycles 6000 times. The other two blocks are minimally
used.
Permutation of 0s: Two numbers belonging to the same Hamming weight
equivalence class can have different permutations of 0 bits. We tested to see how
the error rate depends on the permutation of 0s in one byte of data. For example,
the numbers 240, 15, 170, and 71 all have four 0s in their binary representation but
in different places (240 has 0s in the right nibble, and 15 has all of its 0s in its left
nibble, etc.). The result of the experiment shows a similar byte error rate with mean
46
of 39.85 ± 4.29% for numbers in the same equivalence class. The small standard
deviation (4.29%) shows that the permutation of 0s does not significantly affect the
error rate and therefore we do not consider this to be a factor in our design directions.
Experiment: A MSP430F2131 runs a program that cycles through eight numbers
from the same Hamming-weight equivalence class, writing them to 192 consecutive
bytes of flash memory. We used the monitoring platform to compute the average
error rates for each of the 192 bytes over 50 trials.
Neighbor cells: Another factor that might affect the error rate of storage in
a flash cell at low voltage is the values of neighboring cells. However, our results
suggest that a cell’s error rate does not appear to depend on the values stored in
neighboring cells (Figure 3.6).
Experiment: In order to determine if the error rate of a cell is affected by its
neighbor, we consider all numbers from the same Hamming-weight equivalence class
whose two Least Significant Bits (LSBs) are set to either 00 (case 1) or 10 (case 2).
An example of case 1 is number 60 (0b00111100) and an example of case 2 is number
30 (0b00011110). This experiment fixes the Hamming weight variable and changes
the neighbor value of the LSB to be 0 or 1. We deem a write erroneous if the LSB
is not set to 0. The experiment was done for a Hamming weight of four and it was
repeated for five voltage levels in the interval of 1.82V to 1.84V with steps of 5mV.
The error rate for any voltage above 1.84V was close to 0% and for any voltage below
1.82 was close to 100%. We used the monitoring platform to compute the average
error rates of case 1 and case 2 for each of the voltage levels over 50 trials.
47
Figure 3.6: Error rate of a cell is not noticeably influenced by the value of its neighbor.
The graph shows that the value of the second LSB does not greatly affect the error
rate of the LSB. The bars show the error rate of the LSB for writing numbers from
the same Hamming-weight equivalence class whose two LSBs are set to either 00
(dark bars) or to 10 (light bars).
3.2.4 Accumulative Memory Behavior
It is helpful to understand a few details of the electrical nature of flash memory in
order to appreciate the expected behavior of conventional digital abstractions when
layered above embedded flash memory. Each flash memory cell is a floating-gate
(FG) transistor made up of a source, drain, control gate, and floating gate. The
floating gate is separated from the source and drain by an insulating oxide layer that
makes it difficult for electrons to travel into or out of the gate. Flash cells rely on
this oxide to maintain logical state in the absence of power, making the memory
non-volatile [42].
To write a memory cell (which has an erased value of 1), the control circuitry
applies a high field to the source. The application of this field greatly increases the
probability that electrons in the floating gate will tunnel to the source. If a sufficient
number of electrons tunnel to the source, the cell is subsequently read as a 0. To
48
erase a cell (that is to restore a 1), the control circuitry applies a high field to both
the source and drain. This field energizes the electrons currently stored near the
source, allowing them to jump the oxide barrier to the floating gate where they are
once again trapped [42].
Not all electrons must transit in order for a write or erase operation to be success-
ful. The operation only needs to change the state of some majority of the electrons
so that subsequent read operations detect sufficient charge to discern the intended
value. Lowering the applied voltage (and thus the field strength) lowers the proba-
bility of state change for each electron but, as noted earlier, electrons that do transit
will remain in place.
A low-power storage scheme can benefit from this accumulative property by re-
peating writes to the same cell. Each write operation will increase the chance of
success by forcing some number of state transitions. In other words, a failed write is
still progress.
3.3 Design of a Low-Voltage Storage System
This section presents our design for a software system that enables reliable flash
memory writes at low voltage. We first present a model that captures the basic
characteristics and behavior of flash memory. We then set design goals for the model
under consideration. We introduce three methods for reliable flash storage, which we
refer to as in-place writes, multiple-place writes, and RS-Berger codes. Each method
aims to meet our design goals for reliable non-volatile storage.
3.3.1 Modeling Low-Voltage Flash Memory
A NOR flash memory has a set of n cells that are initially set to 1. We represent
the state of the cells by c1, · · · , cn; the value of ci can be 0 or 1. A cell can be set
to 0 using a write operation. The 1 → 0 transition might fail at low voltage while
49
the 1→ 1 will obviously succeed. Flash memory at low voltage, where errors occur
only in one direction, can be modeled as a Z-channel. Flash memory is a write-
once memory [45] and once a cell is set to 0 (i.e., once it is programmed), it cannot
be changed back to 1 without using an erase operation. In flash memory, cells are
organized by blocks, and an erase operation resets an entire block of cells. Block
erasures are costly in terms of time and energy and they cause wear to flash cells.
Operations: There are two operations in this model: (1) An update operation
that changes a subset of cells to 0 to represent a value, and (2) A decoding operation
that maps cell states (i.e., memory state) to a value. Updating a variable means
changing the values of c1, · · · , cn to c′1, · · · , c′n. Assuming that no erase operation
occurs, and therefore no bits are reset to 1 after being set to 0, we have ∀i ∈
{1, · · · , n}, ci ≥ c′i after an update. If the update operation is performed when
operating voltage is below the nominal minimum required for flash memory, the
update operation may not be error-free.
3.3.2 Design Goals
Our storage techniques, which aim to provide reliable storage for low-power de-
vices, are designed with the following metrics in mind:
• Error rate: The first and foremost design goal is to minimize the error rate to
provide applications with reliable non-volatile storage.
• Energy consumption: The energy consumed to achieve an acceptably low error
rate should not exceed the expected energy savings gained by running at a
lower voltage.
• Delay: We define delay as the difference between the execution time to store
data reliably at a low voltage and to store the same data at a high voltage.
50
The delay caused by the storage technique should be reasonably small.
3.3.3 Proposed Methods
Toward the design goals discussed previously, we propose methods to deal with
errors caused by using flash memory at low voltage.
3.3.3.1 In-Place Writes
Since the transition of a 1 to a 0 in a NOR flash memory at low voltage is
stochastic rather than guaranteed, the in-place writes method repeats the write of
each byte (to the same memory location) more than once if error occurs, up to a
threshold number of attempts. Algorithm 1 gives the details for Encode and Decode
procedures for in-place writes. The in-place writes makes an attempt to write a byte
into memory, reads that memory address, and if the read result does not match the
attempted write value, the algorithm makes another attempt to write that value to
the same memory address. The write attempts can be controlled using the threshold.
The reason in-place writes decrease the error rate is that, as explained in Sec-
tion 3.2.4, each write attempt in the same memory location increases the accumu-
lated charge and therefore raises the probability of storing the intended bit sequence
successfully.
3.3.3.2 Multiple-Place Writes
Another approach to increase the reliability of flash writes at low voltage is to
write a value to more than one location in flash memory if error occurs up to a thresh-
old number of locations. Later, to retrieve the stored data, the multiple-place writes
method reads the data from the specified address and several other addresses asso-
ciated with it, then returns the bitwise AND of all of the stored values. Algorithm 2
details Encode and Decode procedures of the multiple-place writes method. The
51
Algorithm 1 The encoding and decoding algorithms for in-place writes method to
store data to address by repeating the writes up to a threshold number of attempts
if necessary.
Encode(data, address, threshold)
Write To Flash(data,address)
result ← Read From Flash(address)
repeat← 1
while (result 6= data) AND (repeat < threshold) do
Write To Flash(data,address)
result← Read From Flash(address)
repeat← repeat+ 1
Decode(address)
result← Read From Flash(address)
return result
multiple-place writes makes an attempt to write a byte into one memory address,
reads that memory address, and if the read result does not match the attempted
write value, the algorithm makes another attempt to write that value to a different
memory address. The write attempts can be controlled using the threshold.
The reason the multiple-place writes approach can decrease the error rate is as
follows: All cells of flash memory are initially set to 1. An error means that writing
a 0 has failed and a bit cell ci has remained untouched (logical 1) although it was
intended to be set to 0. If the cell write in one of the locations has not failed, and cell
ci is 0 in at least one location, getting the AND of the read values from all locations
will make cell ci = 0 in the AND result. The case of writing a 1 to a cell does not
cause an error since it means changing a cell from 1 to 1.
3.3.3.3 RS-Berger Codes
Our third method to provide reliable flash memory at low voltage involves data
coding. We use the concatenation of Reed-Solomon [44]and Berger [6] codes—which
we call RS-Berger codes—to detect and correct errors at read time (Algorithm 3).
52
Algorithm 2 The encoding and decoding algorithms for muliple-place writes method
to store data to address by repeating the writes up to threshold locations if necessary.
The distance between each of these associated locations is offset.
Encode(data, address, threshold, offset)
Write To Flash(data,address)
result← Read From Flash(address)
repeat← 1
while (result 6= data) AND (repeat < threshold) do
phy addr ← address+ (repeat× offset)
Write To Flash(data,phy addr)
n result← Read From Flash(phy addr)
result← result & n result
repeat← repeat+ 1
Decode(address, threshold, offset)
for i← 0 to (threshold− 1) do
phy addr ← address+ (i× offset)
n result← Read From Flash(phy addr)
result← result & n result
return result
Reed-Solomon is a widely used error-correcting code that can correct twice as
many erasures as errors. There are three parameters (n, k, d) accompanying the
Reed- Solomon (RS) code. The parameter n is the total number of symbols in the
codeword, and k is the number of information symbols in the codeword, and the
parameter d is the minimum hamming distance of two codewords in the codebook.
These three parameters should satisfy the following conditions: d = n− k + 1.
A (n, k, d)-RS code can correct up to n−k
2
errors and up to n−k erasures. There-
fore, if the locations of errors are known, an RS code’s error-correcting capacity is
improved twofold.
To detect the location of errors and therefore to improve the efficiency of the RS
code, we use a Berger code, an error-detecting code that can detect all asymmetric
errors [6]. As previously mentioned (Section 3.3.1), flash memory at low voltage can
53
Algorithm 3 The encoding and decoding algorithms for RS-Berger codes writes
method. t is the maximum number of erasures RS code can correct.
Encode(data1,...,N , n)
for i← 1 to N do
CWi ← RS Encode(datai,n)
Write To Flash(CWi,addressi)
for i← 1 to n do
for j ← 1 to N do
symi,j ← CWj(i)
chki ← Berger Encode(symi,(1,..,N))
Write To Flash(chki,addressN+1 + i− 1)
Decode(addr1,...,(N+1), n, t)
for i← 1 to N do
chki ← Read From Flash(addrN+1 + i− 1)
for i← 1 to N do
CWi ← Read From Flash(addri)
for j ← 1 to n do
symi,j ← CWi(j)
errors← {}
for i← 1 to n do
err ← Berger Decode(symi,(1,..,N), chki)
if err = 0 then
errors← errors ∪ {i}
if |errors| ≤ t then
for i← 1 to N do
resulti ← RS Decode(CWi,errors)
return result
else
return “fail to correct errors”
be modeled as a Z-channel for which a Berger code is suitable. A Berger codeword
consists of two parts: k information bits and dlog2(k + 1)e check bits. The check
bits of the Berger codeword represents the number of zeros in the k information bits.
Berger code can detect any number of zero-to-one errors, as long as no one-to-zero
errors occur in the same codeword. As a particular example, consider the case for
54
k = 6, the information bits are 010010. There are totally 4 zeros in it, therefore, the
check bits are 100. One such valid codeword would be 010010 100. When we do the
error detection, we check the number of zeros in the information bits part and the
binary number in the check bits part. If they are equal, no error occurs, otherwise,
errors are detected in the codeword.
We use an (N + 1) × n matrix to represent RS-Berger codes (Figure 3.7). This
matrix has N RS codewords, each of which has n symbols. Each symbol (m bits) is
filled in one entry of the matrix. Each column of the matrix, consisting of m × N
bits, supplies the information bits for one Berger code block. After Berger encoding,
the (N + 1)th row records the check bits for the Berger codes.
Figure 3.7: Structure of input/output sequence of Berger code.
Figure 3.8 shows how the data are encoded and decoded using our RS-Berger
code. When encoding the data, we first use RS code to generate n codewords (rows
of the matrix) and then we apply a Berger code to compute the check bits for each
symbol for all codewords (each column of the matrix). When decoding data, we first
use the Berger decoder to check whether or not each column is erroneous. If one
entry in the column is erroneous, we consider all the symbols in the column erasures;
otherwise, all the symbols in the column are considered correct. Then, once the error
55
locations are known, we apply RS decoding to correct the erroneous sequences row
by row.
Figure 3.8: A diagram representing the RS-Berger code. An RS-Berger code is the
concatenation of the Reed Solomon code and a Berger code.
3.4 Evaluation
Our storage techniques are designed for the resource limitations of low-power
devices. In this section, we first evaluate the suitability of the three methods proposed
in Section 3.3.3 for low-power devices; we then evaluate the hypothesis that for
CPU-bound workloads, operating at low voltage and managing errors is more energy
efficient than fixing the operating voltage to the maximum of all the components’
nominal minimum voltages.
Summary of results: For a sensor monitoring application that reads 256 data
samples from flash memory, aggregates data, and stores the results in flash memory,
use of in-place writes at 1.8V reduces the energy consumption up to 34% versus
running the same application at 2.2V (minimum voltage requirement for the flash
memory). This sensing application models a common workload for both wireless
sensor nodes and RFID-scale devices.
56
3.4.1 Comparison of the Proposed Storage Methods
The maximum number of write attempts for both in-place writes and multiple-
place writes methods were set to two. The RS-Berger codes used three codewords
of size 38 bytes (32 bytes data and 6 bytes parity). These settings enable all three
methods to fit their data in 192 bytes of flash memory. Table 3.3 shows the energy
consumption and time taken for the same workload under each method. Both in-
place writes and multiple-place writes consume less energy and finish more quickly
at 1.9V than at 1.8V. Both of these methods are feedback-based and repeat writes
if they detect errors. Because there is a lower chance of error at 1.9V, fewer rewrites
are required than at 1.8V, so less energy and time are required.
The in-place writes method slightly outperforms the multiple-place writes method
at both voltage levels because its decoding procedure is less CPU-intensive. The RS-
Berger codes method has the best Error Correction Rate (ECR in Table 3.3) of all.
The multiple-place writes method seems to be the most suitable when there are
some memory cells that are hard to program and therefore rewriting in those cells
is not helpful (Figure 3.5 gives an example of such a case). Compared to RS-Berger
codes which always guarantee that a certain number of errors can be corrected,
the in-place writes and multiple-place writes methods are less reliable—they offer
no such guarantees. Therefore, for applications with a hard reliability requirement,
RS-Berger codes may be more suitable if the application knows the error rate in
advance and is willing to incur extra computational costs for RS-Berger encoding
and decoding.
57
Table 3.3: Performance comparison of the proposed methods at 1.8V and 1.9V. Error
Correction Rate (ECR) shows the effectiveness of methods.
Method Voltage Time(ms) E(µJ) ECR
In-place 1.8 24.16 59 96%
M-place 1.8 25.00 63 84%
RS-Berger 1.8 334.45 160 100%
In-place 1.9 15.43 38 100%
M-place 1.9 16.85 40 100%
RS-Berger 1.9 334.73 180 100%
Error Correction Rate: As Table 3.3 illustrates, the two methods that do
not use coding—in-place writes and multiple-place writes—incur similar energy con-
sumption costs. We now compare the effectiveness of these two approaches with
respect to the error correction rate.
Figure 3.9 and Figure 3.10 demonstrate that flash storage reliability improves as
we increase the number of repeated writes/places at five different voltage levels (all
below the nominal minimum voltage for flash writes).
Experiment: Using our automated testbed, the test platform runs a program
that writes zeros to 192 consecutive bytes of flash memory (using in-place writes
and multiple-place writes methods in two different experiments). We increase the
maximum number of repeated writes from one to ten, one unit at a time. The
monitoring platform counts the number of incorrectly stored bytes (those that are
not set to zero after the experiment). The experiment was repeated for five different
voltages (1.86V-1.90V).
58
Figure 3.9: Reliability improvement using in-place writes over five voltages.
Figure 3.10: Reliability improvement using multiple-place writes over five voltages.
3.4.2 Half-Wits Versus Wits in Practice
To evaluate the end-to-end performance of our storage methods, we have tested
a sensor-monitoring application that is CPU-intensive and can benefit from a low-
voltage storage. This application reads from flash memory 256 accelerometer samples
(each ten bits); computes the maximum, minimum, mean, and standard deviation of
the samples; and stores the aggregate information in flash memory. This monitoring
application is a blend of CPU and I/O, but it is still a CPU-intensive workload.
59
Table 3.4 shows that providing the system with a low-voltage storage mechanism via
our methods helps to decrease the task’s total energy consumption by 34%.
Table 3.4: Energy consumption and execution time for the accelerometer sensor
application. At voltage below the recommended (1.8V and 1.9V), in-place writes
method with a threshold of two is used.
Method In-place 1.8V In-place 1.9V Standard 2.2V Standard 3.0V
Clock rate(MHz) 6 6 8 14
Energy(µJ) 270 300 410 760
Time(ms) 151.15 151.32 113.24 64.72
3.4.3 Finding a Crossover Point
We can empirically find the point at which the energy saved on computation
compensates for the added cost of repeated flash writes. We compare a workload
executed at 2.2V to the same one running at 1.8V using the in-place writes scheme
with the threshold k set to 2. We make the worst-case assumption that all data must
be written to flash twice (i.e., no bits change on the first attempt). The time spent
on flash writes while running at 1.8V is then twice the time spent when operating at
2.2V. We also assume that the clock rate of the system is set to the highest specified
for the CPU at each voltage. Specifically, the clock rate would be set to 6 MHz at
1.8V and to 8 MHz at 2.2V.
We empirically determined the power consumption of CPU and flash writes with
1.8V and 2.2V voltage supplies. PC 1.8 = 1.8mW , PC 2.2 = 3.4mW , PF 1.8 = 3.7mW ,
and PF 2.2 = 5.8mW . The variables TC and TF are the time spent in computation and
on flash memory respectively. With these assumptions, we can write the following
60
inequality to determine whether a given workload is likely to result in reduced energy
consumption:
Energy1.8 ≤ Energy2.2 ⇒
PC 1.8 × TC 1.8 + PF 1.8 × k × TF 1.8 ≤ PC 2.2 × TC 2.2 + PF 2.2 × TF 2.2 ⇒
PC 1.8 × 8MHz
6MHz
× TC 2.2 + PF 1.8 × k × 8MHz
6MHz
× TF 2.2 ≤
PC 2.2 × TC 2.2 + PF 2.2 × TF 2.2
The solution with k = 2 is TC 2.2 ≥ 4 × TF 2.2. Therefore, in-place writes are
competitive over normal flash writes when the time spent on low-voltage operations
like computation is at least four times greater than the time spent on flash writes.
3.5 Improvements and Alternatives
This section describes several complementary ways to further improve the per-
formance of our schemes.
3.5.1 Sign Bits and Storing Complements
As discussed in Section 3.2.3, one of the major factors influencing the error rate
is the Hamming weight of a number. One way to improve the performance of the
low-voltage storage methods is to store numbers with greater Hamming weights
(weight ≥ 4) in flash memory. If a number is lightweight (weight < 4), the comple-
ment of the number would be stored and a sign bit would be set for future data access.
An array of sign bits can be stored separately from the data to avoid disturbing word
alignment. A previous work [41] uses a similar technique for multi-level cell (MLC)
flash memories with four levels; their techniques result in a significant decrease of
energy consumption. The sign-bit approach involves very lightweight computation
(counting the number of ones) and increases the number of writes by a factor of
61
one-eighth. Therefore, the effect of this improvement on energy consumption and
delay should be comparatively small.
3.5.2 Memory Mapping Table
To exploit the fact that numbers with greater Hamming weights have a lower
probability of error, we can also map the most frequently used numbers in the user’s
data to the heavier numbers. The solution we suggest is to preprocess the data
to sort numbers based on their frequency of use. A simple memory mapping table
would map the most frequent numbers to the heaviest numbers. Such a table could
be preloaded in flash memory so that storing the table would not consume energy at
run time. Use of a memory mapping table would only increase the number of reads
and would not increase the number of writes. Therefore, the energy consumption
overhead and the delay should be smaller than the sign bit method.
62
4. CONTENT-ASSISTED FILE DECODING FOR NON-VOLATILE
MEMORIES
Non-volatile memories (NVMs) such as flash memories play a significant role
in meeting the data storage requirements of today’s computation activities. The
rapid increase of storage density for NVMs however brings reliability issues due to
closer alignment of adjacent cells on chip, and more levels that are programmed
into a cell. We propose a new method for error/erasure correction, which uses the
random access capability of NVMs and the redundancy that inherently exists in
information content. Although it is theoretically possible to remove the redundancy
via data compression, existing coding algorithms do not remove all of it for efficient
computation. The method named content-assisted decoding can be combined with
existing storage solutions for text files. Using the statistical properties of words and
phrases in the text of a given language, our decoder identifies the location of each
subcodeword representing some word in a given input noisy codeword, and decode
receiving bits sequence to compute a most likely word sequence. In this work, we
focus on the erasures recovery. The decoder can be adapted to work together with
traditional error-correcting codes decoders to keep the number of errors after erasure
recovery within the correction capability of traditional ECC decoders. The combined
decoding framework is evaluated with a set of benchmark files.
4.1 Introduction
Non-volatile memories, especially flash memories featuring excellent I/O speed
and decent storage capacity have attracted great attention from the data storage
community. Flash memories are considered one of the most promising candidates for
replacing mechanical hard disks in the near future [42]. Towards this goal, significant
63
progress has been made for increasing the storage density and the endurance of flash
memories.
However, higher storage density brings important reliability challenges [23]. The
existing solution for reliable storage are usually solely making use of error-correcting
codes. In order to satisfy the high data reliability requirement (the error rate should
be no more than 10−20 after decoding), the error-correcting codes need many more
parity check bits to reach the error correction capacity, which is inconsistent with
the high density storage idea. In this chapter, we propose a new method for erasure
correction named content-assisted decoding. Our method uses the fast random ac-
cess capability of non-volatile memories and the redundancy that inherently exists in
information content. By looking up the dictionaries storing the statistical properties
of words and phrases of the same language, our decoder first finds the “space” sym-
bol’s locations, then breaks the input noisy codeword into subcodewords, with each
subcodeword corresponding to a set of possible words. The decoder then recovers
the erasures in each noisy subcodeword to select a most likely word sequence as the
correction. The new scheme can be combined with existing storage solutions for text
files and improve the system’s erasure correction capacity. Consider the example in
Figure 4.1.
Codeword Text
Huffman encoding (1, 0, 0, 0, 0, 1, 1, 1) I am
LDPC encoding (1, 0, 0, 0, 0, 1, 1, 1,0,1,1,0) I am
Noise received (1, e, 0, e, 0, e, 1, e,0, e,1, e) ×
LDPC decoding failure (1, e, 0, e, 0, e, 1, e,0, e,1,0) ×
Content-assisted decoding (1, 0, 0, 0, 0, 1, 1, 1,0, e,1,0) I am
LDPC decoding success (1, 0, 0, 0, 0, 1, 1, 1,0,1,1,0) I am
Figure 4.1: An example on correcting erasures in the codeword of a text.
64
The English text “I am” is stored using the Huffman coding: {I → (1, 0),unionsq →
(0, 0), a → (0, 1),m → (1, 1)}, where unionsq denotes a space. The information bits are
encoded with a Low Density Parity Check (LDPC) code with parity check matrix H
(the bold bits denote the parity check bits).
H =

1 0 1 1 0 0 1 1 0 1 0 0
0 1 0 1 0 1 0 1 0 1 1 0
1 0 1 0 1 0 0 0 1 0 1 1
0 1 0 0 1 1 1 0 1 0 0 1

Assume that six erasures (marked by the symbol “e”) are received by the codeword.
The number of erasures exceeds the code’s correction capability, and LDPC decod-
ing fails. Our decoder takes in the noisy codeword, and corrects the erasures in the
information symbols by looking up a dictionary which contains two words {I, am}.
This brings the number of erasures down to one. Therefore, the second trial of LDPC
decoding succeeds, and all the erasures are corrected. Our approach is suitable for
natural languages, and can potentially be extended to other types of data where
the redundancy in information content is not fully removed by data compression.
The dictionaries are preloaded in the flash memory. The scheme takes advantage of
the fast random access speed provided by flash memories for fast dictionary look-up
and content verification so that the dictionary look-up process in our decoding algo-
rithm could be linear time. For performance evaluation, we have tested a decoding
framework that combines a modified soft decision decoder of LDPC codes and our
scheme with a set of text file benchmarks. Experimental results show that our de-
coder indeed increases the correction capability of the LDPC decoder and recovers
the erasures efficiently.
The rest of the chapter is organized as follows. Section 4.2 presents the prelimi-
65
naries, and defines the text file decoding problem. Section 4.3 specifies the algorithms
of the content-assisted file decoder. Section 4.4 discusses implementation details and
experimental results.
4.2 The Models of File Decoding
In this section, we first introduce the terminologies used in this chapter, then we
describe the model of the data storage channel and define the file decoding problem.
4.2.1 Notations
Let x denote a binary codeword (x1, x2, · · · , xn) ∈ {0, 1}n, and we use x[i : j]
to represent the subcodeword (xi, xi+1, · · · , xj) and x[i] to represent the subcodeword
(x1, x2, · · · , xi) for short. Let x′ denote a trinary noisy codeword (x′1, x′2, · · · , x′n) ∈
{0, 1, e}n. Let the function length(x) compute the length of a codeword x and
n erasure(x′) compute the number of erasures in a noisy codeword x′. We say x
to be a solution of x′, if for 1 ≤ i ≤ length(x),

xi = x
′
i when x
′
i 6= e
xi = 0 or 1 when x
′
i = e
Let A be an alphabet set, and let s ∈ A be a symbol. We denote a space by unionsq ∈ A.
A word
w , (s1, s2, · · · , sn)
of length n is a finite sequence of symbols without any space. A phrase
p , (w1,unionsq,w2)
66
is defined as a combination of two words separated by a space. Define a text
t , (w1,unionsq,w2,unionsq, · · · ,unionsq,wn)
as a sequence of words separated by unionsq. A word dictionary
Dw , {[w1 : p1], [w2 : p2], · · · }
is a finite set of records where a record [w : p] has a key w and a value p > 0. The
value p is an average probability that the word w occurs in a text. Similarly, a phrase
dictionary
Dp , {[p1 : p1], [p2 : p2], · · · }
stores the probabilities that a set of phrases appear in any given text. In our scheme,
it refers to the set of valid phrases (“word combinations”) used in files.
The dictionary look-up operations denoted by Dw[w] and Dp[p] return the prob-
abilities of words and phrases, respectively. We use the notation w . Dw (or p . Dp)
to indicate that there is a record in Dw (or Dp) with key w (or p). Let pis be a
bijective mapping from a symbol to a binary codeword, and let xs = pis(unionsq). In this
work, the mapping pis is used during data compression before ECC encoding, and
it encodes each symbol separately. In the example of Section 4.1, pis refers to the
Huffman codes.
{I → (1, 0),unionsq → (0, 0), a → (0, 1),m → (1, 1)}
The bijective mapping from a word w = (s1, · · · , sn) to its binary codeword is defined
67
as
piw(w) , (pis(s1), · · · , pis(sn))
and the bijective mapping from a text to its binary representation is defined as
pit(t) , (piw(w1),xs, · · · ,xs, piw(wn))
where xs = pis(unionsq). We use pi−1s , pi−1w and pi−1t to denote the corresponding inverse
mappings.
4.2.2 File Decoding Model
The model of the data storage channel is shown in Figure 4.2.
Source 
Encoder
Channel 
Encoder
Channel 
Decoder
Source 
DecoderSource
Noise
Figure 4.2: The channel model for data storage.
A text t is generated by the source. The text is compressed by the source encoder
(e.g. Huffman encoder), producing a binary codeword y = pit(t) ∈ {0, 1}k. The
compressed bits are fed to a channel encoder (e.g. LDPC encoder), obtaining an
ECC codeword x = ψ(y) ∈ {0, 1}n where n > k. Here we assume a systematic ECC
is used. The codeword is then stored by memory cells, and receives some erasures.
In this work, a binary erasure channel (BEC) with bit-erasure rate f is assumed.
When the cells are read, the channel outputs a noisy codeword x′ = (x′1, x
′
2, · · · , x′n)
where x′i = xi or e, and 1 ≤ i ≤ n. The noisy codeword is first corrected by a
68
channel decoder (e.g. our proposed decoder), producing an estimated ECC codeword
yˆ = ψ−1(x′). The source decoder decompresses the corrected codeword, and returns
an estimated text tˆ = pi−1t (yˆ) upon success.
This work focuses on designing better channel decoders ψ−1 for correcting bit
erasures in text files. We propose a new decoding framework which connects a tradi-
tional ECC decoder with a content-assisted decoder (CAD) as shown in Figure 4.3.
Figure 4.3: The work-flow of a channel decoder with content-assisted decoding.
A noisy codeword is first passed into an ECC decoder. If decoding fails, the
decoding output is passed to CAD. With the statistical information stored in Dw
and Dp, the CAD selects a word for each subcodeword to form a likely text as the
correction for the noisy codeword and reduces the nubmer of erasures left in the
codeword. The corrected text is fed back to the ECC decoder to further recover the
text. The text file decoding problem for our CAD is defined as follows.
Definition 9. Let t be some text generated from the source, and let x′ ∈ {0, 1, e}n be
a noisy channel output codeword of t. Given two dictionaries Dw and Dp, the text
file decoding problem for the CAD is to find an estimated text tˆ which is the most
69
likely correction for x′, i.e.
argmax
tˆ
Pr{tˆ | x′, Dp, Dw}.
4.3 The Content-Assisted Decoding Algorithms
The content-assisted decoder approximates the solution to the optimization prob-
lem in Definition 9 in the three steps: (1) estimate “space” positions in the noisy
codeword to divide the codeword into subcodewords, with each subcodeword repre-
senting a set of candidate words. (2) Resolve ambiguity by selecting a word for each
subcodeword to form a most likely sequence. (3) Connecting the results of step (2)
as the input to a modified LDPC soft decision decoder to further reduce error rate.
We describe the algorithms of each step in this section.
4.3.1 Creating Dictionaries
The dictionaries Dw and Dp are used in our decoding algorithms. To create the
dictionaries, we simply count the frequencies of words and phrases of two words
which appear in a relatively large set of different texts in the same language as
the texts generated by the source. Fast dictionary look-up is achieved by storing
the dictionaries in a content-addressable way thanks to the random access in flash
memories, i.e., the probability in a dictionary record is addressed by the value of the
corresponding word or phrase. As we show later in section 4.4, the completeness of
the dictionaries effects the decoding performance.
4.3.2 Codeword Segmentation
We are aiming to assign 0 or 1 to those erasure bits so that we can recover the
noisy codeword to a sequence of words separated by “spaces”. Considering the fact
that the dictionary is complete or near complete, the probability that the input
70
text t contains a word w that is not in the dictionary (w 7 Dw) is very low. The
erasures are evenly distributed in the information bits. Therefore, after decoding,
the erasures located in the words that are not in the dictionary should be as few as
possible. If the dictionary is complete, the number of erasures located in the non-
dictionary words is 0 because all words are in the dictionary. By this intuition, we
define the codeword segmentation function σ in the following way: σ takes in a noisy
codeword x and a word dictionary Dw, then assign 0 or 1 to erasure bits to make
the corrected codeword represent a text, i.e., a sequence of valid words separated by
spaces, and the number of erasures located in non-dictionary words is minimized. If
σ(x, Dw) = ((x1,x2, · · · ,xk), (i1, i2, · · · , ik−1)), where the number of records |Dw| is
bounded by some constant K, and ij ∈ N is the index of the first bit of the j-th
space in x, the subcodeword x1 = x[i1 − 1], xk = x[ik−1 + length(xs) : length(x)],
and xj = x[ij−1 + length(xs) : ij − 1] for j ∈ {2, 3, · · · , k − 1}. The mapping σ is
required to satisfy the following properties:
1. for each subcodeword xj, 1 ≤ j ≤ k, ∃w such that piw(w) is a solution to xj.
2.
∑k
j=1 n erasure(xj)× flag(j) is minimized, where
flag(j) =

0 if ∃w . Dw such that piw(w) is a solution to xj
1 otherwise
Let the cost function c(i) return the minimum number of erasures located in the
non-dictionary words after converting the subcodeword x[i] to represent a text. We
have the following recurrence for i ≥ lmin:
c(i) , min{
min{i,lmax}
min
k=lmin
{c(i− ls − k) + g(i− k− ls + 1, i− k) + h(i− k + 1, i)}, h(1, i)}
71
where lmin/lmax are the shortest/longest codeword length for a single dictionary
word respectively, ls = length(xs), and when i < lmin, c(i) = ∞. Clearly, if the
dictionary are complete, c(n) = 0, where n = length(x).
The function g(i, j) denotes whether x[i : j] can be decoded to unionsq. The function
h(i, j) computes the cost taken to obtain a single word of subcodeword length j − i.
g(i, j) ,

0 if xs is a solution to x[i : j]
∞ otherwise
h(i, j) ,

0 if ∃w . Dw, piw(w) is a solution to x[i : j]
n erasure(x[i : j]) else if ∃w 7 Dw, piw(w) is a solution to x[i : j]
∞ otherwise
Example 10. Consider the example in section 4.1. The input noisy codeword x′ =
(1, e, 0, e, 0, e, 1, e), and the word dictionary Dw = {[I : 0.5], [am : 0.5]}. We have
lmin = length(xs) = 2, lmax = 4 and σ(x
′, Dw) = (((1, e), (0, e, 1, e)), (3)). Starting
from c(1), we recursively compute c(i) for all 1 ≤ i ≤ n and n is the codeword length.
The results are shown in Figure 4.4a. For instance, to compute c(2), we first compute
c(1) = ∞ because the codeword length is at least 2. We then compute h(1, 2) = 0.
This is because we can assign 0 to the first erasure and make x1 = (10), which can
be decoded as “I”. Finally, we have c(2) = min(0,∞) = 0.
Our objective is to compute c(n) given an input codeword of length n, and find
out the space positions which helps achieve the minimum cost. When c(i) is com-
puted recursively starting from c(1), some entries will be called several times. For
instance, in Example 10, the entry c(2) needs to be computed when we compute c(i)
72
for i > 2. A good way for speeding up such computation is to use dynamic program-
ming techniques shown in Algorithm 4, which computes the final result iteratively
starting from c(1), an entry computed in the previous iteration is saved for later
iterations.
(a) Array c
(b) Array m
Figure 4.4: The examples of codeword segmentation. In Figure (b): sets of words
means the subcodeword x[i] can be decoded to a word or word sequence chosen
from any word in the word set. The → defines the word sequence order. The cross
× represents a subcodeword x[i] can neither be decoded to a word nor to a word
sequence.
The algorithm treats c(i) as the entries of one dimensional array. Starting from
c(1), the algorithm fills each entry from c(1) to c(n), as shown in Figure 4.4a. The
corresponding space locations for breaking the subcodeword x[i], and the set of word
sequences that x[i] can be decoded to represent are recorded using a one dimensional
array m. In practice, as f is close to 0, the average number of erasures in the
subcodeword x[k : j] is small. The cardinality of the set of possible words Sw for
a given noisy subcodeword x[k : j] can be bounded by 2n erasure(x[k:j]). In practice,
we first brute force search the set {w | w . Dw, piw(w) is a solution to x[k : j]} and
record the results in m. If x[k : j] cannot be decoded to any word in the dictionary,
the set for non-dictionary {w | w 7 Dw, piw(w) is a solution to x[k : j]} is computed
73
and recorded in m. As we are interested in the space locations for the whole input
noisy codeword, after the entries of c and m have been filled (Figure 4.4), we get the
optimal solution from m(n). The results are the ordered space locations and the sets
of words for the subcodewords between the spaces. Assume that the subcodeword of
each word has limited length bounded by some constant and the number of erasures
in each word is small, the time complexity of our dynamic programming algorithm
is O(n2), and O(n) space is used for storing the arrays c and m.
Algorithm 4 CodewordSegmentation(x, Dw, lmin, lmax)
n← length(x), l← length(xs)
Let c be an array of length n
Let wordSets and spaces be two arrays of empty lists
for i from 1 to n do
c(i) =∞
for i from lmin to n do
flag = 0,k = lmin
while flag 6= 0 AND k ≤ min (lmax, i) do
if k = i then
Brute force assign 0 or 1 to erasures in x[k]
Sw ← {w | w . Dw, piw[w] is a solution to x[k]}
if Sw 6= ∅ then
c(k) = 0, wordSets(k) = Sw
else
Sw ← {w | w 7 Dw, piw[w] is a solution to x[k]}
if Sw 6= ∅ AND c(i) > n erasure(x[k]) then
c(i) = n erasure(x[k]), wordSets(k) = Sw
else
if c(i− k − l) 6=∞ AND xs is a solution to x[i− k − l + 1 : i− k] then
Brute force assign 0 or 1 to erasures in x[i− k + 1 : i]
Sw ← {w | w . Dw, piw[w] is a solution to x[i− k + 1 : i]}
if Sw 6= ∅ AND c(i) > c(i− k − l) then
c(i) = c(i− k − 1), wordSets(i) = wordSets(i− k − l)→ Sw
spaces(i) = spaces(i− k − l)→ i− k − l + 1
else
Sw ← {w | w 7 Dw, piw[w] is a solution to x[i− k + 1 : i]}
if Sw 6= ∅ AND c(i) > c(i− k − l) + n erasure(x[i− k + 1 : i]) then
c(i) = c(i−k− l)+n erasure(x[i−k+1 : i]), wordSets(i) = wordSets(i−
k − l)→ Sw
spaces(i) = spaces(i− k − l)→ i− k − l + 1
if c(i) = 0 then
flag = 1
k ++
return wordSets(n) and spaces(n)
74
Example 11. For the example in section 4.1, the tables c and m computed by Al-
gorithm 4 are shown in Figure 4.4. The minimum decoding cost is c(8) = 0, which
means the noisy codeword can be decoded as a sequence of dictionary words. And
the index of the estimated space is 3. With the estimated space, the subcodeword
x[1 : 2] = (1, e) can be decoded to a word in the set {I}, and the subcodeword
x[5 : 8] = (0, e, 0, e) can be decoded a word in the set {am}.
4.3.3 Ambiguity Resolution
Given the subcodewords (x1,x2, · · · ,xk) between the estimated spaces, and a
list of word sets (W1,W2, · · · ,Wk) computed from the codeword segmentation al-
gorithm, for i ∈ {1, · · · , k} we select a word wi from Wi to form a most probable
text tˆ = (w1,unionsq,w2,unionsq, · · · ,unionsq,wk). The codeword pit(tˆ) is a correction for the input
noisy codeword. Specifically, this step is to compute
argmax(w1,w2,··· ,wk)∈W1×W2···×Wk Pr{(w1,w2, · · · ,wk) | (x1,x2, · · · ,xk)}
= argmax(w1,w2,··· ,wk)∈W1×W2···×Wk Pr{(w1,w2, · · · ,wk), (x1,x2, · · · ,xk)}
Let the function P(wi) compute the maximal joint probability when some word
wi is selected from Wi and appended to the previously selected word sequence
(w1,w2, · · · ,wi−1). For i ∈ [2, k], we have
P(wi) , max(w1,··· ,wi−1)∈W1×···×Wi−1 Pr{(w1, · · · ,wi), (x1, · · · ,xi)}
= max(w1,··· ,wi−1)∈W1×···×Wi−1 Pr{w1}Pr{x1 | w1}Pr{w2 | w1}Pr{x2 | w2}
Pr{w3 | (w1,w2)}Pr{x3 | w3} · · ·
Pr{wi | (w1,w2, · · · ,wi−1)}Pr{xi | wi}
75
Assume the words in a text form a one-step Markov chain, i.e., for i ≥ 2,
Pr{wi | (w1,w2, · · · ,wi−1)} = Pr{wi | wi−1}
Therefore, we rewrite the equation above as:
P(wi) = max(w1,··· ,wi−1)∈W1×···×Wi−1
Pr{w1}Pr{x1 | w1}Pr{w2 | w1}Pr{x2 | w2}
Pr{w3 | w2}Pr{x3 | w3} · · ·Pr{wi | wi−1}Pr{xi | wi}
= max(w1,··· ,wi−1)∈W1×···×Wi−1
Pr{w1}Pr{w2 | w1} · · ·Pr{wi | wi−1}
i∏
k=1
Pr{xk | wk}
= max(w1,··· ,wi−1)∈W1×···×Wi−1 Pr{wi | wi−1}Pr{xi | wi}
Pr{w1}Pr{w2 | w1} · · ·Pr{wi−1 | wi−2}
∏i−1
k=1
Pr{xk | wk}
= max(w1,··· ,wi−1)∈W1×···×Wi−1 Pr{wi | wi−1}Pr{xi | wi}
Pr{(w1, · · · ,wi−1), (x1, · · · ,xi−1)}
= maxwi−1∈Wi−1 Pr{xi | wi}Pr{wi|wi−1}
max(w1,··· ,wi−2)∈W1×···×Wi−2 Pr{(w1, · · · ,wi−1), (x1, · · · ,xi−1)}
= maxwi−1∈Wi−1 Pr{xi | wi}Pr{wi|wi−1}P(wi−1)
and
P(w1) = Pr{w1}Pr{x1 | w1}
The conditional probability Pr{xi | wi} is computed from the channel statistics by
Pr{xi | wi} = fn erasure(xi)(1− f)length(xi)−n erasure(xi)
76
Since the number of erasures in xi is fixed, Pr{xi | wi} is the same for all wi ∈Wi.
We are aiming to find a word sequence (w1,w2, · · · ,wk) ∈ W1 ×W2 · · · ×Wk
to maximize Pr{(w1,w2, · · · ,wk) | (x1,x2, · · · ,xk)}, therefore, we can remove the
factor Pr{xi | wi} from P(wi) such that
P(wi) = maxwi−1∈Wi−1 Pr{wi|wi−1}P(wi−1)
and
P(w1) = Pr{w1}
The probabilities Pr{wi} and Pr{wi | wi−1} are looked up from the dictionaries:
Pr{wi} = Dw[wi]
Pr{wi | wi−1} = Dp[(wi−1,unionsq,wi)]
The derived recurrence suggests that the optimization problem can be mapped to
the problem of trellis decoding, which is again solved by dynamic programming. The
trellis for our problem has k time stages. The observed codeword at the i-th stage
is xi for i ∈ {1, · · · , k}. There are |Wi| vertices at stage i with each representing
an element w of Wi and being associated with the probability Pr{w}. The weight
of the directed edge from a vertex at stage i with word wx to a vertex of stage
i + 1 with word wy is the conditional probability Pr{wy | wx}. An example of
the mapping is shown in Figure 4.5. Our target is to compute the sequence which
achieves maxwk∈Wk P(wk), which leads to the Viterbi path in the corresponding
trellis starting from a vertex in stage 1 and ending at a vertex in stage k.
77
Figure 4.5: An illustrative example of the mapping to trellis decoding. The sets
W1 = {w1,1,w1,2}, W2 = {w2,1,w2,2,w2,3}, W3 = {w3,1,w3,2,w3,3} and W4 =
{w4,1,w4,2} respectively corresponds to the subcodewords x1, x2, x3 and x4.
Algorithm 5 Viterbi((W1, · · · ,Wk), (x1, · · · ,xk), Dw, Dp)
n← maxl∈[1,k] |Wl|
Let p and s be two n× k tables
pmax ← 0, index← 0
for t from 1 to k do
for i from 1 to |Wt| do
p(i, t)← Dw[Wt[i]]
pmax ← 0, index← 0
for j from 1 to |Wt−1| do
p′ ← Dp[(Wt−1[j],unionsq,Wt[i])] · p[j, t− 1]
if p′ > pmax then
pmax ← p′
index← j
p(i, t)← pmax
s(i, t)← index
words← [Wk[index]]
for t from k to 2 do
i← s(index, t)
words. appendToFront(Wt−1[i])
index← i
return words
78
The dynamic programming algorithm for solving our trellis decoding problem is
specified in Algorithm 5, which is adapted from the Viterbi decoding [56]. The final
solution is computed iteratively, starting from P(w1) according to the recurrence.
When the last iteration is finished, we trace back along the Viterbi path recorded
in the table s, and collect the selected words to form an estimated text tˆ. The
complexity of the Viterbi decoding algorithm is O(n2k), where k is the length of the
input codeword list, and n = maxi∈[1,k] |Wi| is the cardinality of the biggest input
word set. The algorithm requires O(nk) space for storing the tables p and s.
4.3.4 Post Processing
Additional errors may be introduced during codeword segmentation and ambi-
guity resolution if unknown/rare words or phrases occur in the input codeword.
Unknown words (phrases) refer to the words (phrases) that are not in Dw (Dp) and
rare words (phrases) mean the words (phrases) that are in Dw (Dp) but with small
frequency. Upon meeting an unknown word, the codeword segmentation algorithm
tends to split its codeword into subcodewords representing known short words with
the space symbol or decode it to be some known words with the same codeword
length as the unknown codeword. Such segmentation and ambiguity resolution in-
troduce additional bit errors. We use a simple post-processing step which further
reduces the errors by applying the ECC error decoder on the output of our content-
assisted decoder (CAD). Since the CAD recovers most of the erasures, the error
rate for the correction codeword getting from CAD becomes much smaller than the
original channel erasure rate, which is usually under the error capacity of ECC.
Moreover, because the noisy codeword only has erasures, the bits with value 0 and
1 are definitely correct. By getting those information, we can modify the ECC er-
ror decoder to improve its error correction capacity. In our work, we use an LDPC
79
code as the error-correcting code and apply iterative belief propagation algorithm to
decode. However, we modify the message passing functions for messages from the
variable nodes to the check nodes in the following way: if the variable node is 0 (1)
in the original noisy codeword, its likelihood to be 0 is always 1 (0) no matter what
messages that node receives from check nodes.
4.4 Experiments
In this section, we evaluate the preformance of our proposed content-assisted
decoding scheme and discuss the experiment results.
4.4.1 Implementation Detail
Our implementation supports the use of basic punctuations in the input text files,
including ‘,’, ‘.’, ‘?’ and ‘!’. This is done by adding another function in the definition
of c(i). The function measures the number of erasures in the subcodeword that can
be decoded as a word followed by a punctuation.
When we estimate the last “space” position for the codeword x[i], we begin
with the last subcodeword of length k = lmean, then search subcodeword of length
lmean−1, lmean+1, lmean−2, lmean+2, · · · , until we find a good last word such
that c(i) = 0, where lmean is the mean of length(piw(w)), for all w ∈ Dw. Because
the subcodeword length is near lmean with high probability, this heuristic method
can speed up the code segmentation algorithm.
During ambiguity resolution, overflow may occur when the input codeword length
is very long due to the multiplications of floating point numbers. We thus use a
logarithmic version of the recurrence, which uses additions instead of multiplications
of floating point numbers. This significantly delays the overflow.
A smoothing technique is used for computing Pr{wi | wi−1}. The probability
Pr{wi} is used if the phrase (wi−1,unionsq,wi) is unknown to Dp. And if for a word set
80
Wi, ∀w ∈Wi is not in the dictionary, we set ∀w ∈Wi, Pr{w} = a, where 0 < a < 1
is a random number. The reason is that returning 0 for unknown words or phrases
suddenly makes the whole joint probability be 0 and cancels the path.
4.4.2 Performance Evaluation
We evaluate decoding performance of our proposed CAD by comparing the bit
erasure rates of using LDPC erasure hard decoder alone, the bit error rate of using
CAD, and the bit error rate of applying modified LDPC error soft decision decoder
on top of CAD. The test inputs include 3 self-collected paragraphs and 24 para-
graphs randomly extracted from the Canterbury Corpus, the Calgary Corpus, the
Large Corpus [2], and the large text compression benchmark [1] (see Table 4.1).
All the testing inputs use basic punctuations. In the future, we would like to sup-
port numbers, more punctuations and math symbols. The functions pis and pi
−1
s are
implemented with Huffman coding. We use a (3584, 3141)-random LDPC code.
Table 4.1: The benchmark used in our performance evaluation
Name Category From
email Email discussion Calgary
lcet10 Lecture notes Canterbury
alice Novel Canterbury
conf-intro Call for paper Self-collected
bible The King James version of the bible Large
asyoulike Shakespeare play Canterbury
plrabn Poetry Canterbury
news Web news Self-collected
enwiki8 Wikipedia texts Large
world192 The world fact book Large
The decoding results for each scheme for f = 0.1 is shown in Figure 4.6. The
81
bit-erasure rate makes the LDPC erasure hard decoder fail to converge with high
probability. The results are averaged from 100 experiments. The use of CAD suc-
cessfully recovers the erasures and brings the number of errors down to make the ECC
decoding effective again. The completeness of the dictionaries determines the decod-
ing performances. For instance, in the benchmarks conf-intro, enwiki8, plrabn
and world, where most of the words or phrases are unknown to the dictionaries, our
decoder introduces additional errors by aggressively breaking the codewords of the
unknown words with spaces.
Figure 4.6: The comparison on the correction performance of three decoders: LDPC
erasure hard decoder, CAD only and CAD+LDPC error soft decoder.
82
5. SUMMARIES AND FUTURE DIRECTIONS
This thesis was motivated by the need for effective data representation and coding
schemes, which are helpful to efficient and reliable data storage in flash memories
based on their unique properties such as: increasing a flash cell level is easy, but
decreasing a flash cell level is very costly because of incurring block erasure; flash
memories are usually used in low-power embedded devices where energy consumption
is the most important factor in the system’s performance; flash memories support
random and fast access for the data. The topics we have discussed in this work
include rank modulation with multiplicity for flash memories, software techniques
for reliable embedded flash storage at low voltages and content-assisted file decoding
for flash memories. For systems using flash memories, our proposed techniques can
extend their longevity and improve their reliability and performance. In this chapter,
we summarize our contributions and present suggestions for future work.
5.1 Summaries and Contributions
We have presented a new data representation scheme for flash memoires, which is
called rank modulation with multiplicity. It is an extension of rank modulation with
the advantages of higher capacity and efficient programming. We have focused on
the rewriting of data based on this new scheme and have studied its basic properties,
including the rewriting cost, optimal ways to change rank modulation states and the
expansion of rank modulation states given the rewriting cost. We have considered
both the unweighted and weighted rewriting cost and described the analysis respec-
tively. This scheme can solve both the problem of overshooting while programming
cells and the problem of memory endurance in aging devices.
The high voltage requirement of on-chip flash memories is a barrier to reducing the
83
total energy consumption of low-power devices. We have examined the main factors
affecting the behavior of flash memories at low voltage. Based on our observations of
flash memory behavior at low voltage, we have proposed three algorithms—in-place
writes, multiple-place writes, and RS-Berger codes—that aim to make flash memories
available and reliable at low voltage while tolerating the resource limitations of low
power devices. Our evaluation shows that in-place writes can save 34% of energy
consumption for a sensing workload on the MSP430 microcontroller. Our storage
techniques enable battery-powered devices to require fewer or smaller batteries or to
become batteryless.
For the sake of reliable file storage in flash memories, we have presented the
content-assisted decoder, which makes use of the random and fast access properties
of flash memories and the redundancy in the content existing in the text files, to
recover the erasures in the codeword. To the best of our knowledge, this is the first
decoding scheme for flash memories that is based on looking up the dictionaries for
information verification and error/erasure correction. The dictionaries are gained
from the statistical properties of words and phrases in the text of a given language.
We have designed the dynamic programming algorithms for word segmentation and
choosing the most likely word for each segment to form the most likely word sequence
as a recovery for the original input text file. We have evaluated the new decoding
scheme on a set of benchmark files.
5.2 Future Directions
In order to think further about our research, here, we are interested in discussing
potential research work in future.
One of our general objective is to construct rewriting codes for flash memories
based on rank modulation with multiplicity proposed in this thesis and explore the
84
error correcting code for it. It will be interesting to study the following topics about
rank modulation with multiplicity in future:
• Analyze the rewriting performance for rank modulation with multiplicity and
find some bounds according to the rewriting ball size such as Gilbert type lower
bound and sphere packing upper bound.
• Construct good rewriting codes that achieve or near the bounds.
• Define an error model for rank modulation with multiplicity where the number
of errors corresponds to the minimal number of adjacent transpositions required
to change a given stored permutation to another erroneous one and study its
corresponding error-correcting codes.
We have designed the RS-Berger code for reliable flash writes under voltages
below the requirement on the specification. Although the RS-Berger code can cor-
rect errors dramatically, it consumes much energy due to the very intense computa-
tion operations for the Reed Solomon decoding. Future work includes finding more
energy-efficient coding schemes to combat flash writes errors caused by low voltage.
Currently, the systems cannot take full advantage of dynamic voltage scaling. The
new coding schemes should support dynamic voltage adjustment for flash writes and
consume less energy.
Another plan is to introduce benchmarks for the storage systems of low power
devices. The standard benchmarks that are currently used to evaluate the storage
systems are designed for desktop computers and not immediately applicable to the
low-power domain.
We have provided content-assisted decoding algorithms for file erasure recovery.
The algorithms can be slightly modified and upgraded to support error corrections.
85
Also, it is very desirable to extend our content-assisted file decoding method to
support more general files. In the current stage, decoder only supports plain text
files with letters and basic punctuations including ‘,’, ‘.’, ‘?’ and ‘!’. In future, we
are planning to support numbers, more punctuations, math symbols and documents
with format information. Our final goal for the decoder is to decode more general
types of files such as pictures, music and videos. It should find the solutions to the
following problems:
• Define the dictionaries and collect data for the dictionaries. Currently, the
dictionaries are the statistical properties of words and phrases in the texts. For
image, audio or video files, what should their dictionaries include? How to get
the dictionaries for those different types of files.
• Construct the source encoder to compress the original files such that we can still
make use of the redundancy left in the information bits for content verification.
How to compress the format information existing in the documents?
• Design the algorithms to split the image, audio or video files. The text files are
segmented by words or phrases. What are the segmentation unit for image,
audio or video files?
They all remains as open questions.
86
REFERENCES
[1] Large Text Compression Benchmark. http://mattmahoney.net/dc/text.html,
May 2012.
[2] The Canterbury Corpus Benchmark. http://corpus.canterbury.ac.nz/index.html,
May 2012.
[3] D. Agrawal, B. Li, Z. Cao, D. Ganesan, Y.i Diao, and Pr. J. Shenoy. Exploiting
the interplay between memory and flash storage in embedded sensor devices. In
RTCSA, pages 227–236, 2010.
[4] I. F. Akyildiz, W. Su, Y. Sankarasubramaniam, and E. Cayirci. Wireless sensor
networks: a survey. Computer Networks, 38(4):393 – 422, 2002.
[5] A. Barg and A. Mazumdar. Codes in permutations and error correction for rank
modulation. IEEE Transactions on Information Theory, 56(7):3158–3165, 2010.
[6] J. M. Berger. A note on error detection codes for asymmetric channels. Infor-
mation and Control, 4(1):68 – 73, 1961.
[7] C. Berrou, A. Glavieux, and P. Thitimajshima. Near shannon limit error-
correcting coding and decoding: Turbo-codes. In Communications, 1993. ICC
’93 Geneva. Technical Program, Conference Record, IEEE International Con-
ference on, volume 2, pages 1064 –1070 vol.2, May 1993.
[8] M. Buettner, B. Greenstein, D. Wetherall, and J. R. Smith. Revisiting smart
dust with rfid sensor networks. In 7th ACM Workshop on Hot Topics in Net-
works, Oct. 2008.
87
[9] P. Cappelletti, C.Golla, P. Olivo, and E. Zanoni. Flash memories. Kluwer
Academic Publishers, Norwell, MA, USA, 1999.
[10] Y. Cassuto, M. Schwartz, V. Bohossian, and J. Bruck. Codes for asymmetric
limited-magnitude errors with application to multilevel flash memories. Infor-
mation Theory, IEEE Transactions on, 56(4):1582 –1595, April 2010.
[11] B. Chen, X. Zhang, and Z. Wang. Error correction for multi-level nand flash
memory using reed-solomon codes. In Signal Processing Systems, 2008. SiPS
2008. IEEE Workshop on, pages 94 –99, Oct. 2008.
[12] S. Chen. What types of ECC should be used on flash memory? Application
note for SPANSION. http://www.spansion.com/Support, March 2011.
[13] M. Fujino and V. G. Moshnyaga. An efficient hamming distance comparator
for low-power applications. In Electronics, Circuits and Systems, 2002. 9th
International Conference on, volume 2, pages 641 – 644, 2002.
[14] E. En Gad, A. Jiang, and J. Bruck. Trade-offs between instantaneous and total
capacity in multi-cell flash memories. In ISIT, pages 990–994, 2012.
[15] E. En Gad, M. Langberg, M. Schwartz, and J. Bruck. Generalized gray codes
for local rank modulation. In ISIT, pages 874–878, 2011.
[16] E. En Gad, M.l Langberg, M. Schwartz, and J. Bruck. Constant-weight gray
codes for local rank modulation. IEEE Transactions on Information Theory,
57(11):7431–7442, 2011.
[17] E.l En Gad, A. Jiang, and J. Bruck. Compressed encoding for rank modulation.
In ISIT, pages 884–888, 2011.
88
[18] E. Gal and S.Toledo. Algorithms and data structures for flash memories. ACM
Comput. Surv., 37(2):138–163, June 2005.
[19] R. Gallager. Low-density parity-check codes. Information Theory, IRE Trans-
actions on, 8(1):21 –28, Jan. 1962.
[20] B. Godard, J. M. Daga, L. Torres, and G. Sassatelli. Hierarchical code cor-
rection and reliability management in embedded nor flash memories. In Test
Symposium, 2008 13th European, pages 84 –90, May 2008.
[21] A. L. Goldberger, L. A. N. Amaral, L. Glass, J. M. Hausdorff, P. C. Ivanov,
R. G. Mark, J. E. Mietus, G. B. Moody, C. K. Peng, and H. E. Stanley. Phys-
iobank, physiotoolkit, and physionet: Components of a new research resource
for complex physiologic signals. Circulation, 101(23):e215–e220, 2000.
[22] S. Gregori, A. Cabrini, O. Khouri, and G. Torelli. On-chip error correcting tech-
niques for new-generation flash memories. Proceedings of the IEEE, 91(4):602 –
616, April 2003.
[23] L. M. Grupp, J. D. Davis, and S. Swanson. The bleak future of nand flash
memory. In Proceedings of the 10th USENIX conference on File and Storage
Technologies, FAST’12, pages 2–2, Berkeley, CA, USA, 2012. USENIX Associ-
ation.
[24] Texas Instruments Incorporated. MSP430 ultra-low power microcontrollers.
http://www.ti.com/msp430, May 2010.
[25] A. Jiang. On the generalization of error-correcting wom codes. In ISIT, pages
1391 –1395, June 2007.
[26] A. Jiang and J. Bruck. Data Storage. In-Tech Publisher, New York, USA, 2010.
89
[27] A. Jiang, R. Mateescu, M. Schwartz, and J. Bruck. Rank modulation for flash
memories. In ISIT, pages 1731–1735, 2008.
[28] A. Jiang, M. Schwartz, and J. Bruck. Error-correcting codes for rank modula-
tion. In ISIT, pages 1736–1740, 2008.
[29] A. Jiang, M. Schwartz, and J. Bruck. Correcting charge-constrained errors
in the rank-modulation scheme. IEEE Transactions on Information Theory,
56(5):2112–2120, 2010.
[30] A. Jiang and Y. Wang. Rank modulation with multiplicity. In Proceedings of
IEEE Workshop on Application of Communication Theory to Emerging Memory
Technologies (ACTEMT), pages 1928–1932, 2010.
[31] T. Kløve. Lower bounds on the size of spheres of permutations under the cheby-
chev distance. Des. Codes Cryptography, 59(1-3):183–191, 2011.
[32] T. Kløve, T. Lin, S. Tsai, and W. Tzeng. Permutation arrays under the cheby-
shev distance. IEEE Transactions on Information Theory, 56(6):2611–2617,
2010.
[33] C. Lee, T. Lin, M. Shieh, S. Tsai, and H. Wu. Decoding permutation arrays
with ternary vectors. Des. Codes Cryptography, 61(1):1–9, 2011.
[34] Y. Li, Y. Wang, A. Jiang, and J. Bruck. Content-assisted file decoding for non-
volatile memories. In Proceedings of the 46th Asilomar Conference on Signals,
Systems and Computers, Pacific Grove, CA, USA, 2012.
[35] S. Lin and D. J. Costello Jr. Error Control Coding - Fundamentals and Ap-
plications. Prentice Hall computer applications in electrical engineering series.
Prentice Hall, Upper Saddle River, NJ, USA, 1983.
90
[36] D. J. C. MacKay and R. M. Neal. Near shannon limit performance of low density
parity check codes. Electronics Letters, 32(18):1645, Aug. 1996.
[37] A. M. Mainwaring, J. Polastre, and R. Szewczyk. Wireless Sensor Networks for
Habitat Monitoring. In Mobile Computing and Networking, pages 88–97, 2002.
[38] D. Malan, T. Fulford-Jones, M. Welsh, and S. Moulton. CodeBlue: An Ad Hoc
Sensor Network Infrastructure for Emergency Medical Care. In Wearable and
Implantable Body Sensor Networks, 2004.
[39] G. Mathur, P. Desnoyers, D. Ganesan, and P. J. Shenoy. Ultra-low power data
storage for sensor networks. In IPSN, pages 374–381, 2006.
[40] Microchip. 32-bit PIC MCUs. http://www.microchip.com/en US/family/pic32,
June 2010.
[41] V. Papirla and C. Chakrabarti. Energy-aware error control coding for flash
memories. In Proceedings of the 46th Annual Design Automation Conference,
DAC ’09, pages 658–663, New York, NY, USA, 2009.
[42] P. Pavan, R. Bez, P. Olivo, and E. Zanoni. Flash memory cells - an overview.
Proceedings of the IEEE, 85(8):1248–1271, 1997.
[43] J. Polastre, R. Szewczyk, and D. Culler. Telos: enabling ultra-low power wireless
research. In Information Processing in Sensor Networks, Fourth International
Symposium on, pages 364 – 369, April 2005.
[44] I. S. Reed and G. Soloman. Polynomial codes over certain finite fields. Journal
of the Society for Industrial and Applied Mathematics, 8(2):300–304, 1960.
[45] R. L. Rivest and A. Shamir. How to Reuse a “Write-Once” Memory. Information
and Computation/information and Control, 55:1–19, 1982.
91
[46] M. Salajegheh, S. Clark, B. Ransford, K. Fu, and A. Juels. Cccp: secure re-
mote storage for computational rfids. In Proceedings of the 18th conference on
USENIX security symposium, SSYM’09, pages 215–230, Berkeley, CA, USA,
2009. USENIX Association.
[47] M. Salajegheh, Y. Wang, K. Fu, A. Jiang, and E. Learned-Miller. Exploit-
ing half-wits: smarter storage for low-power devices. In Proceedings of the
9th USENIX conference on File and stroage technologies, FAST’11, pages 4–
4, Berkeley, CA, USA, 2011. USENIX Association.
[48] A. P. Sample, D. J. Yeager, P. S. Powledge, A. V. Mamishev, and J. R. Smith.
Design of an rfid-based battery-free programmable sensing platform. Instru-
mentation and Measurement, IEEE Transactions on, 57(11):2608 –2615, Nov.
2008.
[49] M. Schwartz. Constant-weight gray codes for local rank modulation. In ISIT,
pages 869–873, 2010.
[50] M. Schwartz and I. Tamo. Optimal permutation anticodes with the infinity norm
via permanents of (0, 1)-matrices. J. Comb. Theory, Ser. A, 118(6):1761–1774,
2011.
[51] M. Shieh and S. Tsai. Decoding frequency permutation arrays under chebyshev
distance. IEEE Transactions on Information Theory, 56(11):5730–5737, 2010.
[52] V. Shnayder, B. R. Chen, K. Lorincz, T. R. F. F. Jones, and M. Welsh. Sensor
networks for medical care. In Proceedings of the 3rd international conference
on Embedded networked sensor systems, SenSys ’05, pages 314–314, New York,
NY, USA, 2005.
92
[53] Atmel AVR Solutions. ATmega128L. http://www.atmel.com/atmel/acrobat/
doc2467.pdf, July 2010.
[54] I. Tamo and M. Schwartz. Correcting limited-magnitude errors in the rank-
modulation scheme. IEEE Transactions on Information Theory, 56(6):2551–
2560, 2010.
[55] S. Tsai and M. Shieh. Decoding frequency permutation arrays under infinite
norm. In ISIT, pages 2713–2717, 2009.
[56] A. Viterbi. Error bounds for convolutional codes and an asymptotically optimum
decoding algorithm. Information Theory, IEEE Transactions on, 13(2):260 –
269, April 1967.
[57] Z. Wang, A. Jiang, and J. Bruck. On the capacity of bounded rank modulation
for flash memories. In ISIT, pages 1234–1238, 2009.
[58] D. Zeinalipour-Yazti, S. Lin, V. Kalogeraki, D. Gunopulos, and W. A. Najjar.
Microhash: An efficient index structure for flash-based sensor devices. In FAST,
pages 31–44, 2005.
[59] G. Zemor and G. D. Cohen. Error-correcting wom-codes. Information Theory,
IEEE Transactions on, 37(3):730 –734, May 1991.
[60] H. Zhou, A. Jiang, and J. Bruck. Error-correcting schemes with dynamic thresh-
olds in nonvolatile memories. In ISIT, pages 2109–2113, Aug. 2011.
[61] H. Zhou, A. Jiang, and J. Bruck. Nonuniform codes for correcting asymmetric
errors. In ISIT, pages 1011–1015, Aug. 2011.
[62] H. Zhou, A. Jiang, and J. Bruck. Systematic error-correcting codes for rank
modulation. In ISIT, pages 2978–2982, 2012.
93
