Polar-Coded Forward Error Correction for MLC NAND Flash Memory Polar FEC
  for NAND Flash Memory by Song, Haochuan et al.
ar
X
iv
:1
80
2.
04
57
6v
1 
 [c
s.A
R]
  1
3 F
eb
 20
18
SCIENCE CHINA
Information Sciences
. RESEARCH PAPER .
Polar-Coded Forward Error Correction for MLC
NAND Flash Memory
Haochuan Song1,2,3, Frankie Fu4, Cloud Zeng4, Jin Sha5, Zaichen Zhang2,3,
Xiaohu You3 & Chuan Zhang1,2,3*
1Lab of Efficient Architectures for Digital-communication and Signal-processing (LEADS);
2Quantum Information Center of Southeast University;
3National Mobile Communications Research Laboratory, Southeast University, Nanjing 211189, China;
4Lite-On Technology Corporation, Guangzhou 510000, China;
5School of Electronic Science and Engineering, Nanjing University, Nanjing 210046, China
Abstract With the ever-growing storage density, high-speed, and low-cost data access, flash memory has
inevitably become popular. Multi-level cell (MLC) NAND flash memory, which can well balance the data
density and memory stability, has occupied the largest market share of flash memory. With the aggressive
memory scaling, however, the reliability decays sharply owing to multiple interferences. Therefore, the control
system should be embedded with a suitable error correction code (ECC) to guarantee the data integrity
and accuracy. We proposed the pre-check scheme which is a multi-strategy polar code scheme to strike a
balance between reasonable frame error rate (FER) and decoding latency. Three decoders namely binary-
input, quantized-soft, and pure-soft decoders are embedded in this scheme. Since the calculation of soft log-
likelihood ratio (LLR) inputs needs multiple sensing operations and optional quantization boundaries, a 2-bit
quantized hard-decision decoder is proposed to outperform the hard-decoded LDPC bit-flipping decoder with
fewer sensing operations. We notice that polar codes have much lower computational complexity compared
to LDPC codes. The stepwise maximum mutual information (SMMI) scheme is also proposed to obtain
overlapped boundaries without exhausting search. The mapping scheme using Gray code is employed and
proved to achieve better raw error performance compared to other alternatives. Hardware architectures are
also given in this paper.
Keywords Polar coding, non-volatile memory, error correcting code, NAND, flash memory
Citation Song H, Fu F, Zeng C, Sha J, Zhang Z, You X, Zhang C. Polar FEC for NAND flash memory. Sci
China Inf Sci, for review
1 Introduction
Nowadays, the ever-developing digital technologies enable us to achieve extremely high communication
speed. However, traditional hard disk drive (HDD) can no longer meet the throughput and latency
requirements of most state-of-the-art application scenarios. To this end, NAND flash memory, which
is of lower access time, higher compactness, and less noise has become increasingly popular for storage
market [1, 2].
The past decade has witnessed the steady price fall of flash memory and is expecting further price-drop
in the future [3,4]. This trend has enabled solid state drive (SSD), which is mainly based on NAND flash
memory, to occupy a large share of both business and consumer markets.
*Corresponding author (email: chzhang@seu.edu.cn)
Song H, et al. Sci China Inf Sci 2
1.1 Challenges and motivation
As the required storage density increases, most NAND flashes consider to store 4 bits in a single cell
[5–8], which results in worse raw error performance. Therefore, powerful forward-error correction (FEC)
methods are required, and voluminous researches on conventional error correction code (ECC) schemes
for NAND flash memory emerge [9–13]. Recently, low-density parity-check (LDPC) codes have been
considered. To balance performance and complexity, hybrid scheme combining hard decoder and soft
decoder is always employed. However, the accepted soft decoders such as min-sum and belief-propagation
(BP) suffer from high complexity. Identifying an alternative code might serve as a solution.
Recently, polar codes [14] have shown capacity-achieving performance and reasonable complexity [15,
16]. Besides its good performance over binary-input discrete memoryless channels (B-DMCs), N -bit
polar code’s encoding and decoding complexity is as low as O(N logN), which is much lower than that of
LDPC code. Consequently, polar codes have been selected as the control channel code for the enhanced
mobile broadband (eMBB) scenario by 3GPP [17]. Inspired by few existing literature [18], this paper
devotes itself in proposing an efficient polar-coded forward error correction for multi-level cell (MLC)
NAND flash memory.
1.2 Contributions
To balance the performance and delay, this paper proposes a pre-check scheme based on polar code for
MLC NAND flash. Our main contributions are notably:
• We propose the pre-check scheme to arrange pure-soft, quantized-soft, and binary-input polar de-
coders in different life-stages of SSD.
• We have proved that polar code is a balanced code for which each codeword contains an equal
number of zero and one bits.
• We propose a well-designed hard-decision binary-input polar decoder. This decoder directly employs
1-bit hard results returned from the voltage detector and utilizes a single XOR gate to calculate
log-likelihood ratios (LLRs).
• We compare the complexities of binary-input SC polar decoder, SC polar decoder, binary-input bit-
flipping LDPC decoder, and layered BP polar decoder. Results show that binary-input SC polar decoder
has the lowest complexity given a target error performance. Besides, it also has better performance than
traditional hard-decision bit-flipping LDPC decoder.
• We propose a new quantized-soft polar decoder with refined boundary-defining scheme to improve
the empirical method.
• We clarify that Gray code is the optimal scheme to map 2 bits in 1 cell.
1.3 Notations
Let L and L designate likelihood ratio (LR) and LLR, respectively. Sets are denoted by uppercase
calligraphic letters as A. We indicate the probability density function (PDF) of a voltage distribution i
by p(i). The uppercase letter P designates probability cumulated by PDFs. The entropy function is H .
1.4 Paper outline
The remainder of this paper is organized as follows. Section 2 reviews background of NAND flash and
polar codes. Section 3 proposes the Gray mapping scheme and pre-check scheme. Three polar decoders
are discussed in this section too. In Section 4, hardware architecture of proposed binary-input decoder is
detailed. In Section 5, performance and complexity are compared for different decoders. Finally, Section
6 concludes this paper. Proof for Gray mapping scheme and the correction of previous work [11] are
presented in Appendix.
Song H, et al. Sci China Inf Sci 3
2 Background of MLC NAND flash memory and polar codes
2.1 Modeling of NAND flash memory
Floating gate transistors constitute the NAND flash memory [1]. Programming is an operation which
stepwise injects a certain quantity of charges to achieve a target voltage. Unavoidably influenced by
multiple interferences, voltages will turn into wide ranges, which results in overlapped regions.
The voltage distribution adopted in this work originates from [20]. Gaussian distribution is selected
for both convenience and accuracy of modeling [21].
For design purposes, each cell is initialized with 4 distributions away from each other. However, these
distributions gets closer with increasing program/erase (P/E) cycles and multiple interferences. Raw
error happens when overlapped regions exist.
2.2 Basics of polar codes
Proposed by E. Arıkan in [14], polar codes have the capability of achieving the symmetric capacity I(W )
of any given B-DMC W , so long as the code length N goes to infinity. To better understand polar codes,
LLR-based min-sum SC decoding algorithm [19] is introduced below.
In an arbitrary code with parameter (N,K,A, uAc), code length and information length are represented
by N and K. Source vector, the input vector of SC encoder, is denoted by uN1 , which consists of an
information part uA and a frozen part uAc . Note that frozen bits uAc are usually set to 0.
The LLR-based min-sum SC decoding algorithm is defined as
L
2i
N (y
N
1 , uˆ
2i−1
1 ) = (−1)uˆ2i−1L(i)N/2
(
y
N/2
1 , uˆ
2i−2
1,o ⊕ u2i−21,e
)
+ L
(i)
N/2
(
yNN/2+1, uˆ
2i−2
1,e
)
, (1)
L
2i−1
N (y
N
1 , uˆ
2i−1
1 ) ≃sgn
[
L
(i)
N/2
(
y
N/2
1 , uˆ
2i−2
1,o ⊕ u2i−21,e
)]
sgn
[
L
(i)
N/2
(
yNN/2+1, uˆ
2i−2
1,e
)]
·
min
[∣∣∣L(i)N/2 (yN/21 , uˆ2i−21,o ⊕ u2i−21,e )∣∣∣ , ∣∣∣L(i)N/2 (yNN/2+1, uˆ2i−21,e )∣∣∣] . (2)
The symbol L in (1) and (2) denotes LLR
L
i
N (y
N
1 , uˆ
i−1
1 ) , lnL
(i)
N (y
N
1 , uˆ
i−1
1 ), (3)
where L
(i)
N (y
N
1 , uˆ
i−1
1 ) is the LR and uˆi (i ∈ A) is estimated as
uˆi =
{
0, if L
(i)
N (y
N
1 , uˆ
i−1
1 ) > 0;
1, otherwise.
(4)
The hardware architecture of this algorithm is explained in [16].
3 Multi-strategy ECC scheme
In this section, we first demonstrate the adopted Gray mapping scheme. Then we propose the pre-check
scheme with multi-strategy ECC and 3 corresponding polar decoders.
3.1 Gray mapping and detection
The programmed symbols for each state of MLC NAND flash memory are shown in Figure 1. Note that a
raw error happens when a state is mistakenly considered for its neighboring states. Moreover, S1 and S2
have 2-bit difference under direct mapping. Hence, we should consider a mapping scheme that is capable
of reducing raw error bits. To this end, Gray mapping with minimum difference between adjacent states
is the optimal choice. The proof is shown in Appendix A.
Song H, et al. Sci China Inf Sci 4
Direct 
Mapping
Gray 
Mapping
P
ro
b
a
b
il
it
y
 D
e
n
si
ty
00
00
01
10
10
11
11
01
S0 S1 S2 S3
Figure 1 Modeling and Gray mapping scheme.
Polar 
Encoder
Gray 
Mapping
Bit Stream Codewords
Voltage 
Sensing
Pre-processing
Binary-input 
Decoder
Quantized-soft 
Decoder
Pure-soft 
Decoder
Polar Decoder
Error Correction Module
Figure 2 Error correction module in SSD controller.
Raw Error Rate
10-410-310-210-1100
BE
R
10-3
10-2
10-1
100
Decoding Capability (Gray Mapping vs Direct Mapping)
Direct Mapping
Gray Mapping
Figure 3 BER results of a (1024,512) hard-decision polar decoder
Disturbed Raw Data
Binary-input 
Decoder
Quantized-
soft Decoder
Pure-soft 
Decoder
Decoding Finished
Pre-check Scheme
Figure 4 Flow chart of pre-check scheme.
3.2 Control system
The overall architecture of error correction module is illustrated in Figure 2. Polar decoder will encode
the external bit stream into binary codewords. Then these codewords will be pairwise mapped to a certain
voltage in each cell. To recover the stored data, the detector first senses a cell several times and compare
the stored voltage to reference voltages. After that, pre-check scheme will determine which decoder should
be picked and then process the comparison results to LLRs to feed corresponding decoders.
Figure 4 illustrates the flow of each step in the pre-check scheme. Cell state will be checked at the
beginning to determine which decoder should be picked. When cell distortion appears slight, the binary-
input decoder is chosen owing to its low decoding latency. When distortion is getting worse, soft-decision
decoders should be selected to guarantee data integrity.
3.3 Pre-check scheme
This scheme aims to select an optimal decoder in accordance of the condition to meet the demand for
storage reliability.
Assume the mean values of four states are V , 2V , 3V , and 4V respectively and the standard deviation
is σ, which is identical for all distributions.
The cell state can be expressed as a set of equations as
p(i)(x) =
1√
2piσ
exp(− (x− (i+ 1)V )
2
2σ2
) (i = 0, 1, 2, 3). (5)
By solving (6)
p(i)(x) = p(i+1)(x) (i = 0, 1, 2), (6)
we can obtain intersections [R1, R2, R3] between 4 distributions which are
R1 =
3
2
V, R2 =
5
2
V, R3 =
7
2
V.
Song H, et al. Sci China Inf Sci 5
Since mean values are uniformly distributed and standard deviations are identical, reference voltage Ri
is the mid-value between µ(i−1) and µ(i). A raw error will occur when the sensed voltage gets across
the reference voltage. For example, if a voltage of state p(0) is greater than R1, it is more likely to
be considered as a voltage in p(1) (i.e., an error happens). Therefore, we can calculate the raw error
probability for each overlapped region by
PE =
{ ∫ Ri
−∞ p
(i)(x)dx = 0.1995
√
2pi[1− erf(
√
2V
4σ )] leftmost and rightmost distributions;∫ Ri
−∞ 2p
(i)(x)dx = 0.3990
√
2pi[1− erf(
√
2V
4σ )] middle distributions.
(7)
PE is a function of variable
√
V
σ , where V is the distance between two adjacent distributions and σ is the
standard deviation. In NAND flash memory, the values of
√
V and σ change over time due to voltage
shifting and cell distortion. Since PE is monotonically decreasing with
√
V
σ and
√
V
σ is decreasing over
time (the experiment in [21] has shown that the signal-to-noise ratio (SNR) in the NAND flash memory
degrades about 0.13dB per 1k P/E cycles), the value of PE is increasing.
With numerical PE , we can set several thresholds to adjust the decoding scheme to satisfy performance
requirements of the system.
3.4 Pure-soft decoder
The sensed voltage needs to be converted into digital LLR to feed the pure-soft decoder.
Given the model of NAND flash memory in Section 2, the whole voltage range can be described with
4 Gaussian distributions indicated by p(0)(x), p(1)(x), p(2)(x), and p(3)(x). To obtain the definition of
LLR in NAND flash memory, there are some basic ideas that need to be clarified.
Lemma 1.
Polar code is a balanced code for which each codeword contains an equal number of
zero and one bits.
Proof. The codeword xN1 and the i-th element of x
(i)
N are constructed as
xN1 = u
N
1 ·GN , x(i)N = uN1 G(i)N ,
where uN1 is the source information, GN is the generator matrix and G
(i)
N denotes the i-th column of GN .
With the property of multiplication in GF (2), whether x
(i)
N is 0 or 1 is only determined by the number
of 1’s in uN1 whose corresponding places in G
(i)
N are 1. For example, if N = 4, i = 2, then we have
x
(2)
4 = u
4
1 · [0 0 1 1]T = u3 + u4. (8)
Since some elements in GN are 0, only a part of elements in u
N
1 participate in the calculation. In the
example of (8), only u3 and u4 are concerned.
Assume that the number of 1’s in G
(i)
N is Gi. The probability for x
(i)
N being 0 or 1 can be denoted by
P (x
(i)
N
= 0) = C0Gi(
1
2
)0(
1
2
)Gi−0 + C2Gi(
1
2
)2(
1
2
)Gi−2 + . . . ; P (x
(i)
N
= 1) = C1Gi(
1
2
)1(
1
2
)Gi−1 +C3Gi(
1
2
)2(
1
2
)Gi−3 + . . .
∵ C0n + C
2
n + . . . = C
1
n + C
3
n + . . . = 2
n−1 ∴ P (x(i)N = 0) = P (x
(i)
N = 1) =
1
2
.
Lemma 1 is the foundation for LLR calculation in NAND flash memory. This a priori property guarantees
the usage of Bayes Law within LLR calculation for all the polar decoders discussed in this paper.
Lemma 2. For any stored bit bi, its LLR is defined as
L(bi) = log
∑
k∈Oi
p(k)(Vd)∑
k∈Zi
p(k)(Vd)
, (9)
where p(k) denotes the k-th PDF of voltage distribution, Vd denotes the sensed voltage, Oi contains
distributions where bi = 1 and Zi contains distributions where bi = 0.
Song H, et al. Sci China Inf Sci 6
p
(0)
(x) p
(1)
(x) p
(2)
(x) p
(3)
(x)Vd Voltage
Zi
Oi
0 0 1 1
Figure 5 An example for soft LLR calculation based on a
specific division of Oi and Zi.
(0)
lB
(0)
rB
(1)
lB
(1)
rB
(2)
lB
(2)
rB
Overlapped threshold region Overlapped boundary
p
(0)
(x) p
(1)
(x) p
(2)(x) p
(3)(x)
Figure 6 Non-uniform sensing operations [11].
Proof. According to the definition, the LLR of bi should be denoted by
L(bi) = log
p(bi = 1|Vd)
p(bi = 0|Vd) . (10)
However, considering the difficulty of directly acquiring the a posteriori probability p(bi|Vd), it is simple
to transform (10) into the form of likelihood function according to the Bayes theorem as
L(bi) = log
p(bi = 1|Vd)
p(bi = 0|Vd) = log
p(Vd|bi = 1)p(bi = 1)
p(Vd|bi = 0)p(bi = 0) = log
p(Vd|bi = 1)
p(Vd|bi = 0) , (11)
where p(bi=1)p(bi=0) = 1, according to Lemma 1, and p(Vd|bi) is the summation of PDFs when bi is settled.
Therefore, the LLR of bi is exactly the form in (9). Note that Oi and Zi are different according to the
adopted mapping scheme. An example is shown in Figure 5.
3.5 Quantized-soft decoder
The LLR calculation mentioned in Section 3.4 can achieve the best performance of error correction.
However, it requires an accurate value of the sensed voltage, which is unrealistic in circuits. Therefore, a
proper scheme which can balance the numerical accuracy and sensing latency is highly needed.
3.5.1 Problems in quantized-soft decoder
The main constrain is that the detector can only return a comparison result between the sensed voltage
and pre-set references which we call “hard result”, containing only 1-bit information.
This raises two problems. The first one is obtaining proper references (or boundaries). The definition
of overlapped regions is crucial to calculate LLRs.
Another problem is the number of sensing operations. Considering that LLR contains information
more than 1 bit, we need multiple sensing operations to convert hard results into LLR. An example is
shown in Figure 6 [11].
3.5.2 Boundaries defined by constant ratio
In our previous work [22], we adopted the boundary-defining scheme that was proposed in [10] and
expanded in [11]. In this section, we show the basic idea in [10] and the re-derived quadratic equation
set which differs from the equations in [11].
B
(k)
l and B
(k)
r are 2 boundaries restricting the kth region, and R is a pre-settled ratio. The relation
among B
(k)
l , B
(k)
r and R is as
p(k)(Bl
(k))
p(k+1)(Bl
(k))
=
p(k+1)(Br
(k))
p(k)(Br
(k))
= R, (12)
where p(k)(x) is the kth voltage distribution. Under Gaussian estimation, this calculation is significantly
simplified compare with [10].
Song H, et al. Sci China Inf Sci 7
00 00
10 10
p11
11
01
ea
eb
ec
01
11
e1a
p12
e1b
p13
p14
e1c
p21
p23
p22
p24
e2a
e2b
e2c
p31
p32
p33
p34
e3c
e3b
e3a
p44
p41p42
p43
e4c
e4b
e4a
Figure 7 4-input, 7 output
MLC model for MMI scheme.
q1 q2 q3 q4 q5 q6
0
0
1
0
0
1
1
1
LSB Related
MSB Related
Figure 8 Different references for the LSB and the MSB.
Voltage
0 0 11
Figure 9 Quantization boundaries for the LSB in MLC.
0 0
1 1
e
p1
e0
p2
p3
e1
p4
Figure 10 Channel model for the LSB.
Let σ2k and µk be the variation and mean value of p
(k), then we have


2σ2
k
σ
2
k+1 log(
σk
σk+1
R) = −σ2k+1(Bl(k) − µk)2 + σ2k(Bl(k) − µk+1)2,
2σ2
k
σ
2
k+1 log(
σk+1
σk
R) = −σ2k(Br(k) − µk+1)2 + σ2k+1(Br(k) − µk)2.
(13)
Eq. (13) is derived from (12). The work of [11] does not show the derivation, whereas their result is
slightly wrong. We add red corrections and further derive it in Appendix B.
3.5.3 Boundaries defined by stepwise mutual information
The boundary-defining scheme of constant ratio mentioned in Section 3.5.2 is effective to locate the
overlapped regions. However, there still remains an unsolved problem that the value of R is mostly
determined by empirical evidence.
A different scheme called maximum mutual information (MMI) is proposed in [23] which aims to set
quantization boundaries that maximize the mutual information. MMI quantizes the whole voltage range
into (M + 1) regions with M sensing operations.
However, MMI is a general case instead of an optimal choice for boundary selection because the
mutual information defined in [23] is calculated for each region instead of original bits, whereas LLRs are
calculated bitwise. In this work, we calculate mutual information for the most significant bit (MSB) and
the least significant bit (LSB) separately, which we call stepwise mutual information (SMMI).
Figure 8 shows the relationship between reference voltages and mapped bits. It is obvious that the
judgement of the LSB only relates to 2 quantization boundaries q3 and q4. Similarly, q1, q2, q5, and q6
are responsible for sensing operation of the MSB. We take the LSB as an example to demonstrate the
channel and the entropy calculation under SMMI strategy.
In Figure 9, the whole range is separated into 3 quantized regions, hence this quantization model is
equivalent to a 2-input, 3-output channel model with X ∈ {0, 1} and Y ∈ {0, e, 1} given in Figure 10,
which is similar to the model of single-level cell (SLC) NAND flash memory with 2 reads in [24].
According to Lemma 1, X sends 0 and 1 under equal probability. Therefore, the mutual information
Song H, et al. Sci China Inf Sci 8
q1 q2 q3 q4 q5 q6
LSB Soft Decision Boundary
H1 H2 H3
MSB Soft Decision Boundary
LSB Hard Decision Boundary
MSB Hard Decision Boundary
Figure 11 9 reference in practical scheme.
0 0
1 1
e1
p00
e01
p01
e12
e2
e02
p10
p11
e11
Figure 12 Channel models for the LSB in practical scheme.
0 0
1 1
e1
e2
p10
p11
e3
e4
p00
p01
e01
e02
e03
e04
e11
e12
e13
e14
Figure 13 Channel models for the MSB in practical scheme.
I between X and Y is calculated as
I(X ;Y ) = H(Y )−H(Y |X) = H(p1 + p3
2
,
e0 + e1
2
,
p2 + p4
2
)− 1
2
H(p1, e0, p2)− 1
2
H(p3, e1, p4). (14)
For a settled voltage distribution, the mutual information between X and Y can be numerically max-
imized to obtain desired boundaries q3 and q4 that yield the SMMI.
3.5.4 Practical SMMI boundary calculation
In the MMI example shown above, a 4 input, 7 output MLC model shown in Figure 7 was adopted for
illustration purposes. However, there are at least 3 sensing operations in 1 overlapped region in a practical
control system as demonstrated in Figure 11, where the intersections of two distributions in the middle
are called “hard-decision boundaries” and {qi, i = 1, 2, ...6} mentioned before are called “soft-decision
boundaries”. Channel models for the LSB and the MSB in this scheme are shown in Figure 12 and
Figure 13.
The mutual information for the LSB in this case is calculated as
I(X;Y ) = H(Y )−H(Y |X)
= H(
p00 + p10
2
,
e01 + e11
2
,
e02 + e12
2
,
p01 + p11
2
)−
1
2
H(p00, e01, e02, p01)−
1
2
H(p10, e11, e12, p11),
(15)
and the mutual information for the MSB is calculated as
I(X ;Y ) =H(Y )−H(Y |X)
=H(
p00 + p10
2
,
e01 + e11
2
,
e02 + e12
2
,
e03 + e13
2
,
e04 + e14
2
,
p01 + p11
2
)
− 1
2
H(p00, e01, e02, e03, e04, p01)− 1
2
H(p10, e11, e12, e13, e14, p11).
(16)
3.5.5 LLR calculation
According to (9), quantized LLRs are calculated as follows:
L
LSB
i = log
∫
Ri
p(2)(x) + p(3)(x)dx∫
Ri
p(0)(x) + p(1)(x)dx
, LMSBi = log
∫
Ri
p(1)(x) + p(2)(x)dx∫
Ri
p(0)(x) + p(3)(x)dx
. (17)
Song H, et al. Sci China Inf Sci 9
p
(0)
p
(1)
p
(2)
p
(3)
Voltage
Probability Density
V0 V1 V2
0
0
1
0
1
1
0
1
MSB
LSB
Compared to V1 to decide 
the LSB at first
Figure 14 Detection in binary-input decoder.
L
LSB
i and L
MSB
i designate LLRs of the LSB and the MSB of the quantization region Ri. We take the
LSB as an example to further explain (17).
Under Gray mapping scheme in Section 3.1 (illustrated in Figure 14), p(2)(x) and p(3)(x) are 2 dis-
tributions where LSB= 1. Meanwhile, p(0)(x) and p(1)(x) are distributions where LSB= 0. Under this
condition, the numerator in (17) which contains the integral with respect to x of PDF (p(2)(x) + p(3)(x))
over the interval Ri represents the probability for LSB= 1. In this way, the denominator is the probability
where LSB= 0.
Under Gaussian estimation, Q-function can easily calculate desired LLRs as
L
LSB
i = log
∑
j=2,3
Q(
qr−µj
σj
)−Q( ql−µjσj )∑
k=0,1
Q( qr−µkσk )−Q(
ql−µk
σk
)
, LMSBi = log
∑
j=1,2
Q(
qr−µj
σj
)−Q( ql−µjσj )∑
k=0,3
Q( qr−µkσk )−Q(
ql−µk
σk
)
. (18)
3.6 Binary-input decoder
A sensing strategy is shown in Figure 14. Three reference voltages are denoted by V0, V1, and V2 which
separate 4 voltage distributions. The detector first compare current voltage with V1 to decide the LSB
and then with V0 or V2 to decide the MSB. Detailed description can be found in [22].
According to (4), uˆ is judged by the sign bit of LLR. Therefore, hard results can be fully utilized since
they can represent the sign bit of LLR. In other words, they can be transformed into a special form of
quantized LLR consisting of only a sign bit, for which it is called “binary-input decoder”.
Magnitude of LLR is not concerned in this scenario and only sign bits will participate in the subsequent
calculation, which makes it possible to apply simple bit operations in hardware without adder-subtractors
in traditional processing element (PE) design [15]. This design is hardware-friendly and will be further
discussed in Section 4.
4 Architecture of proposed binary-input decoders
4.1 Two’s complement analysis
According to (1), Type I PE will result in 0 if LLRs are quantized to ±1. In other words, data transferred
between entities in different levels are not completely in binary form and hence can not be represented
by a single bit. Therefore, 2-bit 2’s complement is adopted for simplicity of logical functions and demand
of indicating 3 possible LLRs {0,±1}.
4.2 Input and output analysis
4.2.1 Type I PE
According to [15], universal Type I PE based on min-sum SC algorithm is a series of half or full adder-
subtractors. Calculation of LLRs in (1) is significantly simplified under 2-bit quantization.
Unlike universal Type I PE calculation with arbitrary inputs, binary PE has a limited input set
I = {−1, 0,+1} which exhaustively lists all possible results. Suppose X and Y are two 2-bit operands,
Song H, et al. Sci China Inf Sci 10
Table 1 Results of Type I PE.
u X Y Z
0 -1 -1 −2→ −1
0 -1 0 -1
0 -1 1 0
0 0 -1 -1
0 0 0 0
0 0 1 1
0 1 -1 0
0 1 0 1
0 1 1 2→ 1
1 -1 -1 0
1 -1 0 1
1 -1 1 2→ 1
1 0 -1 -1
1 0 0 0
1 0 1 1
1 1 -1 −2→ −1
1 1 0 -1
1 1 1 0
Table 2 Corresponding 2’s complement.
u X Y Z
0 11 11 11
0 11 00 11
0 11 01 00
0 00 11 11
0 00 00 00
0 00 01 01
0 01 11 00
0 01 00 01
0 01 01 01
1 11 11 00
1 11 00 01
1 11 01 01
1 00 11 11
1 00 00 00
1 00 01 01
1 01 11 11
1 01 00 11
1 01 01 00
u is the last decoded bit which chooses the calculation pattern, and Z is the output. The mathematical
function of Type I PE is
Z =
{
X + Y, u = 0,
−X + Y, u = 1. (19)
Note that the results of (19) may be ±2 and will be quantized to ±1 for simplicity of calculation.
Therefore, all the possible results are listed in Table 1 and we can directly focus on the input and
output by transforming Table 1 into 2’s complement as shown in Table 2 instead of messing with those
intermediate results like ±1 or 0. In particular, we can separate the MSB and the LSB of output Z and
treat this PE as a combinational logic circuit with a 5-bit input (u,XM , XL, YM , YL) and a 2-bit output
(ZM , ZL). Therefore, Table 2 is the truth table for this logic circuit which enables us to simply build
corresponding logic functions.
4.2.2 Type II PE
The architecture of Type II PE is more straightforward. With binary input, (2) can be pruned to
L
(i)
N = L
( i+1
2
)
N/2 (y
N
2
1 , u
i−1
1,o ⊕ ui−11,e ) · L
( i+1
2
)
N/2 (y
N
N
2
+1
, ui−11,e ), (20)
without obtaining the minimum of 2 inputs since their absolute values have already been quantized to 1.
Considering the property of multiplication, the output will be 0 once there exists a 0 in 2 inputs.
Therefore, hardware architecture design can be simplified by independently considering inputs ±1. Note
that both 2’s complements of ±1 have the same LSB as 1 and the outputs can only be ±1, which means
the LSB will constantly be 1. Therefore, we can extract the MSB to analyze the input and output (I/O).
I/O analysis and the corresponding 2’s complements have been shown in Table 3 and 4 by adopting the
method mentioned in Section 4.2.1.
Song H, et al. Sci China Inf Sci 11
Table 3 Results of Type II PE.
X Y Z
-1 -1 1
-1 1 -1
1 1 1
1 -1 -1
Table 4 2’s complement of the MSB.
XM YM ZM
1 1 0
1 0 1
0 0 0
0 1 1
X[MSB+LSB]
Y[MSB+LSB]
Z[MSB+LSB]
u
L M
M
M
M
L
L
M
M
M
L
L
L
L
M
M
M:MSB
L : LSB
ZL
ZM
Figure 15 Proposed architecture of binary Type I PE.
X[MSB+LSB]
Y[MSB+LSB]
Z[MSB+LSB]
M
M
L
L
0
1
Figure 16 Proposed architecture of binary Type II PE.
We can conclude from Table 4 that the calculation of the MSB of Type II PE using 2’s complement
equals to an XOR operation. Therefore Type II PE can be pruned to an XOR operation in the MSB and
a fixed 1 in the LSB.
4.3 Design of binary PEs
4.3.1 Design of binary Type I PE
Binary Type I PE can be treated as a combinational logic circuit based on the analysis in Table 2.
In this part, variable settings in Section 4.2.1 are adopted and therefore X and Y are two binary
input operands, the last-decoded bit u is a selection bit and the output is represented by Z. With 2-bit
quantization for X,Y and Z, binary Type I PE consists of 5 inputs (u,XM , XL, YM , YL) and 2 outputs
(ZM , ZL). The logical functions are listed as follows:
• u = 0
ZM = X
′
LYM +X
′
MYL +XMYM ,
ZL = X
′
LYL +X
′
MYMYL +XLY
′
L +XMYM ;
(21)
• u = 1
ZM = X
′
MXLY
′
L +XMY
′
M ,
ZL = X
′
LYL +X
′
MYM +XLY
′
L +XMY
′
M .
(22)
The gate-level circuit diagram of binary Type I PE is depicted in Figure 15.
4.3.2 Design of binary Type II PE
The core of Type II PE design can be concluded into 3 key points based on the aforementioned I/O
analysis.
1) MSB of Type II PE’s output can be simply calculated by an XOR operation under 2-bit 2’s
complement;
2) The LSB of Type II PE’s output is fixed to 1;
3) The output will be 0 once there exists a 0 in the inputs.
Architecture of binary Type II PE is shown in Figure 16.
Song H, et al. Sci China Inf Sci 12
Raw Error Rate
10-410-310-210-1100
F
E
R
10-4
10-3
10-2
10-1
100
Decoding Capability (FER vs Error Rate)
Polar Binary
Polar QSoft
Polar QSoft 6 sensing
LDPC Bitflipping
Figure 17 FER performance of a (8192, 7168) polar code
and a (8192, 7168) QC-LDPC code.
Decoding Schemes
#
 o
f 
A
d
d
it
io
n
s
105
106
[SC polar]
[binary SC]
[LBP]
[Bit Flipping]
Figure 18 Comparison of decoding complexity between
different algorithms.
5 Performance assessment
In this section, we provide the error performance of different codes and discuss their complexities.
5.1 Settings of simulation
We adopt a (8192, 7168) polar codes using different inputs under MLC NAND flash memory channels.
Besides, a (8192, 7168) QC-LDPC code using bit-flipping decoding algorithm is also used for comparison.
The selection of information length if based on [25, 26].
We adopt a 2-bit/cell MLC NAND flash memory model [20] as the simulation environment. It is
assumed that the mean value of Gaussian distribution for erase state which represents 00 is 0 volt and
the target voltages in programming states are 3.25 volt, 4.55 volt, and 6.5 volt for symbols 10, 11, and
01, respectively. Standard deviations for each state are set to 2σ, σ, σ, and 1.4σ, where σ changes over
time due to multiple interferences. Hard-decision boundaries in binary decoder are the 3 intersections
between 4 Gaussian distributions, and SMMI is applied to obtain other soft-decision boundaries.
The binary-input decoder employees 2-bit quantized LLR. Floating-point LLR is used in quantized-soft
decoders. The maximum iteration is set to 15 in hard-decision bit-flipping LDPC decoding.
5.2 Simulation
The result is based on FER versus raw error probability and the design of x−axis is explained as follows.
The MLC flash memory is modeled as 4 Gaussian distributions and has 3 hard-decision boundaries. In
hard decoding, a raw error happens once the voltage in a Gaussian distribution shifts to its adjacent
distributions (i.e., crosses the left or right hard-decision boundary). Under Gaussian distribution, the
raw error probability P can be calculated by Q-function.
In Figure 17, binary-input polar decoder obviously outperforms the hard-decision bit-flipping LDPC
decoder. With the increment of sensing operations, quantized-soft polar codes is capable of correcting
more error bits than binary-input polar code which assures the data stability of the whole system.
5.3 Complexity analysis
5.3.1 Decoding of polar code
The complexity of full size SC is N log2N , where N is the code length [14]. For LLR-based min-sum SC
decoding, the decoder complexity is:
• Type I PEs: (N log2N)/2 additions;
Song H, et al. Sci China Inf Sci 13
• Type II PEs: (N log2N)/2 comparisons/selection (equivalent of addition) and (N log2N)/2 sign bit
multiplication (equivalent of XOR);
Overall, the decoding complexity is (N log2N) additions (XOR is negligible compared to addition).
For binary-input SC decoding, LLRs are quantized to ±1, which means the comparison in Type II
PEs is no longer needed. Therefore, the overall decoding complexity is N log2N/2 2-bit additions and
N log2N/2 XOR operations.
5.3.2 Decoding of LDPC code
Among various LDPC decoding algorithms, min-sum algorithm is the most widely used method [9–12].
In this section, we adopt the complexity analysis of LBP decoding with min-sum algorithm in [27]. In
this section, code length and information length are represented by N and K. Column and row weight
are denoted by dv and dc.
For LBP decoding, the complexity in one iteration is:
• Check node processing: Ndv + 2(N −K) additions and (2dc − 3)(N −K) + 2(N −K) comparisons
(equivalent of addition);
• Variable node processing: Ndv additions;
Overall, the decoding complexity is (N−K)(2dc+1)+2Ndv additions per iteration. According to [27], the
LBP decoding converges within 15 to 20 iterations (denoted by I) and average column weight d¯v = 3.9375
(when code rate R = 0.75). To this end, log2N is obviously smaller than dvI when N is less than 8K
byte in storage system. Therefore the computational complexity of SC polar decoding is much lower than
LDPC LBP decoding. The complexity of standard BP decoding with min-sum algorithm is similar to
this result.
For hard-decision bit-flipping decoding, the complexity in one iteration is:
• Syndrome calculation: (N −K)dc additions and multiplication in GF (2);
• Number of unsatisfied parity checks: Edc additions where E is number of 1’s in the syndrome;
• (N − 1) comparisons (equivalent of additions) to obtain the largest number of unsatisfied parity
checks.
The complexity for bit-flipping decoding is mainly determined by the (N − 1) comparisons. Hence the
overall decoding complexity is I(N − 1) additions in the worst case. In [9], the iteration of modified
gradient descent bit-flipping (MGDBF) decoder is set to 30.
5.3.3 Comparison of decoding complexity
When setting code length N as 8192, information length K as 7372, iteration I as 20, column weight
dv as 4, and row weight dc as 30, the the decoding complexity is compared in Figure 18. It is obvious
that the proposed binary-input SC decoder has the lowest complexity. Moreover, polar codes using SC
algorithm have much lower computational complexity compared to traditional LDPC codes using LBP
decoding.
6 Conclusion
This paper demonstrates that polar coded scheme holds great promise for data stability of MLC NAND
flash memory. The proposed multi-strategy pre-check scheme can well balance the error performance
and decoding latency. The binary-input decoder is also proposed to relieve the quantization burden of
quantized-soft decoder, and lower the computational complexity compared to LDPC codes. Third, a
new method named SMMI is proposed to calculate quantization boundaries without boundary searching.
Finally, the Gray code has been proved the optimal mapping scheme in our system.
Conflict of interest The authors declare that they have no conflict of interest.
Song H, et al. Sci China Inf Sci 14
References
1 S. Li, T. Zhang. Improving multi-level NAND flash memory storage reliability using concatenated bch-tcm coding.
IEEE Trans. VLSI Syst., 2010, vol. 18, no. 10, pp. 1412–1420,.
2 J. Kim, W. Sung. Low-energy error correction of NAND flash memory thourgh soft-decision decoding. EURASIP
Journal on Advances in Signal Processing, 2012, vol. 2012.
3 LM. Grupp, JD. Davis, S. Swanson. The bleak future of NAND flash memory. In: Proceedings of the 10th USENIX
conference on File and Storage Technologies, 2012.
4 J. Bellorado, E. Yaakobi. Signal Processing and Coding for Non-Volatile Memories.
faculty.cse.tamu.edu/ajiang/NVMW Tutorial.eps, 2013.
5 G. Marotta, A. Macerola, A. DAlessandro, et al. A 3bit/cell 32Gb NAND flash memory at 34nm with 6MB/s program
throughput and with dynamic 2b/cell blocks configuration mode for a program throughput increase up to 13MB/s.
In: Solid-State Circuits Conference Digest of Technical Papers (ISSCC), 2010.
6 Y. Li, S. Lee, Y. Fong, et al. A 16 Gb 3-bit per cell (X3) NAND flash memory on 56 nm technology with 8 MB/s
write rate. IEEE Journal of Solid-State Circuits, 2009, vol. 44(1), pp. 195-C207.
7 N. Shibata, H Maejima, K. Isobe K, et al. A 70 nm 16 Gb 16-level-cell NAND flash memory. IEEE Journal of
Solid-State Circuits, 2008, vol. 43(4), pp. 929C-937.
8 C. Trinh, N. Shibata, T. Nakano, et al. A 5.6 MB/s 64Gb 4b/cell NAND flash memory in 43nm CMOS. In: Solid-State
Circuits Conference-Digest of Technical Papers(ISSCC), 2009.
9 K. C. Ho, C. L. Chen, Y. C. Liao, H. C. Chang, and C. Y. Lee, A 3.46 gb/s (9141, 8224) LDPC-based ECC scheme
and on-line channel estimation for solid-state drive applications. In: Proceedings of IEEE Int. Symp. Circuits and
Systems (ISCAS), Lisbon, Portugal, 2015.
10 G. Dong, N. Xie, and T. Zhang, On the use of soft-decision error-correction codes in NAND flash memory. IEEE
Trans. Circuits Syst. I, 2011.
11 J. Kim, D.-h. Lee, W. Sung. Performance of rate 0.96 (68254, 65536) EG-LDPC code for NAND flash memory error
correction. In: Proceedings of IEEE International Conference on Communications (ICC), Ottawa, Canada, 2012.
12 Z. Cui, Z. Wang, X. Huang. Multilevel error correction scheme for MLC flash memory. In: IEEE International
Symposium on Circuits and Systems (ISCAS), 2014.
13 B. Chen, X. Zhang, Z. Wang. Error correction for multi-level NAND flash memory using Reed-Solomon codes. In:
Proceedings of IEEE Workshop on Signal Processing Systems (SiPS), 2008.
14 E. Arıkan. Channel polarization: A method for constructing capacity-achieving codes for symmetric binary-input
memoryless channels. IEEE Trans. Inf. Theory, 2009, vol. 55, no. 7, pp. 3051–3073.
15 C. Zhang, B. Yuan, K. K. Parhi. Reduced-latency sc polar decoder architectures. In: Proceedings of IEEE International
Conference on Communications (ICC), 2012, pp. 3471–3475.
16 C. Zhang, K. K. Parhi. Low-latency sequential and overlapped architectures for successive cancellation polar decoder.
IEEE Trans. Signal Process., 2013, vol. 61, no. 10, pp. 2429–2441.
17 MCC Support. Final Report of 3GPP TSG RAN WG1 #87. In: 3GPP TSG WG1 Meeting #87,
www.3gpp.org/ftp/tsg ran/WG1 RL1/TSGR1 87/Report/, 2016.
18 Y. Li, H. Alhussien, E. Haratsch, et al. A study of polar codes for MLC NAND flash memories. International
Conference on Computing, NETWORKING and Communications, 2015.
19 C. Leroux, I. Tal, A. Vardy, et al. Hardware architectures for successive cancellation decoding of polar codes. In:
Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Prague, Czech,
2011, pp. 1665–1668.
20 G. Atwood, A. Fazio, D. Mills, B. Reaves. Intel strataflash memory technology overview. Intel Technology Journal,
1997.
21 Y. Cai, E. F. Haratsch, O. Mutlu K. Mai. Error patterns in MLC nand flash memory: Measurement, characterization,
and analysis. In: Proceedings of Conference on Design, Automation and Test in Europe 2012, pp. 521–526.
22 H. Song, C. Zhang, S. Zhang, et al. Polar code-based error correction code scheme for NAND flash memory applications.
In: Proceedings of International Conference on Wireless Communications and Signal Processing (WCSP), 2016.
23 J. Wang, T. Courtade, H. Shankar, et al. Soft information for LDPC decoding in flash: mutual-information optimized
quantization. In: Proceedings of IEEE Global Telecommunications Conference (GLOBECOM), 2011.
24 J. Wang, G. Dong, T. Zhang, et al. Mutual-information optimized quantization for LDPC decoding of accurately
modeled flash data. arXiv:1202.1325, 2012.
25 N. Mielke, T. Marquart, N. Wu, et al. Bit error rate in nand flash memories. In: Proceedings of IEEE International
Reliability Physics Symposium (IRPS), 2008, pp. 9C19.
26 K. Takeuchi. Novel co-design of nand flash memory and nand flash controller circuits for sub-30 nm low-power high-
speed solid-state drives (ssd). IEEE Journal of Solid-State Circuits, vol. 44, no. 4, pp. 1227C1234, 2009.
27 Y. Blankenship, S. Kuffner. LDPC decoding for 802.22 standard. IEEE P802.22, 2007.
28 Q. Xu, Z. Pan, N. Liu, et al. A complexity-reduced fast successive cancellation list decoder for polar codes. Science
China Information Sciences, 2018, vol. 61, no.2: pp. 022309.
29 Z. Chen, L. Yin, Y. Pei, et al. CodeHop: Physical layer error correction and encryption with LDPC-based code
hopping. Science China Information Sciences, 2016, vol. 59, no.10, pp.102309.
Song H, et al. Sci China Inf Sci 15
Table A1 24 Different Schemes and Number of Changes.
DCBA 3 DABC 3 CBDA 4 BCDA 3 BADC 3 ABCD 3
DCAB 4 DACB 4 CBAD 3 BCAD 4 BACD 4 ABDC 4
DBCA 5 CDBA 4 CABD 5 BDCA 5 ACBD 5 ADBC 4
DBAC 5 CDAB 3 CADB 5 BDAC 5 ACDB 5 ADCB 3
Table A2 Example of Gray Mapping Scheme.
MSB 0 1 1 0
LSB 0 0 1 1
Appendix A Proof for Gray code mapping scheme
Lemma 3. Gray code can achieve best coding gain compared to any other mapping schemes.
Proof. As mentioned in Section 3.1, we have noticed that most raw errors happen when a voltage is mistaken for its
adjacent levels. Therefore, we can focus on overlapped regions when talking about mapping schemes. For the convenience
of discussion, we use 4 column vectors to indicate 4 different states in a 2-bit memory cell namely A =
(
0
0
)
, B =
(
1
0
)
,
C =
(
1
1
)
, D =
(
0
1
)
. By making a full permutation for these states, we get 24 different schemes as shown in Table A1.
Each scheme has 2 rows and if 1 bit is different from its adjacent bits in a row, we will call it a change. For example,
the combination ABCD indicates a mapping scheme shown in Table A2. In this case, the number of changes is 3. The
statistical results are shown in Table A1 . We can conclude by enumeration that the number of changes is 3 if and only if
the mapping scheme is in Gray code. The other alternatives’ number of changes are 4 or 5.
We have already known that raw errors usually happen in overlapped regions. For a 2-bit cell, there remains 3 overlapped
regions. Assume the raw error probability for each region is P1,P2 and P3 respectively. Therefore, the expectation of raw
errors NG of mapping schemes using Gray code is
NG = P1 + P2 + P3, (A1)
whereas the expectation of all other alternatives is
NA = αP1 + βP2 + γP3 (α+ β + γ = 4 or 5, αβγ 6= 0). (A2)
NA is absolutely bigger than NG. In other words, we can tell that Gray code is the best choice for mapping schemes.
Appendix B Calculation of boundaries in quantized-soft decoder
In Section 3.5 we have mentioned that the derivation for (13) is wrong in [11]. This section will re-derive (12).
Since pk(x) is a Gaussian distribution, p(k)(B
(k)
l
) and p(k+1)(B
(k)
l
) are

p(k)(B
(k)
l
) =
1
σk
√
2pi
exp(−
(B
(k)
l
− µk)
2
2σ2
k
),
p(k+1)(B
(k)
l
) =
1
σk+1
√
2pi
exp(−
(B
(k)
l
− µk+1)
2
2σ2
k+1
).
Therefore, the fraction in the left will be expanded below
σk
σk+1
R = exp(−
(B
(k)
l
− µk)
2
2σ2
k
+
(B
(k)
l
− µk+1)
2
2σ2
k+1
).
Take the log of both sides of the equation, we get
log(
σk
σk+1
R) = −
(B
(k)
l
− µk)
2
2σ2
k
+
(B
(k)
l
− µk+1)
2
2σ2
k+1
.
Multiply σ2
k
σ2
k+1 on both sides to remove the denominator, then we have
2σ2
k
σ2
k+1 log(
σk
σk+1
R) = −σ2
k+1(Bl
(k) − µk)2 + σ2k(Bl
(k) − µk+1)2,
as shown in (13).
The other fraction can be expanded in the same way.
