Concatenated LDPC-TCM coding for reliable storage in multi-level flash memories by Xu, Q. et al.
Xu, Q., Gong, P. & Chen, T. (2014). Concatenated LDPC-TCM coding for reliable storage in multi-
level flash memories. 2014 9th International Symposium on Communication Systems, Networks 
and Digital Signal Processing (CSNDSP), pp. 166-170. doi: 10.1109/CSNDSP.2014.6923818 
City Research Online
Original citation: Xu, Q., Gong, P. & Chen, T. (2014). Concatenated LDPC-TCM coding for 
reliable storage in multi-level flash memories. 2014 9th International Symposium on 
Communication Systems, Networks and Digital Signal Processing (CSNDSP), pp. 166-170. doi: 
10.1109/CSNDSP.2014.6923818 
Permanent City Research Online URL: http://openaccess.city.ac.uk/8192/
 
Copyright & reuse
City University London has developed City Research Online so that its users may access the 
research outputs of City University London's staff. Copyright © and Moral Rights for this paper are 
retained by the individual author(s) and/ or other copyright holders.  All material in City Research 
Online is checked for eligibility for copyright before being made available in the live archive. URLs 
from City Research Online may be freely distributed and linked to from other web pages. 
Versions of research
The version in City Research Online may differ from the final published version. Users are advised 
to check the Permanent City Research Online URL above for the status of the paper.
Enquiries
If you have any enquiries about any aspect of City Research Online, or if you wish to make contact 
with the author(s) of this paper, please email the team at publications@city.ac.uk.
Concatenated LDPC-TCM Coding for Reliable
Storage in Multi-Level Flash Memories
Quan Xu, Pu Gong and Thomas M. Chen
School of Engineering and Mathematical Sciences
City University London
Northampton Square, London EC1V 0HB, United Kingdom
Email: {Quan.Xu.1, Pu.Gong.1, Tom.Chen.1}@city.ac.uk
Abstract—In this paper, we present an efficient fault tolerant
solution that concatenates trellis coded modulation (TCM) with
an outer low-density parity-check (LDPC) code for multi-level per
cell (MLC) flash memory. Traditional flash coding systems em-
ploy simple hard-decisions based codes, such as Bose-Chaudhuri-
Hocquenghem (BCH) codes, that can correct a fixed, specified
number of errors. Thanks to the Bahl, Cocke, Jelinek, and
Raviv (BCJR) algorithm, the TCM decoder within the proposed
design can provide soft decisions which make it possible to use
the more powerful LDPC codes. Moreover, the error-correction
performance is further improved since TCM can well decrease
the raw error rate of MLC and hence relieve the burden of
outer LDPC code. The effectiveness of concatenated LDPC-TCM
systems has been successfully demonstrated through computer
simulations.
I. INTRODUCTION
Error correction codes (ECC) have become an important
approach to enhance the data reliability of MLC flash mem-
ory. Due to the fact that flash systems only provide hard
information to its decoders, simple algebraic codes such as
BCH codes are generally employed in current design practice.
As the storage density of MLC flash increases however, there
is growing need for more advanced ECC techniques. LDPC
codes are well known for their ability to approach the capacity
limit in the additive white Gaussian noise (AWGN) channel,
but LDPC codes are typically decoded with soft information.
To realize the benefits of LDPC decoding in flash memory,
researchers are trying to extract the soft information from the
sensing outputs of flash systems [1, 2].
Trellis coded modulation is a powerful means for achieving
coding gain in digital communication systems. Several studies
have attempted to apply TCM to flash systems. Lou and
Sundberg were among the first to use coded modulation in
multilevel memories, but they did not consider outer ECC and
sensing quantization [3]. Sun et al. successfully demonstrated
that TCM can help to improve the performance of flash coding
system, but they were focusing on short Hamming and con-
volutional codes [4]. Concatenation of TCM and BCH coding
have been proposed, considering both the coded modulation
and outer codes design [5]. However, the Viterbi algorithm was
chosen to perform TCM decoding, which still results in hard
decisions. In addition, 5-level MLC flash memory considered
in the work is not readily available in the current market.
This paper investigates concatenated TCM and LDPC codes
for flash memory that is modeled with pulse-amplitude mod-
ulation (PAM) plus Gaussian noise. We demonstrate that with
the coded modulation, the storage reliability can be increased
with the same signal-to-noise ratios (SNR). Furthermore, by
performing the maximum a posteriori probability (MAP)
decoding, we obtain soft information from TCM demodulator
that can be utilized for LDPC decoding. Compared to flash
memory with BCH coding, significant performance improve-
ment of the concatenated system has been observed.
The rest of this paper is organized as follows. Section II
concisely summarises the related background knowledge for
understanding the topics in the forthcoming sections. In Sec-
tion III, the proposed LDPC-TCM coding scheme is described
and elaborated in detail. Theoretically achievable Asymmetric
Coding Gain of the concatenated system is analysed in Section
IV. Section V provides simulation results demonstrating the
benefits of the proposed mechanism. The conclusion and
further issues are drawn in Section VI.
II. BACKGROUNDS
In this section, we briefly present the basics of NAND flash
memory, Trellis Coded Modulation, and LDPC codes based
on related works [2, 5–9]. The readers are referred to these
literatures and references therein for detailed discussions on
flash memory structures and related LDPC coding techniques.
A. NAND flash memory structure
Each NAND flash memory cell comprises a metaloxide
semiconductor field effect transistor (MOSFET) with a float-
ing gate [2]. The amount of charges stored during writ-
ing/programming in the floating gate is quantized to Ω levels
to express log2 Ω bits of information.
The probability density function of the variation of threshold
voltage in MLC flash memory cell is usually modeled by
a Gaussian distribution. In this work, we assume an i.i.d.
(independent and identically distributed) Gaussian threshold
voltage for each level of the memory cell [1, 3]. Therefore an
m-level flash cell is equivalent to an m-PAM communication
system with additive white Gaussian noise. As an example,
the threshold voltage distribution of 2 bits per cell flash
memory are depicted in Fig. 1 which shows four distributions
representing the memory levels with mean values of PVi for
i ∈ {0, 1, 2, 3} and the same standard deviation of σ.
11 10 00 10
PV0 PV1 PV2 PV3R1 R2 R3 Treshold voltage
Vmax
Fig. 1. The approximate Flash memory cell threshold voltage distribution
model
B. Low Density Parity Check Codes
Low-density parity check code has been developed by
Gallager [6] in the early 1960’s. An LDPC code is defined as
the null space of a parity check matrix H with the following
structural properties: (1) each row consists of ρ “ones”; (2)
each column consists of η “ones”; (3) the number of “ones”
in common between any two columns, denoted ζ, is no greater
than 1; (4) both ρ and η are small compared to the length of
the code and the number of rows in H. Since ρ and η are
small, H has a small density of “ones” and hence is a sparse
matrix.
The class of LDPC codes have held the attention of cod-
ing theorists in the past decade not only because of their
near-capacity performance on data transmission and storage
channels, but also because their decoders can be implemented
with manageable complexity. In addition to introducing LDPC
codes, Gallager also provided a decoding algorithm that is typ-
ically near optimal. Since that, other researchers have indepen-
dently discovered several related algorithms, albeit sometimes
for different applications. The class of decoding algorithms
are collectively termed “message passing” algorithms since
their operation can be explained by the passing of messages
in graph-based model of LDPC codes. Generally, message
passing decoders use soft reliabilities about the received bits;
conversely, a quantization of the received information or hard
decisions can degrade the performance of an LDPC code.
C. Trellis Coded Modulation
In digital communication, Trellis Coded Modulation is an
attractive solution for improving band limited communication
systems. This technique evolves from the end of 1970’s, when
Ungerboeck [7] addressed the issue of bandwidth expansion
by combining coding and modulation. According to him,
“redundancy” is now provided by using an expanded signal set
and the coding is done directly on the signal sequences. TCM
is highly efficient, i.e. high coding gain and little descending
data rate by coding and error correction, and more efficient
especially for multilevel modulation.
The output signal sequences of TCM systems are usually
decoded by the Viterbi Algorithm (VA) [8], which computes
the most-likely input sequence (i.e., performs hard-output
detection). For concatenated LDPC-TCM system, however,
the soft decisions are required. Among the approaches that
can generate soft information, the BCJR algorithm achieves
optimum soft-output performance, while being well-suited
K=7, rate½ 
code
n-1 bits
1 bit
n-2 bits
2 bits
n bitsouter LDPC 
encoder
Trellis Coded Modulation
Demodulation
outer LDPC 
decoder
MAP decoder
LLRsb
b
y
2n levels/cell 
flash memory 
array
Fig. 2. Block diagram of TCM LDPC coding system
for hardware implementation. Thus, we consider the BCJR
algorithm in our design.
III. CONCATENATED LDPC-TCM CODING SYSTEM
This sections provides detailed descriptions of the proposed
concatenated LDPC-TCM coding system illustrated in Fig. 2.
The analysis and verification will be presented in Section IV
and V. Firstly, some redundancy is added to the information
bit stream b by the LDPC encoder, and then the bit stream
is passed to the TCM module after serial/parallel conversion.
After being processed at the TCM module, the coded data bits
are then programmed to the flash memory array. The threshold
voltage within each MLC cell are sensed and quantized during
the reading process, and the quantized value y is utilized in
TCM demodulation and LDPC decoding.
For ease of practical implementation, we adopt the industrial
standard pragmatic approach to TCM [8] (this approach will
be referred to as the pragmatic TCM for convenience in the
rest of the paper). In the design here, n− 1 encoded bits are
to be stored in one memory cell, among which n − 2 bits
are stored directly while 1 bit is fed to a constraint-length 7
(64 states), rate 1/2 convolutional code. The output n bits are
stored in 2n levels; as a result, the number of storage levels
per cell must increase from 2n−1 to 2n.
Let Xl denote the trellis state at time l; the coded 2 bits
can be decided by the state transition from Xl to Xl+1. For
the mapping of pragmatic TCM, 2 coded bits choose a cell
voltage level within a subconstellation according to the Gray
code, and n− 2 uncoded bits choose the subconstellation lex-
icographically, thus there are in total 2n−2 parallel transitions
for given two coded bits, as shown in Fig. 3. Consider the
pragmatic TCM on 8-level MLC for example, “111”, “110”,
“100”, “101”, “011”, “010”, “000” and “001” are mapped to
the voltage levels PV 0, PV 1, PV 2, PV 3, PV 4, PV 5, PV 6
and PV 7, respectively.
Due to the fact that LDPC decoder only accepts soft infor-
mation, a demodulation module that can provide soft decisions
is required, although the Viterbi algorithm is generally used to
perform TCM decoding for the maximum likelihood sequence
estimation (MLSE) which results in hard decisions. In this
paper, the BCJR algorithm [9] is employed in the MAP
Xl Xl+1
2n-2 parallel 
transitions
time l time l+1
Fig. 3. State transitions in pragmatic TCM
decoder, which generates soft information of each bit, in
terms of log-likelihood ratios (LLRs), to the following LDPC
decoder.
Suppose the demodulator receives y = (y0, y1, . . . , yL−1)
from each page of the memory block, where yl is the quantized
voltage sensed from one memory cell, and let cl denote the
expected n-bit symbol along with the transition from time l to
time l+1. For each bit cl,k, k = 1, 2, ...n, the LLR is defined
as
LLR(cl,k) = ln
(
Pr(cl,k = 1|y)
Pr(cl,k = 0|y)
)
(1)
As a demodulated noisy value yl is received from the flash
channel with AWGN noise with variance σ2, the likelihood
function becomes
p(yl|Xl, Xl+1) = 1
σ
√
2π
exp
(
−|yl − cl|
2
2σ2
)
(2)
The BCJR algorithm finds the a posteriori probabilities
using the forward and backward recursions, respectively, as:
α(Xl+1) =
∑
Xl
α(Xl)γ(Xl → Xl+1) (3)
β(Xl+1) =
∑
Xl+1
β(Xl+1)γ(Xl → Xl+1) (4)
where the values for α(X0) and β(XL) should be determined
according to the initial conditions, and γ(Xl → Xl+1) is the
branch metric that is given by
γ(Xl → Xl+1)
=
{
Pr(Xl+1|Xl)p(yl|Xl, Xl+1), valid transition;
0, invalid transition.
(5)
The first term in the above equation corresponds to the a prior
probability of the transition, (Xl → Xl+1), which is known at
the TCM encoder. The a posteriori probability of cl is given
by
Pr(cl|y) =
∑
Λl(cl)
α(Xl+1)β(Xl)γ(Xl → Xl+1) (6)
where Λl(cl) denotes the set of the state transitions, (Xi →
Xi+1), for given cl. With the calculated probabilities of all the
symbols in the trellis, we can further obtain the a posteriori
probability of each bit and the corresponding LLRs.
IV. THEORETICAL ANALYSIS OF CODED MODULATION IN
FLASH MEMORY
It is well known that the error performance of TCM systems
in terms of SNR is measured by the free Euclidean Distances,
where SNR is defined as the ratio between the average signal
power, Es, and the average noise power, 2σ2. Assuming the
mean value of the threshold voltage at erase level (PV0) is 0,
the programming voltage at the highest level becomes Vmax
(see Fig. 1), and the peak power Ep equals V 2max. In this work,
both the TCM and non-TCM systems use the same peak power
for fair comparisons.
Let M represent the number of levels in a MLC flash
memory, the minimum squared Euclidean distances (MSED)
which equals the programming voltage difference between
two adjacent levels and the mean value of each level can be
expressed as
d2min =
V 2max
(M − 1)2 (7)
and
PVi =
Vmax
M − 1 i, i = 0, 1, 2...,M − 1. (8)
The average power Es is calculated as
Es =
1
M
M−1∑
i=0
(PVi)
2. (9)
Substituting (7) and (8) into the above equation, we obtain
Es =
V 2max(2M − 1)
6(M − 1) =
(M − 1)(2M − 1)
6
d2min (10)
With the flash channel model presented in Section II-A, the
SNR is given by
SNR(dB) = 10 log10
Es
2σ2
(11)
As for the TCM system, the free squared Euclidean distance
d2free [8] is given by
d2free = 2(2dmin)
2 +(df − 4)(dmin)2 = d2min(df +4) (12)
where df is a factor that depends on the convolutional codes
used, and is equal to 10 for the constraint-length 7 (64 states)
convolutional code of pragmatic TCM [8]. Substituting (7) into
the above equation, d2free can be rewritten as
d2free =
14V 2max
(M − 1)2 (13)
The asymptotic coding gains (ACG) [7] (achieved at high
SNR) of the proposed system is computed as
ACG = 10 log10
(
d2freeEs/non−tcm
d2minEs/tcm
)
(14)
Substituting (7), (10) and (13) into the above equation and
noting that the values of M for non-TCM and TCM system
are 2n−1 and 2n respectively, we obtain
ACG = 10 log10
(
14(2n−1 − 1)
2n+1 − 1
)
(15)
Increasing the number of flash memory cell levels reduces
the Euclidean distances between the multiple levels; never-
theless, the trellis coded modulation could offer a coding
gain that overcomes such disadvantage and further improve
performance over the non-TCM system. In the next section,
we will observe this error performance improvement versus
the derived SNR.
V. PERFORMANCE EVALUATION
For purpose of comparison, we briefly present the asymp-
totic coding gain achieved for M-PAM modulation [8], which
is generally used in digital communication with a similar
model as Fig. 1.
ACG = 10 log10
(
4× M
2 − 4
M2 − 1 ×
7
8
)
= 10 log10
(
7(22n−1 − 2)
22n − 1
) (16)
where n becomes the number of bits per transmitted symbol
corresponding to the bits per cell in MLC flash memory. Fig.
4 shows that the value of ACG increases with n and the M-
PAM modulation achieves higher gain than flash memory at
certain value of n. The reason is because the average energies
have been normalized to unity in the transmitter of digital
communication systems, while the peak energies rather than
average energies are normalized in flash memories. In the
current market, 4-level and 8-level MLC are the prevalent flash
storage media, so these two types were chosen to be used in
the following simulations. As shown in Fig. 4, for TCM system
on 8-level MLC, the asymptotic coding gain can be achieved
as much as 4.5 dB in SNR.
To evaluate performance improvements due to the LDPC-
TCM concatenated coding provided in Fig.2, we consider two
flash memory systems which have the same page length of
4K bytes but using 4-level and 8-level MLC respectively.
BCH coding and standard Gray mapping are applied in the
first system while LDPC coding and the pragmatic TCM
mapping are used in the second one. We store the same size
of information bits that are randomly generated into these two
memory systems and compare the bit error rate (BER) after
reading and decoding.
For the error correction codes, we first design a rate-0.927
(17664, 16384) structured LDPC code [10] whose parity-check
matrix is specified by a triangular plus dual-diagonal form to
lower the error floor and encoding complexity. Additionally,
the bipartite graph of the LDPC code used has been con-
structed to be free of cycle-4 with the bit-filling algorithm.
Min-sum decoding algorithm is used to carry out LDPC code
decoding. For the purpose of comparisons, we also consider a
2 3 4 5 6 7 8 9 10
3
3.5
4
4.5
5
5.5
n
AC
G
(dB
)
 
 
MLC flash
M−PAM modulation
Fig. 4. Asymptotic Coding Gain of TCM for MLC flash memories and M-
PAM modulation
[n, k, t] = [16383, 15200, 85] binary BCH code with the same
rate-0.927.
Fig. 5 shows the raw BER performance of the flash systems
without ECC. Comparing to the flash system using 4-level
MLC, the coding gain (CG) of TCM flash system using 8-level
MLC is about 3.4 dB at the bit error probability of 10−6. For
the flash system at very low bit error probability (say less than
10−9), the coding gain will reach the asymptotic coding gain
of 4.5 dB that was derived earlier. In practical flash memories,
the stored information (associated with cell threshold voltages)
is usually sensed and quantized during the reading, so we
assume two types of uniform sensing quantization schemes
in our simulations: 16 levels and 32 levels, labelled as TCM-
16Q and TCM-32Q, respectively. Performance loss has been
observed after applying quantization to the TCM flash system.
However, it still demonstrates a substantial performance im-
provement compared to the raw BER of 4-level flash system,
as shown in Fig. 5.
Fig. 6 shows the BER performance of the flash systems
with outer ECC. BCH coding is adopted in the flash system
using 4-level MLC because for which only hard-decisions
are available. As shown, the proposed LDPC-TCM coding
provides a remarkable performance contribution (about 0.75
dB improvement over BCH coding at the bit error probability
of 10−6) to the error correction of flash systems. Similarly,
some performance loss is introduced due to the quantization.
The outer ECC of the proposed design can be replaced with
BCH coding if the TCM decoder simply outputs the hard
reliabilities. For performance evaluation, we also compared
our design with such BCH-TCM coding system. Fig. 7 illus-
trates the simulation results of these two schemes under the
same memory parameters. The curves show that LDPC code
outperforms the BCH code on the condition of same TCM
parameters and quantization. Additionally, compared to the
results shown in Fig. 6, it has been observed that BCH coding
also benefits from the use of TCM if the quantization is not
15 16 17 18 19 20 21 22
10−7
10−6
10−5
10−4
10−3
10−2
10−1
Signal−to−Noise Ratio(dB)
Bi
t E
rro
r R
at
e
 
 
4Level−RAW
TCM−16Q
TCM−32Q
TCM−Inf
Fig. 5. Simulation results for flash systems without outer ECC
15 16 17 18 19 20
10−7
10−6
10−5
10−4
10−3
10−2
10−1
Signal−to−Noise Ratio(dB)
Bi
t E
rro
r R
at
e
 
 
4Level−Raw
BCH−4Lev
LDPC−TCM−32Q
LDPC−TCM−Inf
Fig. 6. Performance comparisons of concatenated LDPC-TCM and BCH
coding systems((17664, 16384)LDPC code and (16383, 15200)BCH code)
too coarse.
VI. CONCLUSIONS
An error-correction scheme of concatenated LDPC-TCM
coding for MLC flash memory is proposed. Compared to the
flash coding system that provides hard-decisions and employs
BCH codes only, results show remarkable BER performance
improvements from the system equipped with (17664, 16384)
LDPC codes and industrial pragmatic TCM. In this paper, we
have also derived mathematical formulations to quantitatively
analyse the asymmetric coding gain achieved in flash channels.
BCJR algorithm is performed and associated with TCM, which
has been demonstrated to be an alternative of converting
the hard-decisions into soft-decisions. To further improve our
work, better quantization schemes for memory sensing should
be considered while designing the error correction system.
15 15.5 16 16.5 17 17.5
10−7
10−6
10−5
10−4
10−3
10−2
10−1
Signal−to−Noise Ratio(dB)
Bi
t E
rro
r R
at
e
 
 
BCH−TCM−16Q
LDPC−TCM−16Q
LDPC−TCM−32Q
BCH−TCM−32Q
Fig. 7. Performance comparisons of concatenated LDPC-TCM and BCH-
TCM coding systems
REFERENCES
[1] J. Wang, T. Courtade, H. Shankar, and R. Wesel,
“Soft information for LDPC decoding in flash: mutual-
information optimized quantization,” in IEEE Globecom,
Dec. 2011, pp. 1–6.
[2] G. Dong, N. Xie, and T. Zhang, “On the Use of Soft-
Decision Error-Correction Codes in NAND Flash Mem-
ory,” IEEE Trans. on Circuits and Systems I: Regular
Papers, vol. 58, no. 2, pp. 429–439, 2011.
[3] H.-L. Lou and C.-E. Sundberg, “Coded modulation to
increase storage capacity of multilevel memories,” in
IEEE Globecom, 1998, pp. 3379–3384.
[4] F. Sun, S. Devarajan, K. Rose, and T. Zhang, “Multilevel
flash memory on-chip error correction based on trellis
coded modulation,” in 2006 IEEE Int. Symp. on Circuits
and Systems (ISCAS 2006), May 2006.
[5] S. Li and T. Zhang, “Improving multi-level NAND flash
memory storage reliability using concatenated BCH-
TCM coding,” IEEE Trans. on VLSI Systems, vol. 18,
no. 10, pp. 1412–1420, Oct. 2010.
[6] R. Gallager, “Low-density parity check codes,” IRE
Trans. Information Theory, pp. 21–28, Jan. 1962.
[7] G. Ungerboeck, “Trellis-coded modulation with redun-
dant signal sets Part I: Introduction,” IEEE Com. Mag.,
vol. 25, no. 2, pp. 5–11, 1987.
[8] A. Viterbi, J. Wolf, E. Zehavi, and R. Padovani, “A
pragmatic approach to trellis-coded modulation,” IEEE
Com. Mag., vol. 27, no. 7, pp. 11–19, July 1989.
[9] L. Bahl, J. Cocke, F. Jelinek, and J. Raviv, “Optimal
decoding of linear codes for minimizing symbol error
rate,” IEEE Trans. on Info. Theory, vol. 20, no. 2, pp.
284–287, Mar. 1974.
[10] Z. He, P. Fortier, and S. Roy, “A class of irregular LDPC
codes with low error floor and low encoding complexity,”
IEEE Com. Letters, vol. 10, no. 5, pp. 372–374, 2006.
