ABSTRACT A protograph-based quasi-cyclic (QC) low-density parity-check (LDPC) code for multi-level cell (MLC) NAND flash memories is proposed in this paper. In this design approach, the quantized voltage signals are measured for soft decoding because the exact voltage values are unavailable. Flash memory channels are asymmetric, and therefore, not optimal for existing LDPC codes optimized for symmetric additive white Gaussian noise (AWGN)-like channels. Mutual information (MI) between the input and output of flash memory is used to model the quantized log-likelihood ratio (LLR) messages. The base matrix of the method is constructed according to the degree sequences optimized by the modified extrinsic information transfer (EXIT) chart method for flash memories. The designed protograph-based codes have a low-complexity QC encoder structure with a readily parallelizable decoder structure. In addition, rateadaptive polar code design based on Bhattacharyya parameters of the memory cell bits is proposed to further improve the storage efficiency of the MLC NAND flash memories. The code design takes advantage of the inherent characteristics of MLC flash memory channel to iteratively calculate the Bhattacharyya parameter of each memory cell bit, where the punctured positions are selected by choosing the bits with higher Bhattacharyya parameters to construct rate-adaptive polar codes. The simulation results confirm the benefits of the proposed coding schemes.
I. INTRODUCTION
In order to enhance the capacity of data storage systems, high-density NAND flash memories using multi-level cell (MLC) or three-level cell (TLC) technique have recently been proposed [1] , [2] . These techniques store 2 or 3 logical bits in a single physical memory cell and have received a lot of interest due to their higher storage capacity. Important characteristics of NAND flash memories include the low bit cost and high reliability. However, the scaling of NAND flash cells and noise affect these characteristics adversely. Some of the examples of noise sources include random telegraph noise (RTN), read/program disturb, retention noise (RN), and cellto-cell interference (CCI) [3] , [4] .
The associate editor coordinating the review of this manuscript and approving it for publication was Pietro Savazzi.
Channel coding can effectively improve the storage reliability and extend the storage lifetime. Traditionally, ReedSolomon (RS) codes and Bose-Chaudhuri-Hocquenghem (BCH) codes are commonly used in reliable storage [5] , [6] . But the traditional error correction codes (ECCs) are limited by their error correction capacity and can not meet the requirements for reliable data storage in high-density NAND flash memory. Low-density parity-check (LDPC) codes which approach the Shannon limit and the capacityachieving polar codes are essential for NAND flash based applications because of the scaling and MLC technique.
In [7] and [8] , LDPC codes employing soft-decision decoding have been used to solve the reliability issues of data storage. LDPC codes that provide the best degree of distributions, for decoding possibilities with different levels of precision during multiple reads are suggested in [9] . Additionally, flash memory channels are asymmetric. LDPC codes, optimized for symmetric AWGN-like channels, have been used for flash applications without considering their error characteristics of flash memories. LDPC degree distributions can be optimized by the reciprocal channel approximation (RCA) based extrinsic information transfer (EXIT) function analysis. Owing to the fact that this solution is an approximation of the density evolution (DE) algorithm, it leads to a very high computation complexity when compared with EXIT chart. Hareedy et al. [11] , propose a theoretical framework for the design and analysis of non-binary LDPC (NB-LDPC) codes over asymmetric data storage channels based on new combinatorial definitions and linear algebraic tools. This technique manipulates the edge weights in the graph representation of NB-LDPC codes. Output extrinsic log-likelihood ratio (LLR) messages cannot be approximated as a Gaussian distribution when the common EXIT chart approach is employed. This presents a major hurdle to the application of LDPC codes in flash memory channels as a method must be devised to describe the quantized message in closed form. LDPC codes optimized for two-dimensional intersymbol interference (2-D-ISI) channels exhibit better performance than those optimized for AWGN channels as demonstrated in [12] . There is the need to investigate a low-complexity optimization approach to the design of LDPC codes. This is required because an LDPC code degree distribution designed for a full-precision Gaussian channel may not be optimal in the quantized setting.
In addition, a new channel coding scheme, named polar codes, based on the theory of channel polarization, was proposed by Arıkan [13] in 2009. Polar codes are known to be the first liner block code that can be rigorously proven to achieve the capacity of binary-input discrete memoryless channels (B-DMCs) with efficient construction and coding complexity. The core theory of polar codes is the channel polarization which entails channel combining and channel splitting [14] . After channel polarization, bit channels can be divided into noisy channels and noiseless channels. In a communication system, the noiseless sub-channels are utilized in transmitting information bits, while the noisy sub-channels are used to transmit frozen bits. Puncturing is widely used to design rate-compatible (RC) polar codes [15] - [17] . The application of polar codes in MLC NAND flash memory has gained prominence because of its excellent error correction performance [18] , [19] . Reference [18] focuses on making polar codes be easily adapted to flash page of any size and proposing techniques to shorten polar codes in order to construct length-adaption codewords. The multi-strategy ECC scheme based on MLC NAND flash memory is proposed in [19] discussing the three decoding mechanisms of hard decoding, quantized-soft decoding and pure-soft decoding of polar codes in flash memory.
In this paper, protograph based quasi-cyclic (QC) LDPC codes applicable to MLC flash memories are proposed. The design takes advantage of the mutual information between the input and output of the flash memory to model the quantized LLR messages. Given a code rate and length, the base matrix can be determined according to the degree sequences for the specific flash channels. Furthermore, we deeply investigate the characteristics of MLC NAND flash memory channel and propose a method for rate-adaptive polar codes based on memory cell bits Bhattacharyya parameters, where the Bhattacharyya parameters of each memory cell bit are iteratively calculated according to its threshold voltage distribution, and the bits with higher Bhattacharyya parameters are selected as punctured bits to construct rate-adaptive polar codes to meet the high-efficiency storage requirements of flash memory.
The rest of this paper is organized as follows. Section II introduces the MLC NAND flash channel model. Then in Section III, two ECC schemes (protograph based QC-LDPC codes and rate-adaptive polar codes) are proposed for the MLC NAND flash channel. The proposed schemes are verified in Section IV. Finally, Section V concludes the paper.
II. MLC NAND FLASH CHANNEL MODEL
MLC allows for storing two bits in a single memory transistor, thereby providing twice as much data storage in the same area utilized by the single-level cell (SLC) technique. However, the charges need to be accurately placed within a one-four charge state and is associated with a two-bit data pattern. The data write process requires erasure of the flash memory cell and this is done by removing the charges in the floating gate, which sets its threshold voltage to the lowest voltage window. As the proposed LDPC code design is generic, a simplified channel model for MLC NAND is used in this study.
The threshold voltage of the erased memory cells tends to have a wide Gaussian-like distribution [20] , as a result of unavoidable process variability. We use the MLC flash model in [20] and assume that the other three cell threshold voltages follow the Gaussian distribution as demonstrated in equation (1), where µ XX and δ XX are the mean and standard deviation of the cell threshold voltage, respectively. Also, the right subscript XX in mean and standard deviation stands for a two-bit data pattern with Gray mapping (see Fig.1 ).
Equation (2) illustrates the relationship of the standard deviations and the locations of the means of the two inner distributions are determined to minimize the raw BER. The maximum voltage difference between the means of the two outer distributions is defined by v max = µ 01 − µ 11 . Fig.1 shows the threshold voltage probability distribution of MLC NAND flash memory channel, where µ 11 = 0, µ 10 = 3.25, µ 00 = 4.55, µ 01 = 6.5, and δ 00 = δ 10 = 0.28, δ 01 = 0.56, δ 11 = 1.12.
(1)
Decoders for error control codes in flash memory applications read the same sense-amp comparator multiple times using different world-line voltages to obtain soft information. This multiple readout operation is necessary because the sense-amp comparator can provide at most one bit of information about the threshold voltage. For a 4-level MLC flash memory with four distinct quantization regions, each cell is compared to 3 word-line voltages. A practical MLC channel model is proposed by the extension of the quantization approach in [9] , this maximizes the mutual information (MI) between the input and output of a QPSK read channel. In the next section, an ECC design which uses the maximum mutual information (MMI) based on the quantized channel signal is discussed.
Note that the proposed ECC design method is not sensitive to the specific distribution of the channel modeling. Thus, the proposed approach can be easily extended to other flash memory channels.
III. ECC DESIGN FOR MLC CHANNEL
When the LLR distribution of the channel output is nonGaussian, the traditional ECC method cannot be used directly as a result of the non-linearity of the quantizer. LDPC codes, and polar codes for MLC NAND flash memory channel is presented in this section. We use the mutual information between the input and output of the flash memory model the quantized LLR messages, and therefore connect ECC optimization between the flash channels and AWGN channels.
A. PROTOGRAPH BASED QC-LDPC CODE DESIGN
An important requirement of the proposed protograph QC-LDPC codes is the ability to improve the BER performance in flash memory channels. An ensemble of LDPC codes with the variable and check node degree sequences are defined as
, where λ i and ρ j denote the fraction of edges connected to variable nodes of degree i and check nodes of degree j, respectively. In order to optimize the degree sequence of LDPC codes, the λ and ρ sequences are carefully chosen to ensure that the check node detector (CND) EXIT curve and the variable node detector (VND) EXIT curve are as close as possible but do not meet.
For the MLC NAND flash memory channel, the quantized voltage signal is measured for BP decoding as the exact voltage signal is unavailable. Maximization of the MI between the input and output of the flash memory create different quantization levels. For EXIT charts, the output extrinsic LLR messages L ch from the channel are assumed to be approximately Gaussian distributed. However, this assumption does not apply for the MLC NAND flash memory channel, making the traditional EXIT chart method unsuitable for direct application in flash memory channels. Additionally, we cannot apply curve fitting to obtain a Gaussian-like distribution as is the case in AWGN channels where the variance of L ch conditioned on channel input X is easily obtained. Fig. 2 shows that the LLR distribution of the channel output is non-Gaussian. This outcome is as a result of the nonlinear quantizer, as such the quantized message cannot be described in closed form. The mutual information can be measured by conducting a large number of Monte Carlo simulations. Then, the unknown distributions are determined by applying the ergodicity theorem. In this process, we first track the quantized LLR distribution by I (ζ,X ) , which is measured by estimating the p(ζ |x =0 ) and p(ζ |x =1 ) from a histogram of the extrinsic information values, where ζ denotes the element of LLRs. Given the p(ζ |x =0 ) and p(ζ |x =1 ), the mutual information of quantized channel is calculated by
The EXIT chart method used to monitor the extrinsic information through curve-fitting is not sensitive to the LLR values. The mutual information builds a new bridge between the flash memory channel and AWGN-like channel, and then the classic code design methods can be re-used. Thus, the quantized LLRs are forced to follow a Gaussian distribution with variance δ 2 mlc from I (ζ,X ) . When considered together, flash channel output signal and VND EXIT function of a degree-d i v variable node is expressed by
where the functions J (·) and J −1 (·) are given in [21] . This design problem is equivalent to solving the following linear programming problem
where N κ is the total of samples whose step is denoted by δ κ between sample N κ−1 and N κ in the optimization, and
× ς i , and ς i is the fraction of nodes incident to variable nodes of degree d i v . Then the base matrix is constructed according to the degree sequences optimized by the modified EXIT chart method described above for the specific MLC NAND flash memory channel. To maintain benefits of the QC structure, we modify the progressive edge-growth (PEG) algorithm [12] by maximizing the girth of the graph and imposing the additional QC constraints. The procedure of protograph based QC-LDPC codes design is summarized in Algorithm 1, where
i is the number of connections of the check nodes connected with the variable nodes v i of base matrix H b , L l j is the l-th location of the check node which is corresponding the j-th replication.
B. RATE-ADAPTIVE POLAR CODE DESIGN
At the heart of this subsection is the design of rate-adaptive polar coding scheme to meet the requirement of storage during the flash memory lifetime, where the block diagram of the rate-adaptive polar coding scheme is shown in Fig.3 . The mother polar code is optimized first based on the characteristics of MLC NAND flash memory channel, then the rate-adaptive coding scheme is proposed according to the Bhattacharyya parameters of the memory cell bits.
Motivated by the LDPC code design approach in III-A, we calculate the MI of the quantized LLR-value messages I (ζ,X ) using equation (3), and model the quantized for j = 1 to p do 9: if ( j == 1) then 10: Select the check nodes with larger local girth based on the modified PEG algorithm 11: else 12: for l = 1 to N d i do 13: //Permute the edges while keeping QC structure 14 :
end for 16: end if 17: end for 18: end for LLR distribution as a Gaussian distribution with variance δ 2 mlc from I (ζ,X ) .
Then, the new parameter ϑ polar for polar construction in an MLC flash channel is proposed as
where ϑ polar specifically characterizes the channel LLR distribution of the MLC NAND flash memory. When the new parameter ϑ polar is achieved, we can do the polar code optimization and the corresponding puncturing operation by iteratively calculating the Bhattacharyya parameters of the memory cell bits. This polar code design problem is equivalent to calculating the Bhattacharyya parameters of the memory cell bits in the specific MLC flash channel. Bhattacharyya parameter, which is an important criterion for measuring the reliability of each polarized memory cell bit, is defined as
The smaller Bhattacharyya parameter Z (W can be described specifically by
Based on equations (8) and (9), the Bhattacharyya parameter of each cell bit is calculated as
An iterative initial value, which is required when iteratively calculates the Bhattacharyya parameters of cell bits, can be expressed as
where C(ϑ polar ) represents the capacity of the memory cell bit, by employing the Ungerboeck set partitioning method which is the best decision criterion in AWGN channel, and the minimum Euclidean distance of the subset after each segmentation is maximized. Given the code rate R and length N , we can construct the mother polar codes according to the C(ϑ polar ) in the specific MLC flash memory channel.
Punctured positions affect the performance of the punctured polar codes. The general puncturing methods include random puncturing by randomly select puncturing positions, and stopping-tree puncturing [22] which is performed by calculating the number of occurrences of each bit in the stopping-tree and selecting those bits that have fewer appearances as punctured bits. The exiting polar puncturing schemes do not consider the effect of channel characteristics on the selection of punctured positions. In this paper, a puncturing method based on Bhattacharyya parameters of the memory cell bits is proposed to capture the characteristics of the MLC flash memory channel, as shown in Fig.4 . Considering the puncturing rate R , the number of punctured bits K can be calculated by
where N and K represent the length and rate of the mother code, respectively. Bits of memory cells with higher Bhattacharyya parameters are selected as punctured bits, and the remaining bits are transmitted. (10) and (11). //Rate-adaptive polar coding scheme 8: Determine the punctured bits K according to the puncturing rate R based on equation (12 
IV. SIMULATION RESULTS
In this section, simulation results are presented to evaluate the BER and FER performance of the proposed coding schemes in an MLC NAND flash memory channel.
A. LDPC CODES RESULTS
In the simulations, we set the maximum number of BP iterations in LDPC decoder to 50. Also, a 3 bit-level quantizer VOLUME 7, 2019 is used. The SNR in decibels is defined as
where v max = µ 01 − µ 11 = 6.5 V , δ is the standard deviation of the two inner distributions. Fig.5 shows the comparison of BER performance of our proposed protograph codes (Scheme 1: QC-structure and random-structure) with the optimized random LDPC code (Scheme 2) and EG-QC LDPC code (Scheme 3) [23] in the specific MLC NAND flash memory channel. The base matrix of the proposed protograph LDPC code is based on the Algorithm 1, shown in Fig.6 with n = 4. All codes have rate of 0.81. The code lengths of the designed codes are 4096 where the replication factor p = 256, similar to the code length of the EG-QC LDPC code (N c = 4032). It can be seen from Fig. 5 that the proposed Scheme 1 can provide a 0.7 dB gain over Scheme 3 at a BER of 10 −6 . Additionally, the proposed protograph based QC-LDPC code in Scheme 1 exhibits similar BER performance compared to the random-structure protograph code at a BER of 10 −6 .
In the next set of simulations, we consider the performance of the frame error rate (FER) of the three schemes with the same code parameters. From Fig.7 it can be seen that our proposed protograph LDPC codes for the specific MLC NAND flash memory channel achieve a better performance than the EG-QC LDPC codes. This conclusion agrees with the results in Fig.5 . In addition, our proposed protograph based QC-LDPC code also maintains the low hardware storage space requirements of EQ-QC LDPC codes due to the QC structure.
In Fig.8 , we compare the BER performance of our proposed protograph LDPC codes with EG-QC LDPC code [23] in an AWGN channel. Fig.8 shows that the proposed protograph based QC-LDPC code has a performance gain of about 1.2 dB over EG-QC LDPC code at BER of 10 −6 in an AWGN channel. We also can see that the designed codes suffer from the sudden drop of BER curve slope in virtue of non-optimization for the AWGN channel.
Based on the nested mother base matrix in Fig.6 , the different code rate and code length can be generated. In Fig.9 , we compare the punctured protograph based QC-LDPC code and protograph LDPC code ( p = 85, N c = 1020, puncturing rate R = 0.75) in an MLC NAND flash memory channel of [27] , where the channel model is directly reflected by the program-and-erase (PE) cycles. Note that the base matrix of the punctured protograph based code is obtained by deleting the last column of the original base matrix with n = 3. From the Fig.9 we can see that the simulation results coincide with that of the mother codes in Fig.5 and Fig.7 , and our designed codes still work well in the other flash channel model.
B. POLAR CODES RESULTS
Simulations are conducted to evaluate the properties and BER performance of the proposed rate-adaptive polar codes which are designed for the specific MLC NAND flash memory channel, where the BP decoding algorithm is employed [24] , [25] , and the maximum number of iterations is set to 60. A (1024,768) polar code is proposed first and considered as the mother code in the following simulation, where the puncturing rates R = 0.8, R = 0.85, and R = 0.9 are employed.
In Fig.10 , we compare the BER performance of our proposed polar code with the codes constructed by Bhattacharyya parameters [13] , Monte-Carlo [13] , Gaussian approximation [26] , and reference [24] with optimal performance under AWGN channel for the same code length in the specific MLC NAND flash memory channel, where the proposed polar code is based on the Algorithm 2. 10 shows that in the specific MLC NAND flash memory channel, the polar code constructed by Gaussian Approximation method has the worst performance, and the performance of code constructed by Monte-Carlo method is close to the optimal polar code of [24] . However, there is certain gap between the performance of polar code constructed by Bhattacharyya parameters method and [24] in the high SNR region. It also can be seen that the BER curves of polar codes constructed by the four methods under the AWGN channel decrease gradually in the MLC NAND flash memory channel. The proposed polar code optimized in this paper has the best error correction performance than the other polar codes constructed by the classical methods under the AWGN channel. Observe that the proposed polar code can provide gains 1.8 dB over the code constructed by [24] at BER of 10 −6 . When SNR = 21.6 dB, the proposed polar code improves nearly 2 order of magnitudes compared with the code constructed by Bhattacharyya parameters method, and 5 order of magnitudes compared with the code constructed by Gaussian approximation method in BER performance at a code rate of 0.75 in the specific MLC NAND flash memory channel. In addition, it also can be seen that the punctured protograph based QC-LDPC code (N c = 1020) performs better than the polar code at the same code rate, where the code length are approximately equal. Fig.11 , Fig.12 and Fig.13 are the BER performance curves of polar codes punctured by different methods in an MLC NAND flash memory channel with puncturing rate R = 0.8, R = 0.85, and R = 0.9, respectively. From Fig.11 we can see that polar codes constructed by stopping-tree puncturing and random puncturing have the worst BER performance, and the error performance of code punctured by the frozen bits random method is close to the proposed puncturing method. Observe that the proposed punctured polar code can provide gains of 1 dB and 5 dB over the code constructed by random puncturing and stopping-tree puncturing at BER of 10 −4 , respectively.
In Fig.12 , it can also be seen that the proposed punctured polar code yields similar BER performance compared to the codes constructed by the other three puncturing methods at a code rate of R = 0.85, where the difference is that the performance of polar code punctured by stopping-tree puncturing is slightly better than that by the random puncturing at SNR = 20.3 dB, and BER curve of the code punctured by random approach yields a steep slope as the SNR increases gradually. Fig.13 shows that the code punctured by the random way has the worst performance compared with the other puncturing methods, and occurs a visible error floor. As for the high code rate, the proposed puncturing method can still construct the polar code with better BER performance, which coincides with the results in Fig.11 and Fig.12 .
In Fig.14 , we compare the BER performance of the four punctured polar codes at different puncturing rate in an MLC NAND flash memory channel decoded by the BP decoder. It can be seen that all the constructed polar codes yield the better BER performance. As SNR increases, the gap between the performance curves corresponding to different puncturing rates will further enlarge due to the improvement of puncturing rate, which increases the uncertainty in decoding process, resulting in the poor performance of punctured polar codes.
V. CONCLUSION
In this paper, novel protograph based QC-LDPC codes, and rate-adaptive polar codes are proposed for MLC NAND flash channels by exploiting their inherent characteristics. These codes are also applicable to other high-density flash memory channels. We use Monte Carlo simulations to show that the proposed protograph based QC-LDPC code has performance comparable to protograph based QC-LDPC codes, and performs better than the EG-QC LDPC codes. Another important benefit of the proposed approach is the appreciable simplification of the MLC NAND system. This is due to the protograph based QC structure which guarantees a significant reduction in the LDPC encoding and decoding complexity. The results also indicate that our rate-adaptive polar codes for MLC NAND flash memory channel are superior to other puncturing codes. There is also BER improvements of more than 2 and 5 order of magnitudes compared with the Bhattacharyya parameters and Gaussian approximation methods at a code rate of 0.8 in an MLC NAND flash memory channel respectively. In addition, it is worth mentioning that the proposed ECC construction approach can be extended to TLC NAND flash memory without a hitch. 
