Abstract-We propose binary discrete parametric channel models for multi-level cell (MLC) flash memories that provide accurate ECC performance estimation by modeling the empirically observed error characteristics under program/erase (P/E) cycling stress. Through a detailed empirical error characterization of 1X-nm and 2Y-nm MLC flash memory chips from two different vendors, we observe and characterize the overdispersion phenomenon in the number of bit errors per ECC frame. A well studied channel model such as the binary asymmetric channel (BAC) model is unable to provide accurate ECC performance estimation. Hence we propose a channel model based on the betabinomial probability distribution (2-BBM channel model) which is a good fit for the overdispersed empirical error characteristics and show through statistical tests and simulation results for BCH, LDPC and polar codes, that the 2-BBM channel model provides accurate ECC performance estimation in MLC flash memories.
Channel Models for Multi-Level Cell Flash Memories Based on Empirical Error Analysis
Veeresh Taranalli, Student Member, IEEE, Hironori Uchikawa, Member, IEEE, Paul H. Siegel, Fellow, IEEE Abstract-We propose binary discrete parametric channel models for multi-level cell (MLC) flash memories that provide accurate ECC performance estimation by modeling the empirically observed error characteristics under program/erase (P/E) cycling stress. Through a detailed empirical error characterization of 1X-nm and 2Y-nm MLC flash memory chips from two different vendors, we observe and characterize the overdispersion phenomenon in the number of bit errors per ECC frame. A well studied channel model such as the binary asymmetric channel (BAC) model is unable to provide accurate ECC performance estimation. Hence we propose a channel model based on the betabinomial probability distribution (2-BBM channel model) which is a good fit for the overdispersed empirical error characteristics and show through statistical tests and simulation results for BCH, LDPC and polar codes, that the 2-BBM channel model provides accurate ECC performance estimation in MLC flash memories.
Index Terms-Flash memory, multi-level cell, channel model, error correcting codes, P/E cycling.
I. INTRODUCTION
C HANNEL modeling for NAND flash memories is a developing research area with applications to better signal processing and coding techniques. A channel model for a flash memory can be viewed as a simplified representation of the underlying physical mechanisms which induce errors in stored data. For NAND flash memories, the major error mechanisms are program disturb and cell wear that occur during program/erase cycling, charge loss that occurs during data retention and inter-cell interference (ICI) [1] - [3] . The main applications of a flash memory channel model are improved design, decoding and performance evaluation of errorcorrecting codes (ECCs) and error-mitigating codes. Other applications include information theoretic studies that provide an analysis of the capacity of flash memories [4] , as well as insights for the development of new coding techniques. In this paper, we focus on the development of parametric channel models for multi-level cell (MLC) flash memories based on empirical error characterization, that enable accurate ECC frame error rate (FER) performance estimation/prediction. V. Taranalli, and P. H. Siegel are with the University of California, San Diego, La Jolla, CA 92093-0401, USA (e-mail: vtaranalli, psiegel@ucsd.edu).
H. Uchikawa is with Toshiba Corporation, Japan (e-mail: hironori.uchikawa@toshiba.co.jp).
A. Overview of the Problem
Efficient evaluation of ECC FER performance is important for storage system design and optimization. One approach to ECC FER performance estimation is to experimentally collect error data for use in Monte-Carlo simulations of the ECC decoder, but this can be impractical because of the large amount of error data required when estimating low frame error rates. Another approach is to analytically predict the performance of a code based upon a measured average raw bit error rate. While this is feasible for algebraic codes with bounded distance decoders, it is difficult for low density parity check (LDPC) codes and polar codes that use probabilistic decoders based upon message passing or successive cancellation. Moreover, the implicit assumption of independent, symmetric bit errors may not be justified.
Previously proposed [5] , [6] parametric channel models for MLC flash memories were obtained by using well known probability distributions to model the empirical cell threshold voltage distributions. In [5] , a Gaussian distribution, and in [6] , a Normal-Laplace mixture model were shown to be a good fit for the experimentally observed cell threshold voltage distributions in MLC flash memories. Such models can be used to reliably predict/estimate the experimentally observed raw bit error rate (RBER) of the flash memory. However in this paper, we show through empirical error characterization that the RBER is not necessarily a good indicator of the ECC FER performance and this is due to the overdispersion phenomenon in the number of bit errors per frame in MLC flash memories. Overdispersion refers to the greater variability in empirical data compared to a statistical model for e.g., the binomial distribution typically used to model count data. Therefore, a memoryless channel model such as the binary asymmetric channel (BAC) model provides an optimistic estimate of the ECC FER performance when compared to the actual ECC FER performance estimate obtained from empirical data.
B. Summary of Contributions
We present a detailed empirical characterization of errors in MLC flash memories at the bit, cell and page granularity levels for 1X-nm and 2Y-nm feature size MLC flash memory chips from two different vendors referred to as vendor-A and vendor-B respectively. We study the asymmetry of bit errors in the lower and upper pages of MLC flash memories with a focus on the number of bit errors per frame parameter. We observe that the empirical probability distributions of the number of bit errors per frame parameter are overdispersed arXiv:1602.07743v2 [cs.IT] 17 May 2016 when compared to a binomial distribution typically used to model count data.
Based on the empirical error analysis, we study the perpage binary asymmetric channel (BAC) model referred to as the 2-BAC model for MLC flash memories. Using statistical analysis, we show that the 2-BAC model does not provide a good fit for the empirical error data and hence is inadequate for accurate ECC frame error rate (FER) performance estimation. Therefore, we propose a channel model based on the betabinomial probability distribution referred to as the 2-BetaBinomial (2-BBM) channel model. We show that it is a good fit for the observed overdispersed empirical error data and performs well for ECC FER performance estimation. We also propose normal and Poisson approximation based channel models for MLC flash memories.
Through quantitative evaluation of the proposed channel models using the statistical Kolmogorov-Smirnov (K-S) Two Sample goodness of fit test and using Monte-Carlo simulation results of FER performance for BCH, LDPC and polar codes, we show that the 2-Beta-Binomial channel model is an accurate channel model to represent the overdispersed nature of bit errors in MLC flash memories.
C. Organization of the Paper
The rest of the paper is organized as follows. Section II presents a brief introduction to flash memories with a focus on the structure of MLC flash memories. In Section III we describe the P/E cycling experiment procedure. Section IV provides a detailed empirical characterization of errors in MLC flash memories, the results of which are utilized for design and evaluation of the proposed channel models. Section V describes the proposed channel models for MLC flash memories and provides statistical analysis results. In Section VI, quantitative results for statistical goodness of fit tests and BCH, LDPC and polar code FER performance are presented to evaluate the proposed channel models. Section VII provides the concluding remarks.
II. FLASH MEMORY STRUCTURE
The fundamental data storing unit in NAND flash memories is a floating-gate transistor commonly referred to as a cell. A cell can be programmed to hold different levels of charge and these charge levels represent the data bits stored in a cell. The most commonly used cells in today's flash memories are capable of holding 2, 4 and 8 distinct charge levels (1, 2, 3 bits/cell respectively) and are referred to as single-level cell (SLC), multi-level cell (MLC) and three-level cell (TLC) respectively. These flash memory cells are organized into a rectangular array interconnected through horizontal wordlines (WL) and vertical bitlines (BL) to form a flash memory "block" [1] . A collection of such blocks makes up the flash memory chip. A schematic of the block structure of MLC flash memories is shown in Fig. 1 .
The two bits belonging to a MLC flash memory cell are separately mapped to logical units of programming, called pages. A page is also the smallest unit for program and read operations whereas a block is the smallest unit for the erase 10  11  01  10   11  01  01  11   01  10  00  11   00  01 11 01
MLC Flash Block Schematic Fig. 1 . Cell level to bit mapping and block schematic in MLC flash memories.
In the block schematic, the rectangles depict the MLC flash memory cells connected to horizontal wordlines (WL) and vertical bitlines (BL).
operation. The most significant bit (MSB) is mapped to the lower page while the least significant bit (LSB) is mapped to the upper page. The lower page bit of a cell always precedes the corresponding upper page bit in the programming order. We represent the four charge levels in MLC flash memory as 0, 1, 2, 3 in the increasing order of charge levels respectively. The corresponding 2-bit patterns written to the lower (MSB) and upper (LSB) pages are '11', '10', '00' and '01' respectively as shown in Fig. 1 .
III. EXPERIMENT PROCEDURE
To characterize and quantify the number and types of errors observed, we perform program/erase (P/E) cycling of the MLC flash memory chip under test which consists of repeated application of the following steps: 1) Erase MLC flash memory blocks under test.
2) Program MLC flash memory pages (of blocks under test) with pseudo-random (PR) data generated using a Mersenne-Twister pseudo-random number generator. The pseudo-random number generator is initialized with a randomly generated seed for every page in every P/E cycle. 3) Starting with the first cycle, perform a read operation on the MLC flash memory block(s) at intervals of every 100 th cycle. Record bit errors and their locations in the block. We arbitrarily choose 4 contiguous blocks in an MLC flash memory chip for our experiments. The MLC flash memory blocks are P/E cycled up to 10,000 P/E cycles and the experiments are performed at room temperature in a continuous manner with no extra wait time between the erase/program/read operations.
IV. CHARACTERIZATION OF ERRORS IN MLC FLASH MEMORIES
The first step in the error characterization of a flash memory chip is to study its raw bit error rate (BER) performance when all the pages in all the blocks under test are programmed with pseudo-random data. This closely resembles the most common use in practice, where random data are stored and retrieved. 2 shows the average raw BER across the P/E cycles when all pages in each block are programmed for both the vendor-A and vendor-B flash memory chips. The raw BER is averaged over 4 blocks tested. Fig. 2 also shows the average raw BER separately for the lower and upper pages of the MLC flash memory. Although the lower page is expected to have a smaller BER compared to the upper page [7] , we observe that this is only the case up to a certain number of P/E cycles in the beginning and as the P/E cycle count increases, the lower page begins to show a larger number of errors than the upper page. This observation is consistent across both the vendor-A and vendor-B flash memory chips. Using empirical data from 20 blocks of the same flash memory chip, we have also observed consistent measured average raw BER estimates across all the P/E cycles.
We also record the specific cell (symbol) errors corresponding to all the bit errors observed. Table I shows the frequencies of all possible cell errors as a percentage of the total number of cell errors observed across all the blocks in all the P/E cycles. The corresponding average cell error probabilities across all P/E cycles are ∼4.16 × 10 −3 and ∼2.71 × 10 −3 for vendor-A and vendor-B chips respectively. We observe that the level 1 to 2 cell error "10 (1) → 00 (2)" is the most dominant for both vendor-A and vendor-B chips. This observation explains why the lower page average raw BER is worse than the upper page average raw BER as shown earlier in Fig. 2 . We also note that the three adjacent level cell errors "10 (1) → 00 (2)", "11 (0) → 10 (1)" and "00 (2) → 01 (3)" are the most frequent and together make up about 96% and 94% of all the cell errors observed for the vendor-A and vendor-B chips respectively. Such knowledge about dominant cell errors can be very useful in utilizing ECC redundancy more effectively. This was demonstrated in [8] , where the authors designed two BCH codes with different error correction capabilities for the lower and upper pages of an MLC flash memory and proposed a stagewise combined decoding algorithm for both pages. Their scheme gave better results than using a single BCH code independently for all pages.
A. Asymmetry of Bit Errors in MLC Flash Memories
Fig . 3 shows the asymmetry of bit errors in MLC flash memories. We present the average raw BERs corresponding to the specific types of bit errors i.e., 0 → 1 and 1 → 0 bit errors, in the lower and upper pages of both vendor-A and vendor-B MLC flash memory chips. While there is a high degree of asymmetry in the lower page bit errors throughout the P/E cycle range, the degree of asymmetry in the upper page bit errors is much lower. This agrees well with the observations in Table I , where the dominant cell errors imply a large proportion of 1 → 0 bit errors in the lower page and comparable proportions of 0 → 1 and 1 → 0 bit errors in the upper page. This asymmetry in bit errors in both the lower and upper pages also reflects the dominance of data dependent inter-cell interference (ICI) errors i.e., the middle cells in the cell level data patterns 303, 313 and 323 across wordlines are highly susceptible to errors [9] .
B. Characterization of Number of Bit Errors per Frame
As we want to develop parametric channel models for MLC flash memories which provide an accurate representation of the empirically observed bit errors and enable accurate ECC FER performance estimation, we study the distribution of the number of bit errors per frame parameter. This is the key factor in determining the FER performance of an ECC with a specified error correction capability of t number of bit errors per frame.
From the error data collected during P/E cycling experiments, we obtain the sample counts of the number of bit errors per frame for 0 → 1 and 1 → 0 bit errors in both the lower and upper pages by choosing a fixed frame length of N = 8192 bits. This choice of the frame length is representative of the large ECC frame lengths used in practice, while still being small enough to ensure sufficient empirical data can be collected easily. Commonly used ECC frame lengths range from 8192 to 32768 bits and multiple ECC frames are written to a single flash memory page in practice. The sample mean and variance statistics of the number of bit errors per frame are computed using the sample counts and are shown in Table II for both vendor-A and vendor-B chips. We also plot two dimensional (2D) maps showing the number of bit errors for every frame in a single block of MLC flash memory at 8,000 P/E cycles in Fig. 4 . The 2D maps are obtained by stacking horizontally, the bit error counts in frames belonging to a page, and then stacking vertically all the pages belonging to a single block. From Table II and Fig. 4 , we clearly observe that the variance in the number of bit errors per frame is much larger than the mean i.e., the experiment data is overdispersed with respect to a binomial distribution, Binomial(n, p), typically used to model count data whose mean and variance are approximately equal when p is small. 
V. CHANNEL MODELS FOR MLC FLASH MEMORIES
In this section, first we study the suitability of well known discrete memoryless channel (DMC) models such as the 4-ary DMC, the BSC and the BAC, to represent the bit errors observed in the MLC flash memory channel. Among the DMC models, a per page BAC (2-BAC) model appears to align well with our empirical error characterization results. However we show through analysis as well as empirical results that the per page BAC model is unable to fit the empirical distribution of the number of bit errors per frame and is not a good model for ECC FER performance estimation. This is due to the interdependence of mean and variance statistics of the number of bit errors per frame for a BAC where the number of 0 → 1 and 1 → 0 errors are modeled as binomial distributions. The binomial distribution is a single parameter (degree of freedom) distribution, hence its mean and variance cannot be chosen independently. Thus the binomial distribution is unable to accurately model the overdispersed empirical error data as described in the previous section. A natural next choice is to consider the normal approximation to the binomial distribution which provides two parameters (degrees of freedom) for modeling the observed mean and variance statistics independently. However we observe that the normal approximation based channel model does not accurately fit the shape of the empirical data distribution. Another commonly used probability distribution to model overdispersed data with respect to a binomial distribution is the beta-binomial distribution [10] , [11] . Hence we propose a discrete channel model based on the beta-binomial distribution for the lower and upper pages referred to as the 2-BBM channel model. We show that this model fits the empirical distribution of the number of bit errors per frame and provides accurate ECC FER performance estimation. We also present simple approximations of the 2-BAC model based on the normal and Poisson probability distributions. Although these approximations are able to fit the empirical distribution of the number of bit errors per frame better than the 2-BAC model, they are not as good a fit as the proposed 2-BBM channel model.
A. Definitions and Notation
Let K represent the total number of bit errors in a frame of length N bits. Let K m be the total number of bit errors in a frame of N bits which consists of m zeros and N − m ones. The relationship between probability distributions of K and K m is given by
where ( N m ) 2 N represents the probability of observing exactly m zeros in a frame of N bits. K m can be represented as the sum of the number of 0 → 1 and 1 → 0 bit errors as
where
N −m denote the number of 0 → 1 and 1 → 0 bit errors respectively. K can also be represented as the sum of the total number of 0 → 1 and 1 → 0 bit errors as
where,
Note that u ∈ {0, 1} where l = m + (N − 2m)u. We use E[X] and Var[X] to denote the expected value (mean) and the variance of a random variable X respectively. We use X | Y to denote "X given Y ".
B. Candidate Discrete Memoryless Channel (DMC) Models
The primary error mechanism in MLC flash memories is at the cell level and hence the 4-ary DMC model with 4 inputs and 4 outputs can naturally account for all the cell level errors. This 4-ary DMC model requires 16 parameters (only 12 independent parameters) which are the cell level transition probabilities and these parameters can be easily estimated from experiment data such as that shown in Table I . However the 4-ary DMC model is not useful in practice as the logical unit of progam/read operations in current MLC flash memory applications is a binary page. Hence any practically applicable channel model would have to treat the errors in the lower and upper pages of the MLC flash memory independently, even though it is clear that the errors occur at the cell level and hence the lower and upper page bit errors are not independent.
A simpler more commonly used DMC model is the 2-BSC model where two independent BSCs are used to represent the bit errors occuring in the lower and upper pages. The advantage of using the BSC model for each page independently is that it is simple and well studied, with a variety of error correction coding (ECC) techniques available for transmission over the BSC. However, based on our error characterization results in Section IV, the bit errors in MLC flash memories during P/E cycling are mostly asymmetric in nature. Therefore, the BSC is clearly not an accurate model to represent the bit errors in MLC flash memories. A numerical comparison of estimated capacities of the 4-ary DMC model and the 2-BSC model was presented in [9] , where it was observed that the 4-ary DMC model provides a significant capacity gain compared to the 2-BSC model for MLC flash memories.
C. The 2-Binary Asymmetric Channel (2-BAC) Model
Based on the asymmetry of bit errors observed in MLC flash memories (Section IV), we propose a per page BAC model called the 2-BAC model where two independent BAC models are used to represent the bit errors occuring in the lower and upper pages. Table II . We consider a BAC as shown in Fig. 5 , where p is the probability N −m are distributed according to the binomial probability distribution and are independent i.e.,
The mean and the variance of K
m are given by
and those of K
N −m are given by
Proposition 1: The mean and the variance of K for a BAC model are given by
Proof: See Appendix A. The parameters of the BAC model p and q are estimated as the average 0 → 1 and 1 → 0 bit error rates obtained from experimental data corresponding to a particular P/E cycle point in the flash memory lifetime. An algorithmic description of the BAC model is presented in Algorithm 1.
Algorithm 1 BAC Model Implementation
Input: Input frame x of length N , BAC model parameters (p, q). Output: Data frame with errors y.
1: for x i ∈ x do 2:
Generate random sample u ∼ Uniform[0, 1].
3:
if x i = 0 then t = p else t = q.
4:
if u ≤ t then e i = 1 else e i = 0.
5: . Therefore, the BAC model is not a good fit for the observed empirical probability distribution of K as shown in Fig. 8 and Fig. 9 for vendor-A and vendor-B flash memory chips, respectively. As the Var[K] is much less than the observed sample variance, the 2-BAC model for MLC flash memory is expected to provide a more optimistic estimate of the ECC FER performance when compared to the actual performance. We discuss this in more detail in Section VI. However, note that the 2-BAC model does provide an accurate estimate of the average raw BER which is given by
N . This shows that the ability to accurately estimate/predict the average raw BER is not the sole criterion for a good MLC flash memory channel model.
D. The 2-Beta-Binomial (2-BBM) Channel Model
As mentioned in Section IV, the empirically observed sample mean and variance estimates show that the number of bit errors per frame data is overdispersed with respect to the binomial distribution. This is the major reason for the poor fit of the 2-BAC model discussed in the previous subsection. To account for the overdispersion, we propose a channel model for MLC flash memories based on the betabinomial probability distribution called the 2-Beta-Binomial (2-BBM) channel model.
The beta-binomial probability distribution was first proposed in [10] as the probability distribution for counts resulting from a binomial distribution if the probability of success varies according to the beta distribution between sets of trials. Using empirical data, it was also shown in [10] that the beta-binomial probability distribution is a good fit for overdispersed binomial data. Lindsey et al. [11] studied the beta-binomial probability distribution based model in fitting overdispersed human sex ratio in families data and it was found to be a good fit. Stapper et al. [12] developed a yield prediction model for semiconductor memory chips by modeling the overdispersed distribution of number of faults per chip using the gammaPoisson distribution which is closely related to the betabinomial distribution.
For the beta-binomial channel model, we model the variables K 
where (a, b) and (c, d) correspond to the parameters of a beta probability distribution defined as
where B(α, β) represents the beta function. Thus the BetaBinomial (BBM) channel model is derived from a BAC model where the bit error probabilities p and q are random variables which vary from frame to frame and are distributed according to the beta distribution. The BBM channel model is a 4-parameter model (compared to the 2-parameter BAC) and hence the 2-BBM channel model for MLC flash memories will be an 8-parameter model. The beta-binomial probability distributions of K
m and K
The mean and the variance of K (0) m and K
Proposition 2: The mean and the variance of K for a BBM channel model are given by
Proof: See Appendix B.
Proposition 3: The mean and the second moment of K
and K (1) for a BBM channel model are given by
Proof: See Appendix C. The parameters a, b, c, d of the BBM channel model are estimated from the sample moments of K (0) and K (1) using the method of moments [10] . From P/E cycling experiment data, we obtain the sample mean and sample second moment estimates of the random variables K (0) and K (1) which represent the total number of 0 → 1 and 1 → 0 bit errors per frame. Let µ 1 , µ 2 represent the first and second moment estimates of K (0) and µ 3 , µ 4 represent the first and second moment estimates of K (1) . Solving the equations in Proposition 3 for a, b, c, d, we have the parameter estimateŝ
An algorithmic description of the BBM channel model is presented in Algorithm 2.
Algorithm 2 BBM Channel Model Implementation
Input: Input frame x of length N , BBM channel model parameters (a, b, c, d). Output: Data frame with errors y.
1: Generate two independent random samples, p ∼ Beta(a, b) and q ∼ Beta(c, d). are as shown in Table III We also observe remarkable consistency in the parameter estimates of the BBM channel model across different blocks of the same MLC flash memory chip. Fig. 6 shows the empirical parameter estimates corresponding to the upper page BBM channel models for vendor-A chip using data collected from 3 different sets of 4 contiguous blocks of the MLC flash memory chip. Fig. 7 shows the empirical parameter estimates corresponding to the upper page BBM channel models for vendor-B chip obtained using different frame sizes. Although not shown (due to lack of space), we also observe similar consistency in the lower page parameter estimates for both the vendor chips using different sets of blocks on the same chip and different frame sizes. We also note that the estimates for lower page parameters a and b will be noisy because the 0 → 1 bit error rate in the lower page is extremely small. This consistency suggests that we may be able to model every flash memory chip with just 8 parameters of the 2-BBM channel model for accurate ECC FER performance estimation.
E. Normal and Poisson Approximation Channel Models
To model the overdispersed number of bit errors per frame empirical data, an alternative approach from a statistical viewpoint is to consider approximations to the binomial probability distribution which retain the general shape of the binomial distribution and whose mean and variance can be controlled independently. We propose two such channel models for MLC flash memories based on the normal and Poisson probability distributions called the 2-Normal Approximation to the BAC (2-NA-BAC) model and the 2-Poisson Approximation to the BAC (2-PA-BAC) model respectively. Similar to the 2-BAC and 2-BBM channel models, the 2-NA-BAC (resp., 2-PA-BAC) model consists of two independent NA-BAC (resp., PA-BAC) models for the lower and upper pages of MLC flash memories. The design goal for the NA-BAC and PA-BAC models is to ensure a match between the mean and variance statistics of the data from the model and the observed sample mean and sample variance. Based on this, we define rules for the normal and Poisson approximation as follows.
Let µ 0 and σ 2 0 denote the sample mean and sample variance of K (0) and µ 1 and σ 2 1 denote the sample mean and sample variance of K (1) . Let N (µ, σ 2 ) denote a normal distribution with mean µ and variance σ 2 and let P(λ) denote a Poisson distribution with rate parameter λ. Let g 0 and g 1 represent the sampled number of 0 → 1 and 1 → 0 bit errors per frame.
Definition 1: The normal approximation rules for the NA-BAC model are given by
where [·] denotes the round to nearest integer operator. Definition 2: The Poisson approximation rules for the PA-BAC model are given by
Based on these rules, an algorithmic description of the NA-BAC and PA-BAC models is presented in Algorithm 3.
The normal probability distribution is a continuous distribution with infinite support whereas the variables K (0) and K (1) being modeled have finite support and are discrete (integers). Hence we require the round to nearest integer function in Definition 1. The Poisson probability distribution is a discrete distribution with an infinite support set. Using goodness of fit tests in Section VI, we show that the 2-NA-BAC and 
2-PA-BAC models are a better fit than the 2-BAC model for the observed empirical data. However, the 2-NA-BAC and the 2-PA-BAC models are not as good a fit as the 2-BBM model to describe the bit errors in MLC flash memories.
VI. SIMULATION RESULTS AND EVALUATION OF CHANNEL MODELS
In this section, we provide a quantitative evaluation of the proposed channel models for MLC flash memories. For this we consider two viewpoints. The first one is a purely statistical viewpoint where we perform the Kolmogorov-Smirnov (K-S) Two Sample test [13] to evaluate the goodness of fit of the proposed channel models when compared with the empirical data. Next, we evaluate the proposed channel models for their application in ECC FER performance estimation. We emphasize the results of this latter evaluation when compared to the former, as accurate ECC FER performance estimation has been the main driving factor in the design of the proposed channel models.
A. Statistical Goodness of Fit Tests
The Kolmogorov-Smirnov (K-S) Two Sample test is a commonly used statistical test for determining if two sets of data samples are drawn from the same probability distribution. The K-S test is a very general test in that it makes no assumptions about the underlying probability distributions of the input data samples and is a non-parametric test [13] . This makes it suitable for our purpose as we have a varied set of underlying probability distributions of the number of bit errors per frame corresponding to the proposed channel models. The BAC and BBM model distributions do not match any well known probability distributions exactly although, they are close to the binomial distribution, and the NA-BAC and PA-BAC model distributions are approximately normal and Poisson respectively.
We perform K-S Two Sample tests comparing the number of bit errors per frame data samples from the proposed channel models to the empirical data obtained from P/E cycling experiments. The empirical data sample sizes, i.e., number of frames for each page, are 8704 for vendor-A and 4096 for vendor-B, respectively. For the BAC, BBM, NA-BAC and PA-BAC models, we simulate 10000 frames. The beta random variates to simulate the BBM channel model and the K-S Two Sample test statistic values are computed using the SciPy library [14] . The test statistic values are shown in Tables IV  and V for 8, 000 and 4, 000 P/E cycles, respectively. The null hypothesis is that the data samples from a proposed channel model and empirical data belong to the same underlying probability distribution. The test statistic is indicative of the difference in underlying probability distributions of the two input data samples. From Table IV , we see that the test statistic values are consistently low for the BBM channel model, thus indicating that it provides the best fit to the empirical data among all the proposed channel models. The p-values recorded (not shown) for all the K-S Two Sample tests in Tables IV and V are smaller than 0.01 indicating that the test statistic values are estimated with a significant level of confidence. The K-S Two Sample test compares the cumulative distribution functions (CDF) obtained from input data samples to compute the test statistic. Fig. 8 and Fig. 9 provide a visual comparison of these CDFs corresponding to vendor-A and vendor-B chips. 
B. ECC FER Performance Estimation
We evaluate the proposed channel models for their accuracy in ECC FER performance estimation using binary BCH, LDPC, and polar codes. The choice of these ECCs reflects the fact that BCH and LDPC codes are already being used in practical flash memory applications, while polar codes are a promising candidate for the future. The baseline ECC FER performance estimates are obtained from the empirical error data collected from MLC flash memory chips during P/E cycling experiments. As pseudo-random data was written to the flash memory chips during P/E cycling experiments, for ECC decoding we assume an all-zero codeword as the transmitted codeword with the error vector obtained from the empirical error data. This assumption is valid because all the ECCs considered are linear codes. To estimate the ECC FER performance using the proposed channel models, Monte-Carlo simulations are used where pseudo-random codewords of the ECC are generated and transmitted through the appropriate channel model and the received codeword is decoded. At least 400 frame errors are recorded for FER estimation. The FER performance of a (N = 8191, k = 7683, t = 39) BCH code using empirical data and the proposed channel models is shown in Fig. 10 . Fig. 11 shows the FER performance of a (N = 8192, k = 7683) regular quasi-cyclic LDPC (QC-LDPC) code with d c = 64 and d v = 4, where d c and d v refer to the check node and variable node degrees, respectively, in the parity check matrix. The parity check matrix of the QC-LDPC code is constructed using size 128 × 128 circulant permutation matrices and the design rate is specified as 0.9375. To ensure the required variable node degree d v , exactly d v permutations of the circulant matrix are stacked vertically along the rows of the parity check matrix for every set of columns. Zero matrices of size 128 × 128 are used to fill up any remaining rows. This is done using the progressive Program/Erase (P/E) Cycle Count edge growth (PEG) algorithm [15] to avoid short cycles. Note that although the specified design rate corresponds to a code dimension of 7680, we get k = 7683 due to three dependent parity checks in the final parity check matrix thus obtained. A sum-product belief propagation decoder with a maximum of 50 iterations and early termination is used to decode the QC-LDPC code. Fig. 12 also shows additional results comparing the FER performance of the QC-LDPC code obtained using empirical data and simulation data from the BAC, BBM channel models, separately for the lower and upper pages of vendor-A chip and the lower page of vendor-B chip. The lowest FER performance estimates from empirical data were obtained by P/E cycling 44 and 24 blocks of vendor-A and vendor-B chips, respectively. A total of 6 and 4 frame errors were observed to obtain the lowest FER performance estimates from empirical data for the lower and upper pages of vendor-A chip, respectively. For the lower page of vendor-B chip, 4 frame errors were observed to estimate the lowest FER performance from empirical data. Note that the results for the upper page of vendor-B chip are not shown as we did not observe any frame errors in the empirical data. We also note that a different vendor-B chip was used to obtain the additional results shown in Fig. 12 when compared to the rest of the paper. Fig. 13 shows the comparison of FER performance of a (N = 8192, k = 7684) polar code using empirical data and the proposed channel models. The polar code is optimized for a binary symmetric channel (BSC) with bit error probability p = 0.001 using the construction technique proposed in [16] . The successive cancellation list (SC-List) decoder proposed in [17] is used for decoding the polar code. For all the ECCs considered and using data from both vendor chips, we observe that the 2-BAC model provides an optimistic estimate of the FER performance when compared to the empirically observed FER performance. This is mainly due to the inability of the 2-BAC model to capture the high variance in the number of bit errors per frame observed empirically. The gap in ECC FER performance estimates using the 2-BAC model and the empirical data is increasing as the FER decreases, and it is about an order of magnitude for vendor-A chip at 6, 500 P/E cycles and greater than an order of magnitude for vendor-B chip at 7, 000 P/E cycles for the BCH code as shown in Fig. 10 . This gap in ECC FER performance estimates at low FERs is bad for determining the correct endurance (life-time) of a flash memory chip. From the results shown in Fig. 12 for the QC-LDPC code, we observe that the BBM channel model estimates the FER performance accurately even at lower FERs around 10 −4 , for the upper page of vendor-A chip and the lower page of vendor-B chip. The FER performance estimates obtained using the BBM channel model are better than those obtained using the BAC channel model for the lower page of vendor-A chip, however we observe a small mismatch in the BBM channel model FER performance estimates at lower FERs when compared to the empirical FER estimates. This mismatch is due to the inability of the BBM channel model to fit the larger proportions of frames with small number of bit errors observed in the lower tail of the empirical error histograms for the lower page of vendor-A chip. This appears to be a vendor-specific effect, as this kind of effect was not observed in the empirical error histograms corresponding to the lower page of vendor-B chip. Overall, the 2-BBM model is able to match the empirical ECC FER performance estimates accurately, while the estimates obtained using the 2-NA-BAC model lie between those of the 2-BAC and the 2-BBM models. The ECC FER performance estimates using the 2-PA-BAC model are the same as those using the 2-NA-BAC model and are omitted. From these results it is clear that the proposed 2-BBM channel model is able to accurately describe the nature of the number of bit errors per frame in MLC flash memories and hence provides accurate estimates of the ECC FER performance.
VII. CONCLUSION
We studied the feasibility of using well known discrete memoryless channel models to model the MLC flash memory channel. Based on empirical error analysis and ECC FER performance estimation for BCH, LDPC, and polar codes, we observe that the 2-BAC model with parameter estimates derived from empirical error data suffices to produce an accurate estimate of the average raw bit error rate, but it provides an incorrect optimistic estimate of the ECC FER performance when compared to the empirically observed ECC FER performance. This is mainly due to the overdispersed nature of the number of bit errors per frame in empirical data which is not modeled well by the 2-BAC model. We proposed the 2-Beta-Binomial (2-BBM) channel model based on the beta-binomial probability distribution and using statistical analysis, goodness of fit tests and ECC FER performance results showed that the 2-BBM channel model accurately describes the nature of the number of bit errors per frame in MLC flash memories. We also note that the BBM channel model can be shown to be equivalent to an urn based channel model [18] and hence has memory associated with it. Although not presented in this paper, our preliminary experiment results for combined data retention plus P/E cycling stress show the evidence of overdispersion in error statistics and the suitability of the proposed 2-BBM channel model. We leave a detailed examination of this as future work. Although the proposed channel models are for MLC flash memories, the proposed empirical design approach is generic and can easily be extended for three-level cell (TLC) flash memories. 
From (2) and (7), we have 
Therefore, E[K 
Note that we have used the combinatorial identities 
Therefore we can obtain Var[K] from (40) and (41) as
APPENDIX B PROOF OF PROPOSITION 2 We take the same approach as the proof of Proposition 1. From (2) and (16) 
We have used the combinatorial identities (42) and (43). From 
We have used the combinatorial identities (42) and (43) and also the fact that the second moment of a beta-binomial random variable K (a+b)(a+b+1) . The expressions for E[K (1) ] and Var[K (1) ] can be derived similarly.
