Abstract-In the write process of multilevel per cell (MLC) flash memories, an iterative approach is used to mitigate the monotonicity problem. The monotonicity in programming is considered to be the major restriction in MLC flash. To solve this issue, an iterative approach called incremental step pulse programming (ISPP) is used to concurrently program lots of cells in small steps. In this paper, we are mostly concerned with deriving a mathematical model for iterative programming using the framework of renewal theory. We obtain a closed-form approximation for the probability distribution of the number of steps required in the ISPP process. We also bound the maximal error between the true distribution and our approximation. Moreover, the results obtained help to accurately analyze the effect of inter-cell interference in this type of memory. Finally, we devise an adaptive step size approach for write process to strike a balance between latency and lifetime under fixed bit error rate constraints or information rate constraints.
I. INTRODUCTION

E
VERY storage medium can be modeled as a communication channel which contains a program (write) phase and a recovery (read) phase. During the program phase of a NAND flash storage system, an n-bit symbol is mapped to one of 2 n non-overlapping voltage partitions, and the voltage of a cell (CMOS transistor) is adjusted to fall within the corresponding partition. The stored data is recovered by comparing the voltage of the cell with some predetermined read thresholds during the recovery phase.
Monotonic property in programming is considered the major restriction of MLC flash memory which distinguishes it from all other storage channels [2] . In this type of memory, the reduction of the stored voltage in a cell is a very costly operation. In fact, it is not possible to individually reduce the voltage of each cell. The reduction process is simultaneously done for a very large group of cells, called a block, and the voltage of all the cells in a block return to an original state, called the "erased state". To deal with the monotonic programming restriction, an iterative strategy, called incremental step pulse programming (ISPP), is used to concurrently program the target data in a very long array of cells, called the "word-line" [3] , [4] . ISPP is an iterative process and in each iteration, the voltage of the cell is increased by a small amount during an incremental step. Also, in each iteration, a verification step is executed to check whether the cell has reached the target voltage. Note that although it takes longer to program a single cell using ISPP, the overall write delay is decreased due to the concurrent programming of multiple cells. Also, the overcharge programming is enormously reduced because of using small incremental steps. Despite the fact that a common program voltage is applied to all cells during ISPP, the amount of voltage increment is random due to the injection hardness property [5] . The charge injection for each cell is intrinsically different, resulting in a varied amount of voltage increment during programming among all the word-line cells.
Several studies have been made on modeling and analyzing the write process in the past. In [6] and [7] , Jiang and et al. primarily focused on obtaining the storage capacity and optimizing the expected programming precision for a single cell. Note that the results obtained are only valid if individual programming of each cell were possible. If this approach is used in ISPP, it makes the verification step very complicated and time consuming. Moreover, the initial random voltage in the erased state was not considered. In [5] , ISPP programming of a flash memory was studied, and an algorithm was developed for parallel programming of flash memory when the cell hardness information is available. In [8] , a similar analysis was made for the case in which the increments are exponentially distributed. Although the results are helpful in understanding the capacity of parallel programming, they disregard the existing erase state randomness which makes them inapplicable in practice.
In this paper, we introduce a mathematical model for ISPP using the renewal theory framework. A renewal process is a stochastic counting process that models the number of steps required for a random sum to pass a specific threshold [9] , [10] . We show that it is possible to model the ISPP in flash as a renewal process with a random starting point. We call this process the "ISPP renewal process". We show that the ISPP renewal process is a renewal process whose starting point is random. To the best of our knowledge, this is the first attempt to describe the relationship between ISPP in flash and 0733-8716 © 2016 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See http://www.ieee.org/publications_standards/publications/rights/index.html for more information. renewal theory. We first derive the connection between the ISPP renewal process in the finite threshold regime and the asymptotic classical renewal process. Next, we obtain a close approximation for the probability mass function (PMF) of the number of steps required in the ISPP renewal process. We also bound the maximal error between the true distribution and our approximation showing extremely good agreement in realistic scenarios. Finally, we use results from renewal theory to obtain the distribution of the cell voltage after passing the threshold, called the "overshoot", and analyze the ICI using the obtained overshoot distribution. After having determined the statistics of the write process, our next goal is to use these statistics in order to describe/optimize trade-offs between speed, accuracy and device lifetime. Using the analytic and semi-analytic tools developed in this paper, we devise an adaptive approach to design the step size in order to improve the flash life-time.
II. FLASH MEMORY BASICS
A NAND flash memory is comprised of arrays of floating gate transistors (cells). Each MLC transistor can store multiple bits (usually 2-4 bits). By injecting a writing charge on the floating gate of a cell, the cell voltage is increased to one of the multiple partitions, and the symbol is stored. Figure 1 shows the mapping to the stored voltage of a 2-bit MLC flash.
In order to increase the speed of programming (writing) and recovering (reading) data in flash memory chips, all memory cells are hierarchically organized in blocks and arrays (called word-lines). Each flash memory chip usually contains thousands of blocks, and each block is composed of 32 to 128 word-lines. Each word-line contains 2K to 64K cells which form a very long array. In this paper we assume the flash has the "all-bit-line" programming structure. In this structure, we call the target bits of a word-line that need to be stored in the same logical location, a logical "page". In other words, each word-line in an m-bit MLC flash contains m pages of logical information. Note that the smallest unit that can be simultaneously accessed for programming or reading is a page. Figure 2 illustrates the page definition for a 2-bit NAND flash memory in the write/read process-namely the least significant bit (LSB) page and the most significant bit (MSB) page.
A. Fundamental Operations
MLC flash memories basically support three major processes which we briefly describe as follows: 1) Erase Process: In this process, the charged voltages (stored in the floating gates of all the cells in a block) are tunneled out through the Fowler-Nordheim mechanism [11] . After the erase process, the voltages of all the cells in a block will fall into the erased state (e.g. state "00" in Figure 1 ). The smallest unit that can be erased is a block. This limitation makes cell programming a one-way operation because it is not possible to erase a specific cell separately from other cells in a block. Furthermore, it is necessary to erase a memory cell before being able to program it. The distribution of the cell voltage in the erased state, denoted by V 0 , tends to be Gaussian with mean μ 0 and variance σ 2 0 , i.e., V 0 ∼ N (μ 0 , σ 2 0 ) [12] . 2) Write Process: Due to the monotonic programming restriction, it is very important to accurately program each page such that the voltages of all cells in the page fall in their intended voltage ranges. Thus, the ISPP approach is used to program each page. This approach provides a series of verification steps right after each short increment programming step [2] . Figure 4 shows the voltage changes during the ISPP steps of programming. Let w be the threshold voltage for programming a page. Those cells with the target symbol B = 1 (by convention) should pass the threshold voltage w. In the i -th step of programming, the cell voltageṼ i−1 is increased by V i for each cell whose voltage has not reached the target value w. After the i -th programming pulse, the voltages of all the cells are compared with the target threshold voltage w during the verification phase. In any particular cell, the ISPP is stopped whenṼ i of that cell is greater than the desired threshold voltage w. The program pulse V i is accurately modeled as a random variable with positive support set.
3) Read Process: During the read process, the stored symbols of all cells in the same page are read concurrently. The read process is done by comparing the voltage of a cell with some predetermined thresholds r ∈ {r m 1 , r l , r m 2 }, called "read thresholds". Note that the read thresholds r need not equal the write thresholds w. That is, the threshold values w in Figure 1 need not equal the thresholds value r . In general, the optimal read thresholds (r ) must be obtained by using either a trellis-based Viterbi detector (Maximum Likelihood (ML) detection) or a trellis-based BCJR detector (Maximum a Posteriori (MAP)). In practice, since the input sequence usually is considered to be i.i.d, optimal detection can be executed on a symbol-by-symbol basis without implementing a trellis [13] , and simple thresholds suffice.
B. Degradation Sources
There are two major sources of performance degradation that affect the voltage in each cell during the write process:
1) Program/Erase (P/E) Cycling Effect: The P/E cycling distorts the final threshold voltage of a transistor due to the trapping and detrapping ability of the interface at the transistor gate, which leads to higher fluctuation of the final voltage of the cell. Note that P/E cycling has a higher effect on the voltages of those cells whose target state is the erased state (state "00" in 2-bit case). Moreover, as the flash gets older (i.e., the number of P/E processes in a flash increases), the variance of V i during ISPP gets larger. 
2) Inter-Cell Interference (ICI):
Due to the capacitance coupling effects of the neighboring cells, the change in the threshold voltage of one cell during programming (charging) affects the final voltages of all the other cells (especially those cells that were already programmed) [14] . ICI is a degradation source that grows with density. As cells are packed closer to each other, the influence of threshold voltages from neighboring cells increases. For a cell in location
As shown in Figure 3 , the set of aggressors A (k, ) in an all-bit-line structure for cell (k, ) is typically the set of nearest neighbors that get programmed after the victim cell:
III. NOTATION Our notation is geared towards a 2-bit MLC flash memory with an all-bit-line structure. All random variables are denoted by capital letters. Let X = B m B l denote the 2-bit channel input of a cell, where B l and B m are binary random variables that represent the LSB and MSB of the input X, respectively. Let Y be the 2-bit channel output that is obtained after the read process. X and Y take value from the set S = {00, 10, 01, 11}. We use superscript (k, ) for the binary symbol B and the input X to associate them with cell
). For notational simplicity, we remove the superscript (k, ) whenever the location of the cell is unambiguous, and similarly we ignore the subscripts l and m when the significance of the bit is clear. For sake of simplicity, c v denotes the vertical capacitance coupling coefficient (c v = c (1, 0) ). All the write threshold voltages are denoted by w ∈ {w ml , w l , w mh }, where w l denotes the write threshold for LSB, and w ml and w mh denote the low and high write thresholds for MSB, respectively ( Figure 1 ). Similarly the read thresholds are denoted by r , where r ∈ {r ml , r l , r mh }. By convention, if B = 1 for a cell in a page, the program voltage of the cell needs to pass the threshold w. Due to the equality of the distances between vertical aggressors and the victim cells, the vertical ICI coupling coefficient, denoted by c v , is assumed to be equal for all cells. Finally, we denote by n P E the P/E cycling number of the flash.
IV. ISPP RENEWAL PROCESS
In this section we first model the ISPP page programming using the known concepts in renewal theory [9] , and explain the relationship between the ISPP process and the classical renewal process. Note that ISPP only increases the voltage of those cells whose target bit is B = 1. The cells with B = 0 are already in the final state and their voltages do not need to change during the ISPP renewal process. Let V j be the sum of voltage increments up to the j -th step. i.e.,
where V i denotes the increment during the i -th step of ISPP. Remark 1: Since several degradation sources such as random telegraph noise (RTN) and program disturb affect the probability distribution of increments, we do not focus on deriving the exact pdf for V i . More details can be found in [15] and [16] . Note that unless specified clearly, most of the results presented in this work are valid for any probability distribution function for V i 's with positive value.
LetṼ j denote the cell voltage after programming step j . Given that the process starts in the erased state V 0 , the cell voltageṼ j isṼ
We denote the cumulative distribution function (CDF) of
Note that the amount of write voltage applied to the control gate (20-25 V [11] ) is larger than the stored voltage (< 5V ) and tit is much larger than the amount of increments ( V i < 0.5V in practice). Thus, to simplify the analysis and make the process mathematically tractable, we assumed that V i and V j are independent. Note that the voltage V 0 is also assumed to be independent of all V i 's.
For a write threshold value w ≥ 0, define the classical counting process as N(w) max{n : n ∈ {0} ∪ N and V n ≤ w}. N(w) represents the number of voltage increments that occurred prior to passing the threshold w. The counting process {N(w), w ≥ 0} is called the renewal process.
Definition 1 (ISPP renewal process):
The ISPP renewal processÑ (w) for a threshold voltage w ≥ 0 is defined as
Note that if V 0 = 0, thenÑ (w) = N(w). For a particular w, the random variableÑ (w) represents the number of ISPP steps required to stay just below voltage w. Note that there exists another way to formulate the ISPP renewal process as the number of steps required to pass the threshold w. Let
Then, clearlyL(w) =Ñ (w) + 1. Although formulating ISPP usingL(w) seems to be natural and very simple to understand, we will continue to useÑ (w) because it is consistent with the classical renewal process formulation [10] , thus simplifying the use of known results.
Remark 3: Note thatÑ (w) is dependent on the input B. N (w) = −1 happens either when B = 0, or when B = 1 and V 0 ≥ w. i.e., the cell voltage is already above the threshold w in the erased state and it need not be programmed (L(w) = 0).
Remark 4: Note that if V 0 = 0, the ISPP renewal process N (w) is reduced to the classical renewal process N(w). The increment V i represents the inter-occurrence random variable. In practice, the ISPP renewal processÑ (w) has starting voltage V 0 that tends to be a Gaussian variable, V 0 ∼ N (μ 0 , σ 2 0 ) [13] . Thus, the ISPP renewal process is a renewal process with a random starting voltage. It belongs to the class of processes known as "delayed renewal" [10] .
Observation 1: Using the renewal process formulation [10] : 1)ṼÑ (w)+1 is the voltage of the cell immediately after the threshold voltage w is exceeded. 2)Ñ (w) + 1 is the number of incremental steps required to pass the threshold voltage w.
ṼÑ (w)+1 −w is the excess programmed voltage after passing the threshold w. 4)Ñ (w) ≥ n if and only ifṼ n ≤ w.
V. DISTRIBUTION OFÑ (w)
While there exist several asymptotic results for the classical renewal process N(w), there are not many results in classical renewal theory when w is finite. To analyze the ISPP processÑ (w), however, we are interested in obtaining a good PMF approximation ofÑ (w) when w is finite (0V ≤ w ≤ 5V ).
In this section, we first derive the connection between the ISPP renewal processÑ (w) in the finite threshold regime and the classical renewal process N(w). Then, we use some existing asymptotic results to simplify the analysis of the ISPP process in the finite threshold regime. To be precise, we obtain a close approximation for the PMF ofÑ (w), and linear approximations for mean and variance ofÑ (w). We also bound the maximal error between the true distribution and our approximation. Finally, we closely approximate the distribution of the overshoot˜ (w).
A. Convergence
In this section our goal is to characterizeÑ (w). Precisely, we seek to find a relationship betweenÑ (w) and N(w). To that end, for any ∈ N, we define a new counting process N (w), related to N(w) and two constants μ and μ defined above, as
Next, we show howÑ (w) is related to N (w).
where
Proof : It follows from the proof of Proposition 1. Remark 5: Note that for the ISPP renewal processÑ (w), it is known that V 0 usually has negative mean (μ 0 < 0) and a large variance σ 0 σ [12] . (E.g., if σ 0 = 0.5 and V i ∼ U [0.1, 0.15], then ζ > 1000). Thus, the processÑ (w) can be viewed as a shifted version of the classical renewal process in which the starting point V 0 is a large negative voltage. In this case, the pdf ofÑ (w) is similar to the asymptotic distribution of the classical renewal process N(w) when w → ∞.
B. Gaussian Approximation
We next embark on quantifying the deviation between PMF ofÑ (w) and its Gaussian approximation when w and ζ are finite. For n = 1, 2, · · · , let F n (w) denote the CDF of the ISPP voltageṼ n F n (w) P{Ṽ n ≤ w}, w ≥ 0.
Using Observation 1-d, it is easy to verify that
Denote g(n) = Q(
). Let us define a new random processÑ ζ (w) whose CDF is Gaussian, i.e.,
Proof : It follows from the proof of Proposition 1.
and for any finite ζ > 0,
Proof : We only sketch for n ≥ 0 (n = −1 is clear).
where (a) holds because when n ≥ 0, the condition N ζ (w) ≥ n is equivalent to N(w − μ 0 + ζ μ ) ≥ n + ζ using (3) and (b) holds because of the CLT. The rest of the proof follows using (9) and the Berry-Esseen Theorem [17] . Bound (8) holds for all possible distributions of the random variable V i . However, when { V i } ∼ U [a, b], simulations suggest that the bound could be much tighter than o(
). This observation is proved in the following corollary.
, and denoting δ n as
then,
Proof : Similar to the proof of Proposition 2 and using the Uspensky Theorem [18] .
, it is easy to verify that when n + ζ ≥ 10, the following bound holds:
.
C. Distribution of˜ (w)
Another key factor in page programming is the voltage overshoot distribution for the cells which pass the target threshold w. Given B = 1, let˜ (w) denote the amount of overshoot of a cell voltage after passing threshold w.
Remark 7: Note that the programming precision is also directly related to the distribution of increments V i during the ISPP process. The larger the increment step size is, the wider the probability distribution function (pdf) of (w) becomes, and thereby programming becomes less accurate.
Proposition 3:
Proof : See [1] for complete proof. Simulation results show that the CDF of the overshoot for finite ζ 1 is very close to the asymptotic result (11) . Therefore, in this paper, for any w > 0 and for a finite value ζ , we approximate the overshoot distribution as
Remark 8: Note that equation (11) explains why the overshoot distribution˜ (w) is independent of the threshold voltage w and only depends on the statistical property of step size V i . 
VI. INTER-CELL INTERFERENCE (ICI)
Using the obtained distribution for the ISPP overshoot 
A. ICI Effect on the LSB Page
In the writing process, there might be a case that only the LSB page is used to store the data. In other words, some of the word-lines might be used to program only a single page rather than multiple pages. Thus, we need to separately analyze the ICI for the LSB page. Also, the ICI effect on the LSB page helps to better estimate the unknown hidden states which were explained in detail in [19] .
Let B 
and Figure 5 shows the pdfs of the voltages of the cells in each stage during page programming. Figure 5(a) shows the pdf of the cell voltage before LSB page programming (each cell is in the erased state). Figure 5(b) shows the pdf of ISPP for LSB page with threshold w l . Finally, the ICI effect on the cell due to the LSB writing of vertical aggressor is shown in 5(c).
Remark 9: Note that it is common practice in industry to mitigate the ICI effect using a two-pass programming procedure [12] . This dramatically affects on the initial voltage distribution for MSB page programming, and makes the ICI analysis for MSB page case more complicated. While we believe that it is possible to analyze ICI for MSB page, the exact calculations of this case is subject to further research.
B. ICI Analysis
In this section, we provide simulation results to show the effects of ICI and analyze the effects of step size variation on the overall performance. The simulation results are obtained from a 4MB, 2-bit MLC flash memory block with the all-bit-line structure. We assume that the target input is an i.i.d process. The write thresholds are set to W = {1.0V, 2.0V, 3.0V }. The mean and standard deviation of V 0 are assumed to be μ 0 = −1.0V and σ 0 = 0.5V , respectively [19] . We assume c v = 0.06 as introduced in [19] , [20] and the references therein. The read thresholds r are numerically computed using the designed optimal MAP detector approach in [13] . For simplification, we assumed that V i 's are uniformly distributed ( V i ∼ U [a, b] ) and varied the parameters a and b in the ISPP process. We assumed that the minimum step size a varies in the range [0.05, 1] and b = 1.4 × a. Also, we run ISPP until the number of steps reaches a pre-specified maximum number of iterations η, and we assume η = 15.
Remark 10: The MLC flash channel belongs to the class of channels with memory, and the ICI effect is considered as the major source of channel memory. The information rate for this class of channels can be numerically computed using a forward sum-product recursion of the Bahl-Cocke-JelinekRaviv (BCJR) algorithm [21] .
There are two possible ways to compute an information rate measure for MLC flash channels: (i) the symbol-by-symbol mutual information, i.e., compute I 1 = I (X (k, ) ; Y (k, ) ); (ii) the correct information rate for MLC flash using the BCJR algorithm [21] . Let K and L denote the number of wordlines and bit-lines in the block, respectively. Let X (K ,L) (1, 1) and Y (K ,L) (1, 1) denote the input sequence and output sequence of the whole block. Then, the MLC flash information rate obtained from [21] , denoted by I 2 , is computed as
(1,1) ).
Remark 11:
To compute I 2 , we simplify the problem by ignoring the diagonal aggressor (E[γ xy ] = 0.006), and thereby A (k, ) = {(k +1, )}. The channel model is similar to [13] . The final voltage of aggressor V (k, +1) is considered to be the state of the victim cell V (k, ) . Since the voltage is continuous, we quantize the state. Figure 9. shows the results when the number of quantization levels are |S| = 20. We run the sum product recursion mentioned in [21] to compute information rate I 2 . Note that we reorder the read data from a block to obtain the long sequence x n , y n , s n which is needed in [21] .
Note that for any i.i.d input process, I 1 is always a lower bound for I 2 , and the difference I 2 − I 1 is attributable to the ICI effect of MLC flash. Figure 6 shows the result of varying the step size a for both information rates I 1 (solid curve) and I 2 (dashed curve). Both I 1 and I 2 were computed using Mone Carlo method. In Figure 6 , we exclude the aging effect on the performance by setting the program/erase cycling number n P E = 1, and we disregard the retention effect by reading the stored data immediately after writing it. Figure 6 suggests that the flash performance is optimal when the step size parameter a, is chosen from the range [0.2, 0.6], and b = 1.4a.
Remark 12: The performance drop on the right side of both curves in Figure 6 is because of the large step size of ISPP programming, which leads to the over-injection problem (large overshoot˜ (w)). The performance drop on the left side, however, is due to the undershoot problem in ISPP and is caused by the relation between the maximum allowed number of steps η and the increment step size. The undershoot error happens when some of the cells can not reach their target threshold due to small maximum number of steps or small step size. We discuss this error in more detail in section VII.
Remark 13: To compare our propose results with the best known ICI handling methods [20] - [22] , we encourage the reader to see our previous work [13] . Simulation results in [13] show that at soft information quality (SIQ = 1.8) bits per cell our propose MAP detector outperforms the best known detectors such as post-compensation and pre-distortion ones [20] , [22] by 0.35 dB. We believe that we can improve further the MAP detector performance if we modify the detector using our renewal process results.
VII. WRITE LATENCY
Simulations show that most of the cells in the page pass the threshold much earlier than the lagging few. Thus, it is not efficient to associate the write latency with the last cell that passes the threshold. In practice, the ISPP page programming operation ends when either all the cells with target bit B = 1 in the page reach their target threshold w or the number of steps reaches a predetermined maximum number. For a page, the predetermined maximum allowed number of steps in the ISPP process is called the "maximal delay", and is denoted by η [2] . Hence, for a 2-bit MLC word-line, we denote the maximal delay corresponding to the LSB and MSB pages by η l and η m , respectively.
A. LSB Page Latency
Let m be the length of the word-line. The LSB write latency is related to the number of steps needed for all the cells with target bit B 0 = 1 to pass w l . LetT Let k 1 be the index of the cell with largest valueÑ (w l ) + 1 in the j -th word-line. In other words, cell ( j, k 1 ) is the most latent cell during LSB page programming. If η l < N ( j,k 1 ) (w l ) + 1, there exists at least one cell that does not pass the threshold w l , and causes the undershoot write error when trying to store its corresponding bit B 0 = 1. Note that in this caseT Figure 7 shows the histogram of cell voltage after LSB page programming. The shaded part on the left side of w l is because of the undershoot error; the non-shaded part corresponds to the overshoot error.
Remark 14: When designing a flash memory, care should be taken so that, the amount of undershoot error is always much smaller than the available ECC capability. In other words, we can allow a small percentage of undershoot errors to obtain some write latency reduction, and handle the rare undershoot errors using the flash ECC module.
Definition 2 (ISPP Undershoot Error Rate):
The allowed undershoot error rate, denoted by α, is the maximum allowed probability that a cell does not reach its target threshold after an ISPP page programming.
Proposition 4 (Setting
To guarantee a pre-specified allowed ISPP undershoot error rate α, it suffices to choose the number of allowed steps η l as the smallest integer that satisfies
where δ ζ +η l is given by (9) . Proof : See Appendix B.
B. Confidence Interval for η
Inequality (16) is useful to set the maximum number of ISPP steps (η) when the allowed undershoot error rate α is known. Note that α is the probability that a single cell does not reach its target threshold w after η steps, and in practice α is not part of the device specification; Instead, it is desirable to figure out the relation between η and the page quality loss factor (which is typically a design criterion).
Definition 3 (Page Quality Loss): Let m denote the size of a page. The 100q% denotes the page quality loss, where q # of bit errors m Theorem 4 (Confidence Interval for η): Let η be the maximum number of steps in ISPP page programming. Given the page quality loss 100q%, an approximate 100(1 − β)% confidence interval for η is the set of integers that satisfy where
VIII. STEP SIZE DESIGN
As Jiang et al. mentioned in [2] , finding the trade-off between programming precision and the total programming latency is considered to be one of the fundamental problem of monotonic memory channels. In general, the worst-case write latency is proportional to the maximum number of steps η. Hence, we measure the write latency by η. We also use the mutual information between the target input X and the recovered output Y as the overall accuracy measure.
A. Latency Distortion Theory
In this section, we assume that the MLC flash is in its early life. Thus, the P/E cycling number is very small and it is possible to disregard the wear-out noise. We are interested in obtaining the best step size V i that minimizes the latency while a particular information rate is achieved.
Remark 15: To the best of our knowledge, there is no studies that addresses the ISPP latency by changing the step size. In practice, however, the ISPP number of steps should meet should be smaller than a specific per-determined value, denoted by η max . For example, Suh and et.al in their seminar work [4] set the η max to be equal 10 steps.
Here we assume that there exists a maximum bound b max for the step size V . It is clear that the step size can not be considered unbounded. In order to set a proper b max , it is necessary to take into account the number of voltage levels in the cell and the required error margin for other degradation sources such as ICI and wear-out noise. Given that b max is known, the problem is formulated as follows: (17) Note that it is very difficult to analytically solve the above optimization problem because of the enormous number of existing noise sources which make the problem extremely complex. We show a numerical procedure to obtain the best step size V ∼ U [a * , b * ] in the following example.
Example 1: Figure 8 shows the write latency η for all possible
The maximum number of steps η is obtained from simulation. It is clear that η is a decreasing function Fig. 9 .
The contour graph of I (X, Y ) ≥ 1.9 (solid line) versus step size parameters a, b and the contour of η (dashed line) versus step size parameters a, b. The shaded region is the valid step size to be used in practice based on Suh's paper [4] .
of a and b. Therefore, the optimal step size belongs to the boundary of the feasible set. Figures 9 shows the step size regions for various η given that the information rate I (X, Y ) ≥ 1.9 is attained for both LSB page and MSB page. The solid line shows the feasible set, and the dashed lines show the contour set for all possible 4 ≤ η ≤ 30. As shown, if η is set to be large, it is possible to use smaller step sizes. To be more concrete, as shown in Figure 9 both V i ∼ [0. 5, 0.7] and V i ∼ [0.4, 0.6] are valid step size to attain the desired rate R ≥ 1.9 with η = 8 and η = 10, respectively.
B. Adaptive Regulation of Step Size
The lifetime of the flash memory for a specific ECC is the minimum P/E cycling number before the ECC fails to guarantee un-correctable bit error rate (UBER) < 10 −15 (the acceptable reliability in flash storage industry) for a certain retention period. We analyze the effect of step size change on the life time of a flash memory for a fixed storage time. As explained in [23] , P/E cycles widen the final voltage distribution and move the mean to the right. Simulation results in [12] show that an exponential random variable can model the characteristics of the P/E cycling effect with 95% accuracy.
The final state distributions in flash memories get worse as the number of program/erase (P/E) cycles increases (aging effect). Thus, in the early life of a flash, more programming errors can be corrected by ECC than at the end of its lifetime. In other words, it is possible to operate faster programming and reduce the flash write latency as long as the writing error can be handled via the flash ECC module.
As a result, the channel is time-variant. Namely, a flash memory is programmed faster with higher acceptable error rates early in the product's life cycle versus late in the life cycle. In other words, it is possible to use a large step size V i at the early stage of the flash lifetime and attain the sufficient information rate in fewer programming steps. As the flash ages, however, smaller and more accurate step sizes V i are needed to still be able to attain the target information rate. As long as the error correction code (ECC) can handle the relaxed distributions (either due to larger step size at the early age of the flash or natural wear-out degradation in the later stages of the life cycle), the effect will be compensated for, and the data is fully recoverable. Figure 10-a shows the raw-bit-error rate (RBER) versus P/E cycles for a 2-bit MLC flash block which was programmed with large step size V i ∼ U [0. 6, 0.7] . As shown, the write programming is very fast (η = 8(1.0X)); however, the device becomes unreliable at small P/E cycle (RBER ≥ 2.6 * 10 − 3), i.e., the lifetime is short. Similarly, Figure 10 -b shows RBER versus P/E cycle when the device is programmed with a small step size V i ∼ U [0.1, 0.2]. As shown, the write programming is very slow (η = 32(4.0X)); however, the device becomes unreliable at large P/E cycle (RBER ≥ 2.6 * 10 − 3) i.e., the lifetime is long. Figure 10c shows a compromise, namely the RBER for a 2-bit MLC flash block versus P/E cycle for various step sizes that are adaptively tuned to keep the RBER lower than the minimum RBER correctable by ECC (Black curve). Moreover, the blue curve shows the corresponding latency η which gradually increases as flash P/E cycle grows. We consider a simple 32 Kbit BCH code which needs RBER≤ 2.6 × 10 − 3 (horizontal dashed red line). To simplify the step size design, we assume that b −a = 0.1V ( V ∼ U [a, b] ). Thus, we start with a large step size (a = 0.6) and as long as the obtained RBER is under the desired threshold, the flash block gets programmed and erased immediately. Note that the step size a = 0.6V is valid to be used in ISPP as long as the P/E Cycle < 4K . The corresponding maximum number of steps is η = 8. As shown in Figure 10 -c, when P/E cycle passes 4K , we reduce the step size to a = 0.5, and thereby, the RBER is maintained under the threshold. By tuning the step size properly, the lifetime of the flash can be extended to P/E Cycle= 17K (when a = 0.1). Note that the adaptive tuning of step size only gradually increases the delay, thereby extending the lifetime of the device. As shown, in Figure 10 -c, the maximum step size increases from η = 8 for a = 0.6V to η = 32 when a = 0.1, but the average delay η over the lifetime is relatively smallη = 15.
IX. CONCLUSION
In this paper, We use renewal theory to model the ISPP which is an iterative technique to program the data in MLC flash memories. We show that the number of required steps in ISPP is a renewal process with random starting point. This modeling provide us a mathematical tool to effectively analyze some of important features such as overshoot, undershoot, inter-cell interference (ICI) and write process delay. We have found good approximations for the distribution functions for cell programmed levels, and have quantified the approximation error. Moreover, we compute channel bit-error rate (BER) and information rate (IR) resulting from the model and thus devise an adaptive method to select the step size to strike a balance between latency and life-expectancy under fixed BER constraints and IR constraints. 
To prove (4), we bifurcate the proof into two cases: i) n ≥ 0:
where (a) holds because when n ≥ 0, the condition N (w) ≥ 0 is equivalent to N(w − μ 0 + μ ) ≥ , (b) and (e) follow from Observation 1-d, (c) is the triangle inequality and (d) follows from (19) . Note that
Thus, using (21), (22) and the triangle inequality it is easy to verify that for > L, P{N (w) = n} − P{Ñ (w) = n} < .
ii) n = −1: It is straightforward combining Gaussian property of erased state and Observation 1-d.
APPENDIX B PROOF OF PROPOSITION 4
Proof: As discussed earlier,Ñ (w l )+1 is the number of steps required for a cell to pass the threshold w l during LSB page programming given that B = 1. Let E η l represent the event that a cell does not reach the threshold after η l steps. Clearly, E η l happens if and only ifÑ (w l ) + 1 > η l . Hence,
Consequently, α must be chosen to satisfy
Using Corollary 3 it is straightforward to show that
The rest of the proof follows by combining equations (16) and (24) .
APPENDIX C PROOF OF THEOREM 4
Proof: Let N E be the number of undershoot errors after programming a page. Note that N E is a binomial random variable with parameter m and p = α (i. e., N E ∼ B(m, α) ). We assume the undershoot error probability α is unknown. Then, an approximate 100(1 − β)% confidence interval for α is the set [24, Th. 5.3.1].
RecallÑ (w) + 1 is the number of steps needed for a cell to pass the threshold w if the target bit is B = 1. Let E η be the event that a cell does not reach its target voltage w after η steps. Then,
where (a) is because the input distribution is assumed to be i.i.d. and independent of the programming process. In order to guarantee the programming with quality loss under 100q%, it is clear that
Since we assumed that P(B = 1) = 1/2, to guarantee (26), η must be chosen such that
As discussed the actual probability P{Ñ (w) ≥ η} is unknown. Using Corollary 3 it is straightforward to show that
Hence, in order to satisfy (26), η must be chosen to satisfy
i.e., η should belong to the set of integers which satisfy (29).
