Abstract-Flash memory is a promising new storage technology. To fully utilize future multi-level cell Flash memories, it is necessary to develop error correction coding schemes attuned to the underlying physical characteristics of Flash. Based on a careful inspection of fine-grained, experimentally-collected error patterns of TLC (three bits per cell) Flash, we propose a mathematical model that captures the intracell variability, which is manifested by certain patterns of bit-errors. Error correction codes are constructed for this model based upon generalized tensor product codes. For fixed levels of redundancy, these codes are shown to exhibit substantially lower bit error rates than existing error correction schemes.
I. INTRODUCTION
Flash memory devices can be found almost everywhere today. They are lighter, faster and more shock resistant than traditional magnetic hard drives. As this technology scales and the storage density increases, data errors become more prevalent, making error correction coding critical for maintaining data integrity.
The storage density of a Flash memory device is dependent on the number of discrete voltage levels the floating gate cell is capable of representing. In early generations, every memory cell could represent two voltage levels and thus store a single bit (SLC). The demand for increased storage capacity has created the need to store more than a single bit per cell by simply representing more than two voltage levels. In this work, we follow the commonly adopted nomenclature and assume that multiple level cell (MLC) chips store two bits per cell, and that triple level cell (TLC) chips store three bits per cell.
Recently, the subject of error-correction coding for Flash memory has received significant attention. In [5] , trellis coded modulation techniques were applied to Flash memory. In [8] , the use of non-binary LDPC codes was investigated. In [6] , algebraic error-correction codes were used for rewriting as well as correcting errors. In [1] , [4] , codes that correct limited magnitude asymmetric errors were constructed. In [12] , this model was extended to correct graded error patterns.
The error model in this work is motivated by data collected from a TLC Flash device. As observed in [13] , if the information from each Flash cell is interpreted as a triple-bit word, then the errors largely cause only a single bit in each word to change. From this observation, we suggest the use of a new class of codes derived from tensor product codes [7] , [11] in the context of Flash memory. This work generalizes the result of [13] to correct errors that mostly have only a small number of bits in error for each cell-error. The technique used to address the problem is based on the generalized tensor product (GTP) scheme proposed in [7] .
Tensor product codes were first introduced in [11] and were generalized to produce efficient binary codes in [7] . More recently, tensor product codes were used in the context of magnetic recording [2] , [3] . In a concatenated coding scheme, the use of a tensor product parity code as the inner code was shown to offer the performance advantages of a short length parity code but without the associated rate penalty. In this work, it is shown that generalized tensor product codes can be used to efficiently correct the errors that occur within a TLC Flash device, and in turn extend the lifetime of a memory system. The main contributions are construction methods for codes that correct up to t 1 symbol errors with up to l 1 bit errors and t 2 symbol errors with up to l 2 bit errors.
In Section II, the data collected from a TLC Flash chip is summarized. In Section III, the error model, motivated by the experimental data, is proposed. In Section IV, code constructions for this model are given. In Section V, these constructions are shown through simulation to be superior to commonly used storage codes. Section VI concludes the paper.
II. STRUCTURE AND ERROR CHARACTERIZATION OF TLC FLASH
In this section, we report on the observed errors measured from a TLC chip provided by an anonymous vendor. A TLC chip is divided into multiple planes. Each plane is divided into a set of blocks and these blocks are further decomposed into pages. For the particular TLC chip measured, there are 384 pages within a block and 8 kilobytes (KB) within a page. The eight discrete voltage levels from the cell are represented as a triple-bit word. We refer to the first bit in the word as the most significant bit (MSB), the second bit in the word as the center significant bit (CSB), and the third bit in the word as the least significant bit (LSB). For more details on the structure of a TLC chip, see [13] .
The errors were measured from sixteen blocks evenly divided across two planes. The following testing procedure was repeatedly performed. On the first cycle of every 100 program/erase (P/E) cycles, a block was erased, and random data was then written and finally read back for errors. On the other 99 cycles, the block was simply erased and the memory was programmed to simulate the aging of the device.
In Figure 1 , the Bit Error Rate (BER) is illustrated for the TLC chip tested over the course of its lifetime. It can be seen that over time, the BER increases dramatically but at different rates depending on which bit is programmed. The 'Symbol Error Rate' plot refers to the symbol error rate when each cell is represented as a symbol over GF (8) .
The dominant trend from Figure 1 is that the 'Symbol Error Rate' appears to be roughly the sum of the individual BERs of the MSB, CSB, and LSB. This suggests that whenever a cell-error occurs, with high probability only one of the three bits in the cell errs. More specifically, 96.17% of cell-errors only had a single bit in error. This is a result of the special programming property of the bits where the three bits are not programmed all at once. More details on this phenomena are reported in [13] . Note that this error model is considerably different than the one of asymmetric limited magnitude errors, studied in many previous works, e.g., [1] and [4] .
The new codes introduced in this paper correct errors that mostly affect a single bit within each cell-error. In addition, these new codes also have the special property that they can correct the remaining few cell-errors with two or three biterrors.
III. MODEL AND DEFINITIONS
In this section, the relevant error models as well as code definitions are given. Accordingly, every cell-error is represented as a length-m vector e i . For a fixed , if wt(e i ) ≤ then such an error is called an -bit-cell-error, where the Hamming weight of a vector x is denoted by wt(x). Motivated by the nature of the errors observed, it is useful to define the following class of error-vectors and codes. Definition 2. Given the parameters t and , an error-vector e = (e 0 , e 1 , . . . ,
From the data collected from the TLC flash device, it was observed that while most cell-errors suffered a single bit-error, only a small number of cells had double or triple bit-errors. Therefore, to correct all observed errors, it is useful to define the following refined error-vectors and corresponding codes.
The next definition is useful in determining the parity-check matrices of bit-error-correcting codes.
Then the tensor product of A and B is defined as the matrix
Furthermore, rank(A ⊗ B) = rank(A) · rank(B).
IV. CODE CONSTRUCTIONS In this section, code constructions are given for bit-errorcorrecting codes. The section begins by revisiting a result from [11] In [11] , it was shown that the tensor product of two parity check matrices results in a code that can correct a prescribed number of errors of a pre-defined type. For example, suppose a code with a parity check matrix H 1 ∈ GF (2) r1×m corrects all burst errors of length 2 and a code with a parity check matrix H 2 ∈ (GF (2) r1 ) r2×n corrects any 3 symbol errors. Then H 2 ⊗H 1 is a parity check matrix of a code of length nm bits, partitioned into n m-bit blocks. This code corrects any 3 block-errors assuming each block-error is a burst of length 2. In Construction A, this result is stated more formally.
A. Construction A
We start by presenting a construction of [t; ] 2 m -bit-errorcorrecting codes. Construction A. (see first [11] 
code with a parity check matrix H 2 . Then, the code C A with the parity matrix
is a [t; ] 2 m -bit-error-correcting code of length n.
The correctness of the error-correction capability was proved in [11] . Furthermore, since the parity check matrix of the code C A is the tensor product of the matrices H 1 and H 2 , and rank(
we get that the redundancy of the code C A is r 1 r 2 , where r 1 = m − k 1 and r 2 = n − k 2 . An example of the encoding of such codes was given in [10] . Suppose c ∈ C A , where
where h i,j represents the symbol in position row i, column j of 2 and the code C A can be expressed as follows:
n be the decoder of the code C 1 , C 2 , respectively. Here, and henceforth we assume that the input to the decoders of the constituent codes is the syndrome of the received vector and the output is the detected error vector. We also assume that if the code can correct t errors, then the weight of the output error vector is at most t. If the decoder finds an error vector of weight greater than t then the all-zero vector is returned as an output.
n of the code C A gets as an input a word of the form y = c + e, where c ∈ C A and e ∈ (GF (2) m ) n is a [t; ] 2 m -bit-error-vector. The output of the decoder is the estimate of the error vector: 
and since s i = H 1 · e T i and the weight of e i is at most , we get that D 1 (s i ) = e i , that is, e = e.
B. Construction B
The codes given in Construction A correct error patterns according to the maximum number of bit-errors in every cell (or m-bit symbol). Construction B extends this idea so that, while most cells suffer a small number of bit-errors, relatively few cell-errors may occur with a larger number of bit-errors. We capture this property in the following construction of [13] . Note that c ∈ C B if and only if
Hence, the code C B can be expressed as
and its redundancy is at most r r 2 + r r 3 .
Let us denote
to be the decoder of the code C 1 , C 1 , C 2 , C 3 , respectively. As before, the input to all these encoders is the syndrome and the output is the error vector whose weight is no greater than the guaranteed error-correction capability of the corresponding code. Before presenting the decoder's steps, let us explain the idea behind this construction and its decoding procedure. We start in a similar fashion to the decoder in Construction A, where at most t 1 + t 2 cell-errors each of weight at most 1 are found. Clearly, it may not possible to correct all cell-errors this way. If a cell-error has weight at most 1 then it is corrected. Otherwise, it is miscorrected to a cell-error vector, with weight at most 1 + 2 since the weight of each miscorrection has been restricted to be 1 . This, in turn, guarantees that the new cell-error vector is not a codeword in C 1 , since its minimum distance is at least 2 2 +1. Thus, the next step is to detect these cells which were miscorrected. For cell-errors with more than 1 bits in error, the remaining part of the syndrome according to the code C 1 is recovered. The decoder D 1 is then used to recover the remaining errors.
The decoder
n of the code C B gets as an input a word of the form y = c + e, where c ∈ C B and e ∈ (GF (2) Proof: Let y = c + e be the received word to the decoder D B where c ∈ C B and e ∈ (GF (2) m ) n is a [t 1 , t 2 ; 1 , 2 ] 2 mbit-error-vector. Then according to the definition of the code
now has weight at most t 1 + t 2 and since the code C 2 can correct this number of errors we get that (s
At step 2 since for every 
Steps 4 and 5 compute the syndrome using y as input. Since the minimum distance of the code C 1 is at least 2 2 + 1 > 1 + 2 , we get that for all 0 ≤ i ≤ n − 1, if a miscorrection occurred, then e * i is not a codeword in C 1 . Therefore, (s i , s i ) = (0, 0) and in step 6 the set I is the set of all 0 ≤ i ≤ n − 1 such that 1 < wt(e i ) ≤ 2 . In step 7, the word y is the word of y after removing all cell-errors of weight at most 1 .
In step 8 the remaining portion of the syndrome is recovered for all cell-errors with more than 1 bits in error. Lastly in step 9 for every cell-error at position i, if 1 or less bit-errors occurred then e * i is its corresponding cell-error vector and if more bit-errors occurred then the decoder D 1 is used. Since 
C. Construction C
Construction C extends Construction B by using a combination of codes whose abilities are to correct errors, correct erasures, and detect errors. In particular, the code C 1 in Construction B is modified such that it corrects 1 errors and detects when there are between 1 + 1 and 2 errors. Accordingly, the code C 3 in Construction B need only correct t 2 erasures instead of t 2 errors.
Construction C. Let C C be a code with the following modifications with respect to the code construction of C B :
1) 
The proof of the decoder correctness of Construction C is omitted due to a lack of space. It can be shown that Construction C requires less redundancy than Construction A approximately when log n log m < ( 
V. PERFORMANCE AND RESULTS
In this section, the performance of various linear errorcorrecting codes with guaranteed error-correction capability is evaluated for the TLC Flash device. The results of these simulations are shown in Figure 2 . All the known codes used were the best known linear codes according to [9] of the longest block length. The rate 0.904 code labeled 'Scheme A' is comprised of a non-binary [256, 227, 5] 4 code applied to the LSB and CSB bits for each Flash memory cell. Next, an independent binary [256, 240, 2] 2 code was used to protect the remaining bit of information from each cell. This scheme was designed to target the property observed in Section II where the CSB and LSB were more likely to err than the MSB.
The bit-error code of length 256 and rate 0.83 (constructed using one tensor product operation). From Fig. 2 this particular tensor product code has the ability to delay the appearance of any errors in the system by a factor of 4 over the naive GF (8) code. In addition, the proposed tensor product code offers a 1.6x lifetime improvement over the popular BCH codes.
VI. CONCLUSION In this work, data from a TLC Flash device demonstrated that when errors occur within a Flash cell, the vast majority of such errors only affect one of the 3 bits of information. This observation was used to motivate a new error-correction model for Flash memory. Error-correcting code constructions based upon generalized tensor product codes were provided that were analytically and empirically shown to offer a potentially valuable component for future coding schemes in the context of Flash memory.
