2015 15th Non-Volatile Memory Technology Symposium (NVMTS)

# Reliability and Hardware Implementation of Rank Modulation Flash Memory

Yanjun Ma Invention Development Fund Intellectual Ventures Bellevue, WA 98005

Yue Li

Department of Electrical Engineering California Institute of Technology Pasadena, CA 91125

Abstract—We review a novel data representation scheme for NAND flash memory named rank modulation (RM), and discuss its hardware implementation. We show that under the normal threshold voltage ( $V_{th}$ ) variations, RM has intrinsic read reliability advantage over conventional multiple-level cells. Test results demonstrating superior reliability using commercial flash chips are reviewed and discussed. We then present a read method based on relative sensing time, which can obtain the rank of all cells in the group in one read cycle. The improvement in reliability and read speed enable similar program-and-verify time in RM as that of conventional MLC flash.

Keywords - flash memory, rank modulation, reliability

#### I.INTRODUCTION

Rank modulation (RM) has been proposed as a new data representation scheme for flash memory [1]. In this scheme, the information is represented by the relative threshold voltage  $(V_{th})$  value among a group of floating-gate transistor cells rather than by the absolute  $V_{th}$  bins. Namely, a group of N cells (a set) induces a permutation by the relative  $V_{th}$  ranking, which is then mapped to binary data. For a given N, the total number of permutations is N!, which may exceed the capacity represented by the binary data in these cells when N is large. There are some other benefits using this RM scheme, including more tolerance of inhomogeneous noises and celllevel  $V_{th}$  drifting, as only the relative order of the cell  $V_{th}$ affects the data integrity. Error correction and rewrite codes can also be implemented efficiently [2-4]. However, the read operations in RM are often regarded as problematic and slow. as normally N-1 read operation is needed to rank N cells [5]. In this paper, we review the improvement in reliability of RM under the same  $V_{th}$  variations, present testing data using commercially available flash chips to validate this conclusion, and then present an efficient read method based on relative sensing time, which can read out the entire rank information in one read cycle. An accelerated program method based on this read scheme in the verification step is then presented. Lastly

Edwin Chihchuan Kan Department of Electrical Engineering Cornell University Ithaca, NY 14850

Jehoshua Bruck

Department of Electrical Engineering California Institute of Technology Pasadena, CA 91125

we discuss the implementation of RM in the current state-ofthe-art NAND flash circuits.

### II. RM AND ITS ADVANTAGE

An extension of the RM method is to allow multiple occupations of the same rank. In this RM with multiset permutation scheme, each rank may have multiple cells. For a set of N cells with q ranks and each rank with multiplicity  $n_i$ , the normalized capacity is  $log_2(N!/n_1!n_2!..n_q!)/N$  bits/cell. For the balanced RM (BRM) scheme where each rank has the same size, i.e.,  $n_1 = n_2 = ... = n_q = N/q$ , hardware to read cells of different ranks can be reused, and hence reduce the peripheral read circuit overhead. For large N, the bit capacity approaches,  $log_2(q)$  bits/cell, the same as conventional q-level cells, as shown in Figure 1. Thus for the same number of circuit-differentiable levels, RM does not pose a capacity advantage over the conventional MLC flash. However, as demonstrated below RM has an intrinsic reliability advantage when  $V_{th}$  distribution shifts under typical noise in flash.



Figure. 1 The capacities of different rank modulation scheme corresponding to the number of cells in each rank and number of ranks in each set.

This is illustrated by a simple two-level system in Figure 2, where  $V_{th}$  distributions in two levels from many cells are shown. In the conventional flash, data stored in a cell will be faulty when  $V_{th}$  crosses the read threshold, usually set at the

middle of the two distributions. Several schemes using dynamically adjusted read threshold have been proposed [7-9] for reducing read errors. But for RM, data will not be noisy as long as the relative order with the lower ranked cells is maintained. Therefore, even though  $V_{th}$  of the higher-rank cell infringes over the conventional  $\mu/2$  read threshold, RM information remains valid if  $V_{th}$  of the lower-rank cells are lower than that of the higher-rank cells.



Figure 2 Failure point of RM scheme (dotted) is lower than the conventional flash (dashed).

Mathematically, the raw bit error rate (RBER)  $R_0$  for conventional flash can be estimated by, see e.g. [7],

## 错误!未找到引用源。, (1)

where  $V_{r,i}$  is the read threshold and  $p_i(v)$  is the probability of  $V_{th}$  distribution of the *i*-th level. For RM, assume each rank has the same number of cells, the raw cell failure rate (RCER, which is related to RBER = RCER/log<sub>2</sub>(q)),  $R_L$  is given by,

# 错误!未找到引用源。 (2)

RCER measures the probability of adjacent rank swaps for RM, and the probability that the bits stored by a cell has at least an error. Normally only nearest neighbor distributions have significant chance of crossing, so RCER is approximately,

#### 错误!未找到引用源。 (3)

Note that this method is similar to the formalism used for evaluating bit error rate of the dynamical threshold scheme [8, 9] proposed for the conventional flash memory. Of special interest is that Eq. (3) is *proportional to the level's occupation* as well its neighbor's, in contrast to the conventional flash memory where the occupation of a neighbor level has no bearing on cell failure rate. So for RM, there is a tradeoff between capacity, favored by larger N/q, and reliability which favors smaller N/q.

For illustration, we assume a two-level system (N=q=2) with  $V_r$  at 0 and  $\mu$ , and all  $p_i$ 's have standard deviation of  $\sigma$  and an exponential distribution, 错误!未找到引用源。. For the conventional flash, the read threshold is set at  $\mu/2$ . Simple calculation shows that the RBER ratio between RM and the conventional flash is given by,

The standard deviation of  $V_{th}$  distribution, as measured by  $\sigma$ , is typically a characteristics of a given physical NAND flash array, depending on the process variation, random telegraph noise (RTN), intercell interference, number of program/erase cycles, and retention time difference, which cannot be easily reduced. The spacing between levels,  $\mu$ , is a design parameter that determines the bit capacity in MLC. Normally, the entire  $V_{th}$  window W, is fixed by the operating and reliability constraints, and  $\mu \sim W/q$ . Under normal circumstances to keep  $R_0$  reasonably small in Eq. (1),  $\mu/\sigma >> 1$ . Therefore, the RM scheme can offers much lower failure rate.

For more realistic Gaussian distribution, Eq. (2) does not have a closed form, and can only be solved numerically. Since these distributions fall off even more rapidly than the above example, we expect the ratio of  $R_{1}/R_{0}$  to be even larger. Figure 3 shows the ratio of RBERs for convention flash vs RM for the exponential and the Gaussian distributions,错误! 未找到引用源。.

We can see that for reasonable  $\mu/\sigma$  values, RM scheme has a much lower RBER than the conventional sensing scheme. More importantly, using the relative ranking method provide intrinsic protection for retention failures where both distributions tend to shift in the same direction.

Alternatively, this increased reliability may be explored to increase the capacity by reducing  $\mu$ , or increasing q, at the same RBER rate.



Figure 3. Bit error rate ratio for relative sensing vs conventional sensing for exponential and Gaussian  $V_{th}$  distributions.

The above discussion assumed near ideal, intrinsic distributions. It is expected that with the broadening of the distribution, including tail bits caused by program/erase cycling,  $R_1 / R_0$  will decrease. This can be taken care of by error correction coding (ECC). A number of ECC for RM has been proposed [2-4].

#### III. VALIDATION USING EXISTING FLASH MEMORY CHIPS

We characterized the failure rate in the RM scheme using commercial NAND flash chips and benchmark RBER against the conventional MLC scheme. Our experiments used a commercial 20nm NAND flash tester and three kinds of MLC flash on the market from two different vendors. The details of the method are presented elsewhere [10].

In our experiment we used the Read Retry (RR) feature to rank the cells to recover the RM coded words. The multiple reference voltages provided by RR divides the whole threshold voltage interval W into many bins. Each of the pages sharing the cells is read multiple times with different reference voltages. The reading results are combined to determine the bin of each cell, and we rank cells by comparing their bin indices.

All the chips provide 8 RR options. Due to space limitation, we only present the results for one of the MLCs. Figure 4 shows the average RCERs of RM and conventional MLC with endurance cycles. Here we used BRM with 4 ranks and each rank has 64 cells. All the results suggest that RM provides high reliability when (1) errors are strongly asymmetric, and (2) cells carry a small number of P/E cycles. On the other hand, RM is at least as reliable as MLC at larger P/E cycles.

The results of Figure 4 are smaller than predicted by Figure 3. One of the reasons is the limitation of the experimental method – for RM tests we used four read retry voltages to group the cells in bins and rank the cells according to the bins they are in. This is less optimal than the real relative ranking that is demanded in the RM scheme (but not available in present commercial flash chips). The MLC results were collected using eight available read retry voltages.



Figure 4. Cell error rates in the RM scheme and the conventional MLC.

## IV. RM ARRAY ARCHITECTURE AND READ CIRCUITS

One of the main objections to the RM scheme is the difficulty in ranking a large number of cells as compared to conventional flash read scheme. We show in this section that a relative sensing time scheme can be implemented using the existing NAND array structure and has similar read time. The only additional components are the encoder and decoder circuits that convert binary data to RM representation when the data shift in and out of the buffers. Note that these functions can alternatively be performed in the memory controller or processor in software. The later approach will save area on the NAND die at a cost of slower data transfer rate.

The sensing circuit attempts to rank one set in one read cycle [10]. The basic idea for this read scheme is based on that only relative order matters in the RM scheme: there is no need for either global voltage or charge level determination or accurate timing determination.

The read circuit for a set of RM cells is shown in Figure 5, together with the timing diagram. The read process is to pre-

charge all bit lines in a set and then control the discharging of these bit lines by ramping the control gate voltage of the selected row. Then by recording the relative order where the sense amps are triggered we can obtain the ranking of these cells. Note that the method is similar to [11] where a different implementation by ramp current and current comparators was presented.

In Figure 5, we use a set of four cells in  $Col_{0-3}$  to illustrate the timing sequence. After the bit lines are pre-charged, the selected row starts to ramp, all the bit lines start to discharge. The rates of discharge depend on the ramp rate of  $V_{read}$  and the relative order of  $V_{th}$  in the cells. In the example the cell  $V_{th}$  is assumed to be in the order of, from low to high, Col<sub>0</sub>, Col<sub>2</sub>, Col<sub>1</sub>, and Col<sub>3</sub>. Thus the first bit line discharges the fastest and will be the first to trigger its sense amplifier once the bit line voltage reaches a reference voltage, Vref. The triggering of the sense amplifier will cause the state of the bit lines to be latched, buffered, and streamed out in the shift registers. This sequence, as illustrated by the binary codes in Figure 5, contains the ranking of the cell at this moment. In the meantime the remaining bit lines continue to discharge until the next bit line reaches  $V_{ref}$ , triggering another round of latching and buffering, until all cells have been ranked.

Provided that the digital circuits are fast enough to latch and stream out the data as fast as the read is finished, the ranking takes place within one read cycle of the conventional NAND flash memory.. Since digital circuitry in the submicron technology can easily achieve sub-nanosecond resolution, digital speed of latching, buffering or shifting typically is not an issue. It is possible to add a feedback mechanism to adaptively slow down the ramp rate to resolve closely spaced cells and/or to avoid the overflow of the data buffer. The rank data can also be compressed or encoded to speed up the streaming process. Notice that the ramp rate or shape of  $V_{read}$  controls the streaming speed and enhance the rank reading reliability.

#### V. PROGRAMMING OF RANK MODULATION

The standard Incremental Step Pulse Programming (ISPP) method may be adapted to program the cells in RM, using a modified program-and-verify process. Since only the relative order is important in RM, the verification step involves comparing cells to each other rather than to absolute reference levels, as is the case in the conventional MLC NAND flash memory.

A modification of the ISPP process flow is shown in Figure 6. We describe the process using the example of 4 cells in a set and to program in the order shown in Figure 5. Assuming we start from erased cells, there is no need to program the cell that is already at the lowest rank,  $Col_0$ . The bitwise AND step on the address of the cells produces data for the column decoder, [0111] in this case, meaning that during the programming pulse only selected row in columns 1, 2, and 3 are programmed. After the first program pulse, the verification step checks if the next rank column ( $Col_2$  in this example) is higher than the  $Col_0$ . If not, another tunnel pulse is performed and verified again. If yes, next round of programming with the remaining cells starts, where the AND step produces a state [0101] which means only

 $Col_1$  and  $Col_3$  will be programmed during this step. This process repeats until all cells are programmed to the correct order.



Figure 5. Sensing circuit and associated timing diagram.

Note that in the RM process the total tunneling time should be about the same as the tunneling time in programing the conventional flash memory because the highest level of  $V_{th}$ shifts will be comparable. The main difference is the time spending in the verification steps. In the conventional NAND flash, only a few levels (e.g. 3 for MLC) need to be verified. But in the RM memory, this number increases to N-1. This normally would not be a significant problem since even for N =20, 20 verification steps may take just  $20 \times 25\mu s = 0.5ms$  in the verification steps, which is still only a fraction of the programming time in the conventional MLC NAND flash.

#### VI. DISCUSSION AND CONCLUSIONS

The relative program-and-verify method can be implemented with very simple, mostly digital, modifications in current flash memory chips. State-of-the-art NAND flash chips pre-charge bit lines and monitor the discharge, or step the selected word line through each desired read voltage with dedicated capacitors in all bit lines [12]. The readout data are then latched into a page buffer, typically by fast static RAM cells. These functions can be simplified to enable the read operation of RM data by the following steps: (1) changing the staircase voltage ramp to a finer step or continuous ramp; (2) removing the precision voltage references and the corresponding digital to analog converters; (3) latch and stream data on any bit line sense amp triggering.

The RM readout method senses only the relative timing, so it is much simpler, i.e. no need for precision voltage references for binning, DACs and ADCs, temperature compensation circuits, and reference cells, and should also be more energy efficient. The method is less susceptible to noises, e.g. to the source line noise, and no need for the source line noise cancellation techniques [13, 14]. In short, the methods presented in this paper will make it easier to test and adopt the RM scheme in commercial NAND products.

In conclusion, we reviewed the rank modulation scheme and its advantage for flash memory. Hardware implementation methods for both reading and programming of flash memory arrays are discussed and shown that they can be easily adapted using the existing flash architecture. We presented a scheme for estimating the failure rate in RM and experimental data using commercial flash chips that suggest RM has intrinsic reliability advantage over conventional flash memory and should be explored further.



Figure 6. Algorithm for programming RM memory.

#### REFERENCES

- A. Jiang, R. Mateescu, M. Schwartz, and J. Bruck, "Rank modulation for flash memories," *IEEE Transactions on Information Theory*, Vol. 55, No. 6, pp. 2659–2673, June 2009.
- [2] H. Zhou, M. Schwartz, A. Jiang, and J. Bruck, "Systematic Error-Correcting Codes for Rank Modulation," *IEEE Transactions* on *Information Theory*, Vol. 61, No. 1, pp.17-32, Jan. 2015.
- [3] A. Barg, A. Mazumdar, "Codes in Permutations and Error Correction for Rank Modulation," *IEEE Transactions on Information Theory*, Vol.56, No.7, pp.3158-3165, July 2010.

- [4] A. Jiang, M. Schwartz, and J. Bruck, "Error-correcting codes for rank modulation," *IEEE International Symposium on Information Theory*, pp.1736-1740, July 2008.
- [5] M. Kim, et al, "Rank determination algorithm by current comparing for rank modulation flash memories," in *Midwest Symposium on Circuits* and Systems, August 2013, pp.1354 – 1357.
- [6] Y. Cai, E. F. Haratsch, O. Mutlu, and K. Mai, "Threshold Voltage Distribution in MLC NAND Flash Memory: Characterization, Analysis and Modeling", DATE, 2013;
- [7] Y. Kim *et al*, "Verify level control criteria for multi-level cell flash memories and their applications", *EURASIP Journal on Advances in Signal Processing* 2012, 2012:196.
- [8] H. Zhou, A. Jiang, and J. Bruck, "Error-correcting schemes with dynamic thresholds in non-volatile memories," *IEEE International Symposium on Information Theory*, 2011.

- [9] F. Sala, R. Gabrys, and L. Dolecek, "Dynamic threshold schemes for multi-level non-volatile memories," *IEEE Transactions on Communications*, Vol. 61, pp.2624 – 2634, 2013.
- [10] Y. Li, Y. Ma, E. En Gad, M. Kim, A. Jiang, J. Bruck, "Implementing Rank Modulation", Nonvolatile Memory Workshop, March 2015.
- [11] Y. Ma and E. C. Kan, US Patent pending, 2014.
- [12] M. Kim, M. Shaterian, and C. Twigg, "Rank determination algorithm by current comparing for rank modulation flash memories," in IEEE Int. Midwest Symp. on Circuits and Systems, Aug 2013, pp. 1354–1357.
- [13] G. Naso, et al, Micron 20 nm NAND, ISSCC 2013; Sarin et. al., "Sensing memory cells," US Patent 7,948,802