I. INTRODUCTION

E
MERGING portable mass storage applications, such as digital still cameras, digital audio players, personal digital assistants, and electronic books, have accelerated the development of high-density flash memories. Nowadays, such applications require high-performance flash memories as well. From this viewpoint, NAND flash memories have great potential because they have very small unit cells and operate by multibyte unit, which leads to low bit cost and high data throughput. A multilevel program cell (MLC) technique drastically reduces the bit cost, doubling or even tripling the memory density with the same chip size of NAND flash memory using the single-level program cell (SLC) technique. However, the need for tight threshold voltage control decreases the program performance. Therefore, MLC NAND flash memories are suitable for low bit cost and high-density applications, such as digital audio players which require CD-quality music recording with somewhat low program throughput, while SLC NAND flash memories are suitable for high-performance applications, such as digital still cameras which require high program throughput.
Though MLC NAND flash memories have very low bit cost, they have been deprived of their market by SLC NAND flash memories even in the low-performance market. This comes from the small gap of time-to-market between MLC and SLC NAND flash memories having the same density. Nowadays, the density of SLC NAND flash memory grows twice a year while the chip size remains the same from density to density. Therefore, MLC NAND flash memories based on SLC NAND technology should be developed at the same time to take the advantage of the higher density with the same chip size. In this paper, we report the dual-mode NAND flash memory having 1-Gb MLC and 512-Mb SLC modes with the slightly increased chip size compared to a only 512-Mb SLC NAND flash memory. The mode selection is not opened for general use, but provided for simultaneous development of the MLC and SLC NAND flash memory.
It is well known that the incremental step pulse programming (ISPP) scheme can effectively adjust a cell , because a cell follows the ISPP step pulse [1] . Therefore, the cell distribution can be tightly controlled by decreasing the ISPP step pulse at the expense of program performance. Also, the local self-boosting (LSB) scheme, which can be achieved by keeping the wordlines adjacent to selected wordline 0 V, is more effective in reducing the widening of cell distribution due to program disturbance than the self-boosting scheme [2] . However, the LSB scheme restricts the programming order to ascending order. The key to dual-mode NAND flash memory is not sacrificing one mode for the other mode. The SLC mode uses 0.5-V ISPP step pulse to achieve high program throughput, and the self-boosting scheme to provide random-order page programming, while the MLC mode uses 0.15-V ISPP step pulse to tightly control the distribution, and the LSB scheme to reduce the program disturbance.
The memory is fabricated on a 0.15-m CMOS process with a die size of 116.7 mm . The effective cell size is 0.14 m . The chip architecture is determined to minimize the chip size, the skew of page-buffer control signals, and the noise distribution. The detailed device organization is shown in Section II. With such small cell transistors, a bitline precharge-and-sense scheme [3] is adopted to sense the bitline for specific length of time. In order to improve program throughput, a 4 program mode (4 PGM) is adopted by dividing the cell array into four banks. However, the 4 PGM operation generates a large peak current that should be reduced. A two-step bitline setup scheme reduces the peak current. In addition, a wordline ramping technique suppresses the wordline-coupling that causes a serious program disturbance for the memory with small wordline pitch. The bitline precharge-and-sense scheme, the two-step bitline setup scheme, and the wordline ramping technique are presented in Section III. The widening of MLC distribution due to a floating gate (FG) coupling has not been reported yet, but it would limit the scaledown of MLC NAND flash memory cell pitches. An optimization of the MLC distribution with the FG coupling, in consideration, is the key to MLC design for high program speed. This is described in Section IV.
II. CHIP ARCHITECTURE For the SLC mode, the 1-b SL unit is shared by two adjacent bitlines, which divides a row of 8 k cells into two pages. One half of 8 k bitlines are shielded during program and read operations, and the erase unit is a block of 16 kB.
III. ISSUES
A load-current sensing method is no longer practical because the small size of the cell reduces the worst-on-cell current to only 0.5 A, which is insufficient to sense the bitline within given time. Instead, the bitline precharge-and-sense scheme is adopted [3] . During read operation, the unselected line(s) between adjacent selected bitlines are shielded to ground level. The selected bitlines are rapidly precharged to 1.0 V by fully turning on the PRE transistors and setting the corresponding SBL level to 1.6 V, while the unselected SBL level is 0 V. Then the bitline levels are split by on cell states and off cell states where the selected SBL transistors are turned off. Finally, the small splitting of the selected bitline level is amplified by setting the selected SBL level to 1.0 V and reflected on the sense out (SO) node to and 0 V, where the SBL transistors coupled to on-cell transistors are turned on and the SBL transistors coupled to off-cell transistors are turned off. Consequently, the SLs can sense and latch the on and off cell states with sufficient sense margin.
Bitlines should be set to or 0 V before program voltage is applied to the selected wordline. The worst case occurs in 4 PGM for SLC mode when all selected bitlines should be programmed. In that case, the inhibited bitlines adjacent to the selected bitlines should be charged to , whereas the selected bitlines remain at 0 V. A long bitline setup process degrades program performance because it is performed every ISPP steps. On the other hand, a short bitline setup process generates large peak current. We adopt the two-step bitline setup scheme. For the first 2 s, all bitlines are charged up to with constant current supplied by the pMOS coupled to IHB transistors. Then for the next 2 s, the selected bitlines are selectively discharged to ground level according to the latched data. The peak discharge current is suppressed by stepping the SEL level. Finally, another 1 s is needed to stabilize the bitline setup sequence. The bitline setup sequences are schematically shown in Fig. 3 . During the program operation which utilizes the LSB scheme, the SSL level of the selected block is set to and the program inhibit bitlines are set to so that the program inhibit strings are precharged to -and the corresponding SSL transistors are shut off. The channel underneath the selected wordline is localized once a program pass voltage is applied to all wordlines except the selected wordline and the wordlines adjacent to the selected one. After the is stabilized, a program voltage applied to the selected wordline boosts only the localized channel. Therefore, it is important to decouple the localized channel from the others to increase the efficiency of LSB. The worst case occurs when the is selected. When the is applied to the , the SSL is coupled to and the boosted level is around 1.4 V if any efforts to reduce the slope are not made. Even though the string select transistor is in the subthreshold regime, the localized channel loses its charges in an instant because the channel capacitance is rather small, about 0.5 fF. Therefore, the program-inhibited cells are disturbed by the . It becomes more serious with low , because the lower bitline precharge level decreases the body effect of the SSL transistor, and hence the effective threshold voltage . Fig. 5 shows the -characteristics of an SSL transistor with the various (effective body bias) levels. In the case of equal to 2.0 V, the localized channel charges are swept away by about 200 nA where the boosted level is about 1.4 V, which lowers the localized channel potential by 5 V in 30 ns. Therefore, it is impossible to inhibit cells by the LSB scheme.
A voltage-ramping technique is used to suppress the wordline coupling. However, there is a tradeoff between program efficiency and wordline coupling suppression efficiency. Fig. 6(a) shows the peak coupling level between and SSL coupling level with the rising time, where the shape of the ramping output is ideal. As shown in Fig. 6(a) , the wordline-to-SSL coupling suppression efficiency is rapidly increased with rising time until 5 s. We adopt a staircase wave form generator instead of an ideal ramping circuit because the layout overhead of the ideal ramping circuit is large. Fig. 6(b) shows the wordline-to-SSL coupling level with the number of steps for given 5 s. The ramping efficiency is saturated after eight steps, as shown in Fig. 6(b) . According to these simulation results, we designed the wordline ramping circuit to have eight steps in 5 s. The ramping circuit is comprised of a timer, a decoder, an 8-b digital-to-analog (D/A) converter, a comparator, and a high-voltage switch pump (HVSP), as shown in Fig. 7 . The D/A converter divides the at a given step. The HVSP raises until the output of the D/A converter is equal to the reference voltage . At the next step, the change of input digital coding lowers the output level of the D/A converter below the and the HVSP raises until the reaches the target value ( ). As a result, the wordline-to-SSL coupling is reduced to 0.4 V to have lower than a few picoamperes subthreshold current of the SSL transistor, which drastically reduces the program disturbance caused by wordline coupling. cell program speeds are the same for all the cells. Unfortunately, there exists process variation, which widens the cell distribution width even without any noise. This is schematically shown in Fig. 9 . The cell shift follows the ISPP after the floating gate of the cell transistor is saturated. Therefore, the start program voltage should be lower than the critical voltage at which the programming of the fastest cell starts, so that the fastest cell can be saturated before the cell reaches verify level. The low start program voltage, however, degrades the program performance. In this process, the ISPP of 0.15 V is used to reduce the ideal cell distribution width for MLC mode at the expense of program performance compared to the previous design [2] , while the ISPP of 0.5 V is used for the SLC mode to achieve high program performance. The cell shift due to the nonuniformity of cell program speeds and the system noise is about 0.1 V.
IV. THRESHOLD VOLTAGE DISTRIBUTION
The FG coupling arises from the parasitic capacitance between adjacent two floating gates in a string and in a wordline as shown in Fig. 10(a) and (b) . If we assume that the cell tran-sistor that is being read is fully turned on, the FG coupling ratio in a string and in a wordline can approximately be expressed as (1a) and (1b) where is the capacitance between a control gate and a floating gate, is the capacitance between a floating gate and a channel, and and are the capacitance between adjacent two floating gates in a string and in a wordline, respectively. The order of page program is restricted to ascending order in a block because the random order of page programming significantly shifts the cell when LSB scheme is used [2] . Therefore, in the worst case a cell is coupled by five neighbor cells; the right, left, upside, upper right, and upper left cells. If we ignore the effect of upper right and upper left cells, the change of cell read from the control gate after five neighbor cells are programmed can be written as (2) where and denote the th bitline and the th wordline, respectively. In the worst case, the neighbor's shift from the erase state ("11") to the highest program state ("00"), , and the FG coupling can be simplified as for MLC mode.
In this process, the worst FG coupling is about 0.2 V where , , and V. The FG coupling can be reduced by lowering the height of the floating gates and the thickness of field oxide facing the floating gates. Lowering the floating gate height, however, degrades the program performance because it lowers the , hence the program coupling ratio, . The thickness of the field oxide affects the isolation between adjacent two cells. Therefore, the FG coupling, program performance, and cell isolation should be carefully optimized. The total coupling of the worst case should be modified for SLC mode utilizing self-boosting scheme as (4) because the random order of page is permitted. Therefore, the worst FG coupling for SLC mode is about 0.2 V where V. The differences of noise levels such as pocket p-well (p-pwell) noise, common source line (CSL) noise, and power noise between program verify operation and read operation cause an underprogram of 0.1 V. In early ISPP step, just a few cells (fast cells) are programmed. In that case, almost whole bitlines are discharged from the precharged level to ground level except a few bitlines coupled to the programmed cells. The CSL level rise high by the discharge current because of the CSL resistance, which effectively leads to the positive body bias of the fast cells. Consequently, the CSL noise reduces the sensing current of fast cells. The CSL noise may not be a problem if the same amount of CSL noise exists during read operation because the cells feel the same body bias. However, there is no CSL noise after all cells are programmed in a page. Thus, the SLs sense larger sensing current during read operation than during program verify operation for the same selected wordline level. This causes the underprogram problem. Therefore, the selected wordline level should be lower during read operation than during program verify operation, which means underprogram margin. Decreasing the CSL resistance can reduce the CSL noise at the expense of chip size. An undershooting of p-pwell noise caused by the capacitive coupling between bitlines and p-pwell is added to the CSL noise. The p-pwell straps in a cell array reduce the p-pwell noise [4] . Together with the CSL and p-pwell noise, the larger power noise of program verify operation causes the underprogram. The underprogram is slightly larger for SLC mode because 4 k bitlines operate simultaneously. However, the underprogram problem is less serious for SLC mode than for MLC mode because of the sufficient underprogram margin of SLC mode.
Another important effect is the interplay of maximum cell and read pass voltage applied to the unselected wordline during read operation. The higher the , the smaller the background pattern dependency (BPD) [2] . However, the retention failure limits the maximum allowable . Therefore, it is required to optimize the gaps between two adjacent program states to lower the maximum cell of the highest state. In this device, the worst-case shift due to BPD is about 0.05 V for a of 6.0 V with the maximum cell around 3.0 V [2] . The Fig. 11(a) . In addition, the measured distributions of the three program states are shown in Fig. 11(b) . Only the sequential programming is permitted for MLC mode. Therefore, the cells under the fifteen wordlines -are under the influence of the FG coupling, but the cells under the top wordline are free from the FG coupling problem. Thus, the distributions shown in Fig. 11(b) have two peaks for each state after all cells are programmed. The distribution having the first peak is the distribution of the cells coupled to and the distribution having the second peak is the summation of the distributions of the other cells.
V. CONCLUSION
The dual-mode NAND flash memory has been fabricated using a 0.15-m CMOS technology, resulting in an effective cell size of 0.14 m and a chip size of 116.7 mm . Fusing changes the mode from the 1-Gb MLC mode to the high-performance 512-Mb SLC mode. The program throughputs of 1.6 MB/s and 6.9 MB/s are achieved by adopting 4 PGM operation for MLC and SLC mode, respectively. The two-step bitline setup scheme suppresses the peak current below 60 mA. Also, the wordlineramping technique reduces the wordline coupling below 0.4 V. The LSB scheme is used to effectively reduce the program disturbance and restricts the order of page program to ascending order for MLC mode. On the other hand, the SLC mode utilizes the self-boosting scheme to allow random-order page program because the program disturbance problem is relatively small. The ISPP with 0.15-V step pulse is used to keep narrow distribution width of 0.6 V for MLC mode. For SLC mode, the ISPP with 0.5-V step pulse is used to achieve high program throughput. The FG coupling is found to be the most serious parasitic effect widening the cell distribution width by 0.2 V. The cell distributions are optimized not to degrade program performance for MLC mode resulting in the nonuniform distributions. The device parameters and key technology are summarized in Tables I and II, After joining Samsung Electronics, Kiheung, Korea, in 1998, he worked on the development of NAND flash memory as a product engineer. He joined the flash memory design team, where he has been working on the design and development of high-density NAND flash memories, including a multilevel NAND flash memories. He joined the Memory Division of Samsung Electronics Corporation, Kiheung, Korea, in 1991, where he has been working on the design of EEP-ROMs and high-density NAND flash and multilevel NAND flash memories.
Wook-Ghee Han
