In this paper design method for stacked type NAND MRAM which can achieve ultra low bit cost compared with that of previously proposed stacked type NAND MRAM and performance competitive to 1 layered NAND flash memory has been newly proposed. For realizing the ultra low bit cost it is assumed that the process technology with which minimum number of process steps can be achieved. With the newly proposed scheme low bit cost as small as 1/100 of that of 1 layered NAND flash memory, (1/10 of that of previously proposed stacked type NAND MRAM) can be realized using the same design rule. Therefore, with newly proposed scheme not only 1 layered NAND flash memory but also high end and middle range HDD can be replaced.
Introduction
As high speed memory competitive to DRAM, stacked type MRAM with NAND structured cell (NAND MRAM) have been proposed [1] [2] . These candidates can realize not only high speed characteristics competitive to DARM but also low cost more than presently available 1 layered NAND flash memory. Using number of
Shoto Tamai and Shigeyoshi Watanabe
layer of 64-128, low bit cost (process cost for one bit) as small as 1/10 of that of 1 layered NAND flash memory can be realized without sacrificing high speed characteristics competitive to DRAM [3] . However, the reduction of bit cost with ref [3] has the limitation. This is because for realizing low bit cost it is assumed that the process technology which will be available for present or near future level and high speed characteristics competitive to DRAM in ref [3] . In this paper design method for stacked type NAND MRAM which can achieve ultra low bit cost compared with that of ref [3] and performance competitive to 1 layered NAND flash memory has been newly proposed. For realizing the ultra low bit cost it is assumed that the process technology with which minimum number of process steps can be achieved. With the newly proposed scheme low bit cost as small as 1/100 of that of 1 layered NAND flash memory, (1/10 of that of ref [3] ) can be realized using the same design rule. Therefore, with newly proposed scheme not only 1 layered NAND flash memory but also large part of HDD [4] can be replaced. At present 1 layered NAND flash memory reaches to the limitation of scaling [5] . With the newly proposed scheme there is a possibility to realize low cost non-volatile semiconductor memory with the smaller value of design rule compared with 1 layered NAND flash memory. This paper is organized as follows. Section 2 describes the process technology with which minimum number of process steps and performance for replacing 1 layered NAND flash memory can be realized. Section 3 presents the analysis of bit cost of memory cell array and optimized number of layer for realizing the ultra low bit cost with process technology described in section 2. Section 4 describes the design method of row decoder and the number of memory cells connecting to one WL for realizing the optimized number of layer. Section 5 presents the design method about BL and applied voltage to WLs and BLs for realizing the optimized number of layer. Section 6 describes the configuration of the memory with the newly proposed method. Finally, a conclusion of this work is provided in Section 7.
Process technology with minimum number of process steps and performance for replacing NAND flash memory
For fabricating 1 layered NAND flash memory about 500 process steps and 25-50 masks are required [6] as shown in Fig.1 . In this paper it is assumed that 50 masks are required for simplicity. Therefore, about 500/50=10 process steps are required for achieving the process per 1 mask. For fabricating the 1 layer of stacked type NAND MRAM with BiCS [7] [8] structure the process of formation of insulating layer and formation of gate electrode (WL) should be achieved as shown in Fig.2[1] and Fig.3 [2] . In the previous work [3] it is assumed that 10 process steps are required for fabricating insulating layer and gate electrode (WL) as shown in Fig.4 (A) . This value of 10 corresponds to the process technology which will be available for present or near future level. This is because, the value of 10 is equal to the value for achieving the process per 1 mask of presently available 1 layered NAND flash memory. With this technology 1 layer of stacked NAND MRAM can be fabricated with 20 process steps which is 20/500=4% of 1 layered NAND flash memory. On the other hand for realizing lower bit cost than Fig.4 (A) it is assumed that minimum number of process steps technology is introduced in this paper as shown in Fig.4 (B). Formation of insulating layer and gate electrode (WL) is assumed to be fabricated with only one process step which is the smallest value of process steps. With newly proposed technology 1 layer of stacked NAND MRAM can be fabricated with only 2 process steps which is 2/500=0.4% of 1 layered NAND flash memory.
For replacing DRAM the access time of 50ns must be realized. This leads to the maximum value of WL and BL delay time of 5ns. The previous works of stacked type NAND MRAM with BiCS structure [1] [2] [3] are designed using this maximum value of 5ns for realizing high speed operation competitive to DRAM. On the other hand for replacing presently available 1 layered NAND flash memory the access time can be enlarged to microsecond level. Therefore, the maximum value of WL and BL delay time is designed to 1 us in this paper. By using Fig.1 -Fig.4 the bit cost of memory cell array for stacked type NAND MRAM with BiCS structure using newly proposed minimum number of process steps technology (0.4%/layer) and conventional process technology [3] (4%/layer) has been compared analytically. In this analysis the bit cost of 1 layer NAND Cost of m em ory cel l array k*(M *4F Cost of memory cel l array k
flash memory is used as a reference. The bit cost of these structures are estimated by using the formula as shown in Fig.5, 6 [3] [9] . In these figures the number of the layer is N. The memory cell area per bit of these structure is 5F 2 for without WBL case and 9F 2 for with WBL case (F is feature size). Therefore, using the number of memory cells per one layer, M, memory cell array area of these structures become M*5F 2 and M*9F 2 . The yield of 1 layer NAND flash memory is assumed to be Y. By using the formula Yield(process steps)=Y (number of process steps)/500 , the yield about process steps of stacked type NAND MRAM can be estimated. And also the yield about pattern area Yield(pattern area) is estimated using the formula Yield=EXP(-(defect density)*(pattern area)). Yield about pattern area is presented within F(type1) and F(type2) in Fig.5,6 . Cost for memory cell array, fabrication cost for memory cell array, is proportional to memory cell array area and the number of process steps, and inversely proportional to the yield. Therefore, using a constant of proportionality, k, cost of memory cell array of these structures can be estimated (Fig.5,6 ). Bit cost, cost for one bit, is inversely proportional to number of layer. Therefore, the bit cost can be estimated as shown in For the large value of yield of 95%, it is noticeable that using without WBL the minimum bit cost for newly proposed 0.4%/layer scheme becomes one order of magnitude smaller than that of conventional 4%/layer scheme with using larger optimized number of layer of 1024 (Fig.7) . The obtained minimum bit cost with 0.4%/layer scheme is 0.0089 with 1024 layer. Compared with obtained minimum bit cost of 0.0909 with 128 layer for 4%/layer scheme the minimum bit cost is one order of magnitude smaller and number of layer is 8 times larger. From this result we can expect that with newly proposed 0.4%/layer scheme ultra low bit cost non-volatile semiconductor memory can be realized. The minimum bit cost is about 1/100 of that of 1 layered NAND flash memory and about 1/10 of that of conventional 4%/layer stacked type NAND MRAM with BiCS structure [3] . For realizing this ultra low bit cost design about core circuit such as row decoder, WL, BL, and apply voltage to WL/BL, should be improved from the conventional architecture [1] [2] [3] . For not only without BL case but also with BL case newly proposed 0.4%/layer scheme is very effective for realizing lower bit cost as shown in Fig.7 . The obtained minimum bit cost with 0.4%/layer scheme is 0.0206 with 512 layer. Compared with obtained minimum bit cost of 0.202 with 64 layer for 4%/layer scheme the minimum bit cost is one order of magnitude smaller and number of layer is 8 times larger. This ratio is almost the as same as without WBL scheme. These results show that both without and with WBL scheme is effective for realizing ultra low bit cost. Furthermore, for the smaller value of yield of 90% and 70% the effectiveness of newly proposed 0.4%/layer scheme is obtained. With newly proposed scheme one order of magnitude smaller minimum bit cost is obtained with 8-16 time larger number of layer compared with that of conventional scheme as shown in Fig.8 and Fig.9 . These results are summarized in Fig.10 . 
Design of row decoder and WLs
As described in section 3 for realizing ultra low bit cost layer number of 512 for with WBL case and 1024 for without WBL case should be introduced for newly proposed 0.4%/layer scheme. For realizing this large layer number, design of row decoder and WLs should be improved from the conventional scheme connected to one WL is the key issue. This value is very important to estimate the pattern area of row decoder, WL delay time, and applied WL voltage during write operation. The pattern area of row decoder vs number of stacked layer with the parameter of number of memory cells connected to one WL for stacked type NAND DRAM with BiCS structure is shown in Fig.11 (without WBL ) and Figure 11 :The pattern area of memory cell array + row decoder vs number of stacked layer for without WBL scheme [2] . Fig.12 (with WBL). The pattern area of row decoder is proportional to the number of stacked layer. For the previously reported conventional 4%/layer scheme for realizing low bit cost 128 layer with 46% area penalty caused by row decoder has been adopted [3] without WBL scheme. In this case 8K cells are connected to one WL with WL delay time of 4.84ns and 2V of WL voltage for write as shown in Fig.11 . For reducing the area penalty of row decoder it is effective to increase in the number of memory cell connected to one WL. However, this leads to the increase in WL delay time. As a result, high speed competitive to DRAM can not be realized. Therefore, 8K cells connected to one WL is optimized design for realizing high speed competitive to DRAM [3] . For the newly proposed 0.4%/layer scheme for realizing ultra low bit cost 1024 layer must be adopted within 46% area penalty. 8K cells connected to one WL scheme which is adopted to the conventional scheme can not be used in this case because of larger area penalty. Therefore, 64K cells connected to one WL scheme is adopted in this case as shown in Fig.11 . For the 64K cells connected to one WL scheme the delay time of WL becomes as large as 309.8ns. However, this value is small enough for realizing performance competitive to 1 layer flash memory which is goal of this paper. Larger value of 128K scheme can not be employed because of not the limitation of performance but the large applied voltage of 32V which is larger than that of the presently available 1 layered NAND flash memory.
In the case of with WBL the estimation is as follows. For the conventional scheme 64 layer with 15% area penalty and 4K memory cell connected to one WL scheme are adopted as shown in Fig.12 [3] . For the newly proposed scheme 32K memory cell connected to one WL scheme is adopted with area penalty of 15%, WL delay time of 178.6ns, and 16V of WL voltage for write. Figure 11 ,12 indicates that large number of layer for newly proposed scheme can be achieved by optimizing the number of memory cell connected to one WL without sacrificing the area penalty compared with the conventional scheme. With increasing in the number of memory cell connected to one WL, WL delay tine increases. However, this increase is small enough competitive to 1 layered NAND flash memory.
Design of BLs and applied voltages to WL/BL
As described in section 3,4 for realizing ultra low bit cost layer number of 512 for with WBL case and 1024 for without WBL case should be introduced for newly proposed 0.4%/layer scheme. For realizing this large layer number, design of BLs and applied voltages to WL/BL should be improved from the conventional scheme [1] [2] [3] . For designing BLs and applied voltage to WL/BL the read current for bit line (IR), write current (IW), and resistance of seleted/pass cell transistors must be firstly determined. After designing these value applied voltages to WL/BL and BL delay time can be estimated as shown in Fig.13 , 15.
Design method of stacked type NAND MRAM

351
The sequence for estimating BL delay time and applied voltage to WL/BL with WBL scheme is shown in Fig.13 . The values for conventional scheme of 64 stage is shown in the figure [1] . For realizing proposed scheme of 512 stage, which is 8 times larger than that of conventional scheme, these value must be changed to the values indicated by the arrow. The write current of WL and BL ,I WW and I WBL , are not changed. This is because these values depends to not number of layer but design rule and material. On the other hand read current of BL, IR, must be reduced. This is because for realizing IR of 10uA for proposed scheme BL and WL voltage for read must be enlarged. This results in the degradation of the reliability of memory cell transistor. Therefore, IR must be reduced to 10/8=1.2uA. This leads to the reduction of drain to source voltage of memory cell transistor which operates within the linear region and increase in the resistance of memory cell transistor of 512 stages. The resistance of pass transistors of 511 stages increases from 35KΩ to 35*8=280 K Ω. For the case of NAND structure the resistance of selected transistor must be larger than that of pass transistors of 511 stages of 280 KΩ [10] for realizing the stable operation. Therefore, the resistance of selected transistor is designed to 280 KΩ. As a result, the resistance of NAND structure of 512 stages becomes 280 K Ω+280 KΩ=560 KΩ. For increasing the resistance of selected cell from 35 KΩ to 280 KΩ gate voltage must be changed. Because this selected cell transistor operates within the saturation region, gate voltage must be changed from 0.25V, 
Shoto Tamai and Shigeyoshi Watanabe
50mV larger than VT, to 0.218V. 0.218V is only 18mV larger than VT. For realizing the stable operation with this small signal the design which takes into account for interference noise within the memory cell array region will be the key issue. BL voltage for write must be changed from 2V to 2*8=16V due to the increase in the number of layer to 512. The number of memory cell connected to one WL must be changed from 4K to 32K as described in section 4. This leads to the WL voltage for write increase from 2V to 2*8=16V. BL delay times can be estimated (capacitance of BL)*(resistance NAND of structure). This value with proposed scheme is 8*8=64 times larger than that with conventional scheme of 1.29ns. As a result, BL delay time with proposed scheme becomes 1.29*64=82.56ns. This value is fast enough to replace 1 layered NAND flash memory. The obtained value from The sequence for estimating BL delay time and applied voltage to WL/BL without WBL scheme is shown in Fig.15 . The values for conventional scheme of 128 stage is shown in the figure [2] . For realizing proposed scheme of 1024 stage, which is 8 times larger than that of conventional scheme, these value must be changed to the values indicated by the arrow. The write current of WL, I WW , is not changed as the same as with WBL case. Read current of BL, IR, must be reduced from 10uA to 1.2uA as the same as with WBL case. This leads to increase in the resistance of pass transistors of 1023 stages, of selected transistor, and of NAND structure of 1024 stages as the same as with WBL case. And also the gate voltage must be changed from 0.25V to 0.218V. For without WBL case write operation occurs using thermally assisted mechanism [11] . Therefore, the power consumption, (IW) 2 *R selected cell , must be a 
353
constant value which is independent to the number of layer. Because R selected cell becomes 8 times larger value, IW is reduced from 40uA to 40uA/2.83=14.1uA. As a result, BL voltage for write becomes from 2.8V to (resistance of NAND structure)*IW =560KΩ*14.1uA=7.9V. WL voltage for write becomes 7.9V+VT =7.9V+0.2V=8.1V. BL delay times can be estimated (capacitance of BL)*(resistance of NAND structure). This value with proposed scheme is 8*8=64 times larger than that with conventional scheme of 1.21ns. As a result, BL delay time with proposed scheme becomes 1.21ns*64=77.44ns. This value is fast enough to replace 1 layered NAND flash memory. From these estimations it is found that the upper limit of number of stacked layer is determined not by BL delay time but by the upper limit of applied voltage to WL/BL. The obtained value from Fig.15 is summarized in Fig.16 ,17. The value of left side of arrow indicates for conventional scheme. The value of right side of arrow indicates for proposed scheme. Fig .18 shows the configuration of newly proposed stacked type MRAM. As shown in the figure using only one memory cell array mat as large as 512Gb with WBL and 2Tb without WBL can be successfully realized. The feature and target of newly proposed stacked type NAND MRAM is shown in Fig.19 . Using 39nm 
356
Shoto Tamai and Shigeyoshi Watanabe
design rule small chip size of about 25mm 2 can be realized. The estimated access time of 3us is fast enough for replacing 1 layered NAND flash memory. Using newly proposed scheme about 1/100 bit cost of that of 1 layered NAND flash memory can be realized. Therefore, stacked type NAND MRAM with newly proposed scheme is promising candidates for replacing not only 1 layered NAND flash memory due to its high speed characteristics and low bit cost but also replacing high end and middle range HDD due to its ultra low bit cost. The feature and target of conventional stacked type NAND MRAM [3] is also shown in Fig.20 as a reference. Due to its fast access time of 50ns competitive to DRAM and low bit cost compared with that of 1 layerd NAND flash memory, stacked type NAND MRAM with conventional scheme is promising candidates for replacing both DRAM and 1 layered NAND flash memory.
Conclusion
In this paper design method for stacked type NAND MRAM which can achieve ultra low bit cost compared with that of previously proposed stacked type NAND MRAM and performance competitive to 1 layered NAND flash memory has been newly proposed. For realizing the ultra low bit cost it is assumed that the process technology with which minimum number of process steps can be achieved. With the newly proposed scheme low bit cost as small as 1/100 of that of 1 layered NAND flash memory, (1/10 of that of previously proposed stacked type NAND MRAM) can be realized using the same design rule. Therefore, with newly proposed scheme not only 1 layered NAND flash memory but also high end and middle range HDD can be replaced. At present 1 layered flash memory reaches to the limitation of scaling. With the newly proposed scheme there is a possibility to realize low cost non-volatile semiconductor memory with the smaller value of design rule compared with 1 layered NAND flash memory.
