The stacked type MRAM with NAND structured cell which has the features of high speed operation competitive DRAM, non-volatility, and lower bit cost than NAND flash memory has been newly proposed. By using spin transistor for memory cell and SGT type transistor which use three sidewalls as the channel, small memory cell size of 9F 2 and conventional magnetic field writing scheme can be realized. The feasibility of the stacked type MRAM with NAND structured cell is verified by the design of 1Tbit NAND MRAM with 39nm design rule. From the design of core circuit 1Tbit NAND MRAM has the possibility to realize high speed operation competitive DRAM, non-volatility, and lower bit cost than NAND flash memory.
Introduction
DRAM is widely used for the main memory of personal computer because of its high speed characteristics. On the other hands, NAND flash memory which has the features of non-volatility and low bit cost is widely used for the storage device
Shoto Tamai and Shigeyoshi Watanabe
of the multi-media data. However, universal memory which has both features, DRAM and NAND flash memory, has not been reported. In this paper stacked type MRAM with NAND structured cell which has the features of the universal memory has been newly proposed. This stacked type MRAM has been newly introduced a spin transistor [1] as a memory cell element. This stacked type MRAM has the relatively lower speed characteristics compared with that of the conventional 1 transistor + 1 MTJ (Magnetic Tunnel Junction) type MRAM [2] with NOR structure. However, the target specification of the stacked type MRAM are high speed operation competitive DRAM, non-volatility and lower bit cost than NAND flash memory (Fig.1) . This paper is organized as follows. Section 2 describes the cell structure of newly proposed stacked type MRAM with NAND structured cell. Section 3 presents the optimization of cell structure. Section 4 describes the configuration of the stacked type MRAM with 1T bit memory density. Section 5 presents the read and write operation. Section 6 describes the design of core circuit such as row/column decoder circuit. Section 7 describes the estimated performance of 1T bit stacked type MRAM. Finally, a conclusion of this work is provided in Section 8. 
Cell structure of stacked type MRAM
The cell structure of stacked type MRAM is shown in Fig.2 . Memory cell is composed with one spin transistor. Source and drain electrode of the spin transistor is used for fixed layer [3] . The magnetization direction is the same direction within these fixed layer ( in Fig.2 (A) ↑direction) . The memory cell data is stored within the free layer located in the body portion of the spin transistor ( in Fig.2(A) ↓or↑ direction) . With the series connection of this memory cell MRAM with NAND structured cell can be realized. In Fig.2 (A) 4 memory cell is connected in series for simplicity. Above the 4 memory cell the conventional
Design method of stacked type MRAM

71
MOS transistor is connected in series. Its gate is connected to block selected signal and its drain is connected to read bit line (RBL). For write operation the write bit line (WBL) is introduced which runs parallel to 4 memory cell connected in series.
In the newly proposed stacked type MRAM with NAND structured cell this NAND structure is fabricated by stacking the memory cells to Z direction. As shown in Fig.2 (B) , this stacked structure leads to lower bit cost compared with the conventional NAND flash memory case [4] [5] . The spin transistor has the almost the same structure as SGT (Surrounding Gate Transistor) which is features with surrounding gate around the silicon channel [6] . The write bit line is surrounded with this silicon channel via the insulating layer.
The top view of the spin transistor is shown in Fig.2 (C). The spin transistor uses three sidewall as the channel. The remaining one sidewall is used as the isolation to the adjacent spin transistor using the insulating layer ( the isolation between adjacent word line(WL) ). By using this structure memory cell size can be reduced to 3F*3F=9F 2 (F is design rule, cross-section of the write bit line is F*F, width of the insulating layer is F ). 
Optimization of cell structure
As described in section 2, for the cell structure of stacked type MRAM the spin transistor with three sidewalls channel which surrounds the write bit line has been newly introduced. The reason of this introduction is as follows. Conventional MRAM uses the magnetic field writing scheme [7] [8] . In this scheme MTJ polarization, the magnetization direction, is switched by a magnetic field induced by the write current. The staked type MRAM uses this magnetic field writing scheme. In the case of conventional 1 transistor + 1 MTJ type MRAM with NOR structure, the magnetic field writing method can be easily introduced. This is because the write current which flows word line and the write current which flows write bit line can be easily crossed each other for generating a magnetic field high enough to write a cell. However in the case of stacked type MRAM two currents which flows the word line and the write bit line cannot be crossed each other because the word line and the write bit line is a long way away as shown in Fig.3 (A) . To overcome this problem the write bit line which runs parallel to 4 memory cell connected in series is newly introduced as shown in Fig.3 (B) . For realization of this structure conventional SGT structure which surrounds the write bit line as shown in Fig.3 (C) can be easily considered. In order to verify the feasibility of newly proposed stacked type MRAM 1Tbit stacked MRAM has been designed. 16Gbit/32Gbit NAND flash memory has been developed with 43nm design rule [9] . Therefore, almost the same design rule of 39nm is adopted for designing 1Tbit stacked type MRAM.Configuration of 1Tbit chip is shown in Fig.6 . 1Tbit chip is composed wit 2048 512Mbit MRAM block. This 512M bit MRAM block is composed with 64 stage of stacked structure of 8Mbit ( 2K row*4K column ) memory array. Configuration of 512M bit MRAM block is shown in Fig.7 . At the left side of 8M bit memory cell array row decoder for read/write operation is placed. At the right side circuit for current sink during write operation is placed. During write operation write current for word line flows one direction, form row decoder to current sink. On the other hands, write current for write bit line is bidirectional in order to store "0" or "1" data. For realizing this bidirectional writing the column decoder for write operation is placed on and under the memory cell array as shown in Fig.7 . The write operation can be realized as follows. For "1" write operation output of one side of column decoder, upper column decoder, is set to high voltage and output of the other side of column decoder, under column decoder, is set to ground level. For "0" write operation output of upper column decoder is set to ground level and output of under column decoder is set to high voltage. The column decoder for read operation is placed on the memory cell array as shown in Fig.7 .
Further precise configuration of 512M bit MRAM block is shown in Fig.8 . 4 stages of stacked structure is used for simplicity. For Nth memory cell WL runs in a horizontal direction and RBL (Read Bit Line) and WBL (Write Bit Line) runs in a vertical direction. For fabricating this stacked structure BiCS (Bit Cost Scalable) technology [4] [5] which features low bit cost should be adopted. 
Read and write operation
Read and write operation of the stacked type MRAM is shown in Fig.9 . For the random read operation of the selected cell, low voltage of 0.25V is applied to selected WL (WL4) and high voltage of 1V is applied for pass WLs (WL1, WL2, WL3) [3] . This read operation scheme enables to realize the stable read operation of the selected cell independent from cell data of the passed cells. For the random write operation of the selected cell, left edge of WL of the selected WL is set to 2V with the row decoder and right edge of WL of the selected WL is set to 0V with the current sink. As a result, write current flows from left edge to right edge on the selected WL (WL4). During this operation both left and right side of pass WLs (WL1, WL2, WL3) is set 0V. For "1" write operation using upper column decoder the top of the WBL is set to 2V. Using lower column decoder the bottom of the WBL is set to 0V. As a result, write current flows from top to bottom on the WBL.
6 Design of core circuit : Row/Column decoder circuit
Circuit design of row decoder
Row decoder circuit and its drive circuit are shown in Fig.10 and Fig.11 . Sink circuit is also shown in Fig.10 at the right side of memory cell array. 4 stage of stacked structure is considered for simplicity. Row decoder is composed with pre-charge type NOR circuit and WL driver circuit which is well used for high density DRAM [10] . X1-X4 and φ1-φ4 are the section decoded address signals which are composed by using row address inputs. WL driver circuit is designed with the minimum number of transistors for generating WL voltage of selected and passed memory cell during read and write operation. WL is controlled with V HIGH or V LOW signal. By using this row decoder and its drive circuit WL of selected cell is charged to 0.25V using transistor which drain is connected to V LOW during read operation. On the other hands, WL of passed cell is charged to 1V using transistor which drain is connected to V HIGH . In the case of write operation WL of selected cell is charged to 2V using transistor which drain is connected to V HIGH . WL of passed cell is set to 0V using transistor which drain is connected to V LOW during write operation. The inherent voltages for stacked type MRAM, 0.25V, 1V, 2V, are generated with row decoder drive circuit (Fig.11) . This drive circuit is placed beside the row decoder. WL's clock timing diagram for read and write operation is shown in Fig.12 . In Fig.12 it is assumed that WL1 is selected.
Patten design of row decoder
The row decoder circuit must be placed within the memory cell size of 3F for realizing high density layout. It is very difficult for the conventional planar transistor to place the row decoder within 3F. Therefore, SGT [11] [12] [13] which is features with high packing density and small pattern size compared with the conventional planar transistor is introduced for designing the row decoder circuit. The pattern layout of the row decoder with SGT is shown in Fig.13 . Top view, Fig.13 (A)(C) , and cross-sectional view, Fig.13 (B)(D) are shown in the figure. The channel width of all transistor is assumed to 4F. Design rule for the pattern layout is as follows (size of silicon pillar is F*F, size of contact is F*F, the thickness of gate electrode is 0.25F). The output signal of NOR circuit, BS, run horizontal direction of the figure which is parallel to WL. VDD, VSS, and X1-X4 run perpendicular to WL as shown in Fig.13 . With SGT the row decoder has been successfully placed out within a small pitch of 3F. Furthermore, the lateral length of the row decoder can be reduced to the minimum value. For 64 stages 1T bit stacked MRAM, 1 NOR(lateral length is 25F), 32 inverter(lateral length per inverter is 12.5F) and 64 WL driver (lateral length per WL driver is 22.5F) must be included within one row decoder. Therefore, the lateral length of the row decoder becomes 25F+32*12.5F+64*22.5F=1865F. 
Circuit design of column decoder
Column decoder circuit for read and write operation are shown in Fig.14 and Fig.15 . Column decoder drive circuit is shown in Fig.16 . Layout of column decoder and memory cell array is shown in Fig.17 . Column decoder is composed with pre-charge type NOR circuit and CSL/WBL driver circuit which is similar to row decoder. Y1-Y4 and φ1-φ4 are the section decoded address signals which are composed by using column address inputs. During read operation selected CSL is charged to 1V using the transistor, which drain is connected to V HIGHC using the column decoder for read operation as shown in Fig.14 . As a result, selected RBL is connected to Output as shown in Fig.17 . In the upper column decoder for write operation Input data is applied to NOR circuit and in the bottom column decoder Invert signal of Input data is applied to NOR circuit as shown Fig.15 . In the case of "1" data write selected WBL is charged to 2V with the upper column decoder for write operation. On the other hands, the same selected WBL is set to 0V with the bottom column decoder circuit. Therefore, write current flows in the selected WBL from top to bottom. BL's clock timing diagram for read and write operation is shown in Fig.18 . In Fig.18 it is assumed that CSL1 and WBL1 are selected. The column decoder circuit also must be placed within the memory cell size of 3F for realizing high density layout. For this purpose SGT is adopted like the row decoder. The pattern layout of the column decoder with SGT is shown in Fig.19 (read operation) and Fig.20 (write operation) . With SGT the column decoder has been successfully laid out within a small pitch of 3F. Furthermore, the lateral length of the column decoder can be reduced to the minimum value. For 64 stages 1T bit stacked MRAM, 1 NOR(lateral length is 25F), 2 inverter(lateral length per inverter is 12.5F) and 1 CSL driver (lateral length per CSL driver is 15F) must be included within one column decoder for read operation. Therefore, the lateral length of the column decoder for read operation becomes 25F+2*12.5F+15F=65F (67.5F for the column decoder for write operation ). To realize low bit cost more than NAND flash memory (1) Cell occupied ratio should be larger than 60% like the high density DRAM. (2) The number of stacked layer should be as large as possible. (3)Applied voltage during read/write operation should be limited to 2-3 volt for realizing high reliability of transistor and wiring . should be realized.
Estimation about WL
For estimating the delay time of WL, the capacitance of WL and the resistance of WL are estimated using the top view of memory cell (Fig.21) . The gate capacitance per memory cell is 0.00045pF using channel width of SGT of 5F, gate length of F, gate oxide thickness of 0.7nm, and design rule F of 39nm. Assuming that 4K memory cell is connected to one WL and that only gate capacitance is taken into account for the capacitance of WL, the capacitance of one WL is 1.49pF. On the other hands, the resistance of WL per memory cell is 0.47Ω using the pattern of WL shown in Fig.21 and sheet resistance of 0.1 Ω/□. It is assumed that low resistance metal material is newly introduced. The resistance of one WL Fig. 21 Top view of memory cell which is connected to 4K memory cells is 1.86 KΩ. As a result the delay time of one WL which can be estimated 1.49pF*1.86KΩ becomes 2.79ns. 2.79ns is smaller than 10% of DRAM access time of 50ns. From these estimation it is found that WL which is connected to 4K memory cells is optimized design for realize high speed operation competitive DRAM. If 8K memory cell is connected to WL, the delay time of WL becomes (2*1.49)pF*(2*1.86)KΩ=11.19ns which is larger than 10% of DRAM access time of 50ns.
For write operation 1mA [14] is required for WL. IR drop of the WL is 1.86KΩ*1mA=1.86V≒2V corresponds to V HIGH of Fig.10 . This value satisfy (3) for realizing high reliability of transistor and wiring. For estimating the delay time of RBL, the capacitance of RBL and the resistance of RBL are estimated using the cross-sectional view of memory cell (Fig.22) . Distance between adjacent WLs is F for the vertical direction. The capacitance of RBL per stage is 0.00048pF (=Capacitance1+Capacitance2). Assuming that 64 stages of memory cell is adopted, the capacitance of RBL is 0.03pF. The resistance of RBL is estimated by using the equivalent resistance of the spin transistor [3] . For the stacked type MRAM with NAND structure, the passed spin transistors and the selected spin transistor are connected in series. Therefore, the total resistance of the passed spin transistors must be smaller than the resistance of the selected spin transistor for realizing the stable operation [3] . The resistance of the selected spin transistor is 21KΩ. It is assumed that the β value of the spin transistor ( gate length of 10nm, channel width of 30nm, drain voltage of 0.05V, gate voltage of 0.25V, and threshold voltage of 0.2V) can be used for this estimation (design rule of 39nm, gate length of 39nm, and channel width of 188nm). Therefore, the resistance of RBL which equals to (the resistance the selected spin transistor) + (the total resistance of passed spin transistors) is 21K Ω+21K Ω=42K Ω. This value is independent of the number of stages of stacked cell. As a results, the delay time of RBL of 64 stages becomes 0.03pF*42K Ω=1.29ns. This value, 1.29ns, is about 1/4 of 10% of DRAM access time. The number of stages of 128 is available if only the delay time of RBL (1.29ns*2=2.58ns) is considered． However, 128 stages cannot be used because of the limitation of cell occupied ratio(1) as follows. For realizing the cell occupied ratio of 60%, the pattern area of main core circuit (in this case row decoder) should be limited to 25% of the chip area [11] . In the case of 64 stages the pattern area of row decoder is smaller than 25% of the chip area. However, in the case of 128 stages, the pattern area of row decoder becomes larger than 25%. Therefore, maximum value the number of stages of memory cell (64 stage) is limited not by the delay time of RBL but by the pattern area of the row decoder which is proportional to the number of stages.
Estimation about of RBL
Estimation about of WBL
The capacitance of WBL is composed of Capacitance 2 of Fig.22 . This value is smaller than 0.00048pF of RBL. The resistance of WBL of 64 stages is 10 ohm*64=640 Ω, if the conventional polycide (Sheet resistance=5Ω/ □) is adopted. This value is smaller than 42KΩ of RBL. Therefore, the delay time of WBL is negligibly small compared with that of RBL. For write operation 1mA [14] is required for WBL. IR drop of WBL is 0.64KΩ*1mA=0.64V which corresponds to V HIGHC of Fig.15 . In this design V HIGHC is set to 2V for realizing larger write current.
Conclusion
The stacked type MRAM with NAND structured cell which has the features of high speed operation competitive DRAM, non-volatility, and lower bit cost than NAND flash memory has been newly proposed. By using spin transistor for memory cell and SGT type transistor which use three sidewalls as the channel, small memory cell size of 9F 2 and the conventional magnetic field writing scheme can be used. The feasibility of the stacked type MRAM with NAND structured cell is verified by the design of 1Tbit NAND MRAM with 39nm design rule. From the design of core circuit 1Tbit NAND MRAM has the possibility to realize high speed operation competitive DRAM, non-volatility, and lower bit cost than NAND flash memory. 64 stages of memory cell and WL which is connected 4K memory cells are key technology for realizing these characteristics. This stacked type MRAM with NAND structured cell is one of the promising candidates for realizing the universal memory.
