The usage of this PDF file must comply with the IEICE Provisions on Copyright. The author(s) can distribute this PDF file for research and educational (nonprofit) purposes only. Distribution by anyone other than the author(s) is prohibited. 
Introduction
The capacity of embedded memory on a chip has been increasing. In fact, the ITRS predicts that the leakage power in embedded memory will account for 40% of all power consumption by 2024 [1] . A spin transfer torque magnetoresistance random access memory (STT-MRAM), which stores data as magnetic resistance states, is promising for use as non-volatile memory to reduce the leakage power. It is useful as embedded memory because it can function at low voltages and has a lifetime of over 10 16 write cycles [2] . In addition, STT-MRAM technology has a smaller bitcell than an SRAM, making STT-MRAM suitable for use in high-density products [3] - [7] . Figure 1 shows a schematic 1T1MTJ bitcell with one transistor and one magnetic tunnel junction (MTJ) STT-MRAM bitcell. The MTJ is a magnetoresistive device and has pinned and free layers with a tunnel barrier (MgO barrier) between them as an insulator. The MTJ has two states: a parallel state and an antiparallel state. The magnetization direction of the free layer determines the state. In the free layer, the magnetization direction can be switched by the current flowing through the MTJ, which corresponds to a datum stored in a bitcell. The MTJ resistances are low and high in the parallel and antiparallel states, respectively. In the read operation, the stored datum is read out as the difference in the flowing current. Although the MTJ has the potential to operate at less than 0.4 V [8] , such low-voltage operation has not been demonstrated to date for an STT-MRAM macro because the design of the peripheral circuitry is difficult. A pMOS load sense amplifier [8] or a sense amplifier with an op-amp for a replica bias [9] does not function at such a low voltage. Figure 2 shows a conventional read circuit [8] and its bias condition at a supply voltage of 0.4 V. This readout circuit draws a read current I load to the STT-MRAM cell. The voltage of node "S" is determined by the cell datum because the resistance of the STT-MRAM is dependent on the state of the MTJ. Specifically, the voltage at node "S" is determined by the balance of I load and the cell current (I P or I AP ) as shown in the figure. The sense amplifier can distinguish the datum, for instance, at a typical process corner where the voltage difference between the parallel and antiparallel states is sufficiently larger than the offset voltage of the sense amplifier. Figure 3 shows another bias condition at the FS corner; the conventional read circuit, however, cannot operate at 0.4 V because the voltage difference at node "S" becomes as small as 40 mV, which is insufficient to operate the sense amplifier.
Copyright c 2014 The Institute of Electronics, Information and Communication Engineers [7] , [9] - [15] . The operating voltage in these studies was 1.0 V or more, which indicates that it is difficult to realize peripheral circuits that can operate at low voltages. Herein, we present an STT-MRAM operating at a single 0.4 V supply voltage. Our proposed sense amplifier functions effectively below 0.4 V at any process corner, as a result of assistance from a charge pump circuit. Figure 5 shows a macro-block diagram of the proposed 8 Mb STT-MRAM. A Dickson charge pump circuit provides a boosted voltage to eight 1 Mb STT-MRAM macros. A schematic of the charge pump circuit is depicted in Fig. 6 . A 0.4 V clock swing is doubled to a 0.8 V amplitude by a double boosted clock (DBC) generator, which is then forwarded to a charge pump capacitor. The potential output voltage of this Dickson charge pump is 3.0 V (= 5 (2 VDD − V thn ) = 5 (0.8 − 0.2)) because the clocking amplitude from the DBC generator is doubled to 0.8 V. In the macro, the supply voltage from the charge pump (VDD B ) is controlled to 1.6 V so as not to damage the transistors. Figure 7 shows the block diagram of the macro and voltage domains of the 1 Mb STT-MRAM. The macro comprises four 256 kb blocks, each of which consists of 512 bits × 512 words. The supply voltage (VDD) is 0.4 V. Figure 8 shows the bitcell array and the peripheral circuits. To minimize the voltage drop through a bit line, a column selector using a transmission gate is adopted; the gate voltage of the transmission gate is controlled with a boosted voltage (VDD B ) of 1.6 V. In actual macro designs, the transmission gates connected to the global bit line and the global source line are located on opposite sides so that the sum of the wiring lengths of the bit line and the source line is the same for any STT-MRAM bitcell to balance the current path and voltage drop. Therefore, the parasitic resistance is the same in an STT-MRAM bitcell regardless of its position. Figure 9 presents the operating waveforms. When writing the datum "1", the bit line (BL) is increased to 0.4 V, whereas the source line (SL) is raised to 0.4 V when writing the datum "0". VDD B is provided as a word line voltage, which suppresses the variation of the cell current caused by variations in the access transistors. Figures 10 and 11 show SEM micrographs of a CoFeBbased MTJ and the STT-MRAM bitcell layout respectively. The dimensions of the MTJ are 59 × 59 nm 2 . The STT-MRAM process is the same as that described in earlier reports [2] , [16] .
8-Mb STT-MRAM Design
A detailed schematic of the proposed sense amplifier is shown in Fig. 12 . The bitcell datum is determined by the voltage of node "S" and is input to the sense amplifier. Figure 13 shows details of the current flowing through the proposed circuit. In the initial state, the initializing switch grounds node "S". This cuts off the leakage current through M p0 (I neg0 ) in the current mirror of the negative-resistance pMOS load. In the read state, the "Read enable" signal becomes high and the nMOS load transistor (M n0 ) turns on. Then a load current (I load ) flows from VDD. The voltage of node "S" is higher in the early phase of the read operation. This is because the output current from node "S" which flows to the clamp transistor and MRAM cell is smaller than the input current of node "S" I load . When the voltage of node "S" becomes higher than V th for M n1 , M p1 drives the current from VDD. The readout currents I load and I neg1 flow from VDD, which exhibits 0.4 V operation. The boosted voltage of VDD B is used for the gate of the nMOS load transistor (M n ) and the initializing switch in the reading structure. Figure 14 shows operating curves of the load circuits at a typical process corner (TT: pMOS = typical, nMOS = typical). The resistances of the MTJ are 3.5 kΩ and 7 kΩ respectively, in the parallel and antiparallel states. The total load current I cell = I load + I neg1 is a function of the voltage of node "S". The intersection of the load current and I P ("L") or I AP ("H") results in I cell . The voltage difference between "L" and "H" is greater than 250 mV, which is much more than that of a conventional pMOS load circuit [3] ; VDD/2 is effective as a reference voltage (V REF ). The size of the boosted nMOS load Mn can be reduced (moreover, its standby leakage can be reduced) because it operates in a linear region by virtue of its boosted voltage. Therefore, the load cur- rent is sufficient even with a small transistor. The proposed sense amplifier is tolerant to process variations, as shown in Figs. 15(a)-15(d) . Even at the FF, FS, SF, and SS corners, the proposed sense amplifier can distinguish parallel from antiparallel states. Figure 16 shows the results of a Monte Carlo simulation of the proposed circuit. The number of trials is 1 M. The simulation was performed by varying the MTJ and the access transistor at the TT corner. In the figure, the minimum and maximum values of I P and I AP are shown. Figure 17 shows histograms of the voltage of node "S" in the 10 6 Monte Carlo simulations at the intersections of I load with I P and I AP . The proposed circuit is tolerant to variations in both the MTJ and the access transistor. 
Chip Implementation and Measurement Results
We fabricated a 65 nm test chip at the TT process corner, as shown in Fig. 18 , to evaluate the low-voltage and lowleakage operation. The detailed fabrication process of the MTJ device used in the test chip is presented in references [2] , [16] . The macro size is 2.2 × 2.9 mm 2 . Figure 19 shows a Shmoo plot of the test chip. We confirmed operation at 0.38 V for a cycle time of 1.9 µs (the operating frequency is therefore 0.526 MHz); under these conditions the operating power is 1.70 µW. At this low voltage, the read operation is achieved using the proposed sense amplifier; the write operation is carried out by applying a long write pulse with a small write current. Figure 20 shows the energy consumption of the proposed STT-MRAM and a low-voltage SRAM [17] fabricated with the same process technology. Both sets of results are measured values. The ratio of read to write accesses is 50:50. At an operating voltage of 0.5 V, the energy consumed by the STT-MRAM is 3.03 times larger than that consumed by the SRAM. Figure 21 presents a breakdown of the energy components. The ratios of active energy (E active ) to total energy (E active + E leak ), are 96.7% and 15.4% in the STT-MRAM and SRAM, respectively. Figure 22 shows an comparison of the energy when the utilization of the memory bandwidth is changed. The STT-MRAM is superior to the SRAM in terms of energy consumption if the utilization of the memory bandwidth is 14% or less, which means that the STT-MRAM is suitable for use in less active applications such as the healthcare systems and sensor networks. Table 1 shows the characteristics of the test chip. 
Conclusion
We presented a new sense amplifier with tolerance to process valiations for an STT-MRAM operating at low voltages. The proposed sense amplifier can distinguish between parallel and antiparallel states at all process corners. We fabricated an 8 Mb STT-MRAM using a 65 nm process technology. The test chip exhibits 0.38 V operation at a frequency of 0.526 MHz, at which the power consumption is 1.70 µW. The proposed STT-MRAM operates at a lower energy than an SRAM when the utilization of the memory bandwidth is 14% or less.
Yohei Umeki
was born on December 20, 1985. He received a B.E. degree in Computer and Systems Engineering from Kobe University, Hyogo, Japan, in 2012. He is currently on a master courses at Kobe University. His current research is on low-power SRAM and lowvoltage MRAM designs.
Koji Yanagida
received a B.E. degree in Computer and Systems Engineering from Kobe University, Hyogo, Japan, in 2011. Currently, he is a master courses student at Kobe University. His current research is on low-voltage MRAM designs and low-power FeRAM designs.
