In this paper, in order to realize 0.4 V operation of STT-MRAM, we propose the counter base read circuit. The proposed read circuit has tolerance for process variation and temperature fluctuation by changing dynamically the load curve in a time-axis at the read operation. We confirmed that the proposed read circuit can operate at the conditions of five process corners (TT, FF, FS, SF, and SS) and three temperatures (−20 • C, 25 • C, and 100 • C) by HSPICE simulations. At the condition of TT corner and 25 • C, read time of the proposed circuit is 271 ns, and energy consumption is 1.05 pJ at "1" read operation and 1.23 pJ at "0" read operation.
Introduction
The capacity of embedded memory on a chip has kept increasing. It is important to reduce the leakage power of embedded memory for low-power LSIs. In fact, the ITRS predicts that the leakage power in embedded memory will account for 40% of all power consumption by 2024 [1] . A spin transfer torque magnetoresistance random access memory (STT-MRAM) is promising for use as non-volatile memory to reduce the leakage power. It is useful because it can function at low voltages and has a lifetime of over 10 16 write cycles [2] . In addition, making STT-MRAM suitable for use in high-density products [3] , [4] , [5] , [6] , [7] . STT-MRAM uses magnetic tunnel junction (MTJ) device. MTJ has magnetoresistance and that value can be changed by MTJ's state. The state of MTJ determined by the magnetization direction of the fire layer which is one of the layers consisting the MTJ. The MTJ has two states, parallel and anti-parallel states. In the parallel state, the magnetoresistance value of the MTJ becomes higher than that of anti-parallel state. Figure 1 shows the conventional read circuit schematic [8] . The node "S" is the input of the sense amplifier. The voltage of the node "S" determined by the balance between load current (I load ) and read out current (I P , I AP ). The resistance value of bit cell is dependent on the datum. Therefore, the cell current has two patterns. This conventional circuit makes the 130-mV difference between the parallel and anti-parallel states at TT corner.
Conventional Read Circuit
However, at the FS corner, there is only 40-mV difference between the two states. This voltage difference is too small to be read out for a sense amplifier. As well, it is difficult to make an appropriate reference voltage for every process corner. This is the reason the conventional circuit cannot operate correctly in the low voltage area. Figure 3 shows the 1-Mb STT-MRAM macro with the proposed bitline digitize circuit. The proposed read circuit converts a bitline voltage, which depends on a target cell datum, to a digital value. To compare with the cell data, two reference cell columns is further added. All cells in the reference "0" column have data "0" whereas all data are "1" in the reference "1" column. In read operation, both are read out at the same time, and they are changed to digital values in the proposed read circuit. Then, the reference value is determined as an average of their digital values. After that, the target cell's bitline voltage is changed to digital value and compared with the reference value. The output data are decided by comparing the digital values. Figure 4 shows the dynamic load circuit. The dynamic load circuit consists of negative resistance circuit and four boosting nMOSes. These four boosting nMOSes have different current driving capabilities. When the minimum is set to 1, the ratio of the current drive capabilities is 1:2:4:8. The dynamic load circuit has a characteristic that the load curve changes over time dynamically. The voltage of node "S" is changed by the dynamic load circuit. It means the input of the ring oscillator changes dynamically. In addition, the bitline voltage can be digitized by counting number of oscillations, because the frequency of the ring oscillator depends on the voltage of node "S". Therefore, the data of the target cell can be determined by combining with the bitline digitize circuits.
Proposed Read Circuit
The left side of Fig. 5 shows the current characteristics in the case of LE<3:0>="1001". In this proposed circuit, the supply current from the negative resistance circuit (I neg ) will not decrease uniformly. I neg has a characteristic that decreases once it has increased as the voltage of the node "S" increases. This proposed circuit can make the larger voltage difference between the low and high states by summing up the input current from the negative resistance circuit and the NMOS load circuit. This proposed circuit corresponds to the variation of characteristics by sequentially switching the LE signal from LE<3:0>="0000" to "1111". The right side of Fig. 5 shows the current characteristics of dynamic load current at all steps. Figure 6 shows transient simulation results of the proposed circuit at TT corner, 25 • C. In this simulation, the LE signal is switched from "0000" to "1111" in every 100 ns. In this case, if the target cell datum is 1 (=AP state), the oscillator stops in LE<3:0>="0000" because the input voltage becomes higher than their threshold voltage. But if the target cell datum is 0 (=P state), the oscillation continues until LE<3:0>="1011". Though, the data can distinguish by counting the number of oscillations.
Simulation Results
We simulated the proposed circuit in the all process corners (TT, FF, FS, SF, SS). Temperature conditions are −20 • C, 25 • C and 100 • C. The operating voltage setting is 0.4 V. As the characteristics of the MTJ, we configured the MR ratio is 100%. The resistance values are 3.5 kΩ in the parallel state, and 7 kΩ in the anti-parallel state. We evaluated the bitline digitize circuit with dynamic load circuit for accuracy, readout time and energy consumption. Table 1 show the count of the ring oscillator in each condition. c 2016 Information Processing Society of Japan This result shows the proposed circuit is possible to distinguish between the P and AP states at 0.4 V VDD. However, in particular at the SS corner, the number in the count difference between the P and AP is small. This result implies that the proposed circuit needs more than 100 ns in switching time of the LE signal. Table 2 shows the energy consumption and cycle time in the read operation. In this simulation, we assumed the read operation is finished when the difference of the count between the two states becomes 10. In the case of TT corner, 25 • C, the readout time is 271 ns. The energy consumptions of "0" read is 1.23 pJ and "1" read is 1.05 pJ respectively. In the −20 • C cases, the readout time becomes longer than the other temperature cases. It degrades the energy consumptions to the other temperature conditions.
Chip Implementation and Conclusion
We fabricated a 4-Mb STT-MRAM using a 65-nm process technology. The left side of Fig. 7 shows the layout of test chip and the right side shows TEM micrograph of the MTJ. The area of proposed circuit is 180 µm 2 . It means that the area overhead of 1-Mb macro is 0.53%. In this test chip, the charge pump circuit provides a 1.6-V boosted voltage to a gate of an access transistor, which suppresses effects of a threshold voltage variation of the access transistor and a cell current variation, thus draws more readout current. In the TEM micrograph, thin and white area is tunnel insulating film. The free layer and pinned layer are composed almost entirely of CoFeB.
In this paper, we proposed the counter base read circuit for 0.4-V operating STT-MRAM. The proposed circuit is confirmed that operates in all process corners and three temperature conditions at 0.4-V VDD by the simulation. In the case of TT, 25 • C, the cycle time is 271 ns and energy consumptions are 1.23 pJ in "0" read operation and 1.05 pJ in "1" read operation. 
Yohei

