The implementation of a nonvolatile field-programmable gate array (NV-FPGA) using spin-transfer-torque magnetic tunnel junction (STT-MTJ) devices is one promising solution for an ultra-low power FPGA and we have proposed a compact STT-MTJ-based nonvolatile lookup table (NV-LUT) circuit [1] [2] [3] . The only drawback for the STT-MTJ-based LUT circuit is a little complicated circuit design due to the two-terminal structure of an STT-MTJ device where read current and write current pass through the same path [4] . This paper presents a new design paradigm for the spintronics-based nonvolatile FPGA using three-terminal MTJ (3T-MTJ) devices [5] [6] . Since read path and write path are electrically separated, the use of 3T-MTJ devices makes it possible not only to expand the design space for both read and write operation but also to simplify the nonvolatile logic circuitry. In fact, the proposed 6-input NV-LUT circuit demonstrates 65% of transistor-count reduction with 40% of active power reduction compared to those of a conventional CMOS-based implementation where volatile SRAM cells are replaced by MTJ-based nonvolatile SRAM cells [7] . 2. Nonvolatile LUT Circuit with Single-Ended Structure Figure 1 shows a typical 3T-MTJ device where domainwall motion is utilized [5] . The resistance value R MTJ depends on the spin direction of the sense layer (low resistance R P or high resistance R AP ). And the resistance value can be changed by applying write current I AP-P or I P-AP . Thus, the 3T-MTJ device can be regarded as a variable resistor where read current path and write current path are separated. The difference between R P and R AP which is called TMR ratio is defined as R=(R AP -R P )/R P , and a large TMR ratio over 6.0 is reported [8] . The most important point is that such any high TMR ratio can be realized in the 3T-MTJ device since write current is not affected by R MTJ .
Introduction
The implementation of a nonvolatile field-programmable gate array (NV-FPGA) using spin-transfer-torque magnetic tunnel junction (STT-MTJ) devices is one promising solution for an ultra-low power FPGA and we have proposed a compact STT-MTJ-based nonvolatile lookup table (NV-LUT) circuit [1] [2] [3] . The only drawback for the STT-MTJ-based LUT circuit is a little complicated circuit design due to the two-terminal structure of an STT-MTJ device where read current and write current pass through the same path [4] . This paper presents a new design paradigm for the spintronics-based nonvolatile FPGA using three-terminal MTJ (3T-MTJ) devices [5] [6] . Since read path and write path are electrically separated, the use of 3T-MTJ devices makes it possible not only to expand the design space for both read and write operation but also to simplify the nonvolatile logic circuitry. In fact, the proposed 6-input NV-LUT circuit demonstrates 65% of transistor-count reduction with 40% of active power reduction compared to those of a conventional CMOS-based implementation where volatile SRAM cells are replaced by MTJ-based nonvolatile SRAM cells [7] . Figure 1 shows a typical 3T-MTJ device where domainwall motion is utilized [5] . The resistance value R MTJ depends on the spin direction of the sense layer (low resistance R P or high resistance R AP ). And the resistance value can be changed by applying write current I AP-P or I P-AP . Thus, the 3T-MTJ device can be regarded as a variable resistor where read current path and write current path are separated. The difference between R P and R AP which is called TMR ratio is defined as R=(R AP -R P )/R P , and a large TMR ratio over 6.0 is reported [8] . The most important point is that such any high TMR ratio can be realized in the 3T-MTJ device since write current is not affected by R MTJ . Figure 2 (a) shows a block diagram of an NV-LUT circuit using STT-MTJ devices [1] [2] [3] . Since the read current I RD and R are limited by its write current I WR [4] , the NV-LUT circuit is implemented using differential-pair-based circuitry which strongly amplifies the small difference between read current I RD and reference current I REF . In contrast, voltage division between a pull-up resistor and an NMOS/MTJ tree is directly used for logic operation in the proposed LUT circuit as shown in Fig. 2 (b) . Since read path and write path are separated in a 3T-MTJ device, I RD and R are not limited by I WR . Thus, sufficient sense margin can be achieved if I RD and R are increased. Since no reference tree is necessary and the sense amplifier is simply implemented using two cascaded inverters, the total transistor counts of the proposed LUT circuit are smaller than those of the differential-pair-based one.
Nonvolatile LUT Circuit with Single-Ended Structure
There are two key requirements for the proposed single-ended LUT circuit design as shown in Fig. 3 (a) ; one is the reduction of active power dissipation due to DC current I RD . Another is maximization of difference of output voltage swing via resistive division between a pull-up resistor and an NMOS/MTJ tree (V RD ) since it affects both sense margin and switching delay of the sense amplifier. Figure 3 (b) shows the proposed technique for active power reduction. Two cascaded inverters (INV1, INV2) are utilized not only for the sense amplification of V RD but also for the latch function to hold output voltage level by creating a feedback loop from INV2 to INV1. If the output voltage level is held by the cross-coupled inverters, read current I RD can be completely cut off by the current-control switch. As a result, the read path is activated during only T EN (it depends on the switching delay of the sense amplifier) and active power dissipation can be reduced. Figure 3 (c) shows the proposed resistive-division scheme for the increment of V RD ; since the total impedance of an NMOS/MTJ tree is boosted due to the source degeneration effect [9] , a 3T-MTJ device is connected to the source side of an NMOS transistor. Moreover the use of a PMOS pull-up resistor makes it possible to increase V RD compared to that of fixed pull-up resistor. Figure 4 shows a circuit diagram and a layout of the proposed 2-input LUT circuit. A level-sensitive latch circuit is utilized for both sense amplification and latch function to hold output voltage level as described above. A truth table for a two-input logic function is stored into four MTJ devices (R 0 , R 1 , R 2 , R 3 ) in the 2-input selector tree. In write operation, word lines (WLs) and bit lines (BLs) are activated and a bi-directional write current I WR is applied to the corresponding MTJ device. Logic operation is performed by activating EN at high level and applying logic inputs (X 0 , X 1 ). Since MTJ devices are stacked over CMOS plane, the effective area of the LUT circuit is small (31.1m 2 in a 90 nm CMOS/MTJ technology). Table. 1 summarizes the performance comparisons of 6-input LUT circuits; a conventional CMOS-based one, a differential-pair-based one, and the proposed one. Note that a TMR ratio of the STT-MTJ device is set 2.0 since it is limited by its write current [4] , while that of 3T-MTJ is unlimited and can be realized even R=6.0. Since only one simple sense amplifier is required, the proposed LUT circuit exhibits the smallest number of transistor counts. Moreover, active power is also small since wasted DC current is completely cut off. The benefits of the proposed circuit compared to CMOS based one are also demonstrated even if V DD is reduced as shown in Fig. 5. 
Evaluations

Conclusions
In this paper, a new LUT circuit using single-ended structure together with 3T-MTJ devices is presented. As a future prospect, it is important to consider how to utilize the benefits of the proposed circuit in terms of write operation. (c) Resistive division scheme using PMOS pull-up resistor. Table 1 . Performance comparisons of 6-input LUT circuits. 
