For the conventional spin-transfer torque random access memory, tradeoffs exist between read margin and write energy because both read and write currents pass through the same magnetic tunnel junction. To improve the read/write performance and reduce the read disturb rate, three-terminal memory cell structures are investigated and the tradeoffs among read and write performance metrics are explored. A uniform memory array-level benchmarking is performed to compare various spintronic write mechanisms, including spin diffusion, spin Hall effect, domain wall motion, and magnetoelectric (ME) effect. Results show that three-terminal memory cells have the advantage of a small write energy dissipation, and up to two orders of magnitude reduction in the energy-delay product is projected for the domain wall and ME-based memory cells.
I. INTRODUCTION
S PIN-TRANSFER torque random access memory (STT-RAM) is one of the promising candidates for the nonvolatile memory [1] - [3] . An STT-RAM cell stores data in the free magnet layer of a magnetic tunnel junction (MTJ) . If the magnetization of the free magnet in the MTJ is in the same direction as that of the fixed magnet, the cell has a parallel configuration and a small resistance; if the magnetization of the free magnet is in the opposite direction, the cell has a large resistance. The resistance difference can be sensed by passing a bias current and comparing the voltage with a reference MTJ through a sense amplifier [4] . Several proposals have shown the implementation and benefits of using STT-RAM as the on-chip cache replacement thanks to its high density, low standby power, and competitive latency and endurance [5] , [6] . Other studies also show its application as the main memory to achieve a small energy dissipation [7] . The nonvolatility of STT-RAM enables a wide range of applications in the nonvolatile computing [8] . It is crucial to minimize the energy for read and write access due to the limited power supply for the energy-harvesting applications [9] . One of the challenges of STT-RAM is the large critical current required to switch the magnetization of a magnet. Since both read and write currents pass through the same MTJ, tradeoffs exist between read disturb rate and write energy. From the previous experimental work, the tunneling magnetoresistance (TMR) ratio of an MTJ increases with its tunneling oxide thickness [10] . Therefore, for a good memory array-level performance the oxide thickness needs to be sufficiently thick to achieve: 1) a large TMR ratio that creates a voltage swing sensible by a sense amplifier and 2) a small read current that lowers the read disturb rate. However, as the oxide thickness increases, the MTJ resistance rises. Hence, during the write operation, the write energy associated with the Joule heating in the MTJ and the access transistor increases significantly.
To improve the read and write performance and reduce the read disturb rate, three-terminal memory cells have been studied based on the spin Hall effect (SHE), domain wall motion, and magnetostrictive effect [5] , [6] , [11] - [13] . In this paper, for the first time, a variety of spintronic memory arrays with different current-and voltage-based writing mechanisms are investigated and compared in a uniform way, covering the spin diffusion (SD), SHE, domain wall (DW) motion, and magnetoelectric (ME) effect. Compact analytical models are developed to optimize the array-level read and write performance in an efficient way. Optimal design parameters, such as the tunneling oxide thickness of MTJs and their write voltage, are obtained to achieve the minimum energydelay product (EDP) during read and write operations under certain constraints, including minimum sensing swing and read disturb rate. Various TMR ratios of MTJs are also studied to quantify the potential read performance improvement. In addition, comparisons are made between the three-terminal spintronic memory and the conventional STT-RAM at arraylevel performance metrics, including delay and energy per read/write operation, read disturb rate, memory density, and fabrication complexity. The impact of the MTJ process variation on the read and write performance is also investigated.
The rest of the paper is organized as follows. Section II illustrates the structure and layout of the three-terminal spintronic memory cell. Section III describes the read and write performance modeling approaches. Section IV shows the performance comparison and optimization of read/write operation in terms of delay, energy, and read disturb rate for a variety of spintronic writing mechanisms. Conclusions are summarized in Section V. Fig. 1 shows schematics of spintronic memory cells based on four writing mechanisms: SD, SHE, DW motion, and ME effect. By separating read and write paths, the write energy is reduced because the write current does not pass through the resistive MTJ. In addition, an MTJ with a thick oxide is used to reduce the read current and read disturb rate at a given read voltage without sacrificing the write energy. Furthermore, a thick tunneling oxide leads to a large TMR and resistance-area ratio, and potentially improves the sensing margin because the resistance is dominated by the MTJ instead of the resistances of the interconnects and the pass transistor. These benefits come from the extra terminal that is dedicated to the write operation. For the current-driven spintronic memory cells, an additional transistor is also required to isolate the read and write current paths and avoid sneak paths. For the voltage-controlled ME memory cell, the insulating antiferromagnet layer separates the read and write current paths; therefore, a single access transistor is adequate. As a result, the footprint area for current-driven memory cells is 50% larger than those in the conventional STT-RAM and voltage-controlled ME memory cells. Fig. 2 shows the layout and cross-sectional views of various types of spintronic memories. For the conventional STT-RAM and voltage-controlled spintronic memory cell, a two-finger transistor design is used to provide a large driving capability. The current-driven three-terminal memory cell has 50% larger footprint area than the conventional STT-RAM, where F is equal to the half metal pitch of 30 nm. Fig. 2(d) and (e) is cross-sectional views at dashed lines A and B for the voltage-controlled memory cell in Fig. 2(c) . For the current-driven spintronic memory cell shown in Fig. 2(b) , the cross-sectional views at dashed line A for different writing mechanisms are shown in Fig. 2 (e)-(g). With an extra write terminal, the fabrication complexity increases. For instance, a two-terminal STT-RAM cell needs three lithography-etch steps and masks to accomplish vias to the transistor, MTJ stacks, and sourceline and bitline interconnects based on the 193 nm wavelength lithography technology; for a threeterminal cell, six lithography-etch steps and five masks are required. Therefore, with two more masks to create active layers and gates of transistors, the total number of masks increases from five to seven. From the layout views shown in Fig. 2 (a)-(c), the via density of the conventional STT-RAM, three-terminal current-driven memory, and the three-terminal voltage-driven memory are 208.3, 231.5, and 277.8 µm −2 , respectively.
II. THREE-TERMINAL MEMORY CELL IMPLEMENTATION

III. MODELING APPROACHES A. READ PERFORMANCE MODELING
The read operations are identical for all types of memory cells. The MTJ read circuitry is adopted from the default sensing circuitry presented in [14] .
The read delay can be estimated as
where t WL = 0.7R drive C WL + 0.4R WL C WL is the wordline delay, assuming it is driven by a 5× minimum-sized inverter (W = 20 F), R WL is the interconnect resistance, which takes into account the impact of size effects on Cu resistivity. The grain boundary reflectivity and the surface specularity are assumed to be 0.15 and 0, respectively [15] , C WL is the interconnect capacitance, and t sense is the delay associated with the bitline charge or discharge as well as the sense amplifier delay that is simulated based on the SPICE simulation. The bitline capacitance in the simulation is C BL = c w l BL + C tran N bit , which includes both bitline capacitance and the transistor capacitance. The minimum required sensing voltage swing at the input of the sense amplifier is assumed to be 50 mV.
The sense amplifier circuit is adopted from the previous work, and the CMOS device model follows the 16 nm ASU PTM [16] , [17] . The read energy is written as
where E WL = 0.5(C WL /N bit + C tran )V 2 read is the switching energy per cell that is associated with wordline and pass transistor, N bit is the number of bits in a row/column, V read is the read voltage of 1 V, I bias is the average read current from the SPICE simulation, C RBL is the read bitline capacitance, and E SA = P SA t read is the sense amplifier energy based on the SPICE simulation.
The read disturb rate is calculated as [18] 
where
), τ 0 is the attempt period of 1 ns [18] , E is the thermal stability factor, I read is the read bias current, t read is the read delay, and I c is the critical charge current to switch the magnet, which is estimated by following Nikonov's work [19] . Table I lists the simulation parameters and assumptions, including properties of the in-plane magnet anisotropy (IMA) and the perpendicular magnet anisotropy (PMA) materials. The resistance-area product and the baseline TMR ratio for MTJs with various oxide thicknesses are taken from the previous experimental work [10] . Since the maximum TMR observed in the experimental work is 604% at room temperature [20] , MTJs with up to 4× of the baseline TMR (maximum ∼150% [10] ) are investigated in this paper.
B. WRITE PERFORMANCE MODELING 1) CONVENTIONAL STT MEMORY
The write delay is dominated by the wordline delay and the magnet switching time, which is written as t write = t WL + t BL + t mag (4) where t WL is the wordline delay used in the read delay (1), t BL is the bitline delay, and t mag is the magnet switching time following Nikonov's work [19] , which is written as:
where M s is the saturation magnetization, V is the volume of the magnet, e is the elementary charge, µ B is the Bohr magneton, E b is the thermal barrier, I c is the critical switching current of a magnet based on [19] , I s = β · I write is the spinpolarized current,
is the write charge current, V write is the write voltage applied on the write bitline at the edge of the memory array, β is the spin injection coefficient, R MTJ is the average resistance of parallel and antiparallel configurations, which are obtained based on the experimental data [10] . Here, a PMA magnet is used based on the assumptions listed in Table I . The write energy is expressed as
2) SPIN DIFFUSION-BASED MEMORY
The basic writing mechanism is adopted from the all-spin logic proposed in [21] . By applying a write voltage to the input magnet on the left, the current passing through the magnet gets polarized, generating spin-polarized currents that diffuse to the ground and the free magnet on the right. These two spin-polarized currents satisfy the SD equation ∂ 2 µ s /∂x 2 = µ s /l sf and J s = σ/e · ∂µ s /∂x, where µ s is the spin accumulation, J s is the spin-polarized current density, l sf is the spin relaxation length of 400 nm, σ is the interconnect conductivity, and e is the elementary charge. Boundary conditions are as follows: the spin accumulation at the output magnet is zero, and the input spin-polarized current is β·I write , where I write is the charge current, and β is the spin injection coefficient. By solving the equations and associated boundary conditions, the spin-polarized current received at the output magnet is derived as
where l c is the channel length of 4 F, and l g is the length of the ground path. Two types of magnets are studied, including IMA and PMA material. The critical switching current and the corresponding switching delay are adopted from Nikonov's work [19] , and key assumptions are listed in Table I . Since the write current only flows through the magnet instead of the cell MTJ, R MTJ in the write energy (6) is replaced by the ferromagnet resistance R f , which is much smaller than the MTJ resistance and leads to a major energy saving.
3) SPIN HALL EFFECT-BASED MEMORY
The basic writing mechanism is based on the SHE that converts charge currents into spin currents due to the spin orbital coupling [22] . The spin-polarized current density is J s = J c θ, where θ is the spin Hall angle at a value of 0.3 [23] , and J c is the charge current density. For a given write voltage V write , the spin-polarized current I s is obtained, and the magnet switching time t mag is estimated by (5). Equation (6) is used to calculate the write energy by replacing the MTJ resistance R MTJ with R SHE , which is the SHE material resistance and has a value of 150 . Here, IMA magnets are used, whose parameters are shown in Table I . Note that the spin orbital torque switching has only been experimentally demonstrated in a relatively large device. The predictive model used in this paper assuming the device is scalable so that the potential performance is benchmarked against other spintronic memory devices.
4) DOMAIN WALL MOTION-BASED MEMORY
The basic writing mechanism is adopted from the mLogic in [24] . Data stored in the memory cell depends on the DW positions that are set according to the input voltage. The write delay follows (4) by replacing magnet switching time t mag with 6F/c dw , where F is the minimum feature size, and c dw is the DW speed. The relation between the DW speed c dw and the input current density is adopted from [24] . The write energy is adopted from (6) by replacing the MTJ resistance R MTJ with the DW resistance R dw of 200 .
5) MAGNETOELECTRIC-BASED MEMORY
Instead of using STT as the writing mechanism, ME-based spintronics are promising candidates for the memory application because of its voltage-control property [25] . For a given write voltage, the electric field is calculated by E me = V write /t me , where t me is the thickness of the ME material at a value of 2 nm. The corresponding magnetic field applied on the magnet due to the exchange bias effect follows Nikonov's work [19] and is estimated as H app = (B me /E c µ 0 )E me , where the critical field E C = 2.6 MV/m, and the ME exchange bias field B me = 9 mT, giving an ME coefficient of 3.26 ns/m. The magnet switching time t mag is simulated by solving the Landau-Lifshitz-Gilbert equation with a free magnet dimension of 90 × 30 × 2 nm 3 . The write delay is written as t write = t WL + t BL + 0.7(R BL + R tran )C AFM + t mag (8) where the C AFM is the capacitance of antiferromagnet material at a dielectric constant of 13 [19] . The energy dissipation is calculated as
write . (9) VOLUME 3, 2017
IV. SIMULATION RESULTS
A. READ PERFORMANCE ANALYSIS
Using the modeling approach developed in Section III, various performance metrics versus the tunneling oxide thickness are calculated as shown in Fig. 3 . The baseline TMR ratio and three other TMR ratios are investigated to quantify the performance benefits of MTJs at larger TMR ratios. In Fig. 3(a) , the sensing voltage swing V increases with the increase of the tunneling oxide thickness. The reasons are: 1) as the TMR ratio increases, the ON and OFF states of the MTJ are more distinguishable and 2) the increase in the MTJ resistance reduces the bias current and suppresses the impact of the parasitic interconnect and transistor resistances.
For a small oxide thickness, the read disturb rate, shown in Fig. 3(b) , is high because of the small sensing voltage swing and the large read bias current. As the oxide thickness increases, the read disturb rate decreases due to the smaller read current induced by the larger MTJ resistance. The read disturb rate saturates as the oxide thickness increases beyond a certain point because the significantly longer read time diminishes the benefit from the lower read current. Fig. 3(c) shows that optimal oxide thicknesses exist to achieve the minimum read delay. When the oxide is thin, the increase in the oxide thickness improves the sensing voltage swing and reduces the RC delay required to reach the minimum sensing voltage of 50 mV. If the oxide thickness increases beyond a certain point, however, the TMR saturates, and the large MTJ resistance dominates the delay and increases the overall read delay. Another observation is that the memory at a large TMR (green curves) provides a small delay for an MTJ with a thin oxide, but it causes a large delay for the one with a thick oxide. This is because at a thin oxide, the sensing voltage swing is small and a large TMR helps to reduce the read delay, but for a thick oxide, the sensing voltage swing is large enough, and the large MTJ resistance at the OFF state increases the worst case read delay.
For the read energy shown in Fig. 3(d) , optimal tunneling oxide thicknesses exist to achieve the minimum energy. The reason is that if the oxide thickness is small, a large energy is dissipated because of: 1) the large bias current due to a small MTJ resistance and 2) the large read delay due to a limited sensing voltage swing; if the oxide thickness is large, the significant increase in the delay causes a jump in the leakage energy associated with the sense amplifier. Compared to the minimum delay design point, a thicker tunneling oxide is preferred to minimize the read energy. Fig. 4 shows the tradeoffs between the read delay and read energy for memory cells using PMA magnets. The contour of read disturb rates is shown as purple dashed lines, which decreases as the oxide thickness increases. One can observe that optimal oxide thicknesses exist to minimize the EDP for different TMR assumptions. One major design advantage of the three-terminal memory cell is that the performance tradeoffs made in the read operation do not affect the write energy or delay because of the separation between the read and write current paths.
To investigate the impact of the process variation on the memory performance, Monte Carlo simulations are performed at the minimum EDP design point according to Fig. 4 . Fig. 5 shows the normalized probability density function of the EDP for three TMR assumptions. The 3σ deviation is 14 VOLUME 3, 2017 FIGURE 5. Normalized probability density function of EDP during the read operation for three TMR assumptions. assumed to be 10% for the tunneling oxide thickness and the area of MTJs. Due to a small sensing margin, a low TMR memory cell is more sensitive to the process variation, leading to a large variability in terms of the overall EDP. This applies to both conventional STT-RAM and the three-terminal memory cells, since their read operations are identical.
B. WRITE PERFORMANCE ANALYSIS
Based on the assumptions and configurations illustrated in Section III, Fig. 6 shows the total write access energy and corresponding write voltage versus the total write access time. Note that the write voltage is the voltage applied on the write bitline at the edge of the memory array. The actual voltage across the MTJ, magnets, or the SHE material is a fraction of the write voltage due to the voltage drop across the select transistor and long bitline. The oxide thickness of the MTJ is set as 1.25 nm for the STT-RAM. By sweeping the write voltage, tradeoffs between write energy and delay can be achieved. Optimal write voltages exist to reach the minimum EDP because: 1) if the write voltage is high, the large writing current dramatically increases the energy associated with Joule heating and 2) if the write voltage is low, the write current approaches the critical switching current and increases the delay significantly. Fig. 7 shows the optimal write energy versus the write time at the optimal EDP design points. The detailed design parameters and performance metrics are listed in Table II , where the fifth column takes into account the variation of the thermal barrier with a 3σ deviation of 10%. One can observe that the DW-based memory cell provides the best EDP because rather than switching the entire magnet, the DW position inside the magnet moves based on the direction of the input current. Therefore, it requires a lower write voltage and a lower current density than that of switching a whole magnet. Magnetoelectric-based memory cells also provide relatively small energy consumption thanks to its low voltage operation and the voltage-driven property. The minimum write voltage is set to be 100 mV due to the supply noise, which limits the DW motion and ME-based memory cell performance. For the reliability concern, the maximum current density reported in [26] , which is higher than most of the memory cells shown in Table II , except for the SD-based memory cell with IMA magnet because of the large charge current required to generate a spin-polarized current to switch the magnet. Note that the actual current density limits also depend on the duty cycle and material considerations. For the memory application, the average current density is expected to be smaller for memory blocks that are accessed less frequently. Here, the ITRS projection for the current limit is listed only as a reference. Table III shows the delay and energy breakdown for each spintronic memory technology, where E joule is associated with the Joule heating of the interconnects and selecting transistors, and for the ME-based memory, E joule is the dynamic switching energy of the antiferroelectric material, and E BL and E WL are the switching energy of the bitlines and wordlines. From the data shown in Table III , the majority of the energy for the current-driven spintronic memory comes from the Joule heating. For the voltage-controlled ME-based memory, the energy is dominated by the switching energy of the bitlines. For various spintronic memory, the majority of the delay is contributed by the switching time of the magnets.
To further improve the spintronic memory performance in terms of write/read delay and energy, Table IV lists the processing requirement and critical parameters for each memory technology. For the conventional STT-RAM, since the read and write operation share the same MTJ, the MTJ with a large TMR is preferred so that the oxide thickness can be reduced to achieve a small RA, which subsequently lowers the read and write energy. For the three-terminal spintronic memory, the write operation is decoupled from the read operation and can be improved as described in Table IV .
V. CONCLUSION
This paper studies the three-terminal nonvolatile spintronic memory cells. Read and write performance is examined at the array level using a variety of current-and voltage-based writing mechanisms, including SD, SHE, DW motion, and ME effect. Compact memory layouts and structures are developed, and tradeoffs are explored among key performance metrics, such as read/write energy/delay and read disturb rate. Compared to the conventional STT-RAM, three-terminal cells have smaller energy dissipation because of their separate read and write current paths. The DW-based memory cell provides the smallest EDP due to its lowest critical current.
