Spin-orbit torque (SOT) from the spin Hall effect (SHE) [1] [2] [3] [4] [5] [6] [7] [8] in heavy metals (HMs) can rapidly and reliably switch an adjacent ferromagnet (FM) free layer of a nanoscale MTJ in a three-terminal configuration (3T-MTJ). This effect provides the strategy for a new generation of fast, current-and energy-efficient cache magnetic memory [9] [10] [11] [12] [13] [14] [15] . The separate read and write channels in the 3T-MTJ geometry offer additional advantages; faster read-out without read disturbance and lower write energy. While the development of SOT switching has focused primarily on nanoscale perpendicularly magnetized (PM) MTJs, their SOT effective-field switching requires much higher currents than can be provided by a reasonably scaled CMOS transistor (current densities in the SH channel are ≥ 1.4 × 10 8 A/cm 2 ) [16] , and fast, low write error rate (WER) switching has not yet been demonstrated. SOT switching of a PM MTJ also requires an in-plane bias field to obtain deterministic reversal but strategies have been recently demonstrated where an antiferromagnetic pinning layer or an electric field successfully provides this bias field [5, 17, 18] . Here we report a dramatic performance improvement for in-plane-magnetized 3T-MTJs wherein the strong SOT arising from nano-channels of beta-phase W is combined with two recently discovered effects of Hf atomic layer modifications of the FM-MgO and HM-FM interfaces that, respectively, enhance the interfacial perpendicular magnetic anisotropy energy density [19] and reduce interfacial spin memory loss [20] . The result is an anti-damping SOT switching current density of just 5.4 × 10 6 A/cm 2 . We also achieve reliable (WER ≈ 10 -6 ) switching with 2 ns pulses, which we tentatively attribute, at least in part, to the beneficial assistance of the field-like SOT arising from the spin current generated by the W spin Hall effect. In Fig. 1a we show a schematic of the W-based 3T-MTJ device structure along with (inset) a scanning electron microscope (SEM) image of a typical elliptical nano-pillar MTJ on top of the W SHE channel after it has been defined by electron-beam lithography and argon ion milling.
We demonstrate the potential of these W-based in-plane-magnetized (IPM) 3T-MTJ devices by reporting in detail on the representative performance of a high-aspect-ratio, 190 nm × 30 nm, MTJ device fabricated on a 480 nm wide W channel. This device was annealed in an air furnace at 240 °C for 1 hour after patterning to increase the tunneling magnetoresistance (TMR) of the MTJ and also reduce the switching current as discussed below. In the inset to Fig. 1b we first show the minor magnetic loop response of the MTJ resistance as an in-plane magnetic field H ext is applied along the long axis of the MTJ device and ramped over ± 300 Oe, which is sufficient to reverse the orientation of the thin bottom free layer (FL) of the MTJ from being parallel (P) to anti-parallel (AP) to the thicker FeCoB reference layer, but not strong enough to reverse the orientation of the reference layer due to its stronger shape anisotropy. The horizontal offset of the minor loop (~ −50 Oe) is due to the dipole field from the reference layer. All subsequent SOT measurements are taken when this offset is canceled by an appropriate H ext [21] .
In the main part of Fig. 1b we show the characteristic DC SOT hysteretic switching behavior of the IPM 3T-MTJ as the bias current in the W channel is ramped quasi-statically. The switching polarity is consistent with the negative spin Hall sign of β-W in comparison to that of platinum [4, 22] . For nanoscale MTJs thermal fluctuations assist the reversal during slow current ramps. Within the macrospin or rigid monodomain model the critical current I c that is observed is dependent on the current ramp rate [23] ,
Here I c0 is the critical current in the absence of thermal fluctuation, I is current ramp rate, Δ is the thermal stability factor that represents the normalized magnetic energy barrier for reversal between the P and AP states, and τ 0 is the thermal attempt time which we assumed to be 1 ns. To characterize the SOT behavior of this device we measured the mean switching current for I varying from 10 -7 A/s to 10 In Table 1 we compare critical switching current density J c0 achieved in various in-plane and out-of-plane 3T structures. The different types of SOT devices have different minimum sizes as determined by thermal stability requirements, which in turn will set the current amplitude required for switching or domain wall motion. For the PMA SOT nanodot devices currently a 40 nm diameter is required [16] , which would necessitate a minimum current of approximately 300 µA for reversal using a 40 nm wide, 4 nm thick Pt-based IPM 3T-MTJs with low WER [22] , but a high current pulse amplitude was required, 2-3 I c0 , with I c0 > 500 µA. To characterize the performance of the W-based 3T-MTJ device in the short pulse regime, we separately measured the switching phase diagram for the two cases, P AP → and AP P → , using a fast pulse measurement method [21] . Results are shown in Fig. 2a and 2b where each data point is the statistical result of 1000 switching attempts, with the scale bar on the right showing the switching probability. Although micromagnetic modeling indicates that for strong short pulses these 3T-MTJ devices do not reverse simply as a rigid domain [22] we can still utilize the macrospin model [24] as an approximation to characterize the short pulse response by fitting the 50% switching probability boundary between the switching and non-switching regions with,
The results shown in the solid curves provide a reasonable fit to the data despite the simplifying macrospin assumption. From these fits we extract the characteristic switching times τ 0 and critical switching voltages V 0 to be 0.76 ns and 0.48 V for P AP → and 1.20 ns and 0.44 V for AP P → . The short pulse critical switching current (current density) as calculated from V 0 and the channel resistance R ≈ 3.6 kΩ is I c0 ≈ 120 μA (J c0 ≈ 5.9 × 10 6 A/cm 2 ), consistent with the ramp rate results.
For cache memory SOT reversal has to be both fast and highly reliable and in this latter regard our results with this W-based IPM 3T-MTJ approach offer encouraging prospects as indicated by Fig. 2c , where we show WER results measured with 2 ns pulses on the same device. We applied square switching pulses of increasing voltages to the W channel and recorded states of the device after each switching pulse. For every voltage level we repeated the switching attempts 10 6 times and calculated WER based on switching probability WER = 1 − P switch . At 2 ns, WER of close to 10 -6 is achieved for both polarities P AP → and AP P → , which indicates the potential of this approach for high reliability. Note that our current results were limited to 10 -6 WER (V ≤ 3.5 V 0 ) due to the constraint on the highest pulse voltage we can apply to the channel imposed by a less than optimal electrode design (spreading resistance) and a poor quality field insulator.
Straightforward improvements in both should lower V 0 and enable measurements with
The observed anti-damping SOT reversal on τ 0 ≤ 1 ns timescale is much faster than predicted by the rigid domain, macrospin model. A key conclusion in the initial report on fast switching with Pt-based IPM 3T-MTJs was that the in-plane Oersted field H Oe generated by the pulsed current is advantageous in promoting the fast reliable switching because it opposes the anisotropy field H c of the FL at the beginning of the reversal [22, 25] . Due to the opposite sign of the SHE for W-based 3T-MTJs the pulsed . This field-like torque efficiency corresponds to an effective field −6.68 × 10 -11 Oe/(A/m 2 ) in the MTJ structure with a 1.8 nm free layer that is oriented in opposition to the Oersted field generated by the electric current, as previously reported for W devices [4] , and approximately three times larger. Thus the net transverse field is in opposition to the free layer in-plane anisotropy field at the beginning of the reversal and hence may be playing an important role in the fast reliable W-based 3T-MTJ results reported here.
In addition to utilizing the high spin torque efficiency of β-W we have employed two other materials enhancements, the sub-monolayer "dusting" and monolayer "spacer" of Hf that were inserted respectively between the FL and the MgO and between the W and the FL, to achieve this exceptionally low pulse current (density) switching performance.
For 3T-MTJs the SOT switching current density, within the macrospin model, is predicted to vary as [26, 27] 
where e is the electron charge, ℏ is the reduced Plank constant, μ 0 is the permeability of free space, M s is the saturation magnetization of the FL and t FM is the FL's effective magnetic thickness, which were measured to be 1. area microstrip of the same heterostructure composition [21] . This difference may be due to an increase in damping due to side-wall oxidation of the nanopillar in the lithography process, which can be addressed by in-situ passivation in the future [28] . 1d ). We have confirmed with switching experiments that this reduction of M eff indeed significantly reduces critical switching current [21] . The additional reduction to M eff = 2110 Oe for the sample with the added Hf spacer can be attributed to some of that Hf diffusing through the FeCoB to the MgO interface during the anneal [20, 29] . Another benefit of the Hf spacer is that its insertion decreases α very substantially from 0.018 to 0.012 (Fig. 1e) which we attribute to a passivation of the W surface that suppresses reaction between the W and FeCoB that would otherwise result in interfacial spin memory loss [30] . While there is some spin current attenuation from the use of the Hf spacer [20, 31] , its effectiveness in lowering the effective damping, and M eff substantially outweighs that cost.
Integration of MRAM with CMOS usually requires thermal processing above 240 °C. Annealing at higher temperatures can also be beneficial in producing higher TMR. The 190 nm × 30 nm free layers analyzed above became thermally unstable due to further decrease in M eff after annealing at 300 °C, but it is important to note that the Hf dusting technique itself becomes even more effective after processing at T ≥ 300 °C. We performed FMR measurements on an un-patterned section of the wafer with only the 0.1 nm Hf dusting layer after it was annealed at 300 °C for 1 hour. As illustrated in Fig. 3a raising the annealing temperature from 240 °C to 300 °C resulted in approximately a 2.5 × reduction in M eff from 4300 Oe to 1550 Oe, while there was no material effect on M s [21] , a compelling demonstration of the effectiveness of Hf dusting in enhancing K s .
To examine the SOT switching behavior of devices with such low M eff we patterned larger 390 nm × 100 nm, and hence more thermally stable, MTJs from the wafer and annealed two of them at the two different temperatures, 240 °C and 300 °C respectively.
Consistent with the M eff change, we see clean SOT switching with a much lower critical current, I c0 = 155 μA, after 300 °C annealing temperature in comparison to the 240 °C critical current I c0 = 335 μA.
In summary, we have achieved nanosecond-scale, reliable, low-amplitude pulse current switching in W-based IPM 3T-MTJs by utilizing a partial atomic layer of Hf dusting between the FL and the MgO which very effectively reduces M eff of the FL, while a further reduction in the required pulse amplitude is achieved by inserting approximately one Hf monolayer between HM and FM which significantly reduces interfacial spin memory loss. This ability to achieve a low M eff with a relatively thick free layer through use of the particularly strong interfacial anisotropy effect of Hf-O-Fe bonds [19] has enabled us to minimize the detrimental effect of interfacial enhancement of magnetic damping, and, due to the thicker free layer, arguably also hinders the formation of localized non-uniformities during the fast reversal that would otherwise result in write errors.
Further decreases in I c , to well below 100 µA, should be quite straightforward with refinements in device design. For example, to ensure successful fabrication, the major axis of our elliptical MTJ nanopillars is less than 50% the width of the spin Hall channel so that up to a factor of two reduction in I c can be expected simply with more aggressive, industry-level lithography. Smaller nanopillars on even narrower channels, ≤ 100 nm, should be possible through the use of slightly thicker FLs (~ 2 nm) to promote thermal stability, with the robust interfacial magnetic anisotropy effect of the Hf dusting technique providing the means to still achieve a low M eff . We anticipate that these approaches, in conjunction with an improved device geometry that substantially reduces the spreading resistance, should lower the pulse write current for fast, reliable switching to ≈ 20 µA and the write energy to the ≤ 10 fJ scale. 
