Abstract-We report experimental results on write error rate and read disturbance as a function of read/write pulse width and amplitude in electric-field-controlled magnetic tunnel junctions (MTJs). Results are shown for 50 nm perpendicular MTJs. We also design and simulate the performance of a 256 kilobit (Kbit) magneto-electric random-access memory (MeRAM) macro in a 28 nm complementary metal-oxide semiconductor (CMOS) process, based on the measured MTJ device data. The results show that existing electric-field-controlled MTJs are capable of delivering write error rates below 10 −9 for 10 ns total write and verify time and read disturbance below 10 −16 for 2 ns read time in a 256 Kbit MeRAM array.
I. Introduction
Magnetic random-access memory (MRAM) based on the spintransfer torque (STT) effect is gaining increased traction within the semiconductor industry [Lu 2015 , Nowak 2016 . For standalone applications, it can replace low to medium density dynamic random-access memory (DRAM), while eliminating the need for refresh. When embedded on chip, it offers a nonvolatile memory (NVM) option with speed and endurance improvement over embedded Flash. It also offers a density advantage over embedded static random-access memory (SRAM), although achieving competitive read and write speeds remains challenging for product-level write error rate (WER) and read disturbance rate (RDR) requirements. Overall, STT-MRAM's currentcontrolled operation results in three challenges: (i) Density: to provide the write current needed for switching, the access transistor width needs to be increased, hence resulting in a large bit area; (ii) Write speed: the density limitation is exacerbated for faster write times, since the write current increases with write speed; and (iii) Read speed: even in cases where small write current is achieved, read current needs to be ∼ 4-5 times smaller than the write current to keep RDR low, resulting in slow read performance. Building on the advances and adoption of STT-MRAM, a number of device proposals have emerged to overcome its limitations and expand the application areas of MRAM. Among these, magneto-electric random-access memory (MeRAM) utilizing voltage control of the magnetic anisotropy (VCMA) offers a significant reduction of the write power consumption and enhanced bit density compared to existing STT-MRAM [Shiota 2012 , Grezes 2016a , Kanai 2016 , Shiota 2016 , while retaining other desirable characteristics such as high endurance . In VCMA-based writing, in principle no current flow is required through the magnetic tunnel junction (MTJ), reduc-ing the bit-level write consumption to sub-fJ and eliminating currentdrive-based size restrictions on the access transistors. Instead, voltages applied across the MTJ, given the correct polarity, reduce the magnetic anisotropy, thus lowering the energy barrier E b between the two stables states of the free layer. Switching is achieved by precessional reorientation of the magnetic moment using unipolar voltage pulses with a sufficient amplitude to remove the energy barrier, and a timing set to approximately half the precession period. In this writing process, improper write pulse duration and thermal activation are responsible for write error rate (WER). Contrary to STT-MRAM for which the pulse polarity determines the switching direction, the write process in MeRAM is non-deterministic: only one write pulse amplitude and width is used to achieve both 0-to-1 and 1-to-0 switching. Hence a preread step is required to determine the initial state (0 or 1) before writing. On the other hand, the barrier being voltage-dependent, RDR induced by thermal activation across the barrier during read operation can be nearly eliminated by using a reverse voltage read , and hence faster read times can be achieved with low RDR. Fig. 1 shows the data program flow in a MeRAM circuit including the pre-read, compare/verify, write, and iteration bound check . When a write command is received, the sensing circuitry reads out the MTJ state (S) and stores it in an S latch. This value is then compared to the external input to decide whether there is a need to generate a write voltage pulse. After completion of the write pulse, the MTJ state is verified. If the data matches, a flag signal indicates the end of the programming sequence; otherwise, the write-verify operation iterates until the MTJ stores the desired data or when the iteration count reaches a bound n, indicating that the overall write error rate is below an acceptable value. In this work, we examine write and read performance in individual electric-field-controlled MTJ devices and present measured curves of the WER dependence on the write pulse width and amplitude, along with RDR dependence on read pulse parameters. We demonstrate that present-day bit-level results are suitable for WER < 10 −9 for 10 ns The circuit first compares the initial MTJ state and the to-be-written data. A write pulse is generated if the data is different. Once the data has been successfully written, or the number of write iterations reaches a limit, a flag signal for completion is generated. 
II. BIT-LEVEL RESULTS

A. Device Test Array
Voltage-controlled perpendicular MTJ arrays with bit size from 50 to 90 nm were fabricated on silicon wafers for write error rate and read disturbance measurements. A schematic of the individual devices is shown in Fig. 2(a) . The stacks consist of a bottom CoFeB free layer, an MgO barrier, and a top CoFeB reference layer with a synthetic antiferromagnetic pinning layer. Both CoFeB layers have perpendicular easy axes, and the MgO layer is thick, resulting in a high resistance-area product (RA = 225-650 · μm 2 ) to ensure negligible current-induced torque contributions during voltage application. Fig. 2(b) shows the tunnel magnetoresistance curves under −1.2, +0.01, and +1.2 V DC bias for a representative 90 nm device, showing TMR of 116%, and a sizeable effect of the voltage on the coercivity, which indicates that the magnetic anisotropy of the free layer can be controlled by voltage.
B. Write Voltage, Energy, Speed, and Error Rate
Write performance was measured at room temperature on a representative subset of the array for 50 nm bit size. An example of successful bidirectional switching between the P and AP states by application of write pulses is shown in Fig. 3(a) . The pulses are applied at 10 kHz repetition frequency, enabling testing of 10 4 write trials in 1 s. A constant external in-plane magnetic field of 70 mT is applied to ensure a single in-plane precessional axis [Grezes 2016b ]. Fig. 3(b) shows the measured write error rates for P-to-AP (AP-to-P) switching as a function of the write pulse width for 2.1 V write voltage. One can observe the oscillatory characteristic of the precessional write process. Bidirectional voltage-induced switching with WER below 6 × 10 −3 is achieved for application of a single unipolar voltage pulse of 2.1 V and 0.5 ns duration. We will refer to this as the "single-pulse" WER, to distinguish from the total WER achieved after multiple attempts, as indicated in Fig. 1 . Only the worst write error rate (P-to-AP) is shown in the following. As seen in Fig. 4 , the single-pulse WER decreases monotonically when increasing the write voltage. The slope is ∼ 0.5 decades/V below 1.5 V, and gradually approaches ∼ 9 decades/V for write voltages above 2 V. The slope change indicates a gradual transition between the thermally activated (V pulse < 1.5 V) and precessional (V pulse > 2 V) switching regimes, where switching errors induced by thermal activation decrease significantly. The lowest single-pulse WER achieved here is restricted by the maximum voltage 2.1 V that can be applied before the dielectric breakdown of the MgO barrier. Hence improvement of the WER is expected by decreasing the VCMA-induced critical voltage V c at which the precessional switching regime appears. An important consideration for the integration in CMOS processes is the robustness of WER to process-induced pulse width variations. Fig. 5 shows the measured single-pulse WER dependence on pulse width variation for a 0.5 ns write pulse of 2.1 V. A variation of WER by less than 7 × 10 −2 decades is observed within ±50 ps variation range. Based on simulations of 256 Kbit MeRAM in a 28 nm CMOS process, we estimate the effect of fastest (FF) and slowest (SS) process corners on the pulse width generated by the write drivers to be within the ±15 ps range. This indicates that devices of this work can be integrated with CMOS with negligible process variation effect on the WER. The bit-level energy per write (2.1 V and 0.5 ns duration) for the 50 nm device shown in Figs. 3-5 is ∼ 6.9 fJ/bit. The write energy variation over several devices is on the order of ±1 fJ, as shown in Fig. 6 for 11 devices of 50 nm size. Process and temperature variation of WER is the scope of another study.
C. Thermal Stability, Retention Time, and Read Disturbance
Read disturbance and thermal stability ( = E b /k B t, where k B is the Boltzmann constant and t is the temperature) were characterized in the same subset of 50 nm devices. RDR as a function of sensing voltage for read pulses of 1, 10, 100, and 1000 ms. Solid lines are calculated using (1). Note that a 70 mT in-plane field is applied to the device in order to reduce its retention time in a measurable time range (10 −6 to 10 6 s, shown on top graph), enabling high-speed repeat measurements.
thermal stability is strongly affected by voltage can be used to reduce RDR in VCMA-based memory. A source line sensing (SLS) scheme has been developed for MeRAM which uses reading with opposite polarity (compare to the write process) to take advantage of the VCMA effect and reduce read disturbance due to enhanced thermal stability during the read operation ]. Here we demonstrate this experimentally in a 50 nm device. To do so, 10 5 read pulses are applied, and the MTJ state is monitored by measuring the real-time voltage across the device [see Fig. 8(a) ]. Read disturbance rates (RDR) are determined as the probability of MTJ switching by thermal activation during read pulses. Fig. 8(b) shows the measured RDR as a function of sensing voltage for read pulses of 1, 10, 100, and 1000 ms duration, applied using an external signal generator. The RDR converges at 100% at large positive sensing voltage, where the retention time is shorter than the read pulse duration, and decreases in the direction of negative sensing voltage. The reduction of the read disturbance with decreasing sensing voltage V read is correlated to the increase of the thermal stability (retention time τ ) due to VCMA effect, as
The results show good agreement with simulations using a VCMA-MTJ macrospin compact model. Note that the choice of milliseconds range read pulse for this experiment has been made to obtain measurable read disturbance within 10 5 read repetitions. RDR results can be Critical path from the BL driver and the sense amplifier to a unit cell. To generate a ∼ 0.5 ns write pulse with 100 ps rising and falling time, the size of driver and the MUX is optimized for a given bit line capacitive loading.
extrapolated to shorter read pulse duration t read using (1) for MeRAM read performance evaluation.
III. 256 Kbit MeRAM WRITE ERROR RATE AND READ DISTURBANCE
This section presents write and read error rates for a 256 Kbit MeRAM full memory block (macro) in a 28 nm CMOS process. Fig. 9(a) shows the 16 bit input and output (IO) based 256 Kbit MeRAM architecture, consisting of the main cell array, multiplexer (MUX), bit line (BL) drivers, word line (WL) drivers, and digital controller. The array performance was simulated using a 28 nm CMOS process design kit (PDK) combined with measured device-level data presented in this work. Applying a well-defined write pulse to BL or WL is critical for designing a MeRAM chip. To reduce the WER, a control (enable) signal to the BL driver is precisely adjusted with 20 ps resolution based on an external digital code, which in turn modulates the write pulse width on the BL. Furthermore, the sizes of BL driver and the MUX are designed to realize a fast enough slew rate (rising and falling time) in a given BL capacitive loading as shown in Fig. 9(b) . The total write access time is the sum of the actual write pulse width, the delay of the peripheral circuit (e.g., digital controller), as well as any pre-read and write verification steps (see Fig. 1 ). For read operation, the SLS scheme is realized by connecting the selected SL to the sense amplifier via the MUX in a given address. The read time can be evaluated as the sum of the delay of the array to generate sufficient read margin and the delay of the peripheral circuit as described in Azizi-Mazreah [2008] . The evaluation using the measured device parameters of Section II (RA = 225 · μm 2 and TMR = 116%) yields 2 ns read time in a 256 × 1024 block. Fig. 10(a) shows the dependence of the total write error rate on write voltage and total write time for the 256 Kbit MeRAM, based on the measured single-pulse write error rates in Fig. 4 . Write error floor of 10 −9 is obtained with 10 ns total write time at 2.1 V. For total write time of 20 ns, WER below 10 −17 is achieved. The read disturbance rates are also evaluated, based on the measured thermal stability parameters in Fig. 7 . The results are shown in Fig. 10(b) , where (1) has been used with 2 ns read time. Read disturbance floor of 10 −16 is achieved for 0.2 V reverse sensing voltage, and further decreases by increasing the sensing voltage on the source line. This demonstrates that MeRAM based on the SLS scheme is able to ensure ultra-low read disturbance in memory macros. 
IV. DISCUSSION AND MATERIALS DEVELOPMENT FOR IMPROVED PERFORMANCE
Lower write voltage, together with lower WER and write power consumption can be achieved by reducing the precessional critical voltage V c . The critical voltage V c , defined as the voltage required to eliminate the energy barrier E b , is given by V c = (dk B t/ξ A) (V = 0) , where d is the thickness of the MgO layer, A is the bit area, and ξ is the magnetoelectric VCMA coefficient. As one can see, the reduction of V c , while maintaining constant thermal stability and bit size, calls for enhancement of the VCMA coefficient. We have developed different materials stacks to increase the VCMA coefficient, together with improved thermal tolerance (see Fig. 11 ) for compatibility with advanced logic processes. MTJ-C shows ξ of 76 fJ/V m, together with thermal tolerance at 430
• C. For comparison, the devices used in this study ( = 34.5, bit size = 50 nm) show ξ = 31 fJ/V m. These results indicate that a reduction of the write voltage from 2.1 V to 0.85 V is expected based on these improved MTJ stacks.
