### MODELING AND DESIGN FOR ENERGY-EFFICIENT SPINTRONIC LOGIC DEVICES AND CIRCUITS

A Dissertation Presented to The Academic Faculty

by

Rouhollah Mousavi Iraei

In Partial Fulfillment of the Requirements for the Degree Doctor of Philosophy in the School of Electrical and Computer Engineering

> Georgia Institute of Technology August 2018

### COPYRIGHT © 2018 BY ROUHOLLAH MOUSAVI IRAEI

# MODELING AND DESIGN FOR ENERGY-EFFICIENT SPINTRONIC LOGIC DEVICES AND CIRCUITS

Approved by:

Dr. Azad Naeemi, Advisor School of Electrical and Computer Engineering *Georgia Institute of Technology*  Dr. Phillip First School of Physics *Georgia Institute of Technology* 

Dr. James Kenney School of Electrical and Computer Engineering *Georgia Institute of Technology*  Dr. Zhigang Jiang School of Physics *Georgia Institute of Technology* 

Dr. Jeffrey Davis School of Electrical and Computer Engineering *Georgia Institute of Technology* 

Date Approved: May 3, 2018

For my family and friends

#### ACKNOWLEDGEMENTS

I would like to thank Dr. A. Naeemi, my advisor, for giving me the opportunity to join his group and pursue research in the field of nanoelectronics and spintronics. His guidance helped to develop me into the researcher and the engineer I am today. Moreover, I enjoyed exploring new science under the supervision of two knowledgeable and intelligent physicists Dr. Z. Jiang and Dr. P. First who taught me how to be a scientist and a researcher. Also, I would like to express my gratitude to Dr. J. Kenney and Dr. B. Ferri for their support and mentoring, which helped me to develop my teaching skills. Moreover, I would like to thank Dr. R. Poproski, Dr. K. Williams, and Dr. D. Lawrence at the Center for Teaching and Learning (CTL) at Georgia Tech who taught me how to be a good mentor and instructor.

My special thanks go to Dr. I. Young, Dr. D. Nikonov, and S. Manipatruni at Intel Components Research for their insightful guidance and discussions. In addition, I would like to express my gratitude to Dr. E. Afshari. Dr. H. Aghasi, and J. Heron from the University of Michigan for their collaboration and support. Moreover, I enjoyed mentoring five brilliant undergraduate students (W. Scott, J. Geoffrey, B. Heard, V. Tanguturi, Y. Zheng) at the school of Electrical and Computer Engineering of Georgia Tech, and I would like to thank them for their hard work and accomplishments. Furthermore, I would like to thank my previous and current group members at the Nanoelectronic Research Lab of Georgia Tech for their support.

Finally, I would like to thank my family and friends for their understanding, patience, and all the love they have given me during all these years.

### **TABLE OF CONTENTS**

| ACKNOWLEDGEMENTS                                                            | iv          |
|-----------------------------------------------------------------------------|-------------|
| LIST OF TABLES                                                              | viii        |
| LIST OF FIGURES                                                             | ix          |
| I. Spintronic Devices: Applications and Challenges                          | 1           |
| 1.1 Motivations                                                             | 1           |
| 1.1.1 Boolean Logic Applications                                            | 1           |
| 1.1.2 CMOS-Spintronic Transducers, Spintronic Interconnects, and Me         | •           |
| Applications                                                                | 4           |
| 1.1.3 Non-Boolean Logic Applications                                        | 5<br>5      |
| <b>1.2</b> The Operation of Spintronic Devices                              | 5           |
| 1.2.1 Spin Current: Generation and Transport                                | 5           |
| 1.2.2 Magnetization Switching                                               | 8           |
| <b>1.3</b> Thesis Overview                                                  | 11          |
| II. The All-Spin Logic Device, Its Performance Analysis and Adder and       | nd Coupled  |
| Oscillator Implementation                                                   | 17          |
| 2.1 All-Spin Logic Device: Applications and Challenges                      | 17          |
| 2.2 Modeling and Benchmarking of All-Spin Logic                             | 20          |
| 2.2.1 The Operation of All-Spin Logic                                       | 20          |
| 2.2.2 The Modelling of the Thermal Noise of Magnets                         | 22          |
| 2.2.3 Size Effects                                                          | 24          |
| 2.3 Applications                                                            | 29          |
| 2.3.1 ASL Adders                                                            | 29          |
| 2.3.2 ASL Oscillators and Coupled-Oscillators                               | 31          |
| 2.4 Conclusions                                                             | 37          |
| III. Image Recognition Circuit Using All-Spin Logic Devices                 | 39          |
| 3.1 Non-Boolean Applications of All-Spin Logic Device                       | 39          |
| 3.2 All-Spin Logic Majority Gate                                            | 41          |
| 3.3 Pattern Recognition Scheme                                              | 44          |
| 3.3.1 Mainly Similar Images                                                 | 44          |
| 3.3.2 Majority Training and Decision Making                                 | 46          |
| <b>3.4</b> Proposed Structure and Design Considerations                     | 46          |
| 3.4.1 Memory+Logic Comparator                                               | 47          |
| 3.4.2 Construction of the mean pixel                                        | 49          |
| 3.4.3 Single Pixel Comparator                                               | 50          |
| 3.4.4 Non-Boolean Row Decision-Maker                                        | 55          |
| 3.5 Simulation Results                                                      | 57          |
| 3.5.1 Non-Boolean Hamming Distance Identifier of $3 \times 3$ Pixel Pattern | 1 and Input |
| Image 57                                                                    |             |

| 3.5.2 Non-Boolean Similarity Comparison of a $9 \times 9$ Pixel Image and a Set | of Three |
|---------------------------------------------------------------------------------|----------|
| Pattern Images                                                                  | 59       |
| 3.6 Conclusions                                                                 | 64       |
| IV. Electrical-Spin Transduction and Long-Range Spintronic Interconnects        | s 66     |
| 4.1. Signal Transduction and Transfer for Spintronic and Magnetic Circu         | uits 66  |
| 4.2. CMOS- to Spintronic-Signal Transduction                                    | 67       |
| 4.3. Spintronic- to CMOS-Signal Transduction                                    | 70       |
| 4.4. ASL Transducer for Long-Spintronic Interconnects                           | 74       |
| 4.4.1 Performance Analysis of ASL Repeaters                                     | 74       |
| 4.4.2 Proposed Long-Range Spintronic Interconnect                               | 82       |
| 4.5. Conclusions                                                                | 88       |
| V. Magnetostriction-Assisted All-Spin Logic Device                              | 90       |
| 5.1. Magnetostriction-Assisted All-Spin Logic (MA-ASL) Device Proposa           | l 90     |
| 5.2. Device Operation                                                           | 92       |
| 5.3. 32-Bit MA-ASL ALU                                                          | 98       |
| 5.4. Clocked MA-ASL                                                             | 101      |
| 5.5 Material Analysis of MA-ASL Device                                          | 105      |
| 5.5. Conclusions                                                                | 108      |
| VI. Hybrid Piezoelectric-Magnetic Neurons: A Proposal for Energy-Efficient      | nt       |
| Machine Learning                                                                | 110      |
| 6.1 Spintronic Artificial Neural Networks                                       | 110      |
| 6.2 Spin Neuron Proposal                                                        | 112      |
| 6.2.1 Neuron Functionality                                                      | 112      |
| 6.2.2 The Transient Response of the Neuron                                      | 115      |
| 6.2.3 Integration into Neural Network                                           | 116      |
| 6.3 Benchmarking Against Competing Technologies                                 | 117      |
| 6.4 Conclusions                                                                 | 119      |
| VII. Magnetostriction-Assisted Spin-Orbit Device                                | 120      |
| 7.1.1 Motivations                                                               | 120      |
| 7.1.2 Rashba Effect                                                             | 122      |
| 7.1.3 Spin-Hall Effect                                                          | 124      |
| 7.2 Magnetostriction Assisted Spin-Orbit (MASO) Logic Device Proposa            |          |
| 7.2.1 Device Proposal                                                           | 127      |
| 7.2.2 The Modelling of the MASO Device                                          | 131      |
| 7.2.3 The Transient Response of the MASO Device                                 | 133      |
| 7.3 Optimizing the Performance of the Device                                    | 135      |
| 7.4 Performance Analysis of the Device                                          | 140      |
| 7.4.1 Using the Device as an Interconnect                                       | 140      |
| 7.4.2 Benchmarking the Performance of the MASO Device Against CMOS a            |          |
| Spintronic Alternatives                                                         | 141      |
| 7.5 Conclusions                                                                 | 143      |
| VIII. Conclusion and Outlook                                                    | 145      |

| 8.1 ( | Conclusion                                          | 145 |
|-------|-----------------------------------------------------|-----|
| 8.2 I | Future Works                                        | 149 |
| 8.2.1 | Non-Boolean Logic Application of Spintronic Devices | 149 |
| 8.2.2 | Signal Transduction and Long-Range Interconnects    | 151 |
| 8.2.3 | The MASO Device                                     | 152 |
| REFER | ENCES                                               | 154 |

### LIST OF TABLES

| Table 1: Performance comparison of the ASL ring oscillator with CMOS oscillators 33                                                              |
|--------------------------------------------------------------------------------------------------------------------------------------------------|
| Table 2: Performance Comparison with Existing CMOS Systems    64                                                                                 |
| Table 3: Simulation Parameters for the long-range spintronic interconnect.       81                                                              |
| Table 4: Simulation Parameter for the MA-ASL Device.    95                                                                                       |
| Table 5: Performance Comparison of MA-ASL Neuron against its CMOS and Spintronic         Counterparts.         118                               |
| Table 6: The spin to charge conversion efficiency of various materials are compared by comparing $\lambda_{IREE}$ and $\Theta_{SHE}\lambda_{sf}$ |
| Table 7: Comparison of the transferred strain to the magnet and spin-hall angle for Pt, Ta, and W.         137                                   |
| Table 8: Resistivity and $\Theta_{SHE}$ for various heavy metallic elements, topological insulators(TIs), magnets, and nonmagnetic metals        |

## LIST OF FIGURES

| Figure 1: To improve the performance of FET transistors, various technologies are investigated over the past five decades                                                                                                                                                                                                                               |
|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| Figure 2: Electronic spins, pseudo-spins in graphene, and excitons are some of the novel state variables studied for beyond-CMOS logic devices                                                                                                                                                                                                          |
| Figure 3: Several spintronic devices are proposed, such as the all spin logic (ASL) device, the composite-input magnetoelectric-based logic technology (COMET), the domain wall magnetic logic (mLogic), and the magnetoelectric spin-orbit device                                                                                                      |
| Figure 4: Illustration of spin-dependent Hall effects used in spin current generation and detection. In the SHE, an unpolarized charge current generates a transverse spin current, while in the ISHE, a spin current generates a transverse charge current. In the anomalous Hall effect (AHE), a charge current generates a transverse charge current |
| Figure 5: Binary information is stored as the magnetization orientation along the two stable directions along the easy axis of the magnet                                                                                                                                                                                                               |
| Figure 6: Spin-transfer torque explained in a spin valve                                                                                                                                                                                                                                                                                                |
| Figure 7: (a) 90° magnetostrictive switching is experimentally demonstrated. (b)<br>Compared to 180° magnetization reversal via applying STT, 90° switching of a magnet<br>via STT from the saddle-point of energy profile is demonstrated to be two orders of<br>magnitude more delay and energy efficient                                             |
| Figure 8: To model the physics of the all-spin logic device, magnetization dynamics, spin mixing conductance, and spin-drift diffusion in the channel are taken into account [81].18                                                                                                                                                                    |
| Figure 9: ASL circuit model 19                                                                                                                                                                                                                                                                                                                          |
| Figure 10: ASL consists of two magnets connected by a non-magnetic channel. Injected spin current from Input magnet to the channel diffuses along the channel and applies a torque to the output magnet, which if strong enough, switches the output magnet                                                                                             |
| Figure 11: Transient Response of an all-spin logic device. In this simulation, the input magnet is assumed to be oriented in the $+X$ direction. By applying a negative voltage, the device acts as a buffer, while by applying a positive voltage, the device acts as an inverter. 22                                                                  |
| Figure 12: LLG equation describes the magnetization dynamics. The corresponding vectors to the terms of LLG equation are represented in this figure                                                                                                                                                                                                     |

| Figure 13: To validate the SPICE model of the thermal noise, the derived average value of the thermal noise over time of the SPICE model, is compared to that of the analytical solution. SPICE results match with analytical results                                                                                                                                                                                                 |
|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| Figure 14: Delay and energy dissipation vs channel length of an ASL gate                                                                                                                                                                                                                                                                                                                                                              |
| Figure 15: Spin Relaxation versus channel Width. Size effects cause the spin relaxation length to decrease with decreasing channel width. For the no size effect case, spin relaxation length is independent of channel width                                                                                                                                                                                                         |
| Figure 16 : (a) Delay versus specularity parameter, P, for an 80 nm long channel. Grain boundary scattering parameter, R, is assumed to be 0.2 (b) Delay versus grain boundary reflection probability for an 80nm long channel. The specularity parameter, P, is assumed to be 0                                                                                                                                                      |
| Figure 17: Delay versus channel width for 80 nm and 400 nm long channel                                                                                                                                                                                                                                                                                                                                                               |
| Figure 18: Implementation of a full adder using two majority-not gates                                                                                                                                                                                                                                                                                                                                                                |
| Figure 19: (a) Schematics and (b) layout of the ASL full adder. (c) The transient response of an ASL full-adder. The blue color represents the magnetization orientation in the x-direction and the green and red colors are representing magnetization in the y and z directions                                                                                                                                                     |
| Figure 20: Oscillation frequency versus supply voltage. The oscillation frequency increases linearly with increasing the supply voltage                                                                                                                                                                                                                                                                                               |
| Figure 21: (a) Two metallic channels are connecting two ASL ring oscillators to form the ASL coupled oscillator. The oscillation of the two rings will be coupled to each other if the oscillation frequencies of two oscillation loops are close to each other                                                                                                                                                                       |
| Figure 22: The voltage applied to the magnets of the ring 1 is 35 mV. In (a), the voltage applied to the magnets of the ring 2 is 10 mV higher than the voltage applied to the ring 1, while in (b), the voltage applied to the ring 2 is 20 mV higher than the voltage applied to the ring 1                                                                                                                                         |
| Figure 23: (a) The supply voltage applied to the ASL connector 2 is changed from 12.5 mV to 20 mV. As a result, the phase shift of the two rings is changed from $\Delta\phi_1$ to $\Delta\phi_2$ in which $\Delta\phi_1 > \Delta\phi_2$ . (b) Locking range of the ASL coupled oscillator. The shaded region shows the region where the supply voltages of rings are different, but the two rings will show a locked oscillation. 36 |
| Figure 24: An ASL Majority gate with three inputs. The three input magnets, M1, M2 and M3 are connected to the output magnet, MO, using three metallic channels                                                                                                                                                                                                                                                                       |
| Figure 25: (a) Switching transient response for different scenarios of input magnetization in a majority gate with 5 inputs. (b) Switching transition comparison of majority gates with three and five inputs. In this comparison, the input magnetization of magnets of                                                                                                                                                              |

| three input gates are the same. For the gate with five inputs, four inputs have similar magnetization and the net spin current is equal to the other gate. The applied voltage on magnets in these simulations is $-5$ mV.                                                                                                                                                                 |
|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| Figure 26: Switching delay variation versus the supply voltage. Each voltage is simulated three times to verify the results                                                                                                                                                                                                                                                                |
| Figure 27: The two images are mainly similar (along the rows), however, the Hamming distance between the third columns is 3 which does not imply a similarity along the columns                                                                                                                                                                                                            |
| Figure 28: 1-bit full adder used as XNOR. In the 2D implementation of this work, X and Y wires are in-plane metal wires and connections along the Z axis are vias                                                                                                                                                                                                                          |
| Figure 29: Simulated output waveforms of XNOR gate 49                                                                                                                                                                                                                                                                                                                                      |
| Figure 30: (a) Standard single pixel detector schematic. (b) The truth table with the detailed operation of the circuit                                                                                                                                                                                                                                                                    |
| Figure 31: (a) Comparator-first pixel detector schematic. (b) The truth table with the detailed operation of the circuit                                                                                                                                                                                                                                                                   |
| Figure 32: Structure of the unit smart detector cell                                                                                                                                                                                                                                                                                                                                       |
| Figure 33: Using a single smart detector cell, we can compare these $3 \times 3$ pixel images.<br>The waveforms of the comparators and majority gates (bottom)                                                                                                                                                                                                                             |
| Figure 34: Training set for the $9 \times 9$ pixel images                                                                                                                                                                                                                                                                                                                                  |
| Figure 35: The input image (left) and the representation of the mean image (right). The mean image is not a direct output of the circuit                                                                                                                                                                                                                                                   |
| Figure 36: Due to fan-in considerations, the circuit is consisted of 9 smart detector cells. The corresponding breakdowns of the mean image and the input image are shown here.62                                                                                                                                                                                                          |
| Figure 37: The switching delay of output magnetization in last stage represents the similarity of input data and pattern data                                                                                                                                                                                                                                                              |
| Figure 38: For the cases of mismatch in clusters, magnets will not switch and the initial magnetization does not change. The y-axis shows a range from -1.002 to -0.998 in contrast with Figure 37 in which the range is from -1 to 1                                                                                                                                                      |
| Figure 39: Electrical signal to spin signal transducer: (a) schematics of the transducer. (b) Input signal ( $V_{DATA}$ ), which switches the polarity of the voltage applied to the fixed magnet, which switches the output magnet accordingly. The orientation of the output magnet follows the input signal with a delay of 1.6 ns and 2.0 ns for high-to-low and low-to-high switching |

Figure 40: Spin signal to electrical signal transducer: (a) schematics of the transducer. (b) the changes in the orientation of the free magnet are translated into changes in electrical voltage on the node V<sub>N</sub>. The inverter provides a full swing between the ground and supply voltages at accordingly. Simulations are done for two TMR values of 131% (low TMR) and 355 % (high TMR)......71 Figure 41: (a) Layout of an ASL gate that transfers spin signals through a metallic interconnect, (b) layout of ASL gates in a cascaded structure, which is a solution to transfer spin signals in long interconnects. The meaning of colors is defined in [33]. .... 73 Figure 42: ASL gate modeled by a star network of resistors for calculating the electrical current passing through the input magnet......74 Figure 43: Delay of ASL repeaters is compared to that of electrical communication of spin information through transduction. For long lengths, the delay of ASL gates increases exponentially with length as predicted by the analytical equation. Meanwhile, for short lengths, the delay increases linearly because the linear terms of the Taylor expansion of delay are dominant. Similarly, the delay of ASL repeaters increases linearly with LInt for short lengths and exponentially for long lengths. Although with multiple ASLs, the linear region is extended, the delay of the electrical interconnect is still shorter than that Figure 44: Proposed long spintronic interconnect. (a) First, the spin signals are converted to electrical signals using a spin to CMOS signal transducer (SCT); then, the electrical signal is transmitted through a long electrical interconnect and converted back to spin signals using a CMOS to the spin signal transducer (CST). (b) The magnetization orientation of the output magnet is the inverse of the magnetization orientation of the input magnet with a delay of 1.6 ns. (c) The layout consists of two transducers with Figure 45: Clocking schemes used to minimize the energy dissipation of the ASL repeaters. Clocks are on  $\alpha T$  before and after the mean switching time to cancel the Figure 46: Energy dissipation per unit length of the ASL repeaters is compared to that of the proposed spintronic interconnect. The dissipated energy of the proposed interconnect is lower than that of repeaters even for  $L_{Int}$  as small as 1.25 µm. Although the energy dissipation of repeaters increases as the number of cascaded ASLs for short interconnects increases, but the repeaters with more cascaded ASLs dissipate lower power for long Figure 47: Area-delay-power product (ADPP) is a measure that takes delay, power dissipation, and area into account. Although the proposed interconnect has larger area overhead, its advantage in terms of energy enables it to outperform ASL repeaters for 

| Figure 56: (a) Definition of pulse skew. (b) The required pulse width to reach to an error rate of $10^{-3}$ for various pulse skews in an MA-ASL inverter. Positive compared to negative pulse skew, has more impact on increasing pulse width. Error rate vs pulse width for various (c) temperatures and (d) rise times. (e) The pulse width and energy to reach to an error rate of $10^{-3}$ for various rise times. The pulse skew is assumed to be -5 ps in Figures (c), (d), and (e)                                                                                              |
|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| Figure 57: (a) voltage and (b) energy required to transfer a net strain of 1200 ppm to the magnetic layer. Simulations are done for various magnetic materials, in which Terfenol-D demonstrated lowest required voltage and energy dissipation for transferring strain. Moreover, the figure compares the transferred strain for various piezoelectric thickness values. 106                                                                                                                                                                                                             |
| Figure 58: Transferred strain to the magnet vs the thickness of the Pt layer between the piezoelectric and magnetic layers                                                                                                                                                                                                                                                                                                                                                                                                                                                                |
| Figure 59: Analysis on the transferred strain vs geometrical dimensions of the piezoelectric layer. (a) definition of piezoelectric dimensions. (b) Transferred strain at a constant $V_{PIEZO}$ voltage of 100 mV. The required (c) voltage and (d) energy to transfer a net strain of 1200 ppm                                                                                                                                                                                                                                                                                          |
| Figure 60: One of the most popular applications of machine learning algorithms is image classification and facial recognition                                                                                                                                                                                                                                                                                                                                                                                                                                                             |
| Figure 61: (a) Transferred strain to the magnet is 800 ppm, lower than that to an MA-ASL magnet; however, the transferred strain is enough to rotate the magnetization. (b) shows the path of applied spin current through the output magnet to ground                                                                                                                                                                                                                                                                                                                                    |
| Figure 62: (a) Proposed MA-ASL neuron, shown with six inputs. The net spin current in the interconnect applies STT to the free layer of the neuron MTJ in timing with the piezoelectric clock, switching the orientation of the neuron output. (b) Biological neural network [164]                                                                                                                                                                                                                                                                                                        |
| Figure 63: Transient response of the MA-ASL device. In the first phase of operation, $V_{PIEZO}$ turns on for 1 ns as shown in the first graph. The second graph illustrates the second phase of operation, in which STT is applied to the output magnet through the injected net spin current (in blue) from three input magnets (shown with dotted lines), applied after $V_{PIEZO}$ turns off. The third graph shows the magnetization of the output magnet (x, y, and z axes shown in blue, red, and green, respectively) and how it is affected by $V_{PIEZO}$ and the spin currents |
| Figure 64: Memristive cross-bar network. The cross-bar array sums together the input currents, abbreviating the number of magnets needed for the output neurons                                                                                                                                                                                                                                                                                                                                                                                                                           |
| Figure 65: Spin-orbit interactions are very strong in heavy metallic elements 121                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         |
| Figure 66: A Rashba interface comprised of a hybrid NiFe/Ag/Bi structure 122                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              |

| Figure 72: First, the orientation of Magnet 2, rotates by 90 <sup>0</sup> from $m_i$ to $m_m$ using |
|-----------------------------------------------------------------------------------------------------|
| magnetostrictive switching; then, it reorients by 90 <sup>0</sup> from $m_m$ to $m_f$ using SHE 134 |

| Figure 73: Transferred strain to the magnet is simulated using COMSOL, and the resu | ılts |
|-------------------------------------------------------------------------------------|------|
| are shown for the cross-section of the magnet                                       | 136  |
|                                                                                     |      |

| Figure 76: Write error statistics of the MASO device vs. the ASL device, the MA-AS | SL  |
|------------------------------------------------------------------------------------|-----|
| device, and the STT-MRAM.                                                          | 142 |

| Figure 78: Energy versus delay per memory association operation using CNN for a       |     |
|---------------------------------------------------------------------------------------|-----|
| variety of charge- and spin-based devices, where the red star indicates the preferred |     |
| corner                                                                                | 150 |
|                                                                                       |     |
| Figure 79: Bit-cell for STT-MRAM and SOT-MRAM.                                        | 151 |

## I. SPINTRONIC DEVICES: APPLICATIONS AND CHALLENGES

#### 1.1 Motivations

#### 1.1.1 Boolean Logic Applications

Over the past half-century, the computational throughput and the memory storage of integrated electronic circuits have improved exponentially mainly through the downscaling of the geometrical dimensions of field-effect transistors (FETs) [1]. Sustaining this trend is becoming more and more challenging as CMOS devices approach their scaling limits [2], [3]. To address this challenge, researchers have investigated various materials including high-k dielectrics such as hafnium silicate ( $HfO_4Si$ ), hafnium dioxide ( $HfO_2$ ), zirconium silicate ( $ZrO_4Si$ ), and zirconium dioxide ( $ZrO_2$ ) [4]–[17] and developed various FET technologies such as Fin field-effect transistor (FinFET) [18]-[20], (Figure 1). Moreover, researchers are investigating beyond-CMOS devices that use state variables other than the electric charge of electrons such as the spin of electrons, pseudo-spins, and excitons [1], [3], [21]–[23], (Figure 2). By employing electronic spin as the binary logic, spintronic devices gained special attention thanks to their potential advantages in terms of non-volatility and low operating voltage [24]-[26]. Several spin-based logic devices are proposed including the all spin logic (ASL) device [27], the composite-input magnetoelectric-based logic technology (COMET) [28], the domain wall magnetic logic (mLogic) [29], the magnetoelectric spin-orbit device [30], and the magnetoelectric magnetic tunnel junction (MEMTJ) [31]. Moreover, the energy efficiency, computational

speed, and chip area of these logic devices are studied [32]–[39]. Spintronic logic devices excel in implementation of logic functionalities using fewer devices because of their efficient implementation of majority gates. However, these devices compared to their CMOS counterparts, are slower and less energy efficient, due to the inefficiencies in magnetization switching and spin current generation and detection. Thus, more research must be done in developing novel spintronics devices to enhance the energy efficiency and the operational speed to realize efficient, novel devices that take advantage of non-volatility and offer new and enhanced applications.



Figure 1: To improve the performance of FET transistors, various technologies are investigated over the past five decades.



Figure 2: Electronic spins, pseudo-spins in graphene, and excitons are some of the novel state variables studied for beyond-CMOS logic devices [40], [41].



Figure 3: Several spintronic devices are proposed, such as the all spin logic (ASL) device [27], the composite-input magnetoelectric-based logic technology (COMET)

[28], the domain wall magnetic logic (mLogic) [29], and the magnetoelectric spin-orbit device [30].

## 1.1.2 CMOS-Spintronic Transducers, Spintronic Interconnects, and Memory Applications

Like CMOS logic devices, CMOS-based dynamic random access memories (DRAMs) face similar limitations in maintaining a significant growth rate [42]. These devices experience an increase in power consumption by scaling down their size because of the increase in charge leakage. To lower energy consumption, non-volatile memories that would not consume static power are studied. By offering non-volatile data storage, magnetic random-access memories (MRAMs) are widely studied to replace the purely semiconductor-based memory technologies. Moreover, spin transfer torque MRAMs [43]-[46] (STT-MRAMs) are used in embedded memories. Thus, enhancing the performance of hybrid systems of CMOS devices and magnetic memories requires energy-efficient and fast CMOS-spintronic interface circuits that can write binary CMOS data into magnets and read the binary data, stored as the magnetization orientation. Moreover, energy-efficient interface circuits are required to improve the performance of hybrid CMOS-spintronic logic circuits as well. Furthermore, interface circuits might improve the performance of large spintronic logic circuits by providing a more energy-efficient long-range interconnect scheme that works based on the conversion of spin signals into electrical signals, transferring signals in electrical interconnects, and converting signals back into spin signals. The interconnects will benefit from the conservation of electrical charge, unlike spintronic interconnect [47], [48], which will suffer from losing data due to spin relaxation

mechanism. Researchers have examined CMOS-spintronic interface circuits and proposed some read and write circuits for STT-MRAMs as wells as sense amplifiers to read MRAMs, but these circuits are mostly suitable when a large, complicated circuit reads many magnetic tunnel junctions (MTJs). Thus, more studies must be done in the cases of signal transductions that using sense amplifiers causes prohibitive energy and area overhead.

#### 1.1.3 Non-Boolean Logic Applications

In addition to Boolean logic, memory, and interconnection applications [49]–[52], spintronic device are studied for applications such as non-Boolean computing, machine learning circuits, image recognition [53], [54], and cellular neural network (CNN) [55] and shown lower energy consumption and simpler implementations compared to their CMOS counterparts. Furthermore, because of non-volatile memory, spintronic devices do not require additional memory circuits to store patterns in pattern recognition systems or synaptic weights for communicating neurons. Various spintronic neuron implementations are proposed that use tunnel magnetoresistance (TMR) in MTJs coupled with other spintronic phenomenon such as domain-wall (DW) motion, STT, and spin-Hall effect (SHE). However, advances in magnetic materials and spintronic devices in non-Boolean logic computations, which must be studied by researchers. In the next section, physical phenomena and formalisms governing the operation of spintronic devices is investigated.

#### **1.2** The Operation of Spintronic Devices

#### *1.2.1* Spin Current: Generation and Transport

As discussed in the previous sections, spintronic devices rely on the spin of electrons to represent binary information. In these devices, the current due to spin-polarized electrons is used to transfer information among magnets that store the binary information. Spin generation in magnetic metals is due to the different mobilities and the density of states at the Fermi level for spin-up and spin-down electrons. The degree of spin polarization (DSP) varies among magnetic materials. The polarization, P, can be defined as

$$P = \frac{N_{\uparrow} - N_{\downarrow}}{N_{\uparrow} + N_{\downarrow}},\tag{1}$$

in which  $N_{\uparrow(\downarrow)}$  is the density of states (DOS) of electrons at the Fermi level [56],

$$N_i = \frac{1}{(2\pi)^3} \Sigma_{\alpha} \int \delta(E_{k\alpha i}) d^3 k = \frac{1}{(2\pi)^3} \Sigma_{\alpha} \int \frac{dS_F}{\mathbf{v}_{k\alpha i}},\tag{2}$$

in which  $E_{k\alpha i}$  is the energy of an electron with spin  $i(\uparrow \text{ or } \downarrow)$  and wave vector k in the band  $\alpha$  [56]. The current of spin-polarized electrons can be injected into non-magnetic metals as well. Spin current in paramagnetic-magnetic interfaces can be measured using schemes employing Johnson-Silsbee [57] experiment. In nonmagnetic metals, spin accumulation decays exponentially with the characteristic length, called the spin relaxation length  $\lambda_{sf}$ . The generation of spin-polarized current is not limited to magnetic materials. Because of spin-orbit interaction, scattering of unpolarized electrons by an unpolarized target yields in a spatial separation of spin-up and spin-down electrons [58]. Thus, a net spin current is generated due to spin Hall effect (SHE). Spin-orbit interactions are strong in heavy metallic elements, topological insulators, and 2D materials such as graphene.



Figure 4: Illustration of spin-dependent Hall effects used in spin current generation and detection. In the SHE, an unpolarized charge current generates a transverse spin current, while in the ISHE, a spin current generates a transverse charge current. In the anomalous Hall effect (AHE), a charge current generates a transverse charge current [58].

Various spintronic devices use spin current to transfer signals in Al, Cu interconnects. Like charge current transport in CMOS devices, spin current transport is impacted by scattering and dimensional scaling in nanowires. In large CMOS circuits such as microprocessors, even more than half of the dynamic power dissipation might happen in interconnects [59]. Thus, studying spin current transport in metallic interconnects is expected to be crucially important. To provide effective design tools and insights for

electronics engineers, circuit and SPICE models must be developed that precisely account for the spin current transport in metallic nanowires.

#### 1.2.2 Magnetization Switching

#### 1.2.2.1 Spin-Transfer Torque and Spin-Orbit Torque Switching

In spintronic devices, magnets are widely used to store binary information. Magnets possess an easy axis, in which the energy of a magnet is minimized when the magnet orients along this axis, (Figure 5). Thus, free magnetic layers reorient themselves along the either of the two opposing directions along the easy axis to minimize their energy. These two directions are the stable states of the magnet and can represent the binary logics 0 and 1. To switch magnets from one stable direction to another, spin torque must be applied. STT-MRAMs use STT for magnetization reversal. The STT generated in a spin valve is explained in Figure 6. Electrons pass through Ferromagnet 1, in which the spin of electrons precesses in the exchange field of the magnet and aligns with the orientation of the magnet. The spin-polarized current will be injected to a non-magnetic spacer layer. Considering the relatively narrow width of the spacer layer, the spin of electrons does not change in this region. By passing electrons through Ferromagnet 2, the spin of electrons tend to align with the orientation of the magnet. Thus, a spin transfer torque is applied to the magnet due to the conservation of angular momentum. Therefore, the spin torque can be explained as the net flux of non-equilibrium spin current passing through the magnet. If the STT is strong enough, it can fully switch a magnet by  $180^{\circ}$ .



Figure 5: Binary information is stored as the magnetization orientation along the two stable directions along the easy axis of the magnet.



Figure 6: Spin-transfer torque explained in a spin valve [60].

As explained in Subsection 1.2.1, spin current is generated in non-magnetic materials due to spin-orbit interactions. Like the spin current generated due to the interaction of electrons with the exchange field in a magnetic material, the spin current generated due to the spin orbit interaction applies torque to a magnetic layer, described as

the spin-orbit torque (SOT). Recently, researchers are widely studying the applications of SOT in the design of SOT-MRAMs [61]–[63] because a large write current does not have to pass through a tunnel junction; hence, the tunneling layer can last longer. Moreover, read and write lines can be separated and the spin transfer can be applied more efficiently. Thus, the area of STT and SOT are active fields of research and promising for novel spintronic logic and memory applications.

#### 1.2.2.2 Strain-Mediated Magnetization Switching

Magnetostrictive switching is an energy efficient and experimentally demonstrated magnetization reorientation mechanism [64]. In this mechanism, by changing the magnetoelastic energy, the easy axis of the magnet rotates; thus, the magnetization orientation rotates accordingly. Figure 7a shows an experimental setup for a magnetostrictive switching. In this experiment, a hybrid structure of magnetic and piezoelectric layers is fabricated. By applying a voltage along the thickness of the piezoelectric layer, an anisotropic strain is generated along the y axis, which transfers to the Ni layer on top. The strain changes the energy profile of the magnetic layer, as Ni is a material with strong magnetostrictive properties. By increasing the strain, the energy profile is changed such that the **v** axis becomes the easy axis of the magnet. Thus, the magnet reorients by  $90^{\circ}$  to align itself with the easy axis. Furthermore, researchers have combined this mechanism with STT to fully switch a magnet. In such scenario, first, the magnetization reorients by  $90^{\circ}$  using magnetostrictive switching. Second, the applied voltage to the piezoelectric is turned off; thus, the easy axis will rotate back to the  $\mathbf{x}$ direction. Therefore, the magnet will be placed at the saddle-point of its energy profile and will be equally probable to rotate to either of the two stable directions, the +x and the -x directions. Hence, by applying an STT, the magnet can be deterministically switch to one of the stable directions. Switching a magnet from the saddle-point of energy profile compared to  $180^{\circ}$  switching of a magnet using STT, is not only shown to be more robust to thermal noise, but also two orders of magnitude more energy efficient [65]. Considering the efficiency and the robustness to thermal noise, modeling the magnetostrictive switching is a promising area of research.



Figure 7: (a) 90° magnetostrictive switching is experimentally demonstrated [64]. (b) Compared to 180° magnetization reversal via applying STT, 90° switching of a magnet via STT from the saddle-point of energy profile is demonstrated to be two orders of magnitude more delay and energy efficient [65].

#### **1.3** Thesis Overview

The objective of this research is modeling the physical formalisms of common materials and phenomenon in spintronic and magnetic devices and circuits and designing novel spintronic devices for various applications such as interconnection, Boolean logic computation, non-Boolean computation, image/pattern recognition, neural networks, and interface circuits for reading and writing magnets. Various novel spintronic devices and circuits are proposed in the past decade that employ STT and SOT as well as strain for magnetization switching, Al, Cu metallic interconnects for transferring spin current, MTJs for storing data, SHE, ISHE, and IREE for converting charge currents to spin currents and vice versa. Therefore, there is a growing demand to investigate these physical phenomenon and design novel spintronic devices and circuits with enhanced functionalities and performance. Thus, the following tasks are undertaken in this research:

- 1. Designing circuit models for magnetization dynamics, thermal noise, and metallic interconnects widely used in the design of spintronic devices
- 2. Analyzing the operation and performance of the all-spin logic device as a building block for Boolean logic and coupled-oscillator applications.
- Designing spintronic pattern/image recognition circuits with non-volatile memory for storing patterns.
- Studying and designing read and write CMOS-magnetic interface circuits using MTJs and the ASL device as well as their applications in long-range interconnection schemes
- 5. Investigating strain-mediated magnetization switching and designing novel spintronic device and circuits that work based on both magnetostrictive and STT switching for logic and neural network applications
- 6. Investigating spin-orbit interactions and designing novel spintronic devices that work based on both magnetostrictive and SOT switching for logic applications

A brief description of the above tasks is given below.

*Task I and Task II*: In Chapter II, based on physical formalisms governing common magnetic and non-magnetic materials used in spintronic devices, models are developed that capture the magnetization dynamics and the impact of thermal noise on the switching of magnets as well as spin/current transport in Al, Cu nanowires. Using these models, the operation of the ASL device is explained and simulated. Moreover, the impact of size effects on the operation the ASL device is studied. Furthermore, the operation of the ASL full-adders, as a building block for more complicated Boolean logic gates, is studied. Finally, an ASL coupled oscillator is proposed, and its tuning range is studied.

*Task III*: In Chapter III, a novel circuit for non-Boolean recognition of binary images is proposed. Employing all-spin logic (ASL) devices, logic comparators and non-Boolean decision blocks for compact and efficient computation are proposed. Furthermore, the extension of the work for larger training sets or larger images thorough the manipulation of fan in number in different stages of the circuit is studied. Finally, the proposed circuit is compared with existing CMOS pattern recognition circuits in terms of footprint, power consumption, decision time, and operational voltage.

*Task IV*: In Chapter IV, first, an electrical- to spin-signal transducer is proposed. The proposed circuit can be used to write binary information into magnetic memories using STT. Then, a simple yet efficient circuit for converting the orientation of a magnet to CMOS binary voltage is proposed, which provides an energy-efficient and fast interface circuit to read magnetic memories. Using the proposed transducers, a long-range spintronic interconnect is proposed that works based on converting spin signals into electrical signals, transferring signals in electrical interconnects, and finally converting the signals back into spin signals. Moreover, an analytical study of the delay, area-delay-power product (ADPP), and the per-unit length value of energy per bit of spintronic interconnects with ASL repeaters is presented. Finally, the performance of the two methods, using spintronic repeaters and the electrical transmission of spin signals using the proposed interconnect, are benchmarked in terms of delay, energy dissipation, and area-delay-power product.

Task V: Magnetostrictive switching combined with STT has resulted in faster operational speed, higher energy-efficiency, and more robustness to thermal noise in magnetization reversal. In Chapter V, the physical formalism of magnetostrictive switching is investigated and modelled. Moreover, by combining this switching mechanism with STT, a novel spintronic device is proposed, named the magnetostriction-assisted all-spin logic (MA-ASL) device. The device is consisted of a heterostructure of magnetostrictive and piezoelectric layers. The operation of the device is modeled and simulated using developed SPICE models. Moreover, the transferred strain to the structure is simulated to ensure the correct functionality of the device. Furthermore, the impact of the pulse skew and the rise time on the operation of the device is studied, and design recommendations to counter these impacts are provided. Using the developed models and simulations, the energy, the error rate, and the delay performance of the device is studied. Moreover, to enhance the performance of the device, material analysis is done to investigate the best candidate materials to implement the MA-ASL device. The magnetic materials must exhibit strong magnetostrictive properties and low resistivity.

In this task, the applications of the MA-ASL device is further investigated for implementation of Boolean logic and neural network circuits. In Chapter V, the performance of a 32-bit MA-ASL arithmetic-logic unit (ALU), as a large Boolean logic circuit is studied. The proposed ALU is compared to both spintronic and CMOS ALUs. Moreover, in Chapter VI, the applications of the MA-ASL device for the neural network circuits is investigated. An MA-ASL neuron is proposed, which consists of an MA-ASL majority gate and an MTJ. The performance of the proposed neuron is studied and benchmarked against its CMOS and spintronic counterparts in terms of energy dissipation, operational speed, and thermal noise.

Task VI: In Chapter VII, spin-orbit interactions, SHE, and ISHE are investigated. Moreover, Rashba effect and inverse Rashba-Edelstein effect (IREE) in 2D materials and topological insulators are studied. Furthermore, these mechanisms are modeled using circuit models. In addition, by combining SHE, ISHE, IREE, and magnetostrictive switching a novel device is proposed, named the magnetostriction-assisted spin orbit (MASO) device. Unlike the ASL and the MA-ASL devices, the MASO device uses the charge current instead of the spin current to transfer data from the input magnet to the output magnet. The operation of the device is modeled and simulated using SPICE models. To optimize the performance of the device, the energy dissipation, the switching speed, and the robustness to thermal noise are studied. Moreover, materials analysis is performed to find the promising magnetic and heavy metallic materials as well as topological insulators for the implementation of the device. Using the findings of this analysis, the performance of the device in the implementation of the ALUs is studied and benchmarked against its spintronic and CMOS counterparts. Findings of this benchmarking helps to understand the potential applications of the MASO device in the implementation of large Boolean logic circuits.

The circuit models developed for the magnetization dynamics, tunnel junctions, and spin-orbit interactions as well as the novel proposed spintronic devices, neurons, image/pattern recognition circuits, spintronic interconnection schemes, and CMOSspintronic interface circuits will serve to guide future research in the field of novel beyond-CMOS devices, memories, and circuits by examining the potentials and the challenges of spintronic and magnetic devices and circuits.

# II. THE ALL-SPIN LOGIC DEVICE, ITS PERFORMANCE ANALYSIS AND ADDER AND COUPLED OSCILLATOR IMPLEMENTATION

#### 2.1 All-Spin Logic Device: Applications and Challenges

The ASL device was proposed as a building block for various spintronic devices and circuits [27]. Thus, the energy and the delay of the ASL device and ASL-based circuits are widely studied [26], [27], [49]–[51], [66]–[76]. The ASL device consists of two magnets via a channel in a non-local spin valve structure. Improving the performance of the device relies on efficient spin current transport throughout the device, spin current injection at the magnetic-non-magnetic interface, and magnetization switching, shown in Figure 8. Thus, to optimize the performance of the ASL device, the delay and the energy of the device is studied for various geometrical dimensions, supply voltage values, and channel materials. To account for the spin current transport in the metallic channel, size effects, i.e. surface and grain boundary scattering, and dimensional scaling, e.g. variations of the length and the width of the channel, must be studied [75]. Excess scattering at the grain boundaries and surfaces of metallic channels is dominated by the Elliott-Yafet (EY) mechanism [77]; when electrons scatter to a new state in the conduction band, there is a probability that they couple to a different spin state as electron states are not pure spin states. Therefore, spin relaxation mechanism becomes proportional to the scattering rate [77]. Moreover, the loss of spin information depends on the spin relaxation length of the channel. Spin signals decay exponentially for channels longer than the spin relaxation length imposing geometrical constraints on the design of ASL devices, such as limiting the maximum allowable length of channels to a few hundred nanometers [75].



Figure 8: To model the physics of the all-spin logic device, magnetization dynamics, spin mixing conductance, and spin-drift diffusion in the channel are taken into account [78].

Modeling channels with various geometrical dimensions is done using circuit models presented in [50], shown in Figure 9. In this model, the spin current transport is modeled using a distributed T-model, which accounts for the conductance of the channel as well as the spin relaxation mechanism. Moreover, these models capture the dynamics of magnetization reversal as well. Furthermore, in [50], a circuit model is developed that precisely captures the impact of thermal noise, is validated by analytical derivations presented in [52]. Based on the findings, the switching of magnets at room temperature is significantly impacted by thermal noise; as an example, the switching delay of magnets may alter by 30% at the room temperature. Because of the efficient implantation of majority gate and lower device count, ASL devices are studied for various applications [39], [70], [79]–[81]. As an example, a majority gate with a fan-out of four implemented by ASL requires four magnets [82], while that of CMOS requires 14 transistors. Similarly, a majority-based full adder implemented by ASL requires five magnets [83], while that of CMOS requires 28 transistors. Lower device count and fabrication area are two advantages of ASL devices in implementing more complicated Boolean logic applications such as 32-bit adders and arithmetic logic units (ALUs) [33]. In this chapter, an ASL full adder as the building block for ASL ALUs is analyzed, and its performance is studied.



Figure 9: ASL circuit model [78].

In addition to lower device count, the ASL device offers advantages such as a tunable delay in a large range by changing the supply voltage; a change of input voltage from 10 mV to 35 mV results in a change of delay from 350 ps to 100 ps. Hence, by

implementing ring oscillators using ASL devices [84], we expect the oscillation frequency to be tunable in a large range. Oscillators are one of the essential blocks in analog and digital electronics and communication systems. CMOS oscillators are widely studied and designed, and their phase noise, frequency tuning, and power consumption have improved over the last two decades [85]–[90]. However, ring oscillators normally suffer from a poor phase noise performance, due to the asymmetric nature of the time domain signal [91], compared to more symmetric topologies such as LC and Collpits oscillators. The tuning of CMOS ring oscillators usually requires extra tuning components such as varactors, which adds to the current path loss. As a result, lower output power and higher phase noise are inevitable in a tunable ring oscillator. Therefore, generating wideband, low phase-noise oscillators, coupled oscillators are introduced to achieve lower phase noise, wider tuning range and higher output power. Moreover, the networks of coupled oscillators can implement certain applications such as non-Boolean logic computation circuits [92]. In this chapter, an ASL coupled-oscillator scheme is proposed and investigated.

#### 2.2 Modeling and Benchmarking of All-Spin Logic

#### 2.2.1 The Operation of All-Spin Logic

In an ASL device shown in Figure 10, electrical current flows from the supply voltage to ground through the input magnet and the nonmagnetic metal underneath it. The current passing through a magnet becomes spin polarized with majority electrons' magnetic moment aligned with its magnetization. The spin polarized electrons injected (or extracted) by the input magnet increase (or decrease) the density of the electrons with the spin orientation aligned with the input magnet inside the channel. The concentration gradients for electrons with parallel and anti-parallel spin orientations inside the channel

creates a spin current towards the output magnet based on the diffusion process. This spin current applies a torque to the output magnet that, if strong enough, can flip it to align it with the spin orientation of the majority electrons. Thus, the device is capable of operating both as an inverter and as a buffer depending on the polarity of the supply voltage, which determines the injection or extraction mechanism for the spin current.



Figure 10: ASL consists of two magnets connected by a non-magnetic channel. Injected spin current from Input magnet to the channel diffuses along the channel and applies a torque to the output magnet, which if strong enough, switches the output magnet.

Major parameters that determine the performance and the energy dissipation of this device include channel and interface resistances, spin diffusion length, and the thermal noise of magnets. In the next subsections, these parameters are investigated.



Figure 11: Transient Response of an all-spin logic device. In this simulation, the input magnet is assumed to be oriented in the +X direction. By applying a negative voltage, the device acts as a buffer, while by applying a positive voltage, the device acts as an inverter.

# 2.2.2 The Modelling of the Thermal Noise of Magnets

The magnetization dynamics of magnets is described by Landau-Lifshitz-Gilbert (LLG) equation,

$$\frac{d\vec{m}}{dt} = -\gamma \mu_0 \left[ \vec{m} \times \vec{H}_{eff} \right] + \alpha \left[ \vec{m} \times \frac{d\vec{m}}{dt} \right] + \frac{\vec{l}_{s,\perp}}{qN_s},\tag{3}$$

in which  $\vec{m}$ ,  $\vec{I}_{s,\perp}$ ,  $N_s$ ,  $\mu_0$ ,  $\alpha$ ,  $\gamma$  represent the magnetic orientation, the perpendicular spin current, the number of spins in the magnet, the free space permeability, the Gilbert damping coefficient, and the gyromagnetic ratio [50], [93], Figure 12. The net magnetic field,  $\vec{H}_{eff}$ , is comprised of the uniaxial anisotropy field,  $\vec{H}_U = -\frac{1}{\mu_0 M_S} \frac{\partial E}{\partial \vec{m}}$ , and the demagnetization field,  $\vec{H}_{demag} = M_S \bar{N}_d \vec{m}$ . The net magnetic field can be modified to include thermal noise. Thermal noise is caused by the thermal random motion of electrons in the magnet [33] and can be modeled by the thermal field,  $\vec{H}_{Thermal}$ , which models the statistical thermal motion of the electrons [50],

$$\vec{H}_{eff} = \vec{H}_U + \vec{H}_{demag} + \vec{H}_{Thermal}.$$
(4)

The model is implemented in SPICE and the results are validated using the analytical solution for the steady-state precession angle,  $\theta_0$ , [50] as a function of temperature

$$\langle \theta_0^2 \rangle = \frac{k_b T}{E_b},\tag{5}$$

in which  $E_b$  represents the energy barrier of the magnet. As Figure 13 shows, the SPICE results match within 5% of the analytical results.



Figure 12: LLG equation describes the magnetization dynamics. The corresponding vectors to the terms of LLG equation are represented in this figure.



Figure 13: To validate the SPICE model of the thermal noise, the derived average value of the thermal noise over time of the SPICE model, is compared to that of the analytical solution. SPICE results match with analytical results.

#### 2.2.3 Size Effects

Size effects caused by extra scattering at surface and grain boundaries affect several important parameters for ASL channels including resistivity, diffusion coefficient, and spin relaxation length. Among these factors, spin relaxation length is the most important factor since signal attenuates exponentially as channel becomes longer than spin relaxation length, Figure 14. In metals, the dominant spin relaxation mechanism is the Elliott-Yafet (EY) mechanism, in which every time an electron is scattered, there is a certain probability that it may lose its spin information [17]. Hence, spin relaxation time is proportional to momentum relaxation time, which gets shorter as channel cross-sectional dimensions become smaller, due to size effects. The models for spin relaxation time and spin diffusion length are presented in [94]. Figure 15 shows how spin relaxation length decreases as channel dimensions scale. The three important parameters of concern are the sidewall

specularity, P, the grain boundary reflectivity, R, and the average grain size. As a rule of thumb, the average grain size in channel fabricated by Dual Damascene process is equal to the width or thickness, whichever is smaller [95].



Figure 14: Delay and energy dissipation vs channel (channel) length of an ASL gate.



Figure 15: Spin Relaxation versus channel Width [2]. Size effects cause the spin relaxation length to decrease with decreasing channel width. For the no size effect case, spin relaxation length is independent of channel width.

The delay and energy per bit have been plotted versus length in Figure 16, respectively, assuming a channel width of 37.8 nm equal to the width of the magnets, and a width to thickness aspect ratio of 2. To observe the impact of size effects, a hypothetical case, in which size effects are absent is also considered (labeled ideal Cu). Size effect parameters are assumed to be R = 0.2, P = 0.0 for the typical case, and R = 0, P = 1.0 are assumed for the ideal case. Physical parameters of Cu channel are calculated as  $\sigma = 41.549$  ( $\mu\Omega$ m)<sup>-1</sup>, D = 0.014 m/s for the typical case. To demonstrate the effect of thermal noise in magnets, each simulation is repeated three times considering room temperature.



Figure 16 : (a) Delay versus specularity parameter, P, for an 80 nm long channel. Grain boundary scattering parameter, R, is assumed to be 0.2 (b) Delay versus grain boundary reflection probability for an 80nm long channel. The specularity parameter, P, is assumed to be 0 [96]–[99].

To see how improving channel process can improve channel performance and energy dissipation, Figure 17 plots delay versus surface specularity parameter, P, and grain boundary scattering, R. Both Cu and Al have been considered here. Also, to avoid busy plots, thermal noise has been turned off and its effect has been considered only in setting the initial angles of the magnets. Here, both Cu and Al have been considered as they offer different tradeoffs. As Figure 15 shows, spin relaxation in Al is higher than that of Cu. Furthermore, since the mean free path in Al is shorter than that of Cu, size effects are less severe in Al as compared to Cu. However, Cu offers a lower resistivity unless crosssectional dimensions become too small such that size effects become too prominent. The spin injection coefficients for Co/Cu and Co/Al interfaces are assumed to be the same [73].



Figure 17: Delay versus channel width for 80 nm and 400 nm long channel.

To quantify the impact of dimensional scaling, channel width analysis is presented in Figure 17. The magnet width is assumed to be 37.8 nm in all cases to ensure adequate magnet stability and non-volatility. Size effects become more pronounced at smaller dimensions. The aspect ratio of channel is assumed to be constant in these simulations. For the channel widths smaller than the magnet width, the interface area decreases which further increases delay and energy. For channel width analysis, two channel lengths of 80 nm and 400 nm have been considered. For the ideal cases (no size effects), both lengths are shorter than spin relaxation lengths in Cu and Al, and Cu is a better choice since it offers a lower resistivity. However, size effects make the spin relaxation length shorter and Al channels become faster and dissipate less energy compared to Cu channels especially at small widths. Also, one can see the delay and energy penalty associated with size effects increase drastically as wire dimensions scale down.

#### **2.3 Applications**

#### 2.3.1 ASL Adders

Majority gates can be used to implement full adders with lower device count, as shown in Figure 18. An ASL full-adder implementation by cascading two ASL majoritynot gates is shown in Figure 19a, and the layout is shown in Figure 19b. The carry-out bit,  $\overline{C_{OUT}}$ , is the majority-not of *A*, *B*, and the Carry in (*C*<sub>IN</sub>) bit; hence,  $\overline{Sum}$  bit can be produced as the majority-not of *A*, *B*, *C*<sub>IN</sub>,  $\overline{C_{OUT}}$ ,  $\overline{C_{OUT}}$  bits, implemented using a 5-input majority gate. The proposed structure is simulated, and the results are shown in Figure 19c. By cascading the proposed full adder, a 32-bit ASL ripple-carry adder is formed. Although the proposed 32-bit ASL adder will benefit from lower device count, the CMOS 32-bit adder will be two orders of magnitude more energy and time efficient, even without considering driver circuits for ASL adders [36]. The significant difference in energy efficiency is due to the higher energy efficiency of CMOS transistors. Hence, ASL devices cannot compete against CMOS devices in terms of delay and energy for implementation of Boolean applications. However, due to the efficient control of delay and magnetization waveform, ASL devices are studied for other applications; one example of an ASL-coupled oscillator is demonstrated in the following subsection, and another example of an ASL image-recognition circuit is demonstrated in Chapter III.



Figure 18: Implementation of a full adder using two majority-not gates.





Figure 19: (a) Schematics and (b) layout of the ASL full adder. (c) The transient response of an ASL full-adder. The blue color represents the magnetization orientation in the x-direction and the green and red colors are representing magnetization in the y and z directions.

## 2.3.2 ASL Oscillators and Coupled-Oscillators

Oscillators are one of the essential building blocks in analog and digital electronics and communication systems. One of the most commonly used CMOS oscillator topologies is the ring oscillator. Compared to its bulky LC counterparts, GHz-range ring oscillators are a more practical candidate for integrated circuits as they promise more compact implementation and enable the use of digital invertors. An ASL ring oscillator is realized using a ring of three ASL inverters, [84]. The oscillation frequency of the device is highly tuneable as it changes from 1.8 GHz to 6.8 GHz by changing the supply voltage from 30 mV to 70 mV, as shown in Figure 20. However, the device suffers from a poor phase noise performance. The figure of merit of the ASL device is limited to 150-160 dBc/Hz, while that of CMOS device can reach to 189 dBc/Hz [100]–[102]. The high phase noise of the device is inevitable for in a highly tuneable ring oscillator structure like CMOS circuits, as discussed before. Therefore, to improve the generation of the wideband low-phase noise oscillation of CMOS oscillators, researchers have proposed a myriad of design techniques for CMOS and other technologies. One proposed technique is the coupled oscillator, which generates lower phase noise and has a wider tuning range and higher output power. In addition, networks of coupled oscillators can be used in certain applications such as non-Boolean logic computation.



Figure 20: Oscillation frequency versus supply voltage. The oscillation frequency increases linearly with increasing the supply voltage.

Table 1: Performance comparison of the ASL ring oscillator with CMOS oscillators. The figure of merit (FMO) of an oscillator is defined as  $FMO = 10 \log_{10} \left[ \left( \frac{f_0}{\Delta f} \right)^2 \cdot \frac{1}{L \{\Delta f\} \cdot P_{dc}} \right].$ 

|                 | Unit   | ASL  | 0.18um<br>CMOS<br>[100] | 0.18um<br>CMOS<br>[102] | 0.18um<br>CMOS<br>[101] |
|-----------------|--------|------|-------------------------|-------------------------|-------------------------|
| Frequency       | GHz    | 6.0  | 5.6                     | 5.8                     | 4.8                     |
| Tuning<br>Range | %      | 100  | 6.4                     | 8.9                     | 4.3                     |
| V <sub>DD</sub> | mV     | 20   | 400                     | 600                     | 1500                    |
| DC Power        | mW     | 0.25 | 1.1                     | 0.7                     | 3.0                     |
| FMO             | dBc/Hz | 150  | 189                     | 174                     | 189                     |



Figure 21: (a) Two metallic channels are connecting two ASL ring oscillators to form the ASL coupled oscillator. The oscillation of the two rings will be coupled to each other if the oscillation frequencies of two oscillation loops are close to each other.

An ASL coupled-oscillators is proposed in this thesis as shown in Figure 21. The proposed structure consists of two ASL ring oscillators connected to each other using two connector ASL gates, responsible for injection-locking mechanism. The simulation results

are shown in Figure 22. The voltage applied to Ring 1 is 35 mV. In Figure 22a, the voltage applied to Ring 2 is 10 mV larger than that of Ring 1. Different voltages applied to rings results in different oscillation frequencies in rings; however, considering the injectionlocking mechanism and close oscillation frequency values, the oscillations of two rings will couple and will oscillate at the same frequency with  $\sim 180^{\circ}$  of phase shift. In Figure 22b, the difference between the voltages applied to the two rings increases to 20 mV; thus, the difference between the oscillation frequencies increases; therefore, two rings can no longer continue their coupled oscillation. To quantify the range of coupled oscillation, simulations are done with different supply voltage values, and the results are shown in Figure 23a. Moreover, the phase shift of two rings is controllable by changing the supply voltage of the connector ASLs. As shown in Figure 23b, the phase shift is changed from  $\sim 180^{\circ}$  to  $130^{\circ}$  by changing one of the connector voltages from 12.5 mV to 20 mV. The easy manipulation of phase noise is a desirable feature for implementing a phase-locked loop (PLL) system based on the proposed ASL coupled-oscillator. The phase noise and the figure of merit of the proposed coupled-oscillator is not investigated yet. However, considering the low power dissipation of the device, the structure might be promising for the design of image and patterns recognition systems based on coupled-oscillators [92].



Figure 22: The voltage applied to the magnets of the ring 1 is 35 mV. In (a), the voltage applied to the magnets of the ring 2 is 10 mV higher than the voltage applied to the ring 1, while in (b), the voltage applied to the ring 2 is 20 mV higher than the voltage applied to the ring 1.



Figure 23: (a) The supply voltage applied to the ASL connector 2 is changed from 12.5 mV to 20 mV. As a result, the phase shift of the two rings is changed from  $\Delta\phi_1$  to  $\Delta\phi_2$  in which  $\Delta\phi_1 > \Delta\phi_2$ . (b) Locking range of the ASL coupled oscillator. The shaded

region shows the region where the supply voltages of rings are different, but the two rings will show a locked oscillation.

# 2.4 Conclusions

Beyond CMOS devices are being studied to potentially augment conventional CMOS logic. Spintronic devices are potential candidates as they offer new features such as nonvolatility. In this section, the potential performance of ASL is modeled and the impact of size effects and dimensional scaling are quantified. It is predicted that ASL devices will suffer from size effects even more seriously as compared to their electrical counterparts. This is due to the exponential drop in spin signal as spin relaxation length degrades due to size effects. Thereby, any improvement in Cu interconnect technology such as an increase in average grain size or wire surface quality will have an even bigger impact on ASL interconnects. Al wires offer a larger spin relaxation length and less pronounced size effects as compared to Cu wires. However, they are more resistive except for narrow wires. Thereby, Al ASL interconnects outperform Cu ASL interconnects when they are relatively long and narrow. To transfer spin signals in distances longer than 1  $\mu m$ , other spintronic structures must be proposed, in which one novel design is proposed in Chapter IV.

Two examples of the applications of ASL device is demonstrated in this section. First, the ASL full-adder, an example of Boolean logic devices, was proposed. The layout and the operation of the device were shown. Although implementing a 32-bit adder using ASL compared to CMOS, requires fewer device count, it cannot compete against CMOS device in terms of energy efficiency, considering the energy efficiency of CMOS transistors. Later, an ASL coupled oscillator was proposed. The device is highly tunable in a wide range of frequency and supply voltage. The proposed device is promising for coupled-oscillator-based image and pattern recognition systems.

# III. IMAGE RECOGNITION CIRCUIT USING ALL-SPIN LOGIC DEVICES

#### 3.1 Non-Boolean Applications of All-Spin Logic Device

Pattern recognition and in particular, image recognition techniques have been widely studied in machine learning and image processing [103]–[105]. Researchers have widely studied the hardware demonstration of computation units for pattern recognition, a challenging problem in terms of chip size, power consumption, computation complexity, and decision speed. Among various solid state technologies, CMOS provides a low cost, highly-integrated and low power implementation platform for pattern recognition [92], [106], [107] and processing [108] systems. For Boolean logic systems, CMOS gates exhibit processing speeds up to a few GHz and can be designed to have a low static power. However, the dynamic power consumption of a large system with a GHz clock frequency can still limit the scalability. Fan-in and fan-out considerations for CMOS devices also impact the speed, power consumption and the size of devices. Besides Boolean systems, some novel non-Boolean techniques have been developed to overcome these issues. In non-Boolean systems, logic gates will no longer be the key building block and analog/mixed signal circuits are used. In [92], authors propose a technique for non-Boolean training and detection of image pixels using a network of coupled oscillators. This structure has the capability to detect any scaled or rotated version of a desired image. On the other hand, this method suffers from high computational complexity and large area and power consumption that limits the application for large image arrays. Moreover, the long convergence time is another limitation. Other proposed CMOS systems have demonstrated artificial neural networks (ANN) by designing circuits emulating neurons and synapses

[106], [107]. The larger computation demand in these systems, leaves the search open for new solutions.

To overcome the limitations of CMOS devices, other technologies are being investigated for pattern recognition applications. Spintronic devices have received attention recently because of some unique properties, e.g., low voltage operation and nonvolatility. In [69], all-spin logic (ASL) and charge-spin logic (CSL) devices are shown to be capable of Boolean and non-Boolean operations which demonstrate them to be an attractive choice to build some fundamental blocks such as ring oscillators. The majority gate operation of ASL devices has been previously introduced in some Boolean logic systems [27], [109]. This feature of these devices can overcome fan-in and fan-out limitations of large integrated systems. Besides, the inverting and non-inverting operation modes of ASL devices can be the key to design many logic circuits e.g., full-adder circuits as discussed in Section II. The time domain transient behavior of magnetization in these devices also provides another degree of freedom to demonstrate non-Boolean operations. These features combined, enable us to design an all-spin logic non-Boolean compact structure with low power consumption and low computational complexity.

In this section we propose a novel pattern recognition circuit that takes advantage of novel features of spintronic devices such as non-volatility, efficient implementation of majority gates and XOR functions, and the ability to distinguish strong and weak majorities. The non-volatility of the devices enables storing large sets of training images within the logic with no standby power dissipation. This feature also enables "instant-on" operation and saves on energy and delay penalties imposed by loading training images from a main memory.



Figure 24: An ASL Majority gate with three inputs. The three input magnets, M1, M2 and M3 are connected to the output magnet, MO, using three metallic channels.

# 3.2 All-Spin Logic Majority Gate

As mentioned earlier, the ASL device supports a majority operation as shown in Figure 24. This feature is achieved because the net spin current to the output magnet is determined by the sum of all input spin currents from all input devices. In principle, this system can be designed for many inputs. As a trade-off, by increasing the number of input devices in a majority gate, the uncorrelated thermal noise of these devices adds up and impact the transient magnetization of output magnet. Based on the device properties, this phenomenon sets a practical limit on the maximum number of input devices for a majority gate. In our simulations, for three and five input cases, the transient output magnetization is less impacted by the thermal noise, compared to higher fan-in numbers. We must clarify that the orientation of the output magnet depends on the sign of the applied voltage on the magnets. In the case of a negative voltage applied on the magnets, the magnetization orientation value will be the majority of input magnetizations. On the other hand, if the applied voltage is positive, the steady state value of the output magnet will be the complementary majority of input magnetizations.



Figure 25: (a) Switching transient response for different scenarios of input magnetization in a majority gate with 5 inputs. (b) Switching transition comparison of majority gates with three and five inputs. In this comparison, the input magnetization of magnets of three input gates are the same. For the gate with five inputs, four inputs have similar magnetization and the net spin current is equal to the other gate. The applied voltage on magnets in these simulations is -5 mV.



Figure 26: Switching delay variation versus the supply voltage. Each voltage is simulated three times to verify the results.

The orientation of the output magnet of an ASL majority gate depends on the number of input magnets, because the transferred spin torque increases when there are more magnets with magnetization in the same direction. Figure 25 shows different scenarios of transient output magnetization in majority gates with three and five inputs. As shown in Figure 25a, in a majority gate, with 5 inputs, the switching of output magnetization becomes faster when there are more inputs with a similar magnetization direction. As the number of magnets with a similar magnetization decreases, the switching happens slower and the thermal noise adds up. In Figure 25b, the switching transition for two majority gates with three inputs and five inputs are compared. The gate with three inputs compared to the gate with five inputs, is affected less by thermal noise. Based on (24), the equation for the switching time of a magnet, if the value of injected spin current increases, the switching delay decreases. However, as shown in [110], the channel in this device can be approximated as an RC network; hence, the injected spin current and the

supply voltage are directly correlated. Therefore, the switching delay is inversely proportional to the value of supply voltage. This result is shown in Figure 26.

## 3.3 Pattern Recognition Scheme

Like any recognition system, in this work we consider two major phases for the operation. The first phase is the learning phase, where the desired pattern is stored in the memory. In the detection phase, the circuit identifies the similarity of an input data and the stored pattern with respect to the decision-making criteria. In the learning phase, the circuit can receive a single image or a training set. The training set includes multiple training images from different users. In this section, we propose a new technique using all-spin logic devices and establish a fully spin-based operation. By illustrating several examples, we verify the performance for various image sizes.

## 3.3.1 Mainly Similar Images

We first provide the mathematical definition of mainly similarity and then illustrate how this can help the training of the circuit. In our simulations, all the images are binaryvalued matrices with 0 and 1 representing white and black pixels, respectively. In this circuit, we assume that binary "0" logic corresponds to the magnetization orientation in -X direction and binary "1" logic corresponds to magnetization orientation in +Xdirection.

For a given pair of binary vectors x and y with equal length L, the *Hamming distance* is defined as

$$d(x, y) = \sum_{i=1}^{L} (1 - \delta_{x_i y_i}),$$
(6)

Where  $x_i$  and  $y_i$  denote the  $i^{th}$  components of x and y respectively and  $\delta$  is the kronecker delta function. Subsequently, we can exploit this quantity as a measure of similarity between two images.

**Definition 1** Two binary images B and  $B' \subset \{0,1\}^{m \times n}$  are called mainly similar if the majority of pixels across every two rows are identical. More specifically,

$$\forall k \in \{1, \cdots, m\}: d(B_{k,:}, B_{k,:}') < \lfloor \frac{n}{2} \rfloor, \tag{7}$$

where  $B_{k,:}$  denotes the  $k^{th}$  row of B and [a] represents the floor operation on a (i.e., the largest integer not greater than a) [76].



Figure 27: The two images are mainly similar (along the rows), however, the Hamming distance between the third columns is 3 which does not imply a similarity along the columns

By this comparison, we ensure that the two images have almost similar pixels along the corresponding rows. In this work, we consider the comparison along the rows, although a column-wise comparison can be established with no loss of generality. As illustrated in Figure 27, being mainly similar along the rows, does not imply being similar along the columns.

## 3.3.2 Majority Training and Decision Making

In the learning phase, we train the circuit by providing a number of mainly similar images. These images could be different representations of a target image (say a character or a certain binary pattern). We build up a representative of the given similar images by constructing a so-called *mean image*.

**Definition 2** For a set of P binary images  $B_1, B_2, B_3, \dots, B_P \subset \{0,1\}^{m \times n}$ , the corresponding mean image denoted as  $\overline{B}$  is a binary image with entries [76]

$$\overline{B}(i,j) = nint\left(\frac{1}{p} \Sigma_{k=1}^{p} B_{K}(i,j)\right).$$
(8)

In this equation, *nint* denotes the nearest integer function. In our circuit, the mean image represents the desired pattern by the users and is utilized as a reference. Since this matrix is constructed using all-spin majority gates, the number of training images, *P*, is considered to be odd and upper bounded by the maximum number of inputs to a majority gate as discussed in the previous subsection.

After the training data is stored and the mean image is constructed, we make a rowwise comparison between the input and the mean image. As we will see in the next section, depending on the initial value of output magnetization, the non-Boolean row decision maker can return the total count of matches or mismatches between the compared rows of input image and the mean image.

#### **3.4 Proposed Structure and Design Considerations**

Based on the pattern recognition scheme shown in the previous section, we study two different implementations of the circuit. By comparing the performances of the two different versions of the *single pixel comparator* unit, we choose the one with more capabilities, at the expense of slightly more power consumption and occupied area. In the single pixel comparator, the circuit receives the training pixels from *P* different users and the mean image is constructed. The value of the mean pixel is then compared with the corresponding value in the input image and the steady state magnetization of *Pixel* magnet stores this information. The two versions of this unit both operate based on the idea of training the circuit with a set of mainly similar images and comparison of the single pixels from the input image pixel comparator, requires a memory to store the training data, a logic comparator and a circuit to construct the mean pixel. As previously mentioned, the mean pixel can be constructed by an all-spin majority gate; however, for the memory and the comparator, we will propose a new circuit in the following subsection.

#### 3.4.1 Memory+Logic Comparator

1-bit full adder structures with a total number of five nanomagnets have been discussed in Section II. By proper setting of the adder circuit, we use it as an area and power efficient comparator (XNOR) block as shown in Figure 28. Two inputs to this block (A and B) are coming from distinct sources. One of the inputs comes from the input image synchronized with the control voltage and the other input is given to the circuit during the learning phase. Compared to a CMOS counterpart, this structure exhibits very important advantages. First, it requires five magnets whereas the CMOS version requires at least eight transistors for XNOR implementation. Second, this circuit has the capability of storing the training information without extra static power consumption, whereas in CMOS, excess power is consumed to store this data [111]. Taking advantage of the non-volatile storage in ASL devices, the input magnets of this circuit can store the binary data and later the



Figure 28: 1-bit full adder used as XNOR. In the 2D implementation of this work, X and Y wires are in-plane metal wires and connections along the Z axis are vias.

stored information is used to determine the magnetization direction of the next stages. Figure 29 shows the simulated output waveform (*sum* magnet) of the XNOR block for different scenarios of input magnetization. As it is important to consider the breakdown current effects [71], we choose the 5mV supply voltage in our simulations. This is to ensure that the current density is safely below the breakdown value. It is noteworthy that for channels with higher breakdown current densities, higher voltages can be applied and the operation speed increases. The control voltage is applied on the magnets at t=0. The total power consumption of the XNOR gate is 11  $\mu W$  and the estimated area is less than 0.3  $\mu m^2$ . As we apply a control voltage on the XNOR gate, the output magnetization remains in the -X orientation (initial condition of magnetization in this simulation) if the pixel

values are different. In case of having similar inputs to this gate, the output magnetization switches to +X direction as shown in Figure 29. We must clarify that the initial condition of the output magnet does not change the final magnetization orientation.



Figure 29: Simulated output waveforms of XNOR gate

#### *3.4.2 Construction of the mean pixel*

As a reliable and simple way to extract the information from the training set, we construct the mean image as discussed in the previous subsection. The ASL majority gate with the schematic shown in Figure 24, provides a low power and efficient implementation of the mean image. The inputs to this majority gate, come from P different users. In addition, the images that system receives during the learning phase are constrained to be mainly similar along the rows. By applying the control voltage on the magnets of this gate, the output magnetization either switches to other value or remains in the same orientation. If the applied control voltage is negative, the output final magnetization orientation is the majority of the input magnetizations. In the case of positive control voltage, the output

magnetization settles to the complementary majority value of the input magnetizations. For this system, since we apply unified positive voltages, the majority gates settle to the complementary majority value. To extract more information from the majority gates operation in this circuit, we assume a unified value of initial magnetization orientation on the output magnets of each stage of majority gates. This enables us to recognize the total count of matches or mismatches between the input magnetizations to each majority gate, as we will discuss later. The total power consumption of each majority gate in this circuit is 3.75  $\mu W$  and the corresponding estimated area is less than 0.2  $\mu m^2$ .

## 3.4.3 Single Pixel Comparator

By having the required blocks, we propose two different versions of the single pixel comparator.

## 3.4.3.1 Standard implementation

The schematics of this implementation and the table with the detailed operation are shown in Figure 30. This circuit operates in the same order discussed before. The first stage of the circuit is a majority gate with inputs coming from the *P* users in the learning phase. The output of this majority gate settles to the corresponding mean pixel value. The output of this gate is connected to a comparator circuit which has the other input coming from the input image. The connection is through a short metallic channel to minimize the delay. When the learning phase is finished, and the detection phase starts, by applying the control voltage across the magnets of the comparator circuit, the "Pixel" magnet settles to the comparison value of the mean pixel and the input pixel. It is noteworthy that the input pixel can be applied on Magnet  $Q_{ij}$  after the  $\overline{P}_{ij}$  magnetization settles to the mean pixel; hence, no extra memory circuit is required to store the value of  $\overline{P}_{ij}$ .



Figure 30: (a) Standard single pixel detector schematic. (b) The truth table with the detailed operation of the circuit.

# 3.4.3.2 Comparator-First implementation

In this version, there are the same number of comparator circuits as the total number of training images at the input side. The comparators have the input image pixel,  $Q_{ij}$ , in common and differ in their other input that comes from their corresponding training image. The output magnets of the comparators are connected to the "Pixel" magnet through metallic channels in a majority gate configuration. During the learning phase, the pattern pixels are stored in the corresponding input magnets. By applying the control voltage on the magnets of the circuit, the detection phase starts and the "Pixel" magnetization settles to the comparison value of the mean pixel and the input pixel. The schematic of the circuit and the detailed operation table are shown in Figure 31.



Figure 31: (a) Comparator-first pixel detector schematic. (b) The truth table with the detailed operation of the circuit.

As it can be verified by comparing the last columns of Figure 30b and Figure 31b, the "Pixel" steady state value is identical in the two versions. To verify the identical output result from the two different versions of the implementation in a more general case, we must prove that the majority operation and the comparison (XOR/XNOR) operation are interchangeable, i.e.,

**Proposition 1** Given x,  $y_1$ ,  $y_2$ ,  $\cdots$ ,  $y_P$  as binary variables and P as an odd integer number,

$$\mathbf{x} \bigoplus \operatorname{nint}(\frac{1}{p} \Sigma_{\mathbf{k}}^{\mathbf{P}} \mathbf{y}_{\mathbf{k}}) = \operatorname{nint}(\frac{1}{p} \Sigma_{\mathbf{k}}^{\mathbf{P}} (\mathbf{x} \bigoplus \mathbf{y}_{\mathbf{k}})), \tag{9}$$

where  $\oplus$  denotes the XOR operation [76]. The mathematical proof of this proposition is shown in [76].

Although the standard implementation has the advantage of slightly lower power consumption (lower device count) and a smaller area, we select the comparator-first design as the unit cell of this circuit because the output magnetization transient of this circuit provides more information on the similarity of the training pixels and the input pixel. Based on Figure 30b and Figure 31b, the final value of output magnetizations, in two cases are identical. However, the Comparator-first output magnetizations comes from a majority gate and switches when most pattern pixels have the same value of the input pixel. If the majority gate at the output of Comparator-first circuit has a low fan-in (e.g.,  $\leq$  5), the switching transient behavior will be less sensitive to the accumulated thermal noise and the information on the number of training pixels with identical values will be provided. On the other hand, in the standard implementation, the output magnetization is from the XNOR circuit and conveys no information on the number of similar pattern pixels. Based on Figure 30b, the output magnetization transient will not add information on the number of training

pixels with identical values. This is particularly important when the user in the detection phase tracks the total count of pattern pixels with the similar value.

#### 3.4.4 Non-Boolean Row Decision-Maker

The last stage of the proposed circuit uses the interesting feature of the ASL majority gate as a means to quickly decide about the mainly similarity of the input image and the mean image, along the rows. The inputs to this majority gate is from the "Pixel" magnets of the pixels along the same row of the image. The connection is through short channels to minimize the delay. As mentioned before, the spin torque transferred from the input magnet to the output magnet in the ASL majority gate is determined by the magnetization of the input devices. As the number of devices with similar magnetization orientations increases, the transferred spin torque increases; hence, the output magnetization switching becomes faster according to (6). By proper selection of the control voltage timing and the dimensions of magnets and channels in this gate, a reliable decision-making based on the transient behavior of output magnetization is achieved. This final majority gate is sensitive to the uncorrelated thermal noise of input magnets; hence, an intentional low fan-in number ( $\leq 5$ ) must be selected. In our simulations, three magnets from the previous pixel stages are connected to this gate and as it will be shown in simulation results, a reliable decision-making is achieved.

The complete circuit for the full image comparison consists of two stages. The unit pixel comparator and the row majority gate. The structure consisting of the comparatorfirst circuits and the Row majority gates is called the "Smart Detector Cell". This naming convention, helps the discussion of operation in the next subsection. We call these detector cells smart because they can perform multiple tasks of "storage", "Boolean Computation", and "non-Boolean decision-making" in a time-efficient manner. The schematic of this circuit is shown in Figure 32. The total power consumption of this circuit is 115  $\mu W$  and the occupied area is less than 0.5  $\mu m^2$ .

To feed the input data, spin polarized currents are used to initialize the magnetization of input magnets based on the training images, similar to [112]. On the other hand, the number of write units is equal to the number of pixels, while there is one output, which translates into a small overhead. The decision data is in the form of time delay and can be stored on a capacitor, where the delay impacts the amount of the stored charge. The other possibility to extract he output data will be using MTJ devices, as mentioned in [113].



Figure 32: Structure of the unit smart detector cell.

## **3.5 Simulation Results**

In this subsection, we provide two examples to show the reliable performance of smart detector cells.

3.5.1 Non-Boolean Hamming Distance Identifier of 3 × 3 Pixel Pattern and Input Image

In this example, we only have one training image and one input image. To compare the similarity of these two images, we need nine XNOR gates to identify the similarity of corresponding pixels in the two images and three majority gates with Fan-in of three to decide on the similarity of the corresponding rows. The smart detector cell in Figure 32, has three comparator-first circuits and a Row majority gate. The mainly similarity of the rows can be determined by the Pixel majority gates. The last majority gate in this case, settles to +X magnetization if at least two rows are mainly similar. The initial magnetizations of the comparators and the majority gate outputs are set to 1. Figure 33 shows two images as well as the transient magnetization for various magnets. The Pixel waveforms overlap in some cases; thus, only three pixels are shown in this figure. As expected, the comparator outputs switch for  $P_{21}$  and  $P_{22}$  pixels since the values in the input image and the pattern image are different. For the rest of pixels, the comparator output is +X magnetization and will not switch. Subsequently, Row 1 and Row 3 both exhibit perfect similarity and the output of the corresponding majority gates switch within the shortest time. On the other hand, Row 2 exhibits a mismatch and therefore cannot switch to -X magnetization orientation. The control voltage of 5 mV is applied on all the magnets at t=0 and the circuit compares the two images in less than 0.6 ns. Compared to CMOS circuits, this exhibits significantly lower operational voltage and decision time.



Figure 33: Using a single smart detector cell, we can compare these  $3 \times 3$  pixel images. The waveforms of the comparators and majority gates (bottom).

## 3.5.2 Non-Boolean Similarity Comparison of a $9 \times 9$ Pixel Image and a Set of Three

### Pattern Images

To incorporate the smart detector cells for larger images, we need an accurate design of cells. Here, we develop a circuit for training with  $9 \times 9$  pixel images and perform a non-Boolean comparison between the constructed mean image and the  $9 \times 9$  pixel input

image. In this simulation, three different users write the word "Spin" by their own choice of pixels. The three pattern images are shown in Figure 34.



Figure 34: Training set for the  $9 \times 9$  pixel images.

In the detection phase, a new user of the circuit, chooses an arbitrary image of interest as the input. As an example, in this simulation, the user chooses the word "swim" as shown in Figure 35 (left). The circuit should compare this image and the mean image constructed from the training set.



 9
 9
 9
 9
 9
 9
 9
 9
 9
 9
 9
 9
 9
 9
 9
 9
 9
 9
 9
 9
 9
 9
 9
 9
 9
 9
 9
 9
 9
 9
 9
 9
 9
 9
 9
 9
 9
 9
 9
 9
 9
 9
 9
 9
 9
 9
 9
 9
 9
 9
 9
 9
 9
 9
 9
 9
 9
 9
 9
 9
 9
 9
 9
 9
 9
 9
 9
 9
 9
 9
 9
 9
 9
 9
 9
 9
 9
 9
 9
 9
 9
 9
 9
 9
 9
 9
 9
 9
 9
 9
 9
 9
 9
 9
 9
 9
 9
 9
 9
 9
 9
 9
 9
 9
 9
 9
 9
 9
 9
 9
 9
 9
 9
 9
 9
 9
 9
 9
 9

Input image

Mean training image

# Figure 35: The input image (left) and the representation of the mean image (right). The mean image is not a direct output of the circuit.

The mean image of the training set is also shown in Figure 35 (right). One advantage of constructing the mean image is discussed here. As it can be seen in Figure 35, those pixels which are mistakenly valued by a single user (e.g.,  $P_{26}$  and  $P_{49}$ ) in the learning phase, are automatically corrected when the mean image is constructed. This is specifically useful, when users train the system with multiple versions of an image during the learning phase to make sure that the mean image represents their desired pattern. The mistaken values could be due to any source of error or distortion. In an ideal case where the thermal noise effect can be ignored, by changing the fan-in of different stages in the smart detector cell, the circuit can compare these two large images. However, in our simulations, as we model the thermal noise accurately, fan-in considerations become prominent. Based on these considerations, we break these  $9 \times 9$  images into smaller  $3 \times 3$ sub-images, where a single smart detector cell unit can be used for the comparison. The nine smart detector cells can operate in parallel and the circuit configuration can be determined by the user. By this breakdown, we can also achieve more information on the pixels as we can check the mainly similarity for smaller blocks of the original image. The breakdowns of the mean image (squares on the right) and the input image (squares on the left) are shown in  $3 \times 3$  partitions in Figure 36.



Figure 36: Due to fan-in considerations, the circuit is consisted of 9 smart detector cells. The corresponding breakdowns of the mean image and the input image are shown here.

To distinguish the different rows of smaller blocks, we use the notation of  $C_{ij}$  clusters, which represents the elements of the *i*<sup>th</sup> row from Column 3j - 2 to Column 3j. The magnetization waveforms shown in Figure 37 and Figure 38 separately show the output magnetizations of smart detector cells for various clusters. The unified initial condition of the output magnet in this simulation is -X orientation. In Figure 37, the switching delay of output magnetizations for the clusters with perfect match ( $C_{11}, C_{22}$ , and  $C_{41}$ ) and those with one mismatch ( $C_{52}, C_{42}$ , and  $C_{32}$ ) can be easily distinguished. This phenomenon was previously described as the unique feature of ASL majority gates and helps the users to identify the number of mismatches along different rows. At the same time, the output magnetization of the clusters with the same level of similarity, are very close in time domain which makes this non-Boolean decision-making a reliable metric.



Figure 37: The switching delay of output magnetization in last stage represents the similarity of input data and pattern data.

On the other hand, in Figure 38, the output magnetization cannot switch for clusters with mismatches ( $C_{43}$  and  $C_{72}$ ), the level of precession for different mismatch levels is not the same, because of the difference in the spin torques provided in these two cases. If the user has a very high-resolution study on the output magnetization, this can help to identify the number of mismatches; however, the switching transient is a more reliable metric and the same information can be extracted by repeating the simulation with the output magnet initial condition set to -X magnetization.



Figure 38: For the cases of mismatch in clusters, magnets will not switch and the initial magnetization does not change. The y-axis shows a range from -1.002 to -0.998 in contrast with Figure 37 in which the range is from -1 to 1.

As it can be seen in all the simulation results, this circuit decides in 1 ns for a  $9 \times 9$  pixel image, whereas in CMOS, this decision time, cannot be less than few nanoseconds. For a detailed comparison between the two technologies, the performance of this circuit and two existing CMOS circuits are compared and shown in Table 2.

| Reference     | [108]          | [112]          | This Work      |
|---------------|----------------|----------------|----------------|
| Decision Time | 30 ns          | N.A.           | 1 ns           |
| Image Size    | $32 \times 32$ | 86 neurons     | 9 × 9          |
| DC Power      | N.A.           | 2.2 mW         | 990 uW         |
| Area          | N.A.           | $0.018 \ mm^2$ | $< 1  \mu m^2$ |
| Technology    | CMOS           | Spin-CMOS      | All-Spin       |

Table 2: Performance Comparison with Existing CMOS Systems

### **3.6 Conclusions**

We have presented a novel non-Boolean image recognition circuit based on all-spin logic devices. The introduced circuit can perform all the phases of a non-Boolean pattern recognition for binary images. Taking advantage of the non-volatility of ASL devices, the learning phase operation is performed incorporating no additional memory devices. By introducing the mainly similarity scheme, two different implementations of the circuit are proposed. As verified by simulation results, this circuit can recognize various sizes of binary image patterns faster than existing CMOS counterparts and consumes less power with an operational voltage of 5 mV. Since comparisons in this circuit are based on ASL majority gates, the computational complexity of the operation is less than existing circuits. The proposed circuit has applications in fast and low power image recognition for security, medical imaging, and sensing.

## IV. ELECTRICAL-SPIN TRANSDUCTION AND LONG-RANGE SPINTRONIC INTERCONNECTS

#### 4.1. Signal Transduction and Transfer for Spintronic and Magnetic Circuits

Hybrid CMOS-spintronic circuits are expected to provide new and enhanced memory and computational functionalities [24]–[26]. Hence, passing information back and forth between spintronic and CMOS devices requires efficient transduction. Several studies have examined CMOS-spintronic interface circuits, which write and read from magnetoresistive random-access memory (MRAM) and spin-transfer torque magnetic random-access memory (STT-RAM) [114], [115], and sense amplifiers that read from these magnetic memories [116], [117]. These interface circuits are suitable for large memory arrays, in which a single large, complicated sense amplifier reads many magnetic tunnel junctions (MTJs). However, in the case of signal transduction, the use of sense amplifiers creates prohibitive energy and circuit area overhead. The data signal of spinbased devices can be transferred by spin-polarized currents through interconnects [50]. Several studies [49], [51], [52] have analyzed the transmission delay and the energy dissipation of short ASL interconnects for metallic, silicon, and graphene interconnects, respectively. The amplitudes of spin signals attenuate exponentially in lengths comparable to spin relaxation length ( $L_{SRL}$ ), which is generally shorter than 1 µm for metals [94]. This length becomes even shorter in nanoscale wires, which results from sidewall and grain boundary scattering in metallic wires [77]. Thus, spin signals must be amplified multiple times to pass through longer interconnects. These repeaters add to power dissipation and the wafer area, which has led to a demand for novel long interconnect designs that efficiently carry spin signals over long ranges in microchips. In contrast to the amplitude of spin signals, that of the electrical signals does not attenuate exponentially with interconnect length ( $L_{Int}$ ). Using CMOS-spintronic transducers and electrical interconnects, we propose a structure that transmits spin signals in long metallic interconnects. The proposed structure outperforms ASL repeaters for interconnects longer than 1.6 µm.

This study proposes compact energy efficient transducers for converting back and forth magnetic states in all-spin logic (ASL) and CMOS binary signals. In this section, we propose two CMOS-spintronic interface circuits with simple structures based on MTJs and ASL gates for the transduction of electrical signals and spin signals. These transducers work under a wide range of supply voltage and TMR values.

#### 4.2. CMOS- to Spintronic-Signal Transduction

The transduction of CMOS data in the form of electrical voltage to spintronic data in the form of the magnetization orientation of magnets can be achieved by using the properties of ASL gates. The polarity of the electrical voltage applied to ASL gates controls copy and invert operations [118]. Employing this property, an ASL-based CMOS to spintronic transducer is shown in Figure 39. In this device, the direction of the electrical current passing through the fixed magnet determines the polarity of the spin accumulation of electrons underneath it. If the electrons are injected by the fixed magnet into the channel, a majority of spins underneath the magnet will have magnetic moments aligned with the magnetic orientation of the fixed magnet. Conversely, if the direction of the current is reversed and the electrons are extracted by the fixed magnet, most of the electrons will have magnetic moments antiparallel to that of the fixed magnet. In both cases, the accumulated spins diffuse inside the non-magnetic channel towards the output magnet and apply a torque on the output magnet, based on the spin-torque transfer (STT) phenomenon, changing the orientation of the output magnetization. In summary, the direction of the electrical current passing through the fixed magnet determines whether the orientation of the output magnet becomes parallel or antiparallel to that of the fixed magnet. In Figure 39a, when the input signal ( $V_{DATA}$ ) is 1, then transistors MN2 and MN3 are ON, but when  $V_{DATA}$  is 0, then transistors MN1, MP1, MN4, and MN5 are ON. The direction of the electrical current passing through the fixed magnet is designated by either a blue arrow for 1 or purple arrow for 0.

The transducer either inverts or copies the magnetization orientation of the fixed magnet to the output magnet according to whether  $V_{DATA}$  is high or low. Hence, the gate converts electrical voltage ( $V_{DATA}$ ) to the orientation of the output magnet. The proposed circuit is simulated using SPICE models, which account for magnetization dynamics and spin transport mechanisms and are calibrated with experimental results, presented in [94]. The simulation results in Figure 39b show that the switching of  $V_{DATA}$  from 800 mV (bit "1") to 0 mV (bit "0"), changes the orientation of the output magnet to the +X direction (bit "1") and then to the -X direction (bit "0"). This transducer copies the logic value of the  $V_{DATA}$  into the output magnet with a delay of 1.6 ns for high-to-low switching and a delay of 2.0 ns for low-to-high switching. The delay decreases as  $V_{FM}$  increases, but to ensure that the current density is safely below the breakdown value [71], we choose 150 mV as the largest simulated  $V_{FM}$  value. For the maximum  $V_{FM}$  value, the current density in the copper channel from the input magnet to the ground node is less than  $10^7 A/cm^2$ , where

the breakdown current density of the channel, determined by its length and width, is close to  $10^8 A/cm^2$  [71]. As Figure 39b shows, the current passing through the fixed magnet ( $I_{ASL}$ ) does not exceed 200 µA, which is less than the conventional write currents of MTJs [119].



Figure 39: Electrical signal to spin signal transducer: (a) schematics of the transducer. (b) Input signal (V<sub>DATA</sub>), which switches the polarity of the voltage applied to the fixed magnet, which switches the output magnet accordingly. The orientation of the output magnet follows the input signal with a delay of 1.6 ns and 2.0 ns for high-to-low and low-to-high switching.

#### 4.3. Spintronic- to CMOS-Signal Transduction

To implement a spintronic to CMOS signal transducer, Figure 40a employs a magnetic tunnel junction (MTJ)-based circuit that relies on the spin-transfer torque (STT) mechanism for switching. An MTJ consists of two magnets encompassing an oxide layer in which the electrical conductance across the gate is determined by the relative difference between the magnetization orientations of the two magnets [69] as

$$G_{MTJ} = \frac{G}{2} + \frac{\Delta G}{2} \,\widehat{m}_1.\,\widehat{m}_2,\tag{10}$$

where  $G = G_p + G_{AP}$ ,  $\Delta G = G_p - G_{AP}$ , and  $\hat{m}_1$  and  $\hat{m}_2$  represent the orientation of two magnets [69]. Under an assumption that Magnet 2 ( $\hat{m}_2$ ) is a fixed magnet in the +*X* direction, the resistance across the MTJ is

$$R_{MTJ} = \frac{1+P}{G_P (1+P\hat{m}_{1,X})},$$
(11)

in which polarization factor P is defined as

$$P = \Delta G / G = TMR / (TMR + 2), \qquad (12)$$



Figure 40: Spin signal to electrical signal transducer: (a) schematics of the transducer. (b) the changes in the orientation of the free magnet are translated into changes in electrical voltage on the node  $V_N$ . The inverter provides a full swing between the ground and supply voltages at accordingly. Simulations are done for two TMR values of 131% (low TMR) and 355 % (high TMR).

While the top layer of the MTJ of Figure 40a is a magnetic fixed layer oriented along the +X direction, the bottom layer is a free magnet receiving spin currents from the input magnet through a metallic channel. Through the STT mechanism, initiated by receiving spin currents, the magnetization orientation of the free magnet switches from antiparallel to parallel with the direction of the magnetic fixed layer. As the change in direction alters the resistance across the MTJ, the voltage at node  $V_N$  also changes. In Figure 40a, resistor  $R_1$ , composed of an MTJ consisting of two fixed magnets, has a fixed resistance value of  $\frac{1}{G_P(1-P)}$ . The resistances across both  $R_1$  and the MTJ depend on the thickness of their oxide layers. In the simulations of the this work, the TMR and resistance per-area values are based on the values, reported in [119]–[129]. Furthermore, the inverter captures the voltage changes at  $V_N$  and provides a full voltage swing between 0 and  $V_{DD}$  at the output.

It is important to note that in an STTRAM, an electrical current must pass through MTJs for both read and write operations. Thus, the oxide thickness has to be sufficiently small so that the required write voltage does not become too large. As a result, a voltage swing across the low-resistance of an MTJ is too small to drive a CMOS inverter, requiring a more complicated sense amplifier. However, in the case of the proposed transducer, the write operation takes place via the spin current provided by the driving ASL gate. Hence, we can choose a large enough oxide thickness of the MTJ, which produces a large enough voltage swing to directly drive a CMOS inverter.

The large thickness of the oxide layer offers four more advantages: 1) lowers the read current, reducing the dissipated power; 2) drastically decreases the read disturb rate of the MTJ, 3) increases the TMR [119]–[129], and 4) lowers the magnitude of the parasitic spin current injected from the fixed to the free magnet. Hence, the transducer can employ MTJs with TMR values as large as 300 to 400% and resistances as large as a few hundred kilo-ohms while STT-RAM read/write circuits rely on MTJs with TMR values of 100 to

200% and resistances of one to two kilo-ohms. Simulation results of the transducer with two TMR values of 355% and 131% are presented in Figure 40b. Results show that by increasing the TMR, the voltage swing increases at  $V_N$ ; in the case of the TMR value of 355%, the inverter is able to provide a full voltage swing (0 to  $V_{DD}$ ) at its output. The negligible parasitic spin flux from the fixed to the free magnet, 1000X smaller than the spin current injected from the input to the free magnet, is accounted for in the simulations.



Figure 41: (a) Layout of an ASL gate that transfers spin signals through a metallic interconnect, (b) layout of ASL gates in a cascaded structure, which is a solution to transfer spin signals in long interconnects. The meaning of colors is defined in [33].



Figure 42: ASL gate modeled by a star network of resistors for calculating the electrical current passing through the input magnet.

#### 4.4. ASL Transducer for Long-Spintronic Interconnects

Spin signals in metallic interconnects attenuate exponentially with  $L_{Int}$ , so propagating signals along ASL interconnects longer than 1 µm is impossible. One potential solution is to use multiple ASL repeaters that amplify a spin signal along the interconnect, illustrated in Figure 41b. We analytically study ASL gates as the building blocks for short metallic ASL interconnects and repeaters and present an approximate solution describing their switching delay and energy dissipation in subsection 4.4.1. Then we introduce a new transducer-based interconnect in subsection 4.4.2 and compare the potential performance of the two approaches.

#### 4.4.1 Performance Analysis of ASL Repeaters

An ASL repeater consists of a cascade of ASL gates, shown in Figure 41b. Figure 41a illustrates an ASL gate in which the electrical current passing through the input magnet becomes spin-polarized at the interface of the magnet with the interconnect. We define the polarization factor ( $\eta$ ) as  $\eta = \frac{I_{SX}}{I_C}$ ;  $I_C$  denotes the electrical current passing through the magnet, and  $I_{SX}$  denotes the spin-polarized current at the interface.

Generalized Ohm's law, including spin currents for the interface, is [130]

$$\begin{bmatrix} I_C \\ I_{SX} \\ I_{SY} \\ I_{SZ} \end{bmatrix} = \begin{bmatrix} G_{\uparrow\uparrow} + G_{\downarrow\downarrow} & G_{\uparrow\uparrow} - G_{\downarrow\downarrow} & 0 & 0 \\ G_{\uparrow\uparrow} - G_{\downarrow\downarrow} & G_{\uparrow\uparrow} + G_{\downarrow\downarrow} & 0 & 0 \\ 0 & 0 & 2\operatorname{Re}G_{\uparrow\downarrow} & 2\operatorname{Im}G_{\uparrow\downarrow} \\ 0 & 0 & -2\operatorname{Im}G_{\uparrow\downarrow} & 2\operatorname{Re}G_{\uparrow\downarrow} \end{bmatrix} \begin{bmatrix} V_N - V_F \\ V_{SX} \\ V_{SY} \\ V_{SZ} \end{bmatrix},$$
(13)

where  $G_{\uparrow\uparrow}$ ,  $G_{\downarrow\downarrow}$ , and  $G_{\uparrow\downarrow}$  are matrix elements derived from spin scattering at the magnetinterconnect interface [130]. Thus, by defining  $G_u = G_{\uparrow\uparrow} + G_{\downarrow\downarrow}$  and  $G_d = G_{\uparrow\uparrow} - G_{\downarrow\downarrow}$ , the polarization factor is

$$\eta = G_d + \frac{G_u}{G_d} (1 - R_1 G_u), \tag{14}$$

The resistances  $R_1$ ,  $R_2$ , and  $R_G$  are shown in Figure 42, where  $R_1$  is predominantly the interface resistance between the metallic channel and the input magnet,  $R_2$  is the interface resistance between the metallic channel and the output magnet plus the resistance of the metallic channel between the input and the output magnets, and  $R_G$  is the resistance to ground. To ensure non-reciprocity (i.e., the magnetization of the input magnet determines that of the output magnet and not the other way around),  $R_1$  must be smaller than  $R_2$ . For an ASL device with a short channel (interconnect) length of ~150 nm,  $R_1 =$ 2.6  $\Omega$ , and  $R_2 = 8.2 \Omega$ . However, for an ASL with a longer interconnect length of 600 nm,  $R_1 = 2.6 \Omega$ , and  $R_2 = 16 \Omega$ . By applying KCL to the star network of Figure 42 and connecting both magnets to the same supply voltage levels,  $V_{FM1} = V_{FM2} = V_{FM}$ , the voltage of node e becomes  $V_{FM}(R_1R_G + R_2R_G)/(R_1R_2 + R_2R_G + R_GR_1)$ . Thus, we derive the electrical current passing through the input magnet  $I_{In} =$ as  $(V_{FM} - V_e)/R_1 = V_{FM}R_2/\Delta$ , in which  $\Delta = R_1R_2 + R_2R_G + R_GR_1$ ; hence, the spinpolarized current at the interface of the interconnect and the input magnet  $(I_{S,In})$  is derived as

$$I_{S,In} = \eta I_{In} = \eta \frac{V_{FM}R_2}{\Delta},\tag{15}$$

The spin current diffuses along the interconnect and experiences exponential attenuation because of the spin relaxation mechanisms. Hence, the spin-polarized current at the interfaces of the interconnect with the output magnet ( $I_{s,out}$ ) will be

$$I_{\rm S,Out} = I_{\rm S,In} e^{-\frac{L_{\rm Int}}{L_{\rm SRL}}} = \eta \frac{V_{FM}R_2}{\Delta} e^{-\frac{L_{Int}}{L_{SRL}}}.$$
(16)

The spin current applied to a magnet exerts a torque on the magnet, which, if strong enough, switches the magnetization orientation of the magnet. The minimum current needed to switch a magnet, that is, the critical current ( $I_{critical}$ ), is defined as [131]

$$I_{Critical} = \frac{4e\alpha E_b}{\eta \hbar} \left( 1 + \frac{\overline{H_S}}{2\overline{H_U}} \right), \tag{17}$$

where  $\overline{H_S}$  and  $\overline{H_U}$  represent the Z-projections of the demagnetization field and the uniaxial anisotropy field, respectively. In CGS units,  $\overline{H_S} = 4\pi M_S N_Z$ , where the demagnetization tensor N is a tensor determined by the geometrical shape of the magnets and  $M_S$  is the saturation magnetization of the magnets. The perpendicular uniaxial anisotropy field resulting from the crystal structure of the magnets is specified as  $H_U = H_k m_z \hat{z}$ , in which  $H_k$  is the Stoner–Wohlfarth field, which is related to the energy density, K, of the magnets [131]; thus, the magnitude of the Z-projection of the anisotropy field is  $\frac{2K}{\mu_0 M_S}$ . As the spinpolarized current reaching the output magnet increases, the magnet switches faster, we define an overdrive factor ( $\sigma$ ) as



Figure 43: Delay of ASL repeaters is compared to that of electrical communication of spin information through transduction. For long lengths, the delay of ASL gates increases exponentially with length as predicted by the analytical equation. Meanwhile, for short lengths, the delay increases linearly because the linear terms of the Taylor expansion of delay are dominant. Similarly, the delay of ASL repeaters increases linearly with L<sub>Int</sub> for short lengths and exponentially for long lengths. Although with multiple ASLs, the linear region is extended, the delay of the electrical interconnect is still shorter than that of the repeaters even for  $L_{Int}$  as small as 1.25  $\mu$ m.

$$\sigma = \frac{I_{S,Out}}{I_{Critical}} = \frac{\eta V_{FM}R_2}{\Delta I_{Critical}} e^{-\frac{L_{Int}}{L_{SRL}}},$$
(18)

The spin-polarized current applied to a magnet determines the switching delay of the magnet as [93]

$$\tau = \frac{\tau_0 \ln\left(\frac{\pi}{\sqrt{(\phi_0^2)}}\right)}{\frac{I_{S,Out}}{I_{Critical}} - 1},$$
(19)

where  $\phi_0$  is the initial angle of switching and  $\tau_0$  is a fitting parameter. The stochastic thermal motion of electrons of a magnet generates thermal noise modeled as white Gaussian noise. In the presence of the uniaxial anisotropy field, the demagnetization field, and thermal noise, thermal fluctuations obey:

$$\langle \phi_0^2 \rangle = \frac{k_b T}{\mu_0 M_S V(H_U - M_S(N_X - N_Y))} = \frac{k_b T}{\mu_0 M_S V H} = \frac{1}{\mu_0 M_S V H \beta},$$
(20)

in which V is the volume of the magnet,  $\beta = \frac{1}{k_b T}$  is the thermodynamic beta, and H is  $H_U - M_S(N_X - N_Y)$ . Thus, the switching delay of ASL gates is derived as

$$\tau_{SW} = \frac{\tau_0}{2} \frac{\ln(\pi^2 \mu_0 M_S V H \beta)}{\sigma - 1},\tag{21}$$

The equation is simplified when  $\sigma \gg 1$ , in which

$$\tau_{SW} = \frac{\tau_0 ln(\pi^2 \mu_0 M_S V H \beta)}{2\sigma} = \frac{\tau_0 \Delta I_{Critical} ln(\pi^2 \mu_0 M_S V H \beta)}{2\eta V_{FM} R_2} e^{\frac{L_{Int}}{L_{SRL}}},$$
(22)

For the magnet described in [131], we require  $V_{FM} \gg 30 \,\mu\text{V}$ , which yields  $\sigma \gg 1$ . Moreover, (22) shows that  $\tau_{SW}$  is exponentially dependent on  $L_{Int}$ , and  $\tau_{SW}$  sharply increases for  $L_{Int} > L_{SRL}$  [118]. The delay calculated from (22) is compared to the results from rigorous SPICE simulations (Figure 43). Simulations are repeated 100 times for each data point to capture the effect of thermal noise in which error bars represent the  $+/-\sigma$  along the mean value of data points. Furthermore, the switching delay,  $\tau_{SW}$ , is inversely proportional to  $V_{FM}$ . Hence, to transfer bits at a faster rate,  $V_{DD}$  must increase proportionally.





Figure 44: Proposed long spintronic interconnect. (a) First, the spin signals are converted to electrical signals using a spin to CMOS signal transducer (SCT); then, the electrical signal is transmitted through a long electrical interconnect and converted back to spin signals using a CMOS to the spin signal transducer (CST). (b) The magnetization orientation of the output magnet is the inverse of the magnetization orientation of the input magnet with a delay of 1.6 ns. (c) The layout consists of two transducers with minimum feature sizes connected to an electrical interconnect.

By taking the calculated delay and the power dissipation into account, we can derive the energy dissipation of ASL gates. The power dissipation of the ASLs is  $P = \Sigma R_i I_i^2$ , in which  $I_1, I_2$ , and  $I_G$ , the electrical currents passing through the resistors  $R_1, R_2$ , and  $R_G$ , are  $(V_{FM} - V_e)/R_1$ ,  $(V_{FM} - V_e)/R_2$ , and  $V_e/R_G$ , respectively. Thus, the power dissipation of ASLs is  $P = V_{DD}^2 \frac{R_1 + R_2}{\Lambda}$ . Hence, the energy dissipation per transferred bit is

$$E = P\tau_{SW} = V_{FM} \frac{\tau_0 (R_1 + R_2) I_{Critical} ln(\pi^2 \mu_0 M_S V H \beta)}{2R_2} e^{\frac{L_{Int}}{L_{SRL}}},$$
(23)

Energy dissipation *E* shows the same dependency on  $L_{Int}$  as  $\tau_{SW}$ . Moreover, *E* is linearly proportional to  $V_{FM}$ , which confirms the tradeoff between the bit transfer rate and  $V_{FM}$ , which was discussed before. The power dissipation further increases because of the nonideal ground contact and supply voltage wires, which are accounted for in simulations under an assumption that 300  $\Omega$  of resistance has been added to the supply path [38]; that is,  $R_1$  and  $R_2$  are replaced by  $R_1' = R_1 + 300 \Omega$  and  $R_2' = R_2 + 300 \Omega$  in (23). For a repeater composed of N ASLs, we can approximate the delay by  $\tau_{Repeater} = N\tau_{SW}$  and the fabrication area by the layout shown in Figure 41.

| Interface Parameters (Co/Cu)       |                             |                                  |  |  |
|------------------------------------|-----------------------------|----------------------------------|--|--|
| Majority Spin Conductance          | $G_{\uparrow}$              | 0.375 1/Ω                        |  |  |
| Minority Spin Conductance          | $G_{\downarrow}$            | 0.125 1/Ω                        |  |  |
| Real Spin-Mixing Conductance       | Re $G_{\uparrow\downarrow}$ | 3.43751 1/Ω                      |  |  |
| Imaginary Spin-Mixing Conductance  | Im $G_{\uparrow\downarrow}$ | $9.37 \times 10^{-3} \ 1/\Omega$ |  |  |
| Magnets (Co)                       |                             |                                  |  |  |
| Magnet Length                      | $L_X$                       | 66 nm                            |  |  |
| Magnet Width                       | $L_Y$                       | 22 nm                            |  |  |
| Magnet Height                      | $L_Z$                       | 3 nm                             |  |  |
| Gilbert Damping Coefficient        | α                           | 0.005                            |  |  |
| Saturation Magnetization           | $M_S$                       | 1.45× 10 <sup>6</sup> A/M        |  |  |
| Demagnetization Tensor Coefficient | $N_X$                       | 0.0443                           |  |  |
| Demagnetization Tensor Coefficient | $N_Y$                       | 0.1390                           |  |  |
| Demagnetization Tensor Coefficient | Nz                          | 0.8166                           |  |  |
| Magnet Barrier                     | $\Delta/K_BT$               | 40                               |  |  |
| Channels (Cu)                      |                             |                                  |  |  |
| Channel Length                     | L <sub>Int</sub>            | 142 nm                           |  |  |
| Channel Width                      | W <sub>Int</sub>            | 44 nm                            |  |  |
| Aspect Ratio                       | AR                          | 1                                |  |  |

Table 3: Simulation Parameters for the long-range spintronic interconnect.

| Transistors [41]  |         |       |  |  |
|-------------------|---------|-------|--|--|
| Half Pitch Size   | F       | 22 nm |  |  |
| Length            | $L_X/F$ | 1     |  |  |
| Width (Inverters) | $W_X/F$ | 5     |  |  |
| Width (Drivers)   | $W_X/F$ | 30    |  |  |

#### 4.4.2 Proposed Long-Range Spintronic Interconnect

Fast transfer of spin signals in long-range interconnects requires an increase in the number of cascaded ASLs, N; however, the power dissipation of ASL repeaters increases proportionally with N. Figure 44a shows the proposed transducer-based interconnect for the electrical transmission of spin information. The interconnect converts the spin signals into electrical signals, which transfer along an electrical interconnect, a more efficient way to communicate signals over long distances. Then the electrical signals are converted back into spin signals. Figure 44b shows the simulation results of the interconnect. The delay of the proposed interconnect is compared to that of the ASL repeaters in Figure 43. As the figure illustrates, the switching delay of ASL gates can be approximated as a linear function of  $L_{Int}$  for lengths shorter than  $L_{SRL}$ , but it exhibits an exponential dependence on  $L_{Int}$  for  $L_{Int} \gg L_{SRL}$ . The slope of the delay in the linear region is  $\frac{\tau_0 \Delta I_{Critical} ln(\pi^2 \mu_0 M_S V H \beta)}{2\eta V_{FM} R_2 L_{SRL}}$ . For a repeater composed of N-cascaded ASLs, the linear region extends proportional to N, which is consistent with the simulation results of Figure 43. The figure shows that the switching delay of the proposed interconnect is lower than that of the ASL repeaters even for a length of 1.25 µm, the shortest possible length of the interconnect using transducers. As illustrated in the layout in Figure 44c, the length of the interconnect is longer than 57 F (half-pitch size), in which the shortest possible length is 1.25  $\mu$ m for F = 22 nm, shown in Table 3.



Figure 45: Clocking schemes used to minimize the energy dissipation of the ASL repeaters. Clocks are on  $\alpha$ T before and after the mean switching time to cancel the potential impact of thermal noise.  $\alpha$  is assumed to be 25% in the simulations.



Figure 46: Energy dissipation per unit length of the ASL repeaters is compared to that of the proposed spintronic interconnect. The dissipated energy of the proposed interconnect is lower than that of repeaters even for  $L_{Int}$  as small as 1.25 µm.

Although the energy dissipation of repeaters increases as the number of cascaded ASLs for short interconnects increases, but the repeaters with more cascaded ASLs dissipate lower power for long interconnects.

The delay of the transducer-based interconnect increases with  $L_{Int}$  because the parasitic resistances and capacitances of electrical interconnects increase with  $L_{Int}$ ; however, the rate of the increase in the delay is far smaller than that of the linear region of ASL repeaters. In these devices, the supply voltage is turned on only when the signals are passing through the gate. Thus, the supply voltage clocking, shown in Figure 45, reduces energy dissipation with turning off the device once data has transmitted along the interconnect [132]. We account for the energy dissipation in the driving transistors, which comes in two forms: 1) the energy dissipation due to the drain-source current  $(I_D V_{DS} \tau)$ , 2) the energy dissipation due to charging and discharging the transistor capacitance  $(CV^2)$ . Because of the relatively large current amplitude and the pulse width needed to switch a magnet, the second component is more than 100X smaller than the first one and can be ignored. The power dissipation associated with clock generation and distribution has not been incorporated in this work. While the proposed interconnect scheme requires only two transistors for supply clocking, the ASL repeater requires N + 1 transistors where N is the number of ASL stages in the repeater. Hence, having a simpler clocking circuit, is another advantage of the proposed transducer-based interconnect. In the figure, to counter the impact of thermal noise, clocks are on  $\alpha T$  before and after the mean switching time. The energy dissipated by the supply voltages is shown in Figure 46. As L<sub>Int</sub> increases, the energy per-unit length remains constant in the linear region and increases exponentially afterwards. Hence, repeaters composed of two, three, and four ASLs minimize the energy

dissipation of interconnects longer than 1.3 µm, 2.5 µm, and 3.7 µm, respectively. The transducer-based interconnect dissipates less energy than ASL repeaters, even for interconnects as short as 1.25  $\mu$ m. Compared to energy per-unit length of the ASL repeaters, that of the electrical interconnect decreases as the length of the interconnect increases since energy dissipation, which mostly takes place in the transduction of signals, experiences a far smaller increase. Despite the advantage of the transmission of signals using transducers over that of ASL repeaters in terms of switching delay and energy dissipation, ASL repeaters have an advantage in terms of a smaller footprint area. Taking all these factors into account, we show the area-delay-power product (ADPP) metric [133], [134], [135] for both interconnect schemes in Figure 47, which shows that the proposed transduction based scheme, utilizing electrical transmission, has an advantage in terms of the ADPP for interconnects as short as 1.6  $\mu$ m. Although the proposed scheme compared to the ASL repeater scheme shows a significant improvement in terms of delay, energy, and ADPP, the proposed structure cannot compete with electrical interconnects used in purely CMOS circuits.



Figure 47: Area-delay-power product (ADPP) is a measure that takes delay, power dissipation, and area into account. Although the proposed interconnect has larger area overhead, its advantage in terms of energy enables it to outperform ASL repeaters for lengths longer than  $1.6 \mu m$ .

Figure 48 depicts the delay and energy dissipation of signal transduction and transmission under various supply voltage  $(V_{DD})$  and magnet voltage  $(V_{FM})$  values. In Figure 48a,  $V_{DD}$  is fixed at 650 mV while  $V_{FM}$  changes from 80 mV to 150 mV. In Figure 48b,  $V_{FM}$  remains fixed at 120 mV while  $V_{DD}$  changes from 300 mV to 950 mV. Figure 48c exhibits the energy-delay product (EDP), which decreases 49% by increasing  $V_{DD}$  from 300 mV to 950; Figure 48d shows that the EDP decreases 31% by decreasing  $V_{FM}$  from 80 mV to 150. Hence, we minimize EDP by operating the proposed device under lower  $V_{DD}$ and higher  $V_{FM}$  voltage values. The thickness of the oxide layer potentially changes the delay and the energy dissipation of the proposed interconnect. The oxide thickness, subject to variations by various fabrication processes, changes both the TMR and the resistance of MTJs. To capture potential variations, Figure 49 illustrates changes in the switching delay by changing TMR values. The simulations use the relationship between the TMR and the oxide thickness from [129]. The figure shows that the increase of the TMR from 125% to 450% decreases the switching delay by less than 10%. Although by increasing TMR, voltage sweep  $\frac{2P}{4-P^2}V_{DD}$  becomes larger at  $V_N$ , but the voltage sweep is already large enough for the inverter, even for TMR values as low as 125%. In these simulations, the TMR and resistance per-area values are based on the values mentioned in [119]–[129].



Figure 48: Delay and energy dissipation variations vs. (a) the voltage applied to the magnets  $(V_{FM})$  and (b) the supply voltage  $(V_{DD})$ . Energy-delay product (EDP) variations vs. (c)  $V_{FM}$  and (d)  $V_{DD}$ . The interconnect must operate at the lowest  $V_{DD}$  and the highest  $V_{FM}$  voltage values without reaching its breakdown current density

to minimize the energy-delay product. The error bars represent variations in the delay and energy dissipation generated by the stochastic thermal noise of magnets.



Figure 49: As TMR increases, the voltage swing at node  $V_N$  of Figure 43 becomes larger; hence, the delay becomes smaller. However, the voltage swing is relatively large even for TMR values as low as 125%, and the improvement in switching delay by increasing TMR is limited to less than 10%.

#### 4.5. Conclusions

This section proposes two simple, yet efficient CMOS-spintronic transducer circuits that convert back and forth between spin signals and electrical signals in hybrid CMOS-spintronic circuits, which must efficiently transmit spin signals in both short and long ranges. Unlike electrical signals, spin signals, however, suffer from exponential decay of their amplitudes as their interconnect lengths increase. Amplifying spin signals through long-range interconnects using repeaters is an inefficient method of transmitting spin

signals. Thus, using the proposed transducer circuits, we propose a new scheme for longrange spintronic interconnects. Although the transducers add to circuitry and area overhead, the proposed spintronic interconnect outperforms all-spin based repeaters in terms of transmission delay, energy dissipation per bit per unit length, and area-delaypower product (ADPP) for interconnects longer than 1.6  $\mu$ m.

## V. MAGNETOSTRICTION-ASSISTED ALL-SPIN LOGIC DEVICE

#### 5.1. Magnetostriction-Assisted All-Spin Logic (MA-ASL) Device Proposal

To improve the performance of the all-spin logic device, two limiting factors must be addressed. First, the ASL proposal is based on a non-local spin valve (NLSV), in which a pure spin current applies a spin-transfer torque (STT) on a free magnet and flips its orientation. Most of the spin current, however, is shunted to ground and wasted. Moreover, the experimental evidence for the operation of the NLSV is limited to only one report [136]. Second, the reliable  $180^{\circ}$  switching of a magnet using STT is known to be quite slow (~ few nanoseconds) and requires large current densities [72]. Providing large currents through driver transistors for such large periods of time, results in prohibitively large energy per binary switching operations. As a result, even when supply clocking is used [132], a 32-bit arithmetic-logic unit (ALU) based on ASL is projected to dissipate more than four orders of magnitude more energy compared to its CMOS counterpart [33]. However, recent theoretical predictions show that the reliable switching of magnets, initialized at their energy saddle point, requires significantly lower energy. Moreover,  $90^{\circ}$ magnetization reversal through magnetostriction is experimentally demonstrated and shown to be more energy efficient than STT. Thus, by utilizing magnetostrictive switching and STT switching from the saddle point of energy profile, I proposed the magnetostriction-assisted all-spin logic (MA-ASL) device. In contrast to ASL, the STT is created by a conventional spin valve (CSV) structure, a well understood and experimentally demonstrated structure [137]-[140]. With an appropriate clocking for STT and magnetostriction, the device can be cascaded in a domino logic scheme; thus, the overall

delay and the energy dissipation of a more complicated circuit like a 32-bit ALU further improves. In this section, the impact of pulse skew and amplitude, rise time, fall time, and temperature on the delay and the energy performance of the proposed device is investigated. Moreover, the performance of a 32-bit ALU designed based on the proposed device is benchmarked against various spintronic and CMOS counterparts. Moreover, authors investigate the distribution of strain in the hybrid piezoelectric-magnetic structure.



Figure 50: (a) Schematics of the proposed device and driver circuit. (b) By applying the pulse,  $V_{PIEZO, OUT}$ , at T<sub>0</sub>, Magnet 2 reorients from a stable state the +x direction to the meta-stable state, the y direction. Then,  $V_{PIEZO, OUT}$  is released, and the current provided by the driver circuit,  $I_{PW}$ , is turned on. The current becomes spin polarized when passing through Magnet 1, oriented in the +x direction, and applies a torque to Magnet 2, which reorients from the y direction to the -x direction, the other stable

state. (c) First, the orientation of Magnet 2, rotates by 90<sup>0</sup> from  $\vec{m}_i$  to  $\vec{m}_m$  using magnetostrictive switching; then, it reorients by 90<sup>0</sup> from  $\vec{m}_m$  to  $\vec{m}_f$  using the spin-transfer torque.

#### 5.2. Device Operation

A single stage MA-ASL device is composed of a transmitter Magnet 1 and a receiver Magnet 2 connected via a metallic (Cu) channel forming a CSV structure, Figure 50a. First, the receiver magnet is reoriented via magnetostrictive switching due to the voltage applied to the piezoelectric layer at the receiver side,  $V_{PIEZO, OUT}$ , from  $T_0$  to  $T_1$ , as shown in Figure 50b. The applied voltage generates an anisotropic strain that couples to Magnet 2 altering the magnetoelastic energy of the magnet; therefore, the easy axis rotates from the x to the y direction. As a result, Magnet 2 rotates to the y direction, Figure 50c. The magnet may fall randomly into either +y or -y directions with equal probabilities, because of the symmetry of the structure with respect to the xz plane. However, both directions represent the symmetrical, metastable saddle-points of the device, neither of logic 0 or 1. By switching off  $V_{PIEZO,OUT}$  at  $T_1$ , the easy axis rotates back to the x direction; thus, the magnet will be placed at the meta-stable saddle point of its energy profile, and the magnet will be equally probable to fall into the +x or -x directions. To break the symmetry, a current with spin polarization opposite to the magnetic orientation of magnet 1 is applied to Magnet 2 from  $T_1$  to  $T_2$ , forcing the magnet to rotate to -**x** direction (opposite to Magnet 1), as shown in Figure 50c. The proposed structure acts as an inverter. Providing a large pulse current at a very low voltage, requires a large portion of the energy to be dissipated in the driver transistors; hence, the driver circuit, shown in Figure 50a, is proposed to more efficiently generate the required pulse current. To reduce the energy dissipation, the voltage drop on the driver transistor is limited to the  $V_{DS}$  voltage of a CMOS transistor.

To model the operation of the driver transistors, predictive technology models are used, while SPICE models are developed to model the operation of the spintronic parts following a similar approach taken in [50]. To account for the physics of the device, we need to self-consistently solve the equations, governing the dynamics of the magnetization, spin transport in the metallic channel, and magnetostriction. For the MA-ASL device, the anisotropic field,  $\vec{H}_U$ , is due to the variations in the magnetoelastic energy,  $E_{ME}$  [64], [141], [142],

$$E_{ME} = -\frac{3}{2}\lambda Y \left[ \left( m_x^2 - \frac{1}{3} \right) \epsilon_{xx} + \left( m_y^2 - \frac{1}{3} \right) \epsilon_{yy} + \left( m_z^2 - \frac{1}{3} \right) \epsilon_{zz} \right], \tag{24}$$

in which,  $\lambda$  and Y represent the magnetostrictive coefficient and the Young's modulus, respectively, and  $m_x$ ,  $m_y$ , and  $m_z$  are the magnetic orientation along the **x**, the **y**, and the **z** directions, respectively. In the equation,  $\epsilon_{xx}$ ,  $\epsilon_{yy}$ , and  $\epsilon_{zz}$  are the components of the strain matrix; hence, the anisotropic field is derived as

$$\vec{H}_U = -\frac{1}{\mu_0 M_S} \frac{\partial E}{\partial \vec{m}} = \frac{2K}{\mu_0 M_S} m_Y \vec{Y},$$
(25)

where  $\mu_0$  and  $M_s$  represent the permeability of free space and saturation magnetization. In (25), the energy density, *K*, due to the magnetostriction is proportional to the strain [143],

$$K = \frac{3}{2}\lambda(C_{11} - C_{12})(\epsilon_{yy} - \epsilon_{xx}), \qquad (26)$$

where  $C_{11}$  and  $C_{22}$  are elastic stiffness constants [143],

$$\epsilon_{\rm xx} = \epsilon_0 + d_{31} \frac{V_{\rm PIEZO}}{\rm t}, \qquad (27)$$

$$\epsilon_{yy} = \epsilon_0 + d_{32} \frac{V_{\text{PIEZO}}}{t},$$
(28)

in which  $d_{31}$  and  $d_{32}$  are piezoelectric constants and *t* is the thickness of the piezoelectric layer. The transferred anisotropic strain to the magnet is investigated using COMSOL based on PMN-PT material parameters (Table 4) [144]. As Figure 51 shows, a large net anisotropic strain ( $\epsilon_{yy} - \epsilon_{xx}$ ) of 1200 ppm is transferred to the magnet, when  $V_{PIEZO,2}$  is 100 mV. The generated strain is large enough to reorient magnets by 90<sup>0</sup>.

Spin transport equation for the metallic channel, magnetostriction, and stochastic LLG equations are solved self-consistently using SPICE simulations. The simulation parameters are shown in Table 4. Moreover, simulations are done for a 3-stage cascaded inverter chain of magnets as illustrated in Figure 52a. First, Magnet 2 and Magnet 3 are initialized in the y direction by applying piezoelectric voltage pulses, Figure 52b. Then, voltage pulses are turned off sequentially, and current pulses are applied to perform the second 90° switching. As shown in Figure 52c, by initializing the magnets simultaneously, the first 90° of magnetization switching, which takes ~1 ns, is more than 10 times larger than the second 90° switching, but is shared between the two magnets; thus, the overall delay is improved. The benefit of this approach obviously grows as the logic depth (the number of cascaded gates) increases.

| Piezoelectric (PMN-PT) [144]          |                        |                                        |  |  |  |
|---------------------------------------|------------------------|----------------------------------------|--|--|--|
| Piezoelectric Constant                | <i>d</i> <sub>31</sub> | 813 pC/N                               |  |  |  |
| Piezoelectric Constant                | $d_{32}$               | -2116 pC/N                             |  |  |  |
| Piezoelectric Height                  | t                      | 40 nm                                  |  |  |  |
| Magnets (Terfenol-D) [145], [146]     |                        |                                        |  |  |  |
| Magnet Length                         | $L_X$                  | 60 nm                                  |  |  |  |
| Magnet Width                          | $L_Y$                  | 30 nm                                  |  |  |  |
| Magnet Height                         | $L_Z$                  | 3 nm                                   |  |  |  |
| Saturation Magnetization              | $M_S$                  | 1.0 T                                  |  |  |  |
| Magnet Barrier                        | $\Delta/K_BT$          | 40                                     |  |  |  |
| Damping Factor                        | α                      | 0.1                                    |  |  |  |
| Spin Polarization                     | Р                      | 0.8                                    |  |  |  |
| Resistivity                           | ρ                      | $60 \times 10^{-8} \Omega. \mathrm{m}$ |  |  |  |
| Magnetostrictive Coefficient          | λ                      | 1200 ppm                               |  |  |  |
| Young's Modulus                       | Y                      | 90 GPa                                 |  |  |  |
| Temperature                           | Т                      | 300 K                                  |  |  |  |
| <i>Channels (Cu)</i> [131]            |                        |                                        |  |  |  |
| Channel Length                        | L <sub>Int</sub>       | 132 nm                                 |  |  |  |
| Channel Width                         | W <sub>Int</sub>       | 44 nm                                  |  |  |  |
| Aspect Ratio                          | AR                     | 1                                      |  |  |  |
| Conductivity                          | σ                      | 27.4 1/μΩm²                            |  |  |  |
| Grain Boundary Reflection Probability | R                      | 0.2                                    |  |  |  |
| Specularity Parameter                 | Р                      | 0.0                                    |  |  |  |
| Spin Relaxation Time                  | $	au_S$                | 8.92 ps                                |  |  |  |
| Electron Mobility                     | μ                      | 0.002 m <sup>2</sup> /Vs               |  |  |  |

 Table 4: Simulation Parameter for the MA-ASL Device.



Figure 51: Generated strain is simulated using a COMSOL model developed from the piezoelectric parameters of PMN-PT [144]. Results shown for (a) the device and (b) the cross-section of the magnet demonstrate that the net strain of 1200 ppm will be transferred to the magnet.



Figure 52: (a) Three magnets cascaded in a domino logic scheme. (b) Piezoelectric pulses are applied simultaneously at  $T_0$  and released sequentially at  $T_1$  and  $T_2$  to perform the first 90<sup>0</sup> of the switching. The current pulses, provided by the clocking circuit, are applied at  $T_1$  and  $T_2$  to perform the second 90<sup>0</sup> of the switching via STT. (c) The overall delay of the inverter chain significantly reduces by simultaneously performing the first 90<sup>0</sup> of the switching for all magnets of a chain.

## 5.3. 32-Bit MA-ASL ALU

At the heart of an arithmetic logic unit (ALU) is the arithmetic operations (AO) block, which performs operations such as addition, subtraction, NAND, and NOR. For a 32-bit ALU, operations are done on two 32-bit input numbers A and B. As

Figure **53** illustrates, the addition of A and B can be done by a 32-bit ripple carry adder in which the result is S. The addition operation requires propagating the carry signal,  $C_i$ bits, in the critical path from one bit to the next bit. Therefore, the propagation delay of carry bits across the 32-bit adder dominates the delay of an ALU. For a 32-bit MA-ASL ripple- carry adder, the critical path is comprised of 32 full adders each consisting of two magnets in the path. In a cascade scheme like Figure 52, all 64 magnets are initialized simultaneously; thus, the overall delay is

$$\tau_{32-Bit-Adder} = \tau_{ME} + 64\tau_{STT},\tag{29}$$

where  $\tau_{32-Bit-Adder}$ ,  $\tau_{ME}$ , and  $\tau_{STT}$  are the delay of the 32-bit adder, the initializing time of 1 ns due to magnetostrictive switching, and the delay of switching from the saddle point due to STT, about 35 ps for an error rate below  $10^{-3}$ , respectively. Thus, the overall delay of the 32-bit adder will be 3.3 ns. By accounting for the 32-bit adder, repeaters at each 300

nm, NAND and NOR gates, and other gates of a 32-bit MA-ASL ALU, the delay and the energy dissipation will be 11.8 ns and 5.2 pJ, respectively. Compared to the delay and the energy dissipation of a 32-bit ASL ALU, those of MA-ASL show 21x and 27x improvement, respectively, Figure 54. However, the delay and the energy of the device compared to those of TFETs and CMOS devices are still larger. Although spintronic devices cannot compete against CMOS devices in Boolean applications, such as 32-bit adders and ALUs, these devices may compete against CMOS in non-Boolean applications because of efficient implementation of majority gates in spin-based devices. Furthermore, by accounting for the significant improvement of the energy and the delay of the MA-ASL compared to those of ASL, the device may become competitive against CMOS for non-Boolean applications. Even in the case of Boolean computations, by taking advantage of pipelining in the design of complicated systems such as 32-bit ALUs, slow and low-energy devices may become more competitive. In a pipelining scheme, the output magnet of  $FA_{i+1}$ will be initialized right after the C<sub>i</sub> bit is generated by FA<sub>i</sub>. Thus, FA<sub>i</sub> can immediately operate on the next bit in line without waiting for the previous bit to propagate to the last magnet in the line, which represents  $C_{32}$ . In this scheme, the delay to generate the last bit,  $C_{32}$ , is  $32\tau_{ME} + 64\tau_{STT}$ , larger than the delay of the domino MA-ASL adder, explained above. However, a new result is generated each  $\tau_{ME} + 2\tau_{STT}$  instead of each  $\tau_{ME}$  +  $64\tau_{STT}$ . Thus, the throughput of the 32-bit adder, the heart of an ALU block, further increases.



Figure 53: 32-bit ripple carry adder consisted of 32 full adders (FAs). Inside each full adder, two MA-ASLs form the critical path. Inside the 32-bit adder, carry bits must propagate through the critical path, comprised of 64 magnets.



Figure 54: Energy dissipation and delay of various spintronic, CMOS, and TFET technologies for implementation of a 32-bit ALU. The delay and the energy dissipation of MA-ASL ALU compared to those of ASL ALU, show 21x and 27x improvement, respectively. The benchmark setup is explained in [147].

## 5.4. Clocked MA-ASL

Operating the MA-ASL device either as a domino logic or as a pipeline, requires a clocking scheme that precisely accounts for the times required to perform the first half of the switching, done through magnetostrictive switching, and the second half of the switching, done through applying STT. In an MA-ASL inverter, the 90<sup>o</sup> magnetic rotation time under STT is inversely proportional to the amplitude of the pulse current [93]; thus, by increasing the amplitude of the pulse current, delay decreases; hence, the width of the required pulse current decreases; thus, lower energy must be dissipated to reach to a certain

error rate, Figure 55. However, the amplitude of the pulse current is limited to the maximum current, not reaching to electromigration. The energy dissipation of an MA-ASL inverter is mainly associated with three parts: (1) the transistors of the driver circuit, illustrated in Figure 50a, about 1.8 fJ, (2) ohmic energy dissipation inside the MA-ASL device during the STT switching, about 0.2 fJ, and (3) in the form of  $CV^2$  to provide pulse voltages of the piezoelectric, in the range of a few aJs. We have accounted for these factors in calculating the total energy dissipation of the device.



Figure 55: Increasing the amplitude of the pulse current, lowers the required pulse width to reach to the same error rate in an MA-ASL inverter; thus, as pulse amplitude increases, lower energy is required to reach to the same error rate. However, the amplitude of the pulse current cannot exceed certain maximum limits due to electromigration.

The proper operation of the proposed domino logic and the pipelining schemes depends on turning off the piezoelectric pulses and applying spin current pulses simultaneously at  $T_1$ , Figure 50b. However, due to factors including the inaccuracies and

the limitations of the fabrication processes, the two clocks perform with a negative or a positive pulse skew defined as  $T'_1 - T_1$  in Figure 56a. At  $T_1$ , Magnet 2 is placed at the metastable saddle point of the energy profile and equally probable to fall into the stable directions,  $+\mathbf{x}$  or  $-\mathbf{x}$ . In the absence of STT due to a positive pulse skew, the magnet starts to randomly switch to one of the stable directions; hence, when STT is applied at  $T'_1$ , magnetization is deviated from the y axis, the meta-stable point of energy, by an angle  $\theta'_m$ ; thus, the longer pulse width are expected (Figure 56b). Increasing the positive pulse skew, increases the angle,  $\theta'_m$ ; hence, pulse width must be increased. Although a negative pulse skew may contribute to a non-zero  $\theta'_m$ , the deviations in the case of a negative pulse skew compared to that of a positive pulse skew will be smaller due to the presence of the magnetoelastic energy. Thus, designing an MA-ASL-based circuit with a small embedded negative pulse skew about 5 ps offsets probable undesired positive skews due to fabrication inconsistencies. Moreover, the deviation from the y axis,  $\theta'_m$ , is larger for higher temperatures; thus, larger pulse width is required to reach to the error rate of  $10^{-3}$ , as demonstrated in Figure 56b and Figure 56c. In the simulations of Figure 56, a pulse amplitude  $I_{PW}$  of 450  $\mu A$  is assumed.





Figure 56: (a) Definition of pulse skew. (b) The required pulse width to reach to an error rate of  $10^{-3}$  for various pulse skews in an MA-ASL inverter. Positive compared to negative pulse skew, has more impact on increasing pulse width. Error rate vs pulse width for various (c) temperatures and (d) rise times. (e) The pulse width and energy to reach to an error rate of  $10^{-3}$  for various rise times. The pulse skew is assumed to be -5 ps in Figures (c), (d), and (e) [148].

Clocked circuits rely on the switching of clock signals from low to high and high to low levels. Because of parasitic resistances and capacitances, switching does not happen instantly rather it is performed gradually over a time interval for the low to high transition, rise time,  $T_r$ , and the high to low transition, fall time,  $T_f$ . The impact of rise time and fall time on the error rate, pulse width, and energy is studied as demonstrated in Figure 56d and Figure 56e. In these simulations, the rise time and fall time are assumed to be equal and the pulse skew is assumed to be -5ps. As rise time increases, the applied STT to the metastable magnet becomes weaker; thus,  $\theta'_m$  increases. As a result, error rate increases as illustrated in Figure 56d. Therefore, to reach to the error rate of  $10^{-3}$  for longer rise times, pulse width must increase to apply more STT to the magnet. Consequently, larger energy dissipation is expected as demonstrated in Figure 56e.

## 5.5 Material Analysis of MA-ASL Device

To improve the performance of the ASL device, researchers have studied the target magnetic materials [72]. But in the case of the MA-ASL device, the magnetostrictive coupling of magnets and piezoelectric must be studied as well. Improving the performance of the device requires transferring the maximum strain at the lowest energy dissipation. In Figure 57a, the voltage required to transfer a net strain of 1200 ppm to the magnetic layer is studied for various magnetostrictive materials. In these simulations, Terfenol-D demonstrated the lowest required voltage and energy dissipation to transfer a net strain of 1200 ppm to the magnetostrictive material. Moreover, the simulations exhibit the voltage and energy values required for piezoelectric layer thickness values of 0, 2 nm, and 5 nm to transfer the strain. Increasing the thickness of the piezoelectric layer will prohibit some of the generated strain in the piezoelectric layer to be transferred to the magnet, as demonstrated in Figure 58; however, the transferred strain will be more uniform.



Figure 57: (a) voltage and (b) energy required to transfer a net strain of 1200 ppm to the magnetic layer. Simulations are done for various magnetic materials, in which Terfenol-D demonstrated lowest required voltage and energy dissipation for transferring strain. Moreover, the figure compares the transferred strain for various piezoelectric thickness values.



Figure 58: Transferred strain to the magnet vs the thickness of the Pt layer between the piezoelectric and magnetic layers.

The transferred strain to the magnetic layer not only depends on the magnetostrictive material but also depends on the magnitude of the generated strain in the piezoelectric layer, which depends on the geometrical dimensions of the piezoelectric layer and the applied voltage value, as shown in Figure 59. In Figure 59a, definitions used for the piezoelectric length and width are shown. In Figure 59b, the transferred strain at a constant applied voltage value is studied. The maximum transferred strain is achieved when the piezoelectric length is shorter than 150 nm and the piezoelectric width is between 60 nm to 90 nm. In these simulations, the magnet length is assumed to be 60 nm, and the magnet width is assumed to be 30 nm. As Figure 59c represents, the same range of piezoelectric dimensions will result in the lowest applied voltage values to transfer a net strain of 1200 ppm to the magnetostrictive layer. However, the same range of geometrical dimensions will not result in the lowest energy dissipation, because of the capacitive nature of energy dissipation in the piezoelectric layer, in which

$$E_{PIEZO} = \frac{1}{2} C_{PIEZO} V^2. \tag{30}$$

In this equation,  $C_{PIEZO}$  is proportional to the piezoelectric length and width. Thus, by making the piezoelectric layer smaller, the energy dissipation decreases, Figure 59d, even if the piezoelectric width becomes smaller than 60 nm.



Figure 59: Analysis on the transferred strain vs geometrical dimensions of the piezoelectric layer. (a) definition of piezoelectric dimensions. (b) Transferred strain at a constant  $V_{PIEZO}$  voltage of 100 mV. The required (c) voltage and (d) energy to transfer a net strain of 1200 ppm.

## 5.5. Conclusions

Studies have examined ASL devices for various Boolean and non-Boolean applications owning to their efficient implementation of majority gates, low voltage operation, and non-volatile memory. This section proposes an ASL-based heterostructure of magnets and piezoelectric that employs both magnetostriction and STT to perform magnetization reversal. The proposed device excels in domino logic and pipelining schemes using the driver circuit, proposed in this section. The performance of the device is benchmarked against ASL, TFETs, and CMOS technologies. This work illustrates that the energy and the delay performance of a 32-bit ALU designed by the MA-ASL device compared to those of the ASL device show 21x and 27x improvement, respectively. However, the device cannot compete against CMOS devices in implementing Boolean functions, but the device, augmented by the advances in piezoelectric and magnetic materials, may become competitive against CMOS in implementing non-Boolean functions.

# VI. HYBRID PIEZOELECTRIC-MAGNETIC NEURONS: A PROPOSAL FOR ENERGY-EFFICIENT MACHINE LEARNING

#### 6.1 Spintronic Artificial Neural Networks

Deep learning enabled by developments in artificial neural networks (ANNs) has attracted special attention in recent years [149]. Cognitive learning researchers have used ANNs to simulate the natural learning process of the brain and improve the precision of speech recognition, the accuracy of pattern finding, and the reliability of self-driving cars [150]–[154]. Modern computer architectures struggle to emulate an ANN, even when processing on highly parallelized GPU architectures [155], [156]. To circumvent this challenge, researchers have turned to investigate how to integrate neural networks directly into hardware. Implementing ANNs as conventional CMOS hardware reduces the power consumption by three orders of magnitude [157]. Even with these improvements, CMOS neuron implementations are inefficient in energy consumption and die area, leading to increasing interest in beyond-CMOS devices for implementing neurons. Most notably, spin-based devices have been proposed as artificial neurons with simpler structure and lower energy consumption than their CMOS counterparts [157], [158]. These spintronic devices have shown to holistically mimic properties of neurons, providing advantages in circuit simplicity, adaptability, and energy efficiency [158]. Moreover, spintronic devices inherently offer non-volatile memory [159], [160]. ANNs need stored information for synaptic weights between communicating neurons; thus, having memory coupled with the circuit reduces energy dissipation and memory bandwidth, helping circumvent the von Neumann bottleneck.



Figure 60: One of the most popular applications of machine learning algorithms is image classification and facial recognition.

Several spin-based neurons are implemented using tunnel magnetoresistance (TMR) in magnetic tunnel junctions (MTJs) [129] coupled with various phenomena such as domain-wall (DW) motion [161], [162], spin transfer torque (STT) generated by lateral spin valves (LSVs) [65], [158], and spin-Hall effect (SHE) [151], [156]. While these devices are proven to mimic neural properties, some of their inherent drawbacks must be addressed. The slow switching speed of DW-based neurons prohibits them from being an ideal candidate for the fast implementation of a neuron. To provide non-reciprocity for the LSV neuron, the output magnet is preset by  $90^{\circ}$  reorientation to its saddle point of energy profile using the STT, generated by preset spin currents. However, the large required current yields substantial energy dissipation in the device. Recent studies on the magnetostriction-assisted all-spin logic (MA-ASL) device, a novel spin valve proposal made of a hybrid structure of magnets and piezoelectrics, have shown the reduction of switching energy by two orders of magnitude [163], [164], as discussed in Section V. Moreover, the switching energy can be reduced in an MA-ASL device by employing a 90° magnetostrictive switching, experimentally demonstrated in [64] and shown to be more robust to thermal noise [65]. Using these recent advances, this work proposes a spin-based neuron based on an MA-ASL device and an MTJ. The proposed structure integrates the advantages of previously proposed spintronic neurons with those of MA-ASL creating a structure that can be implemented into large-scale ANNs.

#### **6.2 Spin Neuron Proposal**

#### 6.2.1 Neuron Functionality

The proposed neuron, shown in Figure 62, is a modified MA-ASL structure whose output magnet is the free magnetic layer of an MTJ. Similar to an MA-ASL device, first, the magnetization is rotated by 90<sup>o</sup> using magnetostrictive switching. As Figure 61a shows, 800 ppm of strain is transferred to the output magnet, large enough to switch the output magnet. In the second phase of operation, the input voltages, shown for six inputs  $(IN_1-IN_6)$  in Figure 62 as an example, produce charge currents that flow through the corresponding input magnets and become spin-polarized at the interfaces with the metallic channel. These spin-polarized currents combine below the output magnet according to the sum, as shown in Figure 61b,

$$I_{s,out} = \sum_{j} I_{s,j} \approx \sum_{j} \eta_{j} e^{\frac{-L_{j}}{L_{SRL,j}}} I_{c,j},$$
(31)

where  $I_{s,out}$  is the spin current injected into the output magnet,  $I_{s,j}$ 's are the spin current contributions from each magnet *j*, and  $I_{c,j}$ 's are the input charge currents [165]. The distance between each input magnet and the output magnet is represented by  $L_j$ . The spin polarization at the interface of each magnet and channel is represented by  $\eta_j$ . The spin relaxation length,  $L_{SRL,j}$ , is affected by the grain boundary and sidewall scattering due to size effects and material properties of the metallic channel [118].



Figure 61: (a) Transferred strain to the magnet is 800 ppm, lower than that to an MA-ASL magnet; however, the transferred strain is enough to rotate the magnetization. (b) shows the path of applied spin current through the output magnet to ground.



Figure 62: (a) Proposed MA-ASL neuron, shown with six inputs. The net spin current in the interconnect applies STT to the free layer of the neuron MTJ in timing with the piezoelectric clock, switching the orientation of the neuron output [166]. (b) Biological neural network [162].

The net injected spin current,  $I_{s,out}$ , applies an STT to the output magnet. If strong enough, the STT will rotate the output magnetization,  $\hat{m}_{out}$ . The output magnet is in contact with an MgO layer that separates it from a magnet fixed in the +**x** direction, forming a threelayer MTJ. As the output magnetization changes, the resistance across the MTJ also changes, following the equation,

$$R_{MTJ} = \frac{1+P}{G_P(1+P\hat{m}_{out,X})},\tag{32}$$

where  $R_{MTJ}$  is the resistance of the MTJ,  $\hat{m}_{out,X}$  is the x-component of  $\hat{m}_{out}$ , and  $G_P$  is the conductance of the MTJ in its low-resistance state, the +x direction [165]. The polarization factor, P, is

$$P = \frac{G_P - G_{AP}}{G_P + G_{AP}} = \frac{TMR}{TMR + 2'}$$
(33)

where  $G_{AP}$  is the conductance of the MTJ in its high-resistance antiparallel state, the **-x** direction [165]. As shown in Figure 62, the change in the resistance of the MTJ is sensed by connecting the structure to a pull-up resistor connected to  $V_{DD}$ ; then, the voltage above the output neuron follows

$$V_N = \frac{R_{MTJ}}{R_{MTJ} + R_{Pull-up}} V_{DD},$$
(34)

where  $R_{Pull-up}$  is the resistance of the pull-up resistor, implemented with an MTJ with two fixed magnetic layers. The voltage,  $V_N$ , is amplified by a PMOS transistor, forming the axon, where the neuron's output can be transferred to other neurons.

#### 6.2.2 The Transient Response of the Neuron

The transient response of the magnetization is shown in Figure 63 for a neuron with three inputs. In the first phase of device operation,  $V_{PIEZO}$  is pulsed high for a duration of 1 ns, rotating  $\hat{m}_{out}$  to the +**y** or the -**y** direction. When  $V_{PIEZO}$  turns off,  $\hat{m}_{out}$  will be placed at the saddle-point of the energy profile. In the second phase of operation, 10x shorter than the first phase, the input voltages are pulsed for 0.1 ns, applying an STT that tips  $\hat{m}_{out}$  toward +**x** or -**x**. The delay of the final switching is inversely proportional to the magnitude of the net spin current,  $I_{s,out}$ . Compared to an STT-only realignment, this magnetostriction-assisted re-alignment of  $\hat{m}_{out}$  onto the axis requires two orders of magnitude lower energy dissipation.





Figure 63: Transient response of the MA-ASL device. In the first phase of operation,  $V_{PIEZO}$  turns on for 1 ns as shown in the first graph. The second graph illustrates the second phase of operation, in which STT is applied to the output magnet through the injected net spin current (in blue) from three input magnets (shown with dotted lines), applied after  $V_{PIEZO}$  turns off. The third graph shows the magnetization of the output magnet (x, y, and z axes shown in blue, red, and green, respectively) and how it is affected by  $V_{PIEZO}$  and the spin currents [166].

#### 6.2.3 Integration into Neural Network

To connect the proposed device into a neural network with machine learning capabilities, we must first show how it mimics a neuron. In Figure 62, the axon of the neuron uses the voltage from the output MTJ as the gate voltage for a PMOS transistor, creating a charge current output. For the synapses, additional circuitry would be required to correctly weight the input current. One proposed method is with a memristive crossbar network, as shown in Figure 64. This structure places memristors between input and output lines to weight the charge current being passed among neurons [167]. In this setup, each output from the previous layer of neurons connects as an input to the crossbar network, which applies synaptic weights and outputs to the next layer of neurons.



Figure 64: Memristive cross-bar network. The cross-bar array sums together the input currents, abbreviating the number of magnets needed for the output neurons [166].

## 6.3 Benchmarking Against Competing Technologies

As Figure 63 illustrates, the delay of the MA-ASL neuron is about 1.1 ns, slightly larger than that of the spintronic neuron presented in [162], which claims 1 ns. However, Table 5 demonstrates that the MA-ASL neuron demonstrates 70% improvement in terms of energy over the spintronic neuron [50]; the spintronic neuron uses STT to reorient magnets, while the MA-ASL neuron utilizes a combination of STT and magnetostrictive switching, which results in lower overall energy dissipation. When compared with both analog and digital CMOS neurons, the MA-ASL neuron has advantages in terms of energy consumption and overall chip area. These advantages are due to a more efficient implementation of a spintronic neuron that requires a lower device count. CMOS neurons require shift registers, sense amplifiers, DRAM, and SRAM, which all require large numbers of transistors [159], whereas spintronic neurons require one MTJ and one magnet for each input, using two orders of magnitude less area than CMOS [112] and three orders of magnitude less energy. These improvements in area and energy consumption enable the proposed device to excel in mimicking a neural network, providing competition to CMOS and other spintronic neural networks in Boolean and non-Boolean computations.

| Neuron<br>Device | Digital<br>CMOS<br>[169] | Analog<br>CMOS<br>[170] | Spintronic<br>[112] | MA-ASL<br>Neuron |
|------------------|--------------------------|-------------------------|---------------------|------------------|
| Delay            | 10 ns                    | 10 ns                   | 1 ns                | 1.1 ns           |
| Energy           | 832.6 fJ                 | 700 fJ                  | 0.81 fJ             | 0.25J            |

Table 5: Performance Comparison of MA-ASL Neuron against its CMOS and Spintronic Counterparts [166].

The efficiency of the proposed neuron in learning tasks can be tested through networkscale simulations. Moreover, beyond characterizing the transient response of a single MA-ASL neuron, a neural network architecture of multiple MA-ASL neurons must be investigated further. A prime candidate for a neural network implementation is a memristive crossbar network due to the inherent learning capabilities of memristors and the lower device count for the structure, because of elimination of circuitry required for backpropagation [167]. As a result, area and power consumption for a neural network will be reduced. The research on MA-ASL neural network topologies may lead to the implementation of network hierarchies usable for processor design or convolutional networks for deep learning [168], [169].

## **6.4 Conclusions**

We proposed a spintronic neuron based on the MA-ASL device and the MTJ. The performance of the neuron is benchmarked against its CMOS and spintronic counterparts in terms of area, delay, and energy dissipation. The MA-ASL neuron operates with less than half the energy compared to its spintronic counterparts by employing magnetostrictive switching along with STT switching. Magnetostrictive switching is expected to further enhance the robustness of the operation of neuron to thermal noise as well. The operation of the device was simulated using SPICE models and the physics behind the operation of the device is well understood.

## VII. MAGNETOSTRICTION-ASSISTED SPIN-ORBIT DEVICE

#### 7.1 Spin-Orbit Interactions

#### 7.1.1 Motivations

As discussed in the previous chapters, various Boolean and non-Boolean ASL devices and circuits are proposed. These devices compared to their CMOS counterparts, suffer from the higher energy dissipation of switching, due to the inefficiency of spintransfer torque mechanism in magnetization reversal. To overcome this challenge, magnetostrictive switching was incorporated into the switching mechanism so that the role of STT was limited to the second half of switching [143]. Thus, the energy dissipation and the delay of a 32-bit MA-ASL ALU compared to those of a 32-bit ASL ALU, demonstrated 27× and 21× improvements, respectively. However, compared to a 32-bit CMOS ALU, the 32-bit MA-ASL ALU is two orders of magnitude less energy efficient and slower. Thus, energy-efficient magnetization switching remains a challenge for all-spin Boolean logic devices that work based on STT. Compared to STT, Spin-Hall effect (SHE) is a more efficient switching mechanism for magnetization reversal. Moreover, SHE is utilized in implementing spintronic devices such as the concatenable spin logic (CSL) device [69]. However, the CSL device compared to ASL has not made significant improvements in terms of energy dissipation and delay, due to the inefficient reading mechanism, which requires spin to charge transduction. As discussed in Section I, the transduction of charge current to spin current and vice versa happens in both magnetic and non-magnetic materials. In magnetic conductors, the generation of spin polarized current is due to the exchange interaction between conduction electrons and local spins. In an all-spin logic, the charge current is converted to spin current using a magnetic layer. In non-magnetic materials, spin current generation is feasible due to spin-orbit interactions and is utilized in the implementation of spin-orbitronic devices [171], [172]. In a CSL device, the write mechanism, which requires charge to spin transduction, is due to the spin-orbit interactions at the interface of a magnetic layer and a heavy metallic layer. Spin-orbit interactions are widely studied [58], [173]–[178]; these interaction, due to the interaction of the spin angular momentum and the orbital angular momentum of electrons, are very strong in heavy metallic elements, such as W and Pt [175]. The spin magnetic moment,  $\mu_s$ , and the orbital magnetic moment,  $\mu_L$ , are described by

$$\mu_S = -\frac{g_S \mu_B}{\hbar} S,\tag{35}$$

$$\mu_L = -\frac{g_L \mu_B}{\hbar} L,\tag{36}$$

in which  $\mu_B$ ,  $g_S$ ,  $g_L$ , S, and L represent the Bohr magneton, the spin g-factor, the orbital g-factor, the spin magnetic moment, and the orbital magnetic moment.



Figure 65: Spin-orbit interactions are very strong in heavy metallic elements [179].

#### 7.1.2 Rashba Effect

Spin-orbit interactions exist in bulk materials, 2D materials, and topological insulators. Rashba effect is a type of these interactions due to the spin-orbit coupling (SOC) on the 2D electron gas (2DEG), which exists at the surfaces, interfaces, and in semiconductor wells. This interaction is explained using the following Hamiltonian,  $H_R$ ,

$$H_R = \alpha_R(k \times \hat{z}).\,\vec{\sigma},\tag{37}$$



Figure 66: A Rashba interface comprised of a hybrid NiFe/Ag/Bi structure [180].

which relates the spin Pauli matrices vector,  $\sigma$ , and the momentum, k. In this equation,  $\hat{z}$  is the unit vector, normal to the interface as shown in Figure 66, and  $\alpha_R$  represents the Rashba coefficient,

$$\alpha_R = \frac{(k_{F+} - K_{F-})\hbar^2}{2m},$$
(38)

, where  $K_{F-}$  and  $K_{F+}$  are the Fermi vectors of the two spin-split bands. By using a simple two Fermi contours model in the Rashba electron gas, the density of spin polarization along

the **y**-direction and the charge current density along the **x**-direction are related as [30], [175], [176], [181]

$$\delta s_{y\pm} = \pm \frac{m}{2e\hbar k_{F\pm}} j_{xc\pm}.$$
(39)

Thus, the charge current density in a 2D Rashba electron gas,  $j_{cx}$ ,

$$j_{Cx} = \frac{e\alpha_R}{\hbar} \langle \delta s \rangle_y = \frac{\alpha_R \tau_S}{\hbar} j_{sy} = \lambda_{IREE} j_{sy}, \tag{40}$$

in which  $\tau_s$  is the spin relaxation time,

$$\tau_S = \frac{\langle \delta s \rangle}{j_{CX}},\tag{41}$$

and the efficiency of charge to spin conversion is represented by  $\lambda_{IREE}$ . Inverse Rashba Edelstein Effect (IREE) is the generation of a charge current that is due to a nonzero spin density induced by the spin injection and is carried by the interfacial quantum states of materials [175]. Using (40), the net generated charge current,

$$I_C = \frac{\lambda_{IREE}}{w} (\hat{\sigma} \times I_s), \tag{42}$$

where *w* represents the width of the magnet. The generated charge current is larger for materials with higher spin-orbit coupling. Materials with high spin-orbit coupling coefficients include heavy metallic elements (Bi/Ag, Pt, W) [175], [178], [182], topological materials (Bi<sub>2</sub>Se<sub>3</sub>, ZrTe5, Bi-Bi<sub>2</sub>Se<sub>3</sub>) [177], [183], [184], and 2D materials (MoS<sub>2</sub>, MX<sub>2</sub>) [185], [186]. In bulk heavy metallic elements, instead of Rashba and IREE effects, spin-Hall effect (SHE) and inverse spin-hall effect (ISHE) dominate the spin-orbit interactions, explained in the next subsection using an example.

#### 7.1.3 Spin-Hall Effect

Spin-Hall effect describes the conversion of charge current flowing through a nonmagnetic bulk material into spin accumulation and transverse spin current on its surface. As an example, consider a hybrid structure of a magnet on top of a heavy metallic element such as Pt. By passing a current through the Pt layer underneath the magnet, an effective spin current is induced proportional to the magnet length to the thickness of the Pt layer. The ratio of the spin current to the electrical current can be analytically derived as [187]

$$\beta = \frac{I_S}{I_C} = \theta_{SH} \frac{L_{FM}}{t} \Big[ 1 - \operatorname{sech} \left( \frac{t}{\lambda} \right) \Big], \tag{43}$$

in which  $I_S$ ,  $I_C$ ,  $\theta_{SH}$ ,  $L_{FM}$ , t, and  $\lambda$  represent the spin current, the charge current, the spin-Hall angle, the length of the magnet, the thickness of the Pt wire, and the spin-relaxation length of the heavy metal, respectively. The charge to spin current conversion factor,  $\beta$ , can be larger than one, which explains why using SHE will be more energy efficient than STT in spintronic devices.



Figure 67: Spin-Hall effect in a hybrid heavy metallic/magnetic structure [180].

In the same structure, the conversion of spin current into charge current can be explained using ISHE. In this case, the spin to charge conversion efficiency in the metallic layer can be derived as [30]

$$\eta = \frac{\Theta_{SHE}\lambda_f}{w} tanh\left(\frac{t}{2\lambda_{sf}}\right),\tag{44}$$

in which  $\eta$  represents the efficiency of the ISHE mechanism. Thus, the net generated charge current,

$$I_{C} = \frac{1}{w} \Theta_{SHE} \lambda_{sf} \tanh\left(\frac{t}{2\lambda_{sf}}\right) (\hat{\sigma} \times I_{s}), \tag{45}$$

where,  $\lambda_{sf}$  is the bulk diffusion length. In Figure 68, MATLAB simulations demonstrate the dependency of  $\beta$  and  $\eta$  on the thickness of the heavy metallic layer. In the limit that  $t \ll \lambda_{sf}$ , one can write



Figure 68: MATLAB simulations are done to measure the efficiency of the spin to charge and charge to spin conversion in the device. Based on the results, maximum spin to charge conversion is achieved by maximizing the thickness of the metallic layer, while the maximum charge to spin conversion efficiency is achieved by minimizing the thickness of the metallic layer.

$$I_C = \frac{1}{2w} \Theta_{SHE} t I_S, \tag{46}$$

that shows the dependency between generated charge current and the applied spin current. Moreover, in the limit  $t >> \lambda_{sf}$ , one can write

$$I_C = \frac{1}{w} \Theta_{SHE} \lambda_{sf},\tag{47}$$

which shows that by increasing the thickness, the generated charge current will reach a saturation value. The spin current to charge current conversion in hybrid structures of heavy metallic and magnets is primarily due to ISHE mechanism. To increase the spin current to charge current conversion efficiency, a spin-injection layer (SIL) of Ag can be added to the structure as shown in Figure 66. In this case, due to interfacial states, IREE will contribute to the spin current to charge current generation as well. In this case, the net charge to spin conversion can be derived by combining charge current generations coming from both ISHE and IREE; thus, the total generated charge current is derived as [30]

$$\vec{I_c} = \frac{1}{w} \left( \lambda_{IREE} + \Theta_{SHE} \lambda_{sf} \tanh\left(\frac{t}{2\lambda_{sf}}\right) \right) \left( \hat{\sigma} \times \hat{I_s} \right) = \frac{1}{w} \lambda'_{ISOC} \left( \hat{\sigma} \times \hat{I_s} \right), \tag{48}$$

in which  $\lambda'_{ISOC}$  shows the net efficiency of the spin to charge conversion using both IREE and ISHE mechanisms,

$$\lambda_{ISOC}' = \lambda_{IREE} + \Theta_{SHE} \lambda_{sf} \tanh\left(\frac{t}{2\lambda_{sf}}\right).$$
<sup>(49)</sup>

Thus, in the limit  $t >> \lambda_{sf}$ ,

$$\lambda_{ISOC}' = \lambda_{IREE} + \Theta_{SHE} \lambda_{sf}.$$
 (50)

However, we expect one of the two mechanisms to be dominant for each material because of the difference in the bulk and surface states. In this case, the efficiency of ISHE and IREE for spin current to charge current conversion is compared by comparing  $\lambda_{IREE}$  and  $\Theta_{SHE}\lambda_{sf}$ . In Table 6, these values are shown for various materials with strong spin-orbit coupling, showing that materials relying on IREE compared to those relying on ISHE are more efficient in spin current to charge conversion, making them more interesting for logic device applications.

Table 6: The spin to charge conversion efficiency of various materials are compared by comparing  $\lambda_{IREE}$  and  $\Theta_{SHE} l_{sf}$ .

| Material     | $\lambda_{IREE}(nm)$ | Material | $\Theta_{SHE} l_{sf}(\mathbf{nm})$ |
|--------------|----------------------|----------|------------------------------------|
| NiFe/LAO/STO | 6.4                  | Bi/Ag    | 0.1-0.4                            |
| αSn          | 2.1                  | Pt       | 0.2                                |
|              |                      | Та       | 0.3                                |
|              |                      | W        | 0.43                               |

## 7.2 Magnetostriction Assisted Spin-Orbit (MASO) Logic Device Proposal

## 7.2.1 Device Proposal

Researchers have proposed various spintronic logic devices, such as the all spin logic (ASL) device [27], the composite-input magnetoelectric-based logic technology (COMET) [28], the domain wall magnetic logic (mLogic) [29], the magnetoelectric spinorbit device [30], and the magnetoelectric magnetic tunnel junction (MEMTJ) [31]. Some of these devices such as ASL employ STT for magnetization reversal, while some of these devices such as CSL employ SHE for magnetization reversal. However, these two devices are not energy efficient due to their switching mechanism. Recently, by combining spinorbit coupling and magnetoelectric switching, the magnetoelectric spin-orbit (MESO) logic is proposed, which compared to the ASL and the CSL, demonstrates higher energy efficiency [30]. However, because of the large capacitance and resistance of the device, it cannot be used for interconnects longer than 2  $\mu m$ . Moreover, the delay of the device is always longer than 50 ps due to the switching mechanism of the device. In this section, by combining SHE, ISHE, IREE, and magnetostriction, the magnetostriction-assisted spinorbit logic (MASOL) device is proposed. Because of the higher energy efficiency of spinorbit torque switching compared to that of STT switching, the proposed structure is expected to outperform previous spintronic devices in terms of delay and energy dissipation. The proposed device is shown in Figure 69. The device is comprised of a hybrid structure of piezoelectric and magnetic layers in contact with two heavy metallic layers or topological insulator layers.



Figure 69: Schematics of proposed magnetostriction-assisted spin-orbit (MASO) logic device. The read and write operations are done in three phases. To write data into the magnet, first, magnetostrictive switching is employed to rotate the magnetization by  $90^{\circ}$ . Second, SHE is applied to perform the second  $90^{\circ}$  of switching. In the third phase of operation, the stored information is converted into the direction of the output charge current that can drive the next stage.

The read and write operations of the device consist of three phases. To write into magnets, first, a voltage  $V_{PIEZO}$  is applied to the magnet rotating its easy axis and magnetization by 90<sup>o</sup>; the operation of this phase of MASO device is like that of an MA-ASL device. Second,  $V_{Piezo}$  is turned off so that the magnet will be placed at the saddle-point of its energy profile. Unlike an MA-ASL device, SHE induced spin torque which is more energy efficient than STT, is applied to the magnet through applying the input charge current,  $I_{C,IN}$ , to the magnet to accomplish a deterministic switching. The input charge current,  $I_{C,IN}$ , must pass through a channel layer made of materials with strong spin-orbit coupling, such as 2D materials (MoS<sub>2</sub> and graphene), topological insulators, or heavy metallic elements such as Pt and W, used in the common fabrication of CMOS devices. Using each of these materials offers advantages and challenges that are discussed shortly in Subsection 7.3. To read magnets, the current pulse,  $I_{Pulse}$ , passes through the magnet, as

shown by the green arrow in Figure 69, which becomes spin polarized. By applying a spinpolarized current to the spin injection layer (Ag, Cu) and the channel, a heavy metallic layer, a transverse charge current  $I_{C,Out}$  is generated due to the ISHE and/or the IREE. The direction of the charge current is determined using (48), i.e., the direction of  $\hat{\sigma} \times \hat{I}_s$ , in which the direction of  $\hat{\sigma}$  is determined according to the orientation of the magnet, and the direction of  $\hat{I}_s$  is determined according to that of the spin current. For this device, the direction of  $\hat{I}_s$  is fixed. Thus, the direction of the charge current, either outward or inward the magnet, is determined according to the orientation of the magnet, either in the +**x**direction or the -**x**-direction, respectively. Here, we assume that  $\Theta_{SHE}$  is a positive number; however,  $\Theta_{SHE}$  is negative for some materials. In these cases, the direction of the generated charge current is reversed.

Using this structure, the direction of the magnet is converted to the direction of the charge current. The same structure can be implemented using topological insulators or 2D materials, as well. In these cases, the spin injection layer is removed; thus, the topological insulator layer will be in contact with the magnetic layer. To pass information to another magnet in a chain, the generated charge current is applied to the next magnet, in which rotates that magnet using SHE-induced torque.



Figure 70: Two driver circuits are proposed for MASO repeaters in (a) and (b), and their corresponding circuit models are represented in (c) and (d).

# 7.2.2 The Modelling of the MASO Device

To analyze the operation of the device, circuit models are developed, shown in Figure 70. The figure shows an MASO inverter, implemented using two driver circuit schemes, as shown in Figure 70a and Figure 70b. Each of these implementations offer their own advantages and disadvantages. For example, the driver circuit shown in Figure 70b requires one driver transistor, while the one shown in Figure 70a requires three transistors. However, unlike the driver circuit shown in Figure 70a, the one shown in Figure 70b requires an additional negative supply voltage  $V_{FM}$ . The SPICE models of these two configurations are shown in Figure 70c and Figure 70d. The magnetic-non-magnetic interface is modelled using the circuit model proposed in [78], labelled  $G_{FM-NM}$  (m̂). Like the ground contact model of the spin current for the ASL device, that for the MASO is

modelled using the channel model proposed in [78], labelled  $G_{NM}$  ( $\hat{m}$ ). The ISHE and the IREE are modelled using a dependent charge current source with the value demonstrated in (49),  $\vec{I}_{ISOC} = \frac{1}{w} \lambda'_{ISOC} (\hat{\sigma} \times \vec{I}_S) \cdot \hat{x}$ , in parallel with a resistor  $R_{IREE}$  modelling the resistance of the topological insulator layer [30]. In these figures, the charge current transport between points a and b is modelled using a resistor  $R_{IC}$ . Moreover, the resistances of the layer of topological insulator/heavy metal and that of the magnetic layer (and the spin injection layer) are modelled using resistors  $R_{TI/HM}$  and  $R_{FM/SIL}$ , respectively. The effective spin current applied to the output magnet,  $I_{S, FM, OUT}$ , is proportional to the current passing through the topological insulator layer below the output magnet,  $I_{C, TI}$ ,

$$I_{S, FM, OUT} = I_{C, TI} \times \theta_{SH} \frac{L_{FM}}{t} \left[ 1 - \operatorname{sech} \left( \frac{t}{\lambda} \right) \right] = I_{C, TI} \times \beta.$$
<sup>(51)</sup>

Furthermore, driver circuits are also modelled using CMOS transistors. These circuits provide the required pulse current, which passes through the magnets and the supply voltage,  $V_{FM}$ , positive for the circuit shown in Figure 70a and negative for the one shown in Figure 70b.



Figure 71: Transient response of the MASO device. The orientations of the input and output magnets are shown in (a) and (b), respectively. The supply current is shown in (c), and the charge current generated beneath the magnetic layer is shown in (d). Some of the generated charge current passes through the TI layer below the output magnet as shown in (e), which applies an effective transverse spin current shown in (f) to the output magnet.

### 7.2.3 The Transient Response of the MASO Device

Using the models described in the previous subsection, the operation of the device is simulated using SPICE and results are illustrated in Figure 71 and Figure 72. In this simulation, first, the input magnet,  $m_{IN}$ , and the output magnet are assumed to be oriented in the -**x** direction, as shown in Figure 71a and Figure 71b, respectively. In the first phase of operation, shaded by red colour in Figure 71, a voltage pulse  $V_{PIEZO,2}$  is applied to the piezoelectric layer on top of the output magnet, rotating its easy axis and its magnetization by 90°, shown in in Figure 71b; the voltage pulse is applied for 1 ns. By turning off  $V_{PIEZO,2}$ , the output magnet will be placed at its saddle-point of energy profile and will be ready to switch to either the -**x** or the +**x** direction. To ensure the deterministic switching of the output magnet, an STT or spin-orbit torque (SOT) must be applied to the magnet. In an MA-ASL, an STT is applied to the magnet, but in an MASO, an SOT due to SHE is applied to the magnet. By using SOT instead STT, we expect this phase of operation of the MASO device compared to that of the MA-ASL device to be  $\beta$  times more energy efficient.



Figure 72: First, the orientation of Magnet 2, rotates by  $90^0$  from  $\vec{m}_i$  to  $\vec{m}_m$  using magnetostrictive switching; then, it reorients by  $90^0$  from  $\vec{m}_m$  to  $\vec{m}_f$  using SHE.

To generate SOT, a charge current pulse  $I_{C, FM, IN}$  is applied to the input magnet, as shown in Figure 71c. The current becomes spin polarized after as it passes through the magnet. The spin current must pass through the topological insulator layer, causing IREE; thus, a charge current  $I_{ISOC}$  is generated, shown in Figure 71d. The charge current will pass through the metallic interconnect, connecting the input and the output magnets. As the current reaches to the output magnet and the topological insulator layer above it, part of the current shunts to the ground through the magnet, and the rest of the current,  $I_{C, TI}$ , passes through the topological insulator layer, as shown in Figure 71e. Because of spinorbit coupling at the topological insulator layer, an SOT is applied to the to the output magnet. The equivalent spin current applied to the output magnet,  $I_{S, FM, OUT}$ , is shown in Figure 71f. If the applied spin current is strong enough, the output magnet deterministically switches, as shown in Figure 71b. As this figure shows, the output magnetization orientation will be the invert of the input magnetization orientation. Moreover, the magnitude of the applied torque to the magnet is dependent on the geometrical dimensions and the magnetic, piezoelectric, and topological insulator materials, used in the MASO device. These factors are investigated in the next section to optimize the performance of the device.

## 7.3 Optimizing the Performance of the Device

To optimize the operation of the MASO device, the materials used in the device and their geometrical dimensions must be optimized for each of the three phases of the operation. The first phase of operation relies on magnetostrictive switching. The optimum performance is achieved, when the maximum strain transfer is transferred for a given  $V_{PIEZO}$  voltage; the transferred strain to the magnet is shown in Figure 73. Thus, the thickness of the heavy metallic (HM)/topological insulator (TI) layer must be minimized (or be removed) to increase the maximum strain, as demonstrated in Figure 58. Moreover, materials with the largest Young's modulus Y must be used. However, the same HM/TI layer is used for applying SOT to the device in the third phase of operation. Thus, the HM/TI layer cannot be removed. Moreover, in the search for the best HM/TI material, one needs to consider both *Y* and  $\Theta_{SHE}$  to guarantee the most efficient SOT switching. For example, the material parameters of Pt, Ta, and W are compared in Table 7; Pt and W are widely used in the fabrication of CMOS devices. Using Pt instead of W in a MASO device, results in 23% higher transferred strain to the magnetic layer, demonstrated in Figure 74, while it results in 77% lower SOT, due to the larger range of variation in  $\Theta_{SHE}$  parameter compared to magnetostriction-related parameters. Moreover, the energy dissipated in the third phase of operation is generally larger than that in the first phase of operation. Thus, in a MASO device, using W is preferred over using Pt.



Figure 73: Transferred strain to the magnet is simulated using COMSOL, and the results are shown for the cross-section of the magnet.

| Materials | Resistivity<br>(p) | Θ <sub>SHE</sub><br>(to CoFeB) | Gilbert Damping ( $\alpha$ ) | Generated<br>Strain (ppm) |
|-----------|--------------------|--------------------------------|------------------------------|---------------------------|
| Pt        | 24 μΩ. cm          | 0.07                           | 0.025                        | 1595.6                    |
| Та        | 190 μΩ. cm         | -0.15                          | 0.008                        | -                         |
| W         | 200 μΩ. cm         | 0.3                            | 0.012                        | 1297.4                    |

Table 7: Comparison of the transferred strain to the magnet and spin-hall angle for Pt, Ta, and W.



Figure 74: COMSOL simulations are done to measure the amount of the transferred strain for W as shown in (a) and Pt as shown in (b).

To transfer the largest strain to the structure, the generated strain in the piezoelectric material must be maximized as well, which depends on the geometrical dimensions and material parameters of the piezoelectric layer. In Section 5.5, the impact of geometrical dimensions on the generated strain for an MA-ASL device is investigated. In this section, the magnetic materials are studied only for their magnetostrictive properties. A

comprehensive list of materials and their properties is shown in Table 8. In the first phase of operation, the anisotropy field,  $\propto \frac{2K}{\mu_0 M_S} \propto \frac{\lambda Y}{M_S}$ , must exceed the demagnetization field,  $\propto M_S$ , to rotate the easy axis from the **x** direction to the **y** direction. Thus, materials with the lowest  $\frac{M_S^2}{\lambda Y}$  such as Terfenol-D offer the largest magnetostrictive properties.

Table 8: Resistivity and  $\Theta_{SHE}$  for various heavy metallic elements, topological insulators(TIs), magnets, and nonmagnetic metals [64], [142], [144]–[146], [163], [182], [183], [188]–[220].

| Materials                           | Type of Material | Resistivity (ρ)<br>(μΩ. cm) | Θ <sub>SHE</sub> (at room<br>temperature) | $\frac{M_S^2}{\lambda Y}$ (1e3) |
|-------------------------------------|------------------|-----------------------------|-------------------------------------------|---------------------------------|
| Pt                                  | Heavy Metal      | 24                          | 0.07-0.08                                 | -                               |
| β-W                                 | Heavy Metal      | 210                         | 0.4                                       | -                               |
| W                                   | Heavy Metal      | 200                         | 0.3                                       | -                               |
| β-Τа                                | Heavy Metal      | 190                         | -0.15                                     | -                               |
| Та                                  | Heavy Metal      | 190                         | -0.15                                     | -                               |
| Bi <sub>2</sub> Se <sub>3</sub>     | TI               | 1750                        | 2-3.5                                     | -                               |
| Bi <sub>x</sub> Se <sub>1-x</sub>   | TI               | 12800                       | 18.8                                      | -                               |
| Bi <sub>0.9</sub> Sb <sub>0.1</sub> | TI               | 400                         | 52                                        | -                               |
| Ni                                  | Magnet           | 6.9                         | -                                         | 34.3                            |

| CoFe <sub>2</sub> O <sub>4</sub> | Magnet            | 1.00E+15 | - | 2.5 |
|----------------------------------|-------------------|----------|---|-----|
| CoFe                             | Magnet            | 30       | - | 16  |
| CoFeB                            | Magnet            | 165      | - | 208 |
| Со                               | Magnet            | 6.2      | - | 469 |
| Fe <sub>3</sub> 0 <sub>4</sub>   | Magnet            | 4000     | - | 289 |
| Terfenol-D                       | Magnet            | 60       | - | 5   |
| Galfenol                         | Magnet            | 85       | - | 53  |
| Ag                               | SIL, Interconnect | 1.6      | - | -   |
| Cu                               | SIL, Interconnect | 1.7      | - | -   |
| Au                               | SIL, Interconnect | 2.2      | - | -   |
| Al                               | Interconnect      | 2.8      |   | -   |

Optimizing the second and the third phase of operation, requires choosing materials with the largest spin-orbit coupling; thus, materials with largest  $\Theta_{SHE}$  are preferred. Hence, considering the very large  $\Theta_{SHE}$  of topological insulators, these materials are promising candidates to be used in MASO device. However, most of these materials exhibit poor conductance, as demonstrated in Table 8. Thus, they lead to large energy dissipation because of shunting a large current through the output magnet. This issue prohibits using them in the design of MASO device. However, researchers have recently studied a BiSb topological insulator [193], which exhibits a very large  $\Theta_{SHE}$  of 52 at the room temperature and a low resistivity of 400  $\mu\Omega$ . *cm*, introducing the material as an ideal candidate to be used in the MASO device. Furthermore, optimizing the second and the third phases of operation requires a comprehensive study, not presented in this section. However, we briefly address a tradeoff issue relate in optimizing these two phases, related to the thickness of the HM/TI layer. From (45) and (48), the thickness of the HM/TI layer must be maximized to optimize the read operation, while that must be minimized to optimize the write operation. To solve this problem, we preferred to use separate HM/TI layers for read and write operations to maximize the efficiency of both operations in the MASO device.

# 7.4 Performance Analysis of the Device

## 7.4.1 Using the Device as an Interconnect

Like the ASL device, the MASO device can be used as an interconnect in transferring information. The delay of an MASO interconnect versos length is plotted in Figure 75. The delay of a 10  $\mu$ m long interconnect compared to that of a 40 nm long interconnect, only increases by 30%. Thus, the MASO device unlike the ASL and the MESO device, is very efficient in transferring signals in long ranges and does not require repeaters to transfer signals in long-ranges. Unlike the ASL device, the MASO device uses charge current to transfer data; thus, it does not suffer from the loss of data because of spin relaxation. Compared to the MESO device, the MASO device has the advantage of using current instead of voltage in transferring signals. Moreover, the MASO device has lower

capacitance and resistance compared to the MESO device. Unlike the MASO device, the MESO device requires larger resistance values to achieve lowest energy dissipation [221].



Figure 75: Switching delay vs interconnect length. The increase in delay with length compared to that of an ASL is significantly smaller. Thus, MASO circuits do not require repeaters even for interconnects as long as 10  $\mu m$ .

# 7.4.2 Benchmarking the Performance of the MASO Device Against CMOS and Spintronic Alternatives

The error rate of the MASO device is compared to that of the ASL device, the MA-ASL device, and STT-MRAM in Figure 76. All these devices use current to write into magnets; however, the MASO device unlike other devices uses SOT instead of STT to write into magnets. Among all the devices, the MASO device uses the smallest current pulse width to reach a certain error rate. Moreover, the MASO device uses lower current magnitude for switching as well. For example, the switching current of the MASO device, which ranges from 1  $\mu$ *A* to 30  $\mu$ *A*, is two orders of magnitude smaller than that of the ASL device, which ranges from 200  $\mu$ *A* to 2 *mA*. Thus, compared to the ASL device, the MASO device requires significantly lower energy for switching. Moreover, unlike the ASL device, multiple MASO devices can share driver transistors due to the small magnitude of the switching current, leading to further reduction in the switching energy.



Figure 76: Write error statistics of the MASO device vs. the ASL device, the MA-ASL device, and the STT-MRAM [30], [72], [130], [222].

The energy dissipation and the delay of a 32-bit ALU implemented by an MASO device is compared to that implemented by various CMOS, TFET, and spintronic devices [30], [72], [130], [222]. The MASO ALU compared to spintronic ALUs, operates faster and dissipates lower energy. Compared to the MA-ASL ALU, the MASO ALU is 2.2x faster and 250x more energy efficient. Compared to the ASL ALU, the MASO ALU is 46x faster and three orders of magnitude more energy efficient. Unlike the energy-delay

product (EDP) of spintronic ALUs, that of the MASO ALU is very close to that of CMOS and TFET ALUs. Thus, with advances in the magnetic and piezoelectric materials, the MASO device may potentially compete with CMOS in Boolean logic applications.



Figure 77: Delay and energy comparison of the MASO device with various spintronic, CMOS, and TFET devices. Compared to MA-ASL, the MASO device operates with  $2.2 \times$  and  $250 \times$  lower delay and energy dissipation, respectively. Compared to CMOS, device operates with lower delay-energy product.

# 7.5 Conclusions

In this chapter, a novel spintronic device is proposed that uses SHE mechanism instead

of STT to enhance the energy efficiency of the device in writing data into magnets.

Moreover, to further enhance the energy efficiency of the device, the strain-mediated switching of magnets to the saddle-point of energy profile is employed. Furthermore, unlike the ASL device, the proposed MASO device transfers data from the input magnet to the output magnet using charge current instead of spin current; hence, signals can be transferred in long ranges without using repeaters. The device is highly energy efficient considering the write mechanism (that operates via magnetostrictive switching and SOT switching) and the read mechanism (that operates using ISHE and IREE mechanisms, experimentally demonstrated mechanisms). The proposed device is expected to be more energy efficient compared to its CMOS and spintronic counterparts for Boolean logic applications.

# VIII. CONCLUSION AND OUTLOOK

This chapter concludes the dissertation by reviewing the major contributions of the work and providing insights and recommendations about possible extensions of the work in future.

# 8.1 Conclusion

The objective of this research is studying the modeling and the designing of fast and energy-efficient spintronic and magnetic devices. Spintronic devices are one of the most widely studied beyond-CMOS devices. Unlike CMOS devices, spintronic devices use electronic spin to represent binary information. Furthermore, the information is stored as the orientation of magnets. Unlike charge-based switching mechanism of CMOS transistors, torque must be applied to magnets to switch their stable state. Moreover, some spintronic devices employ spin current to transfer information from one magnet to another magnet. Thus, researchers require to develop models based on the physical formalisms governing these devices to analyze the operation and the performance of these devices. Considering the wide use of circuit models by electrical engineers, this research focuses on developing circuit models for spintronic devices. Moreover, using developed circuit models, engineers will be able to design various spintronic and hybrid CMOS-spintronic devices and circuits.

To benchmark the performance of large spintronic circuits, investigate their advantages and challenges, and design them for various application, simple spintronic devices that can act as a building block for larger circuits and systems must be studied first. To this end, the all-spin logic (ASL) device, a basic spintronic device, which operates at low voltages and offers non-volatile memory, is studied in this work. The ASL device is capable of offering various applications such as Boolean and non-Boolean logic computation, interconnection, and neural network implementation. Furthermore, the device acts as an interconnect in transferring signals. Moreover, the operation of the device is based on converting an electrical current into spin current, making the device a potential candidate in the design of CMOS-spintronic interface circuits. In addition, the simple structure of the device can be modified to enhance switching speed and energy efficiency. For example, higher energy efficiencies can be achieved by augmenting or replacing spin-transfer torque (STT) with strain-mediated and spin-orbit torque (SOT) switching. Thus, to accomplish the research goals, following contributions are identified:

- 7. Studying and designing circuit models for common materials and physical formalisms used in spintronic and magnetic devices
- 8. Analyzing and benchmarking the performance of the all-spin logic device for interconnection and Boolean logic applications.
- 9. Designing pattern/image recognition circuits using all-spin logic device.
- 10. Designing CMOS-spintronic interface circuits and long-range interconnects
- 11. Employing magnetostrictive switching to design hybrid magnetic-piezoelectric logic and neuron devices
- 12. Employing spin-orbit torque switching to design novel energy-efficient spintronic devices

In conclusion, 1) the developed circuit models of this work are used to design and benchmark various spintronic devices that offer a wide range of applications such as interconnection, logic gates, image/pattern recognition systems, neuron devices, and CMOS-spintronic interface circuits and utilize STT, SOT, and magnetostrictive switching mechanisms, spin injection/extraction at magnet-non-magnetic metal, tunnel junction, and magnet-heavy metal interfaces, and transfer electrical and spintronic signals in metals and topological insulators for their operation.

2) Analyzing the performance of ASL shows that size effects and dimensional scaling significantly impact the performance of an ASL device. Thus, by using the device as a spintronic interconnect, it will suffer from size effects even more seriously as compared to its electrical counterparts, due to the exponential drop in spin signal as the interconnect becomes longer than the spin relaxation length. Thus, improvements in interconnect technology will have an even bigger impact on ASL interconnects. The applications of all-spin logic device are studied using two examples. First, the ASL full-adder, an example of Boolean logic devices, is studied. Results demonstrate that the ASL device cannot compete against CMOS devices in terms of delay and energy efficiency. Moreover, an ASL coupled oscillator is proposed exhibiting high tuning range and low-voltage operation. ASL coupled oscillators are promising for coupled-oscillator-based image and pattern recognition systems.

3) An ASL image recognition circuit has been proposed that performs all the phases of a non-Boolean pattern recognition for binary images. The learning phase operation is performed incorporating no additional memory devices leading to lower energy dissipations. Furthermore, the proposed circuit compared to its CMOS counterparts, operates with lower computational complexity because of taking advantage of ASL majority gates in its design. Moreover, the proposed circuit recognizes various sizes of binary image patterns faster than existing CMOS counterparts, while consuming lower energy and operating at voltages as low as 5 mV.

4) Two simple and efficient CMOS-spintronic transducer circuits have been proposed to act as interface circuits that convert back and forth spin signals and electrical signals. The proposed circuits have potential applications in hybrid CMOS-spintronic logic and memory read/write circuits that require the efficient transmission of spin signals in both short and long ranges. To overcome the exponential decay of the amplitudes of spin signals in long interconnects, ASL repeaters are studied. Using repeaters is shown to be an inefficient method of transmitting spin signals. To solve this problem, a new scheme for long-range spintronic interconnects is proposed that uses the proposed transducer circuits. The proposed spintronic interconnect compared to ASL repeaters, transfers signals faster and dissipates lower energy per bit per unit length for interconnects longer than 1.6 μm.

5) By employing magnetostriction and STT, a novel spintronic device has been proposed. The device, named the magnetostriction-assisted all-spin logic (MA-ASL) device, consists of a heterostructure of magnetic and piezoelectric layers. By performing benchmarking analysis on the device, the energy and the delay performance of a 32-bit MA-ASL ALU has been compared to those of the ASL ALU, showing 21x and 27x improvement, respectively. However, like the ASL device, the MA-ASL device cannot compete against CMOS devices in implementing Boolean functions. The applications of the MA-ASL device is further studied by designing and proposing an MA-ASL neuron. The structure relies on a MA-ASL majority gate and an MTJ for its operation. Compared to its CMOS and spintronic counterparts, the MA-ASL neuron excels in terms of area,

delay, and energy dissipation. Moreover, employing magnetostrictive switching further enhances the robustness of the operation of the proposed neuron to thermal noise.

6) By employing SOT and magnetostrictive switching, a novel spintronic device has been proposed. Unlike the ASL and the MA-ASL devices, the proposed device, named the magnetostriction-assisted spin orbit (MASO) device, uses charge current instead of spin current to transfer data from the input magnet to the output magnet; hence, the device is promising for interconnect applications as signals can be transferred in long ranges without using repeaters. The write mechanism is operated via magnetostrictive switching and SOT switching, and the read mechanism is operated using ISHE and IREE mechanisms. Thus, the device is expected to be highly energy efficient. Compared to its CMOS and spintronic counterparts, the MASO device has demonstrated lower energy-delay product for implementation of a 32-bit ALU.

The conducted research is instrumental in pointing out the advantages and challenges of spintronic devices in the implementation of logic devices, interconnects, interface circuits, neural networks, and image/pattern recognition circuits.

## 8.2 Future Works

#### 8.2.1 Non-Boolean Logic Application of Spintronic Devices

Due to the higher efficiency of the switching of CMOS transistors compared to that of magnets, CMOS devices generally outperform spintronic devices in implementing Boolean logic application. On the other hand, compared to CMOS devices, some spintronic devices such as the ASL device, the MA-ASL device, and the MASO device are more efficient in the implementation of majority gates; thus, these devices can implement certain functionalities requiring lower device count. Because of this advantage, some spintronic devices excel in implementing non-Boolean logic applications such as cellular neural networks [32], as shown in Figure 78. Considering the energy efficiency of the devices investigated in this thesis in implementing majority gates, they are potential candidates to be studied for various applications such as cellular neural networks, coupled-oscillators, and image/pattern recognition circuits. Moreover, the applications of the proposed MA-ASL neuron can be investigated for various machine learning and deep learning applications.



Figure 78: Energy versus delay per memory association operation using CNN for a variety of charge- and spin-based devices, where the red star indicates the preferred corner [32].



Figure 79: Bit-cell for STT-MRAM and SOT-MRAM [223].

## 8.2.2 Signal Transduction and Long-Range Interconnects

Because of their non-volatility, magnetic and spintronic devices are widely studied to be implemented as memory cells, as shown in Figure 79. Thus, augmenting CMOS circuits using magnetic memories requires highly energy-efficient and fast interface circuits. In this work, circuits were proposed to efficiently convert magnetization orientation and spin signals into electrical signal and vice versa. In addition to spintronic-CMOS signal transduction, other transductions such as phononic-spintronic and photonicspintronic transductions must be studied. By manipulating light propagation, photonic systems offer novel devices [224] such as invisibility cloaks [225], field concentrators [226], and perfect black hole absorbers [227]. However, designing spintronic-photonic transducers will be challenging as the trajectory of light is not significantly affected in the presence of magnetic field. On the other hand, the design of phononic-spintronic transducers is expected to be easier as phononic systems utilize materials such as piezoelectric and magnetostrictive materials such as Ni that are widely used in the design of spintronic circuits.

Designing energy-efficient and fast interconnects is a bottleneck in the design of spintronic systems that remains to be further investigated by researchers. Researchers have mostly focused on using devices such as the ASL device and the ASL repeaters as interconnects, but some novel spintronic devices such as the MASO device might be promising candidates for spintronic interconnect design as they use electrical current instead of spin current to transfer signals.

# 8.2.3 The MASO Device

The MASO device, proposed in Section VII, is a highly energy-efficient and fast spintronic device. The proposed device is a promising candidate for logic applications as it reaches energy-delay product values close to the energy-delay product values of CMOS devices. Moreover, like other spintronic devices, the MASO device is expected to excel in non-Boolean logic applications. Further improvements in the design of various MASObased circuits and systems relies on the optimization of the performance of an MASO device. Thus, studies on the impact of geometrical dimensions and piezoelectric and magnetostrictive materials on the performance of the MASO is expected to lead to highly energy-efficient and fast spintronic circuits. In investigating the novel materials, more studies must be done on the impacts of strain and resistivity on the operation of the device. Changing strain can significantly change the Rashba coefficient,  $\alpha_R$ ; thus,  $\lambda_{IREE}$  changes accordingly, which might lead to an increase (or decrease) in the efficiency of spin current to charge current conversion in the device. Furthermore, improvements in the resistivity of topological insulators leads to improvements in spin/current transport and transduction as resistivity is inversely proportional to the spin-relaxation time.

# REFERENCES

- [1] I. L. Markov, "Limits on Fundamental Limits to Computation," *Nature*, vol. 512, no. 7513, pp. 147–154, 2014.
- [2] T. N. Theis and P. M. Solomon, "It's Time to Reinvent the Transistor !," *Science*, vol. 327, no. 5973, pp. 1600–1601, 2005.
- [3] S. E. Thompson and S. Parthasarathy, "Moore's law: the future of Si microelectronics," *Mater. Today*, vol. 9, no. 6, pp. 20–25, 2006.
- [4] D. L. Porter, A. G. Evans, and A. H. Heuer, "Transformation-toughening in partially-stabilized zirconia (PSZ)," *Acta Metall.*, vol. 27, no. 10, pp. 1649–1654, 1979.
- [5] A. G. Evans and R. M. Cannon, "Toughening of Brittle Solids By Martensttic Transformations," *Acta Metall.*, vol. 34, no. 5, pp. 761–800, 1986.
- [6] P. Platt, P. Frankel, M. Gass, R. Howells, and M. Preuss, "Finite element analysis of the tetragonal to monoclinic phase transformation during oxidation of zirconium alloys," *J. Nucl. Mater.*, vol. 454, no. 1–3, pp. 290–297, 2014.
- [7] I. Z. Mitrovic *et al.*, "Electrical and structural properties of hafnium silicate thin films," *Microelectron. Reliab.*, vol. 47, no. 4–5 SPEC. ISS., pp. 645–648, 2007.
- [8] J. H. Choi, Y. Mao, and J. P. Chang, "Development of hafnium based high-k materials A review," *Mater. Sci. Eng. R Reports*, vol. 72, no. 6, pp. 97–136, 2011.
- [9] G. D. Wilk, R. M. Wallace, and J. M. Anthony, "High-κ gate dielectrics: Current status and materials properties considerations," J. Appl. Phys., vol. 89, no. 10, pp. 5243–5275, 2001.
- [10] J. P. Chang, Y.-S. Lin, and K. Chu, "Rapid thermal chemical vapor deposition of zirconium oxide for metal-oxide-semiconductor field effect transistor application," *J. Vac. Sci. Technol. B Microelectron. Nanom. Struct.*, vol. 19, no. 5, p. 1782, 2001.
- [11] K.-L. Lin, T.-H. Hou, J. Shieh, J.-H. Lin, C.-T. Chou, and Y.-J. Lee, "Electrode

dependence of filament formation in HfO<sub>2</sub> resistive-switching memory," J. Appl. Phys., vol. 109, no. 8, p. 84104, 2011.

- [12] T. S. Böscke, J. Müller, D. Bräuhaus, U. Schröder, and U. Böttger, "Ferroelectricity in hafnium oxide thin films," *Appl. Phys. Lett.*, vol. 99, no. 10, pp. 0–3, 2011.
- [13] V. Miikkulainen, M. Leskelä, M. Ritala, and R. L. Puurunen, "Crystallinity of inorganic films grown by atomic layer deposition: Overview and general trends," J. Appl. Phys., vol. 113, no. 2, 2013.
- [14] H. Zhu, C. Tang, L. R. C. Fonseca, and R. Ramprasad, "Recent progress in ab initio simulations of hafnia-based gate stacks," *J. Mater. Sci.*, vol. 47, no. 21, pp. 7399– 7416, 2012.
- [15] E. Bersch, S. Rangan, R. A. Bartynski, E. Garfunkel, and E. Vescovo, "Band offsets of ultrathin high- κ oxide films with Si," *Phys. Rev. B - Condens. Matter Mater. Phys.*, vol. 78, no. 8, pp. 1–10, 2008.
- [16] T. D. Huan, V. Sharma, G. A. Rossetti, and R. Ramprasad, "Pathways towards ferroelectricity in hafnia," *Phys. Rev. B - Condens. Matter Mater. Phys.*, vol. 90, no. 6, pp. 1–5, 2014.
- [17] P. Papaspyridakos and K. Lal, "Complete arch implant rehabilitation using subtractive rapid prototyping and porcelain fused to zirconia prosthesis: A clinical report," *J. Prosthet. Dent.*, vol. 100, no. 3, pp. 165–172, 2008.
- [18] X. Huang, et al., "Sub-50 nm P-channel FinFET," *IEEE Trans. Electron Devices*, vol. 48, no. 5, pp. 880–886, 2001.
- [19] D. Hisamoto, *et al.*, "FinFET-a self-aligned double-gate MOSFET scalable to 20 nm," *IEEE Trans. on Electron Devices*, vol. 47, no. 12, pp. 2320–2325, 2001.
- [20] D. Hisamoto, T. Kaga, and E. Takeda, "Impact of the Vertical SOI 'DELTA' Structure on Planar Device Technology," *IEEE Trans. Electron Devices*, vol. 38, no. 6, pp. 1419–1424, 1991.
- [21] T. N. Theis and P. M. Solomon, "It's Time to Reinvent the Transistor !," Science, vol. 327, no. 5973, pp. 1600–1601, 2005.

- [22] S. A. Wolf, et al., "Spintronics: A Spin-Based Electronics Vision for the Future," *Science*, vol. 294, no. 5546, pp. 1488–1495, 2001.
- [23] C. J. Xue, Y. Zhang, Y. Chen, G. Sun, J. J. Yang, and H. Li, "Emerging non-volatile memories," *Proc. seventh IEEE/ACM/IFIP Int. Conf. Hardware/software codesign Syst. Synth.*, p. 325, 2011.
- [24] G. W. Burr, B. N. Kurdi, J. C. Scott, C. H. Lam, K. Gopalakrishnan, and R. S. Shenoy, "Overview of candidate device technologies for storage-class memory," *IBM J. Res. Dev.*, vol. 52, no. 4.5, pp. 449–464, 2008.
- [25] K. Galatsis *et al.*, "Alternate state variables for emerging nanoelectronic devices," *IEEE Trans. Nanotechnol.*, vol. 8, no. 1, pp. 66–75, 2009.
- [26] A. Naeemi, A. Ceyhan, V. Kumar, C. Pan, R. Mousavi Iraei, and S. Rakheja, "BEOL Scaling Limits and Next Generation Technology Prospects," Proceedings of *The* 51st Annual Design Automation Conference on Design Automation Conference, San Francisco, pp. 1-6, Jun. 2014.
- [27] B. Behin-Aein, D. Datta, S. Salahuddin, and S. Datta, "Proposal for an all-spin logic device with built-in memory," *Nat. Nanotechnol.*, vol. 5, no. 4, pp. 266–270, 2010.
- [28] M. G. Mankalale, Z. Liang, Z. Zhao, C. H. Kim, J.-P. Wang, and S. S. Sapatnekar, "CoMET: Composite-input magnetoelectric-based logic technology," *IEEE J. Explor. Solid-State Comput. Devices Circuits*, vol. 3, pp. 27–36, 2017.
- [29] D. Morris, D. Bromberg, J. Zhu, and L. Pileggi, "mLogic: Ultra-low voltage nonvolatile logic circuits using STT-MTJ devices," *Des. Autom. Conf. (DAC)*, 2012 49th ACM/EDAC/IEEE, pp. 486–491, 2012.
- [30] S. Manipatruni, D. E. Nikonov, and I. A. Young, "Spin-Orbit Logic with Magnetoelectric Nodes: A Scalable Charge Mediated Nonvolatile Spintronic Logic," *Arxiv Prepr.*, vol. 23, p. 1512.05428, 2015.
- [31] N. Sharma, A. Marshall, and J. Bird, "VerilogA based Compact model of a threeterminal ME-MTJ device," Proceedings of *The 16th International Conference on Nanotechnology*, pp. 145–148, 2016.
- [32] C. Pan and A. Naeemi, "An Expanded Benchmarking of Beyond-CMOS Devices

Based on Boolean and Neuromorphic Representative Circuits," *IEEE J. Explor. Solid-State Comput. Devices Circuits*, vol. 3, pp. 101–110, 2017.

- [33] D. Nikonov and I. Young, "Benchmarking Spintronic Logic Devices," J. Mater. Res., vol. 29, no. 18, pp. 2109–2115, 2014.
- [34] R. Mousavi Iraei, S. Manipatruni, D. E. Nikonov, I. A. Young, and A. Naeemi, "Improving the Performance of All-Spin Logic Devices Using Magnetostriction," Techcon, Austin, Texas, Sep. 2017.
- [35] I. Žuti'c, J. Fabian, and S. Das Sarma, "Spintronics: Fundamentals and applications," *Rev. Mod. Phys.*, vol. 76, pp. 323–410, Apr. 2004.
- [36] J. Slonczewski, "Current-driven excitation of magnetic multilayers," J. Magn. Mater, vol. 159, nos. 1–2, pp. L1– L7, 1996.
- [37] E. I. Rashba, "Theory of electrical spin injection: Tunnel contacts as asolution of the conductivity mismatch problem," *Phys. Rev. B*, vol. 62, pp. R16267–R16270, Dec. 2000.
- [38] D. E. Nikonov and I. A. Young, "Overview of beyond-CMOS devices and a uniform methodology for their benchmarking," *Proc. IEEE*, vol. 101, no. 12, pp. 2498–2533, 2013.
- [39] J. Kim, *et al.*, "Spin-based computing: Device concepts, current status, and a case study on a high-performance microprocessor," *Proc. IEEE*, vol. 103, no. 1, pp. 106– 130, 2015.
- [40] D. Song, *et al.*, "Unveiling pseudospin and angular momentum in photonic graphene," *Nat. Commun.*, vol. 6, no. 6272, pp. 1–7, 2015.
- [41] I. E. Perakis, "Condensed-matter physics: Exciton developments," *Nature*, vol. 417, no. 6884, pp. 33–35, 2002.
- [42] S. Bhatti, R. Sbiaa, A. Hirohata, H. Ohno, S. Fukami, and S. N. Piramanayagam, "Spintronics based random access memory: a review," *Mater. Today*, vol. 20, no. 9, pp. 530–548, 2017.

- [43] A. Fert, "Nobel Lecture: Origin, development, and future of spintronics," *Rev. Mod. Phys.*, vol. 80, no. 4, pp. 1517–1530, 2008.
- [44] A. D. Kent and D. C. Worledge, "A new spin on magnetic memories," *Nat. Nanotechnol.*, vol. 10, no. 3, pp. 187–191, 2015.
- [45] S. A. Wolf, J. Lu, M. R. Stan, E. Chen, and D. M. Treger, "The promise of nanomagnetics and spintronics for future logic and universal memory," *Proc. IEEE*, vol. 98, no. 12, pp. 2155–2168, 2010.
- [46] S. A. Wolf, "Spintronics: A spi-based electronics vision for the future," *Science.*, vol. 294, no. 5546, p. 1488, 2001.
- [47] M. Sharad and K. Roy, "Spintronic switches for ultralow energy on-chip and interchip current-mode interconnects," *IEEE Electron Device Lett.*, vol. 34, no. 8, pp. 1068–1070, 2013.
- [48] D. E. Nikonov, S. Manipatruni, and I. A. Young, "Automotion of domain walls for spintronic interconnects," J. Appl. Phys., vol. 115, no. 21, p.213902, 2014.
- [49] S. C. Chang, R. Mousavi Iraei, S. Manipatruni, D. E. Nikonov, I. A. Young, and A. Naeemi, "Design and Analysis of Copper and Aluminum Interconnects for All-Spin Logic," *IEEE Transactions on Electron Devices*, vol. 61, no. 8, pp. 2905-2911, Aug. 2014.
- [50] P. Bonhomme, S. Manipatruni, R. Mousavi Iraei, S. Rakheja, S. Chang, D. E. Nikonov, I. A. Young, and A. Naeemi, "Circuit Simulation of Magnetization Dynamics and Spin Transport," *IEEE Transactions on Electron Devices*, vol. 61, no. 5, pp. 1553-1560, May 2014.
- [51] L. Su, et al., "Proposal for a graphene-based all-spin logic gate," Appl. Phys. Lett., vol. 106, no. 7, p. 072407, 2015.
- [52] S. C. Chang, S. Manipatruni, D. E. Nikonov, I. A. Young, and A. Naeemi, "Design and analysis of Si interconnects for all-spin logic," *IEEE Trans. Magn.*, vol. 50, no. 9, pp.1-13, 2014.
- [53] F. Mireles and G. Kirczenow, "From classical to quantum spintronics: Theory of coherent spin injection and spin valve phenomena," *Europhys. Lett.*, vol. 59, no. 1,

pp. 107–113, 2002.

- [54] V. Quang Diep, B. Sutton, B. Behin-Aein, and S. Datta, "Spin switches for compact implementation of neuron and synapse," *Appl. Phys. Lett.*, vol. 104, no. 22, p.222405, 2014.
- [55] C. Pan and A. Naeemi, "A proposal for energy-efficient cellular neural network based on spintronic devices," *IEEE Trans. Nanotechnol.*, vol. 15, no. 5, pp. 820– 827, 2016.
- [56] I. I. Mazin, "How to define and calculate the degree of spin polarization in ferromagnets," *Phys. Rev. Lett.*, vol. 83, no. 7, pp. 1427–1430, 1999.
- [57] M. Johnson and R. Silsbee. "Spin-injection experiment." *Physical Review B*, vol. 37, no. 10, p. 5326, 1988.
- [58] J. Sinova, S. O. Valenzuela, J. Wunderlich, C. H. Back, and T. Jungwirth, "Spin Hall effects," *Rev. Mod. Phys.*, vol. 87, no. 4, pp. 1213–1260, 2015.
- [59] N. Magen, et al., "Interconnect-power dissipation in a microprocessor," *Proc. 2004 Int. Work. Syst. Lev. interconnect Predict*, no. 74, p. 7, 2004.
- [60] R. Liu, Luqiao, Shillman, "Spin orbit electronics: From heavy metals to topological insulators," 2016.
- [61] R. Bishnoi, M. Ebrahimi, F. Oboril, and M. B. Tahoori, "Architectural aspects in design and analysis of SOT-based memories," *Proc. Asia South Pacific Des. Autom. Conf. ASP-DAC*, pp. 700–707, 2014.
- [62] A. Makarov, T. Windbacher, V. Sverdlov, and S. Selberherr, "SOT-MRAM based on 1Transistor-1MTJ-cell structure," 2015 15th Non-Volatile Mem. Technol. Symp. NVMTS 2015, pp. 0–3, 2016.
- [63] G. Prenat, et al., "Ultra-Fast and High-Reliability SOT-MRAM: From Cache Replacement to Normally-Off Computing," *IEEE Trans. Multi-Scale Comput. Syst.*, vol. 2, no. 1, pp. 49–60, 2016.
- [64] T. Wu, et al., "Electric-poling-induced magnetic anisotropy and electric-field-

induced magnetization reorientation in magnetoelectric Ni/(011) [Pb(Mg 1/3Nb2/3)O3](1-x)-[PbTiO 3]x heterostructure," *J. Appl. Phys.*, vol. 109, no. 7, pp. 2009–2012, 2011.

- [65] N. Kani, J. T. Heron, and A. Naeemi, "Strain-Mediated Magnetization Reversal Through Spin-Transfer Torque," *IEEE Trans. Magn.*, 2017.
- [66] B. Behin-Aein, A. Sarkar, S. Srinivasan, and S. Datta, "Switching energy-delay of all spin logic devices," *Appl. Phys. Lett.*, vol. 98, no. 12, pp. 1–4, 2011.
- [67] A. A. Khajetoorians, J. Wiebe, B. Chilian, and R. Wiesendanger, "Realizing All-Spin," *Science*, vol. 332, p. 1062, 2011.
- [68] B. Behin-Aein, A. Sarkar, S. Srinivasan, and S. Datta, "Switching energy-delay of all-spin logic devices,", vol. 98, no. 12, pp. 9–14, 2010.
- [69] S. Datta, S. Salahuddin, and B. Behin-Aein, "Non-volatile spin switch for Boolean and non-Boolean logic," *Appl. Phys. Lett.*, vol. 101, no. 25, p. 252411, 2012.
- [70] B. Behin-Aeinl and S. Dattal, "All-spin logic," *Device Res. Conf.*, p. 266, pp. 41–42, 2010.
- [71] L. Su, *et al.*, "Current-limiting challenges for all-spin logic devices," *Sci. Rep.*, vol. 5, p. 14905, 2015.
- [72] S. Manipatruni, D. E. Nikonov, and I. A. Young, "Material Targets for Scaling All-Spin Logic," *Phys. Rev. Appl.*, vol. 5, no. 1, pp. 1–21, 2016.
- [73] S. Das Sarma, J. Fabian, X. Hu, and I. Zutic, "Theoretical perspectiveson spintronics and spin-polarized transport," *IEEE Trans. Magn.*, vol. 36,no. 5, pp. 2821–2826, Sep. 2000.
- [74] I. Zutic, J. Fabian, and S. Erwin, "Bipolar spintronics: Fundamentals and applications," *IBM J. Res. Develop.*, vol. 50, no. 1, pp. 121–139, 2006.
- [75] R. Mousavi Iraei, P. Bonhomme, N. Kani, S. Manipatruni, D. E. Nikonov, I. A. Young, and A. Naeemi, "Impact of Dimensional Scaling and Size Effects on Beyond CMOS All-Spin Logic Interconnects," *Interconnect Technology*

*Conference/Advanced Metallization Conference (IITC)*, San Jose, pp. 353-356, May 2014.

- [76] H. Aghasi, R. Mousavi Iraei, A. Naeemi, and E. Afshari, "Smart Detector Cell: A Scalable All-Spin Circuit for Low Power Non-Boolean Pattern Recognition," *IEEE Transactions on Nanotechnology*, vol. 15, no. 3, pp. 356-366, May 2016.
- [77] J. Fabian and S. Das Sarma, "Spin relaxation of conduction electrons," J. Vac. Sci. *Technol. B Microelectron. Nanom. Struct.*, vol. 17, no. 4, p. 1708, 1999.
- [78] L. D. Landau and E. Lifshitz, "On the theory of the dispersion ofmagnetic permeability in ferromagnetic bodies," *Phys. Z. Sowjetunion*, vol. 8, no. 153, pp. 101–114, 1935.
- [79] K. Bernstein, R. K. Cavin, W. Porod, A. Seabaugh, and J. Welser, "Device and architecture outlook for beyond CMOS switches," *Proc. IEEE*, vol. 98, no. 12, pp. 2169–2184, 2010.
- [80] C. Augustine, G. Panagopoulos, B. Behin-Aein, S. Srinivasan, A. Sarkar, and K. Roy, "Low-power functionality enhanced computation architecture using spinbased devices," *Proc. 2011 IEEE/ACM Int. Symp. Nanoscale Archit. NANOARCH 2011*, pp. 129–136, 2011.
- [81] K. Roy, M. Sharad, D. Fan, and K. Yogendra, "Beyond Charge-Based Computation: Boolean and Non-Boolean Computing With Spin Torque Devices Kaushik Computing With Spin Torque Devices," *Symp. Low Power Electron. Des.*, pp. 1–4, 2013.
- [82] M. Sharad, K. Yogendra, A. Gaud, K. Kwon, and K. Roy, "Ultra-High Density, High-Performance and Energy-Efficient All Spin Logic," *arXiv*, 2013.
- [83] Q. An, L. Su, J. O. Klein, S. Le Beux, I. O'Connor, and W. Zhao, "Full-adder circuit design based on all-spin logic device," *Proc. 2015 IEEE/ACM Int. Symp. Nanoscale Archit. NANOARCH 2015*, pp. 163–168, 2015.
- [84] S. Srinivasan, A. Sarkar, B. Behin-Aein, and S. Datta, "All-spin logic device with inbuilt nonreciprocity," *IEEE Trans. Magn.*, vol. 47, no. 10, pp. 4026–4032, 2011.
- [85] Razavi, "A 1.8 GHz CMOS voltage-controlled oscillator," Solid-State Circuits

Conf. 1997. Dig. Tech. Pap. 43rd ISSCC., 1997 IEEE Int., pp. 388–389, 1997.

- [86] C. C. Boon, M. A. Do, K. S. Yeo, J. G. Ma, and X. L. Zhang, "RF CMOS Low-Phase-Noise LC Oscillator Through Memory Reduction Tail Transistor," *IEEE Trans. Circuits Syst. II Express Briefs*, vol. 51, no. 2, pp. 85–90, 2004.
- [87] B. P. Otis and J. M. Rabaey, "A 300-μ W 1.9-GHz CMOS oscillator utilizing micromachined resonators," *IEEE J. Solid-State Circuits*, vol. 38, no. 7, pp. 1271– 1274, 2003.
- [88] C. Zuo, J. Van Der Spiegel, and G. Piazza, "1.05-GHz CMOS oscillator based on lateral- field-excited piezoelectric AlN contour- mode MEMS resonators," *IEEE Trans. Ultrason. Ferroelectr. Freq. Control*, vol. 57, no. 1, pp. 82–87, 2010.
- [89] M. Rinaldi, C. Zuo, J. Van Der Spiegel, and G. Piazza, "Reconfigurable CMOS oscillator based on multifrequency AIN contour-mode MEMS resonators," *IEEE Trans. Electron Devices*, vol. 58, no. 5, pp. 1281–1286, 2011.
- [90] M. Babaie and R. B. Staszewski, "A class-F CMOS oscillator," *IEEE J. Solid-State Circuits*, vol. 48, no. 12, pp. 3120–3133, 2013.
- [91] A. Hajimiri and T. H. Lee, "A general theory of phase noise in electrical oscillators," *IEEE J. Solid-State Circuits*, vol. 33, no. 2, pp. 179–194, 1998.
- [92] R. W. Hölzel and K. Krischer, "Pattern recognition with simple oscillating circuits," *New J. Phys.*, vol. 13, no. 7, p. 073031, 2011.
- [93] J. Sun, "Spin-current interaction with a monodomain magnetic body: A model study," *Phys. Rev. B*, vol. 62, no. 1, p. 570, 2000.
- [94] S. Rakheja, S. C. Chang, and A. Naeemi, "Impact of dimensional scaling and size effects on spin transport in copper and aluminum interconnects," *IEEE Trans. Electron Devices*, vol. 60, no. 11, pp. 3913–3919, 2013.
- [95] G. G. Lopez, "The Impact of Interconnect Process Variations and Size Effects for Gigascale Integration The Impact of Interconnect Process Variations and Size Effects for Gigascale Integration," 2009.

- [96] H. Kitada, et al., "The influence of the size effect of copper interconnects on RC delay variability beyond 45nm technology," Int. Interconnect Technol. Conf., pp. 10–12, 2007.
- [97] J. J. Plombon, E. Andideh, V. M. Dubin, and J. Maiz, "Influence of phonon, geometry, impurity, and grain size on Copper line resistivity," *Appl. Phys. Lett.*, vol. 89, no. 11, pp. 2004–2007, 2006.
- [98] M. Shimada, M. Moriyama, K. Ito, S. Tsukimoto, and M. Murakami, "Electrical resistivity of polycrystalline Cu interconnects with nano-scale linewidth," J. Vac. Sci. Technol. B Microelectron. Nanom. Struct., vol. 24, no. 1, p. 190, 2006.
- [99] W. Steinhögl, G. Schindler, G. Steinlesberger, M. Traving, and M. Engelhardt, "Comprehensive study of the resistivity of copper wires with lateral dimensions of 100 nm and smaller," J. Appl. Phys., vol. 97, no. 2, p. 023706, 2005.
- [100] H. Hsieh, S. Member, and L. Lu, "A High-Performance CMOS Voltage-Controlled," *IEEE Transaction on Microwave Theory and Techniques*, vol. 55, no. 3, pp. 467–473, 2007.
- [101] M. Da Tsai, Y. H. Cho, and H. Wang, "A 5-GHz low phase noise differential colpitts CMOS VCO," *IEEE Microw. Wirel. Components Lett.*, vol. 15, no. 5, pp. 327–329, 2005.
- [102] H.-H. Hsieh, K.-S. Chung, and L.-H. Lu, "Ultra-low-voltage mixer and VCO in 0.18-μm CMOS," *Radio Freq. Integr. Circuits Symp. 2005. Dig. Pap. 2005 IEEE*, pp. 167–170, 2005.
- [103] D. Ballard, "Generalizing the Hough transform to detect arbitrary shapes," *Readings in computer vision*, pp. 714-725, 1987.
- [104] J. Flusser and T. Suk, "Pattern recognition by affine moment invariants," *Pattern Recognit.*, vol. 26, no. 1, pp. 167–174, 1993.
- [105] S. Wold, "Pattern recognition by means of disjoint principal components models," *Pattern Recognit.*, vol. 8, no. 3, pp. 127–139, 1976.
- [106] J. Seo, *et al.*, "A 45nm CMOS neuromorphic chip with a scalable architecture for learning in networks of spiking neurons," pp. 1–4, 2011.

- [107] T. Valet and A. Fert, "Theory of the perpendicular magneto resistance in magnetic multilayers," *Phys. Rev. B*, vol. 48, pp. 7099–7113, Sep. 1993.
- [108] S. P. Levitan, Y. Fang, D. H. Dash, T. Shibata, D. E. Nikonov, and G. I. Bourianoff, "Non-Boolean associative architectures based on nano-oscillators," *Int. Work. Cell. Nanoscale Networks their Appl.*, 2012.
- [109] C. Augustine, X. Fong, B. Behin-Aein, and K. Roy, "Ultra-low power nanomagnetbased computing: A system-level perspective," *IEEE Trans. Nanotechnol.*, vol. 10, no. 4, pp. 778–788, 2011.
- [110] G. Panagopoulos, C. Augustine, and K. Roy, "A framework for simulating hybrid MTJ/CMOS circuits: Atoms to system approach," *Proc. DATE*, pp. 1443–1446,, 2012.
- [111] S. Matsunaga, et al., "Fabrication of a nonvolatile full adder based on logic-inmemory architecture using magnetic tunnel junctions," Appl. Phys. Express, vol. 1, no. 9, pp. 0913011–0913013, 2008.
- [112] M. Sharad, C. Augustine, G. Panagopoulos, and K. Roy, "Spin-based neuron model with domain-wall magnets as synapse," *IEEE Trans. Nanotechnol.*, vol. 11, no. 4, pp. 843–853, 2012.
- [113] J. Wang, H. Meng, and J. P. Wang, "Programmable spintronics logic device based on a magnetic tunnel junction element," *J. Appl. Phys.*, vol. 97, no. 10, 2005.
- [114] X. Dong, C. Xu, N. Jouppi, and Y. Xie, "NVSim: A circuit-level performance, energy, and area model for emerging non-volatile memory," *Emerg. Mem. Technol. Des. Archit. Appl.*, no. 7, pp. 15–50, 2014.
- [115] W. Xu, H. Sun, X. Wang, Y. Chen, and T. Zhang, "Design of last-level on-chip cache using spin-torque transfer RAM (STT RAM)," *IEEE Trans. Very Large Scale Integr. Syst.*, vol. 19, no. 3, pp. 483–493, 2011.
- [116] F. Ren, H. Park, C. K. Ken Yang, and D. Markoviæ, "Reference calibration of bodyvoltage sensing circuit for high-speed STT-RAMs," *IEEE Trans. Circuits Syst. I Regul. Pap.*, vol. 60, no. 11, pp. 2932–2939, 2013.
- [117] R. Cited, M. View, and P. E. Tran, "Method and System for Deblurring," vol. 2, no.

12, 2010.

- [118] W. Brown, "Thermal fluctuation of fine ferromagnetic particles," *IEEE Trans. Magn.*, vol. 15, no. 5, pp. 1196–1208, Sep. 1979.
- [119] S. Ikeda *et al.*, "Magnetic tunnel junctions for spintronic memories and beyond," *IEEE Trans. Electron Devices*, vol. 54, no. 5, pp. 991–1002, 2007.
- [120] S. Nishioka, et al., "Differential conductance measurements of low-resistance CoFeB/MgO/CoFeB magnetic tunnel junctions," J. Magn. Magn. Mater., vol. 310, no. 2, pp. 2006–2008, 2007.
- [121] J. Hayakawa, S. Ikeda, F. Matsukura, H. Takahashi, and H. Ohno, "Dependence of giant tunnel magnetoresistance of sputtered CoFeB/MgO/CoFeB magnetic tunnel junctions on MgO barrier thickness and annealing temperature," *Japanese J. Appl. Physics Lett.*, vol. 44, no. 16–19, 2005.
- [122] W. Rotjanapittayakul, T. Archer, S. Sanvito, and W. Pijitrojana, "A First Principle Study of the Massive TMR in Magnetic Tunnel Junction Using Fe 3 Al Heusler Alloy Electrodes and MgO Barrier," Adv. Mater. Res., vol. 1101, pp. 192–197, 2015.
- [123] H. Sugiyama, T. Inokuchi, and Y. Saito, "A novel magnetic tunnel junction structure using the edge of a magnetic film," J. Magn. Magn. Mater., vol. 310, no. 2, pp. 2003–2005, 2007.
- [124] S. Ikeda, J. Hayakawa, Y. M. Lee, F. Matsukura, and H. Ohno, "Dependence of tunnel magnetoresistance on ferromagnetic electrode materials in MgO-barrier magnetic tunnel junctions," *J. Magn. Magn. Mater.*, vol. 310, no. 2, pp. 1937–1939, 2007.
- [125] D. D. Djayaprawira, et al., "230% room-temperature magnetoresistance in CoFeBMgOCoFeB magnetic tunnel junctions," Appl. Phys. Lett., vol. 86, no. 9, pp. 1–3, 2005.
- [126] N. Tezuka, et al., "Tunnel magnetoresistance in magnetic tunnel junctions with Co2Fe (Al, Si) full-Heusler films," J. Magn. Magn. Mater., vol. 310, no. 2, pp. 1940–1942, 2007.
- [127] J. Hayakawa, S. Ikeda, F. Matsukura, H. Takahashi, and H. Ohno, "Dependence of

giant tunnel magnetoresistance of sputtered CoFeB/MgO/CoFeB magnetic tunnel junctions on MgO barrier thickness and annealing temperature," *Japanese J. Appl. Physics Lett.*, vol. 44, no. 16–19, pp. 1–4, 2005.

- [128] S.-F. Lee, et al., "Two-channel analysis of CPP-MR data for Ag/Co and AgSn/Co multilayers," J. Magn. Magn. Mater., vol. 118, nos. 1–2, pp. L1–L5, 1993.
- [129] S. Yuasa, T. Nagahama, A. Fukushima, Y. Suzuki, and K. Ando, "Giant roomtemperature magnetoresistance in single-crystal Fe/MgO/Fe magnetic tunnel junctions," *Nat. Mater.*, vol. 3, no. 12, pp. 868–871, 2004.
- [130] S. Manipatruni, D. E. Nikonov, and I. A. Young, "Modeling and design of spintronic integrated circuits," *IEEE Trans. Circuits Syst. I Regul. Pap.*, vol. 59, no. 12, pp. 2801–2814, 2012.
- [131] R. Mousavi Iraei, S. Manipatruni, D. E. Nikonov, I. A. Young, and A. Naeemi, "Electrical-Spin Transduction for CMOS-Spintronic Interface and Long-Range Interconnects," *IEEE Journal on Exploratory Solid-State Computational Devices* and Circuits, vol. 3, pp. 47-55, Dec. 2017.
- [132] V. Calayir, D. E. Nikonov, S. Manipatruni, and I. A. Young, "Static and clocked spintronic circuit design and simulation with performance analysis relative to CMOS," *IEEE Trans. Circuits Syst. I Regul. Pap.*, vol. 61, no. 2, pp. 393–406, 2014.
- [133] O. Zografos, et al., "Design and benchmarking of hybrid CMOS-Spin Wave Device Circuits compared to 10nm CMOS," 15th Int. Conf. Nanotechnol., pp. 686–689, 2015.
- [134] J. Xie, P. K. Meher, and Z. H. Mao, "High-throughput finite field multipliers using redundant basis for FPGA and ASIC implementations," *IEEE Trans. Circuits Syst. I Regul. Pap.*, vol. 62, no. 1, pp. 110–119, 2015.
- [135] S. Dutta, R. Mousavi Iraei, C. Pan, D. E. Nikonov, S, Manipatruni, I. A. Young, and A. Naeemi, "Impact of spintronics transducers on the performance of spin wave logic circuit." 2016 IEEE 16th International Conference on Nanotechnology (IEEE-NANO), Sendai, Japan, 2016.
- [136] T. Yang, T. Kimura, and Y. Otani, "Giant spin-accumulation signal and pure spincurrent-induced reversible magnetization switching," *Nat. Phys.*, vol. 4, no. 11, pp. 851–854, 2008.

- [137] T. Nomura, K. Ohnishi, and T. Kimura, "Large spin current injection in nano-pillarbased lateral spin valve," AIP Conf. Proc., vol. 1763, no. 1, p. 020011, 2016.
- [138] W. C. Wong, S. M. Ng, H. F. Wong, C. L. Mak, and C. W. Leung, "Spin-Valve Junction with Transfer-Free MoS2 Spacer Prepared by Sputtering," *IEEE Trans. Magn.*, 2017.
- [139] W. Wang, et al., "Spin-Valve Effect in NiFe/MoS 2 /NiFe Junctions," Nano Lett., vol. 15, no. 8, pp. 5261–5267, 2015.
- [140] H. Li, et al., "Stretchable Spin Valve with Stable Magnetic Field Sensitivity by Ribbon-Patterned Periodic Wrinkles," ACS Nano, vol. 10, no. 4, pp. 4403–4409, 2016.
- [141] Y. Shiota, T. Nozaki, F. Bonell, S. Murakami, T. Shinjo, and Y. Suzuki, "Induction of coherent magnetization switching in a few atomic layers of FeCo using voltage pulses," *Nat. Mater.*, vol. 11, no. 1, pp. 39–43, 2011.
- [142] D. Sander, "The correlation between mechanical stress and magnetic anisotropy in ultrathin films," *Reports Prog. Phys.*, vol. 62, no. 5, pp. 809–858, 1999.
- [143] R. Mousavi Iraei, S. Dutta, S. Manipatruni, D. E. Nikonov, I. A. Young, J. T. Heron, and A. Naeemi, "A Proposal for a Magnetostriction-Assisted All-Spin Logic Device," 2017 75th Device Research Conference (DRC), South Bend, pp. 225-226, Jun. 2017.
- [144] R. Shukla, K. K. Rajan, P. Gandhi, and L. C. Lim, "Complete sets of elastic, dielectric, and piezoelectric properties of [001]-poled Pb (Zn1/3Nb2/3)O3-(6-7)%PbTiO 3 single crystals of [110]-length cut," *Appl. Phys. Lett.*, vol. 92, no. 21, 2008.
- [145] L. Sandlund, M. Fahlander, T. Cedell, A. E. Clark, J. B. Restorff, and M. Wun-Fogle, "Magnetostriction, elastic moduli, and coupling factors of composite Terfenol-D," J. Appl. Phys., vol. 75, no. 10, pp. 5656–5658, 1994.
- [146] M. S. Fashami, K. Roy, J. Atulasimha, and S. Bandyopadhyay, "Magnetization dynamics, Bennett clocking and associated energy dissipation in multiferroic logic," *Nanotechnology*, vol. 22, no. 15, 2011.

- [147] I. Young, and D. Nikonov, "Beyond-CMOS Benchmarking Workshop," 2014.
- [148] R. Mousavi Iraei, D. Nikonov, I. Young, S. Manipatruni, and A. Naeemi, "Thermally Reliable Magnetostriction-Assisted All-Spin Domino-Logic Device," in APS March Meeting, 2018.
- [149] D. Ramachandram and G. W. Taylor, "Deep Multimodal Learning: A Survey on Recent Advances and Trends," in *IEEE Signal Processing Magazine*, vol. 34, no. 6, pp. 96-108, 2017.
- [150] D. Amodei, *et al.*, "Deep Speech 2: End-to-End Speech Recognition in English and Mandarin," *International Conference on Machine Learning*, pp. 173-182, 2015.
- [151] A. Graves, A. Mohamed, and G. Hinton, "Speech Recognition with Deep Recurrent Neural Networks," 2013 ieee international conference on Acoustics, speech and signal processing (icassp), no. 3, pp. 6645-6649, 2013.
- [152] K. He, X. Zhang, S. Ren, and J. Sun, "Deep Residual Learning for Image Recognition," 2016 IEEE Conf. Comput. Vis. Pattern Recognit., pp. 770–778, 2016.
- [153] M. Bojarski, et al., "End to End Learning for Self-Driving Cars," arxiv, 2016.
- [154] S. Ramos, S. Gehrig, P. Pinggera, U. Franke, and C. Rother, "Detecting unexpected obstacles for self-driving cars: Fusing deep learning and geometric modeling," *IEEE Intell. Veh. Symp. Proc.*, pp. 1025–1032, 2017.
- [155] H. Cui, H. Zhang, G. R. Ganger, P. B. Gibbons, and E. P. Xing, "GeePS: scalable deep learning on distributed GPUs with a GPU-specialized parameter server," *EuroSys*, pp. 1–16, 2016.
- [156] J. Schmidhuber, "Deep Learning in neural networks: An overview," *Neural Networks*, vol. 61, pp. 85–117, 2015.
- [157] P. H. Pham, D. Jelaca, C. Farabet, B. Martini, Y. LeCun, and E. Culurciello, "NeuFlow: Dataflow vision processing system-on-a-chip," *Midwest Symp. Circuits Syst.*, pp. 1044–1047, 2012.
- [158] A. Sengupta and K. Roy, "Encoding neural and synaptic functionalities in electron

spin: A pathway to efficient neuromorphic computing," *Appl. Phys. Rev.*, vol. 4, no. 4, p. 041105, 2017.

- [159] J. B. Burr, "Digital neural network implementations." *Neural networks, concepts, applications, and implementations,* vol. 3, pp. 237-285, 1991.
- [160] K. L. Wang, J. G. Alzate, and P. Khalili Amiri, "Low-power non-volatile spintronic memory: STT-RAM and beyond," J. Phys. D. Appl. Phys., vol. 46, no. 7, p. 74003, 2013.
- [161] S. E. Barnes, "Comment on 'theory of current-driven domain wall motion: Spin transfer versus momentum transfer," *Phys. Rev. Lett.*, vol. 96, no. 18, pp. 1–4, 2006.
- [162] A. Sengupta, Y. Shim, and K. Roy, "Proposal for an all-spin artificial neural network: Emulating neural and synaptic functionalities through domain wall motion in ferromagnets," *IEEE Trans. Biomed. Circuits Syst.*, vol. 10, no. 6, pp. 1152–1160, 2016.
- [163] K. Xia, P. J. Kelly, G. E. W. Bauer, A. Brataas, and I. Turek, "Spin torques in ferromagnetic/normal-metal structures," *Phys. Rev. B*, vol. 65, no. 22, pp. 220401-1–220401-4, May 2002.
- [164] R. Mousavi Iraei, S. Dutta, N. Kani, S. Manipatruni, D. E. Nikonov, I. A. Young, J. T. Heron, and A. Naeemi, "Clocked Magnetostriction-Assisted Spintronic Device Design and Simulation," *IEEE Transaction on Electron Devices*, vol. 65, no. 5, pp. 2040-2046, May 2018.
- [165] R. Mousavi Iraei, S. Manipatruni, D. E. Nikonov, I. A. Young, and A. Naeemi, "Electrical-Spin Transduction for CMOS-Spintronic Interface and Long-Range Interconnects," *IEEE Journal on Exploratory Solid-State Computational Devices* and Circuits, vol. 3, pp. 47-55, Dec. 2017.
- [166] W. Scott, J. Jeffrey, B. Heard, D. E. Nikonov, I. A. Young, S. Manipatruni, A. Naeemi, and R. Mousavi Iraei, "Hybrid Piezoelectric-Magnetic Neurons: A Proposal for Energy-Efficient Machine Learning," *Annual ACMSE Conference*, Richmond, March 2018.
- [167] S. H. Jo, T. Chang, I. Ebong, B. B. Bhadviya, P. Mazumder, and W. Lu, "Nanoscale memristor device as synapse in neuromorphic systems," *Nano Lett.*, vol. 10, no. 4, pp. 1297–1301, 2010.

- [168] A. Ankit, A. Sengupta, P. Panda, and K. Roy, "RESPARC : A Reconfigurable and Energy-Efficient Architecture with Memristive Crossbars for Deep Spiking Neural Networks," *Proc.* of *the 54th Annual Des. Aut. Conf. ACM*, 2017.
- [169] S. G. Ramasubramanian, R. Venkatesan, M. Sharad, K. Roy, and A. Raghunathan, "SPINDLE: SPINtronic Deep Learning Engine for Large-scale Neuromorphic Computing," *Proc. 2014 Int. Symp. Low Power Electron. Des.*, pp. 15–20, 2014.
- [170] M. Sharad, D. Fan, and K. Roy, "Spin-neurons: A possible path to energy-efficient neuromorphic computers," *J. Appl. Phys.*, vol. 114, no. 23, pp. 1–15, 2013.
- [171] J. Nitta, F. E. Meijer, and H. Takayanagi, "Spin-interference device," *Appl. Phys. Lett.*, vol. 75, no. 5, pp. 695–697, 1999.
- [172] T. Koga, J. Nitta, H. Takayanagi, and S. Datta, "Spin-Filter Device Based on the Rashba Effect Using a Nonmagnetic Resonant Tunneling Diode," *Phys. Rev. Lett.*, vol. 88, no. 12, p. 4, 2002.
- [173] V. M. Edelstein, "Spin polarization of conduction electrons induced by electric current in two-dimensional asymmetric electron systems," *Solid State Commun.*, vol. 73, no. 3, pp. 233–235, 1990.
- [174] M. I. Dyakonov and V. I. Perel, "Current-induced spin orientation of electrons in semiconductors," *Phys. Lett. A*, vol. 35, no. 6, pp. 459–460, 1971.
- [175] J. Sánchez, *et al.*, "Spin-to-charge conversion using Rashba coupling at the interface between non-magnetic materials," *Nat. Commun.*, vol. 4, p. 2944, 2013.
- [176] K. Shen, G. Vignale, and R. Raimondi, "Microscopic theory of the inverse Edelstein effect," *Phys. Rev. Lett.*, vol. 112, no. 9, pp. 1–5, 2014.
- [177] Y. Shiomi, *et al.*, "Spin-electricity conversion induced by spin injection into topological insulators," *Phys. Rev. Lett.*, vol. 113, no. 19, pp. 7–11, 2014.
- [178] L. Liu, C.-F. Pai, Y. Li, H. W. Tseng, D. C. Ralph, and R. A. Buhrman, "Spin-Torque Switching with the Giant Spin Hall Effect of Tantalum," *Science*, vol. 336, no. 6081. pp. 555–558, 2012.

- [179] D. McQuarrie and J. Simon, "Physical Chemistry: A Molecular Approach," *University Science Books*, 1997.
- [180] J. Fabian, A. Matos-Abiague, C. Ertler, P. Stano, and I. Žuti'c, "Semiconductor spintronics," *Acta Phys. Slovaca*, vol. 57, pp. 565–907, 2007.
- [181] P. Gambardella and I. M. Miron, "Current-induced spin-orbit torques," *Philos. Trans. R. Soc. A Math. Phys. Eng. Sci.*, vol. 369, no. 1948, pp. 3175–3197, 2011.
- [182] C. F. Pai, et al., "Spin transfer torque devices utilizing the giant spin Hall effect of tungsten," *Appl. Phys. Lett.*, vol. 101, no. 12, pp. 1–5, 2012.
- [183] A. R. Mellnik, *et al.*, "Spin Transfer Torque Generated by the Topological Insulator Bi2Se3," *arxiv*, p. 34, 2014.
- [184] T. Valla, et al., "Topological semimetal in a Bi-Bi2Se3 infinitely adaptive superlattice phase," Phys. Rev. B - Condens. Matter Mater. Phys., vol. 86, no. 24, pp. 3–7, 2012.
- [185] C. Cheng, *et al.*, "Spin to charge conversion in MoS\$\_{2}\$ monolayer with spin pumping," *arxiv 1510.0345*, pp. 1–15, 2015.
- [186] G. Wang, *et al.*, "Spin-orbit engineering in transition metal dichalcogenide alloy monolayers," *Nat. Commun.*, vol. 6, pp. 1–7, 2015.
- [187] K. Y. Camsari, R. Faria, B. M. Sutton, and S. Datta, "Stochastic p-bits for invertible logic," *Phys. Rev. X*, vol. 7, no. 3, pp. 1–19, 2017.
- [188] J. Fabian and S. Das Sarma, "Spin relaxation of conduction electrons in polyvalent metals: Theory and a realistic calculation," *Phys. Rev. Lett.*, vol. 81, pp. 5624–5627, 1998.
- [189] A. A. Sukhanov and V. A. Sablikov, "Spin current in an electron waveguide tunnelcoupled to a topological insulator," J. Phys. Condens. Matter, vol. 24, no. 40, 2012.
- [190] J. C. Y. Teo, L. Fu, and C. L. Kane, "Surface States of the Topological Insulator Bi1-xSbx," vol. 80, no. 8, pp. 085307, 2008.

- [191] M. DC, *et al.*, "Room-temperature perpendicular magnetization switching through giant spin-orbit torque from sputtered BixSe(1-x) topological insulator material," *arXiv*, 2017.
- [192] M. Jamali, et al., "Giant Spin Pumping and Inverse Spin Hall Effect in the Presence of Surface and Bulk Spin-Orbit Coupling of Topological Insulator Bi2Se3," Nano Lett., vol. 15, no. 10, pp. 7126–7132, 2015.
- [193] N. Huynh, D. Khang, Y. Ueda, and P. N. Hai, "A conductive topological insulator with colossal spin Hall effect for ultra-low power spin-orbit-torque switching," arXiv, 2017.
- [194] P. B. Ndiaye, C. A. Akosa, M. H. Fischer, A. Vaezi, E. A. Kim, and A. Manchon, "Dirac spin-orbit torques and charge pumping at the surface of topological insulators," *Phys. Rev. B*, vol. 96, no. 1, pp. 1–8, 2017.
- [195] J. Xiao, A. Zangwill, and M. D. Stiles, "Macrospin models of spin transfer dynamics," *Phys. Rev. B*, vol. 72, pp. 014446-1–014446-13, 2005.
- [196] R. A. Buhrman, "Spin-Torque Switching with the Giant Spin Hall Effect of Tantalum," *Science*, vol. 336, p. 555, 2012.
- [197] L. Liu, T. Moriyama, D. C. Ralph, and R. A. Buhrman, "Spin-torque ferromagnetic resonance induced by the spin Hall effect," *Phys. Rev. Lett.*, vol. 106, no. 3, pp. 1– 4, 2011.
- [198] K. Chen, R. Frömter, S. Rössler, N. Mikuszeit, and H. P. Oepen, "Uniaxial magnetic anisotropy of cobalt films deposited on sputtered MgO(001) substrates," *Phys. Rev. B Condens. Matter Mater. Phys.*, vol. 86, no. 6, pp. 1–7, 2012.
- [199] C. Chappert, and P. Bruno, "Magnetic anisotropy in metallic ultrathin films and related experiments on cobalt films," *Journal of Applied Physics*, vol. 64, no.10, pp. 5736-5741, 1988.
- [200] E. Barati, M. Cinal, D. M. Edwards, and A. Umerski, "Gilbert damping in magnetic layered systems," *Phys. Rev. B - Condens. Matter Mater. Phys.*, vol. 90, no. 1, pp. 1–16, 2014.
- [201] S. Schnurr, U. Wiedwald, P. Ziemann, V. Y. Verchenko, and A. V. Shevelkov,

"Structural and thermoelectric properties of TMGa3 (TM = Fe, Co) thin films," *Beilstein J. Nanotechnol.*, vol. 4, no. 1, pp. 461–466, 2013.

- [202] R. Prakash, R. J. Choudhary, L. S. Sharath Chandra, N. Lakshmi, and D. M. Phase, "Electrical and magnetic transport properties of Fe 3 O 4 thin films on a GaAs(100) substrate," *J. Phys. Condens. Matter*, vol. 19, no. 48, p. 486212, 2007.
- [203] Liu, Ming, et al. "Electrically induced enormous magnetic anisotropy in Terfenol-D/lead zinc niobate-lead titanate multiferroic heterostructures." *Journal of Applied Physics*, vol. 112, no. 6, p. 063917, 2012.
- [204] B. Y. Yoo, M. Schwartz, and K. Nobe, "Electrodeposition of high magnetic moment CoFe based alloys," 2001.
- [205] J. L. Hockel, T. Wu, and G. P. Carman, "Voltage bias influence on the converse magnetoelectric effect of PZT/terfenol-D/PZT laminates," J. Appl. Phys., vol. 109, no. 6, p. 064106, 2011.
- [206] T. Wu, et al., "Electric-poling-induced magnetic anisotropy and electric-field-induced magnetization reorientation in magnetoelectric Ni/(011) [Pb(Mg1/3Nb2/3)O3](1-x)-[PbTiO3]xheterostructure," J. Appl. Phys., vol. 109, no. 7, pp. 21–24, 2011.
- [207] H. Cao, V. H. Schmidt, R. Zhang, W. Cao, and H. Luo, "Elastic, piezoelectric, and dielectric properties of 0.58Pb(Mg1/3Nb2/3)O3-0.42PbTiO3 single crystal," J. Appl. Phys., vol. 96, no. 1, pp. 549–554, 2004.
- [208] X. Li, Z. Wang, C. He, X. Long, and Z. G. Ye, "Growth and piezo-/ferroelectric properties of PIN-PMN-PT single crystals," J. Appl. Phys., vol. 111, no. 3, p. 0341105, 2012.
- [209] E. Sun, S. Zhang, J. Luo, T. R. Shrout, and W. Cao, "Elastic, dielectric, and piezoelectric constants of Pb(In1/2Nb1/2)O3–Pb(Mg1/3Nb2/3)O3–PbTiO3 single crystal poled along [011]c," *Appl. Phys. Lett.*, vol. 97, no. 3, p. 32902, 2010.
- [210] J. Barbosa, B. Almeida, J. A. Mendes, A. G. Rolo, and J. P. Araújo, "X-ray diffraction and Raman study of nanogranular BaTiO 3- CoFe 2O 4 thin films deposited by laser ablation on Si/Pt substrates," *Phys. Status Solidi Appl. Mater. Sci.*, vol. 204, no. 6, pp. 1731–1737, 2007.

- [211] A. Muhammad, R. Sato-Turtelli, M. Kriegisch, R. Grössinger, F. Kubel, and T. Konegger, "Large enhancement of magnetostriction due to compaction hydrostatic pressure and magnetic annealing in CoFe 2O 4," J. Appl. Phys., vol. 111, no. 1, p. 013918, 2012.
- [212] F. Wang, L. Luo, D. Zhou, X. Zhao, and H. Luo, "Complete set of elastic, dielectric, and piezoelectric constants of orthorhombic 0.71Pb(Mg[sub 1/3]Nb[sub 2/3])O[sub 3]–0.29PbTiO[sub 3] single crystal," *Appl. Phys. Lett.*, vol. 90, no. 21, p. 212903, 2007.
- [213] R. Zhang, W. Jiang, B. Jiang, and W. Cao, "Elastic, Dielectric and Piezoelctric Coefficients of Domain Engineered 0.70Pb(Mg1/3Nb2/3)O3-0.30PbTiO3 Single Crystal," *AIP Conf. Proc.*, vol. 626, pp. 188–197, 2002.
- [214] F. Li, S. Zhang, Z. Xu, X. Wei, J. Luo, and T. R. Shrout, "Composition and phase dependence of the intrinsic and extrinsic piezoelectric activity of domain engineered (1-x)Pb(Mg1/3Nb 2/3)O3-xPbTiO3 crystals," *J. Appl. Phys.*, vol. 108, no. 3, p. 034106, 2010.
- [215] Y. X. Zheng, *et al.*, "Study of uniaxial magnetism and enhanced magnetostriction in magnetic-annealed polycrystalline CoFe2O4," *J. Appl. Phys.*, vol. 110, no. 4, p. 043908, 2011.
- [216] J. Atulasimha and A. B. Flatau, "A review of magnetostrictive iron–gallium alloys," *Smart Mater. Struct.*, vol. 20, no. 4, p. 43001, 2011.
- [217] C. N. Chinnasamy, et al., "Unusually high coercivity and critical single-domain size of nearly monodispersed CoFe2O4 nanoparticles," Appl. Phys. Lett., vol. 83, no. 14, pp. 2862–2864, 2003.
- [218] H. Ding, J. W. Cheah, L. Chen, T. Sritharan, and J. L. Wang, "Electric-field control of magnetic properties of CoFe2O4 films on Pb(Mg1/3Nb2/3)O-3-PbTiO3 substrate," *Thin Solid Films*, vol. 522, pp. 420–424, 2012.
- [219] H. Zheng, J. Kreisel, Y. H. Chu, R. Ramesh, and L. Salamanca-Riba, "Heteroepitaxially enhanced magnetic anisotropy in BaTiO3- CoFe2O4 nanostructures," *Appl. Phys. Lett.*, vol. 90, no. 11, pp. 1–4, 2007.
- [220] B. Fu, R. Lu, K. Gao, Y. Yang, and Y. Wang, "Magnetoelectric coupling in multiferroic BaTiO 3 -CoFe 2 O 4 composite nanofibers via electrospinning," *Epl*,

vol. 111, pp. 5–10, 2015.

- [221] S. Manipatruni, D. E. Nikonov, H. Liu, and I. A. Young, "Response to Comment on 'Spin -Orbit Logic with Magnetoelectric Nodes: A Scalable Charge Mediated Nonvolatile Spintronic Logic," arxiv, pp. 1–25.
- [222] A. K. Nayak, et al., "Design of compensated ferrimagnetic Heusler alloys for giant tunable exchange bias," Nat. Mater., vol. 14, no. 7, pp. 679–684, 2015.
- [223] F. Oboril, R. Bishnoi, M. Ebrahimi, and M. B. Tahoori, "Evaluation of hybrid memory technologies using SOT-MRAM for on-chip cache hierarchy," *IEEE Trans. Comput. Des. Integr. Circuits Syst.*, vol. 34, no. 3, pp. 367–380, 2015.
- [224] Q. Wu, J. P. Turpin, and D. H. Werner, "Integrated photonic systems based on transformation optics enabled gradient index devices," *Light Sci. Appl.*, vol. 1, pp. 1–6, 2012.
- [225] D. Schurig, *et al.*, "Metamaterial electromagnetic cloak at microwave frequencies," *Science*, vol. 314, no. 5801, pp. 977–980, 2006.
- [226] M. Rahm, D. Schurig, D. A. Roberts, S. A. Cummer, D. R. Smith, and J. B. Pendry, "Design of electromagnetic cloaks and concentrators using form-invariant coordinate transformations of Maxwell's equations," *Photonics Nanostructures -Fundam. Appl.*, vol. 6, no. 1, pp. 87–95, 2008.
- [227] E. E. Narimanov and A. V. Kildishev, "Optical black hole: Broadband omnidirectional light absorber," *Appl. Phys. Lett.*, vol. 95, no. 4, pp. 2007–2010, 2009.