Abstract-Spin-transfer torque (STT) mechanisms in vertical and lateral spin valves and magnetization reversal/domain wall motion with spin-orbit torque (SOT) have opened up new possibilities of efficiently mimicking "neural" and "synaptic" functionalities with much lower area and energy consumption compared to CMOS implementations. In this paper, we review various STT/SOT devices that can provide a compact and area-efficient implementation of artificial neurons and synapses. We provide a device-circuit-system perspective and envision design of an All-Spin neuromorphic processor (with different degrees of bio-fidelity) that can be potentially appealing for ultralow power cognitive applications.
I. INTRODUCTION

R
ECENT years have witnessed unprecedented success of Artificial Intelligence at enabling a plethora of cognitive platforms that have been able to achieve human-like performance for a large number of tasks that involve some variant of pattern recognition ranging from the entire spectrum of simple digit [1] to complex face recognition systems [2] . The vast majority of these platforms are based on computing models that derive inspiration not only from the functional units involved in the processing of information in the human brain but also from the manner in which the units appear to be connected to each other to form neural pathways. However, such "artificial" neural computing models have necessitated rethinking of traditional computing based on the von-Neumann perspective. This is due to the fact that the computational unit in such brain-inspired computing models are highly parallel and require the co-existence of memory and processing in the same computing core. This has resulted in the development of several flagship neuromorphic projects, for instance the IBM T rueNorth [3] , that attempted to develop custom CMOS hardware where memory and computing units are interfaced within the same core. However, huge bottlenecks still exist at The highly parallel computation occurring in a particular layer of a neural network can be implemented efficiently in a crossbar array structure where the synaptic weights at each cross-point modulate the input strength for the corresponding neuron.
the area and power consumption front due to the significant mismatch between the underlying CMOS transistors and such neuromimetic computations. Hence, there has been a growing interest in the recent years on the exploration of nanoelectronic devices that are able to directly mimic neural and synaptic operations. A few words on the neural computing models are in order so that the prospects of such post-CMOS technologies and especially spintronic devices as neuromimetic devices may be understood.
The basic functional unit of a neural network consists of a neuron and associated synapses which serve as junctions for transmission of the inputs to the neuron [ Fig. 1(a) ]. Each input stimuli to the unit is associated with a corresponding synaptic weight, which encodes the importance level of the input for that specific unit. The accumulated weighted synaptic input is then processed by the neuron and its corresponding output is then transmitted to its fan-out neurons. Neural networks consist of layers of such units where each unit in a particular layer receive and process all (fully connected network) or a subset (convolutional network) of the total inputs to that layer, thereby leading to a highly parallel architecture. The outputs of the neurons are then fed as input to the next layer of neurons. Such a connectivity pattern can be efficiently implemented in a crossbar topology, as shown in Fig. 1(b) , where inputs applied along the horizontal rows are scaled by synaptic weights at each cross-point and are accumulated along the column to be processed by the neuron. For a CMOS based implementation, the synaptic elements at each cross-point would be a 6-T/8-T SRAM cell per bit [3] . Although high precision synaptic weights are typically not required for neuromorphic 1549 -8328 © 2016 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
applications, a minimum of 24 CMOS transistors would be required even for implementing a single 4-bit synapse. Further, the energy expended to fetch the synaptic weights to the computing unit from an SRAM bank and the memory leakage constitutes a significant portion of the total energy consumption in a CMOS based neuromorphic architecture [4] . This was the main impetus behind the origin of the field of in-memory neuromorphic computing architectures [ Fig. 1(b) ] based on post-CMOS technologies where it was proposed that single devices like memristors [5] , phase change memories [6] and other resistive technologies [7] could directly mimic the functionality of a synapse and simultaneously be arranged in a crossbar fashion to perform the dot-product computing kernel. Since the number of synapses driving a single neuron is typically expected to reach 10,000 or more for large-scale pattern recognition systems, such nanoelectronic synapses was a crucial step towards the implementation of compact and area efficient neuro-inspired hardware. While synapses serve as the memory element in the network, encoding information about the significance of various inputs, the neuron is the processing element which generates an output depending on the resultant synaptic input received. Although initial research on such neuromimetic devices was mostly focused on the synapses (note that synapses outnumber the neurons by several orders of magnitude), it still incurred high power consumption due to the fact that the large crossbar arrays of resistive synapses had to be operated at a higher voltage in order to be interfaced with analog CMOS neurons. Spintronic devices offers a promising solution to this power bottleneck due to the unique low-power functionalities it can offer [8] - [10] . The magneto-metallic spin devices could be operated at ultra-low voltages and switched by ultra-low currents, thereby enabling low-power operation of the synaptic crossbar array. However, most of these proposals are limited to the implementation of relatively simple "artificial" neurons, i.e. neuron generating binary output depending on the sign of the synaptic input. While such "step" neuron functionalities are able to provide satisfactory results for simple recognition tasks, they might prove to be difficult for implementing highaccuracy recognition systems for complex tasks.
Recent research on current driven domain wall motion in multi-domain magnets [11] - [13] have opened up the possibility of implementing more complex neural functionalities at extremely low power [14] . In addition, spiking neuron functionalities offering a higher degree of bio-fidelity can be also efficiently implemented in such magnetic structures [15] . It is worth mentioning here that in addition to being more biologically realistic, spiking neural architectures allow asynchronous sparse low-power event-driven network operation [16] , [17] . Further, unsupervised synaptic learning enabled by such event-driven spiking networks might potentially pave the way for adaptive neuromorphic systems [16] .
In addition to neural operations, the same device structure can be used as an electronic synapse [14] , [15] . The potential benefit offered by such spintronic synapses is the associated low programming energy in comparison to memristive or other post-CMOS resistive technologies, thereby leading to energyefficient on-chip learning. Such spintronic synapses arranged in a crossbar fashion can be interfaced with spintronic neurons, leading to a compact and energy efficient neural system design.
The goal of this paper is to provide a review of emerging physical phenomena observed in spintronic devices that can be highly appealing for neuromorphic applications. We present a general discussion on possible device structures that can potentially serve as synapses as well as neurons with different degrees of bio-fidelity. The design considerations for such neuronal and synaptic devices from the device level to the system level are outlined for an All-Spin neuromorphic processor. We conclude by providing a general overview of the opportunities that such spintronic neuromorphic designs have to offer along with key design challenges that need to be overcome.
II. SPINTRONIC NEURONS
A. Device Preliminaries
In this section we review some of the spin-transfer torque (STT) phenomena that can be engineered to develop spintronic devices suitable for neuromimetic computation. Before we discuss the details of the various STT phenomena, it is worth mentioning that the behavior of the magnetization, m, of a nanomagnet in the presence of an effective magnetic field, H eff , and a spin current, I s , may be modeled using the LandauLifshitz-Gilbert (LLG) equation with extra terms describing the interaction between the nanomagnet and spin current [18] 
where, N s is the number of spins comprising the nanomagnet given as N s = M s V /μ B . M s and V are the saturation magnetization and volume of the nanomagnet, respectively, and μ B the Bohr magneton. In the absence of any external stimuli, the magnetization of the nanomagnet stabilizes along its easy axes direction, which can be in-plane (IMA: shape anisotropy dominating the uniaxial anisotropy of the magnet [19] ) or out-of-plane (PMA: magnetocrystalline anisotropy dominating the uniaxial anisotropy of the magnet [19] ) of the thin film magnetic layers.
B. "
Step" Neurons
1) Magnetic Tunnel Junction Based Neuron:
The vertical spin valve (VSV) structure [20] , depicted in Fig. 2 , is the basic building block for the device structures discussed herein. It consists of two ferromagnetic (FM) layers, one of which is magnetically "pinned" and serves as the reference layer. The two magnets are separated by a spacer layer and the resistance of this spin valve is typically used as a sensor to read the magnetization of the "free layer" since the valve resistance varies with the magnetization of the "free" layer relative to the "pinned" layer. We will restrict our discussion to a Magnetic Tunnel Junction (MTJ: tunneling oxide barrier) in this paper. The resistance of the stack is lower (higher) when the two magnetic layer spin orientations are in the same (opposite) Fig. 2 . The Magnetic Tunnel Junction (MTJ) consists of a tunneling oxide sandwiched between two nanomagnets. The magnetization of the "pinned" layer is fixed while the magnetization of the "free" layer can be manipulated. Charge current flowing from the "pinned" layer to the "free" layer can switch the MTJ from the parallel (P) to the anti-parallel (AP) state, and vice versa. direction and it is referred to as the parallel (anti-parallel) orientation. The ease of sensing the difference in the resistance is quantified as the tunneling magnetoresistance ratio (TMR)
where G P and G AP are the MTJ conductances in the parallel (P) and anti-parallel (AP) orientation. TMR ratio of a few hundred percent are routinely achieved in MTJs available today. In addition to providing a readout mechanism, input current flowing through the MTJ gets spin-polarized and orients the MTJ in P or AP direction depending on its direction. If charge current flows from the "free" layer to the "pinned" layer, i.e. electrons flowing from the "pinned" to the "free" layer, the "free" layer gets oriented in the same direction as the "pinned" layer (P orientation) and vice versa [19] . However, it is worth noting here, that the generated spin current is limited by the polarization strength of the "pinned" layer. Note that the simple "step" thresholding neural operation directly maps to the switching of an MTJ due to an incoming current [21] . When synaptic current flows from the "free" layer to the "pinned" layer, the MTJ switches to the P state and vice versa, provided that the input current is greater than a threshold value. However, in order to emulate the step-transfer function neuron, the synaptic resistive crossbar array has to be operated at a higher voltage to ensure that the resultant synaptic current input to the neuron falls outside the hysteresis curve. This constraint, coupled with the fact that the input synaptic current has to flow through the oxide layer, results in a large power consumption overhead since the entire crossbar array needs to be operated at a higher voltage. However, proper investigation of circuit level techniques can be used to reduce the power consumption of the array and hence, the overall system. Ref. [21] explored a design where the MTJ is always reset to the AP state before operation and supplied with an input bias current equal to the critical current required for switching. Hence, the synaptic crossbar arrays could be now operated at a lower voltage since a small current (supplied by the synaptic crossbar array) on either side of the bias current would determine the final state of the MTJ and hence the neuron state. However, the power and energy benefits are still curtailed due to the high critical current requirement for MTJ "write" operation. Hence there is a need to explore alternate device physics for MTJ switching to achieve energy efficient neuromorphic hardware implementation as will be discussed in the next few subsections. Fig. 3 . Lateral spin valve (LSV) based "step" neuron. Initially the magnet m1 is switched to the "hard axis" by the preset pulse. Subsequently the excitatory and inhibitory current inputs switches the magnet m1 to one of the stable P or AP states.
2) Lateral Spin Valve Based Neuron:
A lateral spin valve (LSV) consists of two ferromagnets located on top of a non-magnetic channel [22] . Recent experiments have demonstrated the possibility of switching a detector FM by injecting current through an injecting FM and the underlying channel by non-local STT effect [23] . Such a non-local effect can be explained by the spin drift-diffusion model of electron transport [24] . The input charge current injects spins (aligned with the FM magnetization) into the non-magnetic channel. Hence a spin voltage difference is created in the channel between the regions underlying the injector and detector FMs. The gradient of this spin voltage results in spin-current flow from the injector to the detector FM, which in turn, switches the detector FM. However, due to spin-flip processes, the injected spin current decays exponentially with distance from the injector FM. Hence, proper channel materials for efficient spin transport are still under exploration. Fig. 3 depicts the device structure of a bipolar spin-neuron based on an LSV [8] . Note that the neuron is being referred to as bipolar because excitatory (positive) and inhibitory (negative) synaptic currents are being applied separately as inputs to the neuron. The device structure consists of input ferromagnets m2-m4 and an output magnet, m1. The magnet m1 also forms an MTJ based read port. The magnetization directions of the two input magnets, m2 and m3, lie antiparallel to each other along their "easy-axis." Hence, these two input magnets can provide the excitatory and inhibitory synaptic currents respectively. The preset magnet, m4, shown in Fig. 3 , however, has its "easy-axis" orthogonal to that of m1, and is used to preset the magnet m1 along the "hard axis" at the start of the neuronal operation.
After removal of the preset pulse, the excitatory and inhibitory synaptic current pulses are received through the magnets m2 and m3, respectively. Charge current injected into the channel through m2 and m3 gets spin polarized according to the corresponding polarities of magnets. Each of these two anti-parallel spin polarized currents exerts a spin-transfer torque (STT) on m1. The final state of m1 depends on the difference between the charge currents through m2 and m3, corresponding to the excitatory and inhibitory synaptic currents. The preset stage is utilized to reduce the critical current requirement for deterministic switching since the "hard-axis" is an unstable state for the magnet. The magnitude of the critical current is determined by the thermal noise in the output magnet and imprecision in the preset scheme. The resistance of the read MTJ encodes the final state of the neuron after the thresholding operation.
3) Spin-Orbit Torque Based Neuron: Recently, spin-orbit torque (SOT) has emerged as one of the most promising mechanisms for the generation of spin current to switch a nanomagnet in ferromagnet-heavy metal (FM-HM) multilayer structures as the efficiency of spin current generation is not limited by the polarization strength of a FM. Such SOT effects have been observed in magnetization switching [25] - [27] , domain wall motion [11] and spin-torque oscillations [28] . When a charge current flows through the HM underlayer with high spin-orbit coupling, a transverse spin current is injected on the top and bottom surfaces of the HM (assuming spin-Hall effect [29] to be the dominant underlying physical phenomena). Due to repeated scattering of the electrons at the FM-HM interface, multiple units of angular momentum can be transferred to the FM lying on top, thereby leading to efficient spin injection. The input spin current density (J s ) is related to the charge current density (J q ) flowing through the HM underlayer by the relationship,
)I q , where I s and I q are the input spin current and charge current magnitudes respectively, θ S H is the spin-Hall angle [29] and, A MT J and A H M are the MTJ and HM crosssectional areas respectively.
The HM based three terminal device [10] consists of an MTJ structure where the "free" layer (FM) of the MTJ (with Perpendicular Magnetic Anisotropy, PMA) is in contact with the top surface of a HM (Fig. 4) . A two-step switching scheme can be utilized to implement the thresholding operation of a neuron. In the first step, a charge current, I clock , flows through the HM and orients the "free" layer along the "hard-axis" due to the injection of in-plane directed spins at the FM-HM interface. In the second step, a net synaptic current, I s , flows through the "pinned" layer of the MTJ. The direction of the synaptic current determines the final orientation of the magnet. Note that, as discussed in the previous subsection, a small positive or negative synaptic current, just enough to overcome thermal fluctuations, is sufficient to drive the magnetization of the FM to either the P or AP configuration.
C. "Non-Step" Neuron 1) Current Induced Domain Wall Motion:
The discussion on nanomagnets so far has been mainly focused on monodomain magnets. Since the MTJ exhibits only two stable resistive states, namely P and AP, only "step artificial" neurons could be implemented using such device structures. Interestingly, neurons exhibiting analog output variation with input magnitude variation, i.e. "non-step" neurons, can be implemented in an MTJ where the "free" layer is a multi-domain Fig. 4 . Spin-orbit torque based "step" neuron. Initially the FM is switched to the "hard axis" by the clock pulse flowing through the HM. Subsequently the resultant input synaptic current switches the magnet to one of the stable P or AP states. magnetic strip with oppositely polarized magnetic domains separated by a transition region termed as domain wall (DW) where the position of domain wall encodes the conductance of the MTJ. As shown in Fig. 5(a) -(b) magnetic nanowires with Perpendicular Magnetic Anisotropy (PMA) exhibit two types of domain wall. The domain wall is termed as a Néel wall when the magnetization direction at the wall location rotates in a plane perpendicular to the plane of the wall and is typically observed for nanowires with width less than 100nm (owning to shape anisotropy) [30] . For wider nanowires, the wall magnetization rotates in the plane of the wall and is termed as the Bloch wall [30] . Charge current flowing through the magnetic strip can displace the domain wall in the direction of electron flow due to STT effect. However, the current density required for domain wall movement is still sufficiently high in such single layer magnetic structures resulting in undesired heating effects.
Recent experiments on FM-HM bilayers have provided a promising mechanism for the efficient control of DW motion by current density magnitudes that can be ∼ 100× lower than conventional spin-transfer torque driven DW motion [11] - [13] . Additionally the resistance in the path of the "write" current is reduced by almost a factor of ∼ 10× since most of the current flows through the wider HM underlying the FM. Consider the multilayer structures shown in Fig. 5 (c)-(d) . Input charge current flowing along the y-direction will cause injection of x-axis directed spins at the FM-HM interface. A general principle to determine the DW movement direction is to calculate the cross-product between the injected spin direction at the FM-HM interface and the magnetization direction at the wall location. The cross product direction signifies the final magnetization state of the magnet, and hence, the DW motion direction. Regarding the orientation of the DW, there can be two alternatives, namely a longitudinal wall (parallel to the length of the magnet) or a transverse wall (perpendicular to the length of the magnet). However, in both cases the wall magnetization needs to be along the y-axis in order to achieve any DW movement. This implies that a Bloch wall configuration is required for the longitudinal wall and a Néel wall orientation is required for the transverse wall. Let us first discuss the case for the longitudinal wall. Shape anisotropy of the magnet (assuming sufficient magnet width, typically above 100nm) will cause the stabilization of Bloch wall in the FM [31] . However, an in-plane magnetic field is required to retain the stability of the wall in the presence of injected spins due to current flow in the underlying HM [31] . On the other hand, the Néel wall can be stabilized by an effect termed as the Dzyaloshinskii-Moriya exchange interaction (DMI), which is normally associated with such FM-HM bilayers due to spin-orbit coupling and broken inversion symmetry of such magnetic heterostructures [32] - [34] . As a matter of fact, the DMI strength in certain multilayers like CoFe-Pt or CoFe-Ta [33] , [34] has been observed to be strong enough to impose Néel wall configuration even for wider nanomagnets where conventional magnetostatics would have yielded a Bloch configuration. Note that Bloch wall stabilization in the former case (longitudinal DW) discussed before is possible in samples with negligible DMI [31] . The magnetization dynamics of such DMI stabilized DW motion can be studied by solving (1) at each grid point with an additional external field accounting for the DMI effect given by [32] - [34] 
where D represents the effective DMI constant and determines the strength of DMI field in such multilayer structures. The strength of the effective DMI field at the wall location is enough to stabilize the Néel wall magnetization even in the presence of in-plane injected spins due to current flow through the underlying HM. Hence no external magnetic field is required for DW propagation in such magnetic multilayers with inherent DMI effect and consequently more attractive from scalability point of view. As a result we will focus on device structures based on the latter case for the remainder of this text.
2) Domain Wall Motion Based Neuron:
By exploiting the basic underlying device physics of spin-orbit torque driven DW motion in magnetic multilayers, a "non-step" neuron functionality can be implemented in a three terminal device structure, as shown in Fig. 6(a) [14] , where an MTJ is used to "read" the magnetization state of the magnet depending on the position of DW in the device. The device structure is based on the transverse Néel wall configuration discussed in the previous section. A complementary Bloch wall based structure can be also implemented and was investigated in Ref. [35] . Note that traditionally, MTJs have been used to "read" magnetization state at the two extreme states, namely the P and AP states. However, our proposal exploits the analog MTJ resistance variation with DW position expected in multi-domain magnets to implement an "artificial" neuron with graded analog output.
Let us now discuss the principle of operation of the device, consisting of consecutive "write" and "read" cycles. During the "write" cycle, the resultant input synaptic current flows between terminals T2 and T3 and displaces the DW by an amount proportional to the magnitude of the current. Note that most of the current flows through the HM and hence spin-orbit torque generated by the HM underlayer is the main mechanism of DW motion. This fact has been also confirmed by experiments from the observation that DW motion is against the direction of electron flow (i.e., in the same direction as current flow) for CoFe-Pt samples [11] , contrary to the STT driven scenario. During the subsequent "read" cycle, the device terminals T1 and T3 are activated and the resistive divider circuit [ Fig. 6(b) ] is utilized to drive a PMOS transistor in order to generate a corresponding output current. Note that this associated peripheral circuit performs the function of a biological axon to transmit the neuron output information to the fan-out synapses. The resistive divider circuit consists of the neuronal device interfaced with a reference MTJ whose orientation is always fixed to the AP state. Since the "read" current, flowing through the two MTJs in series, can be maintained to sufficiently low values by ensuring a proper oxide thickness, it is not expected to cause any change in the magnetization states of the neuron or the reference MTJ. This is a potential benefit that is a unique characteristic of the proposed three terminal neuronal spintronic device in comparison to other post-CMOS resistive technologies. As we will explain in the subsequent text, from a network operation perspective, neuronal devices need to be interfaced with resistive crossbar arrays, which entails a criteria that the input resistance of the neuron has to be sufficiently low. The decoupled "write" and "read" current paths allows us to The neuron is interfaced with the "axon" circuit to generate a corresponding analog output current with variation in the input current, (c) Integrate-Fire "spiking" neuron can be implemented using a similar device structure where the MTJ is located at the extreme edge of the FM. optimize the two circuits separately. The resistance in the path of the input synaptic current is essentially the HM resistance which is typically of the order of a few hundred ohms and does not have any impact on the magnitude of the "read" current.
With increase in the magnitude of the "write" current, the neuron MTJ resistance reduces due to decrease in the proportion of the AP domain in the MTJ. Hence, the node voltage V G reduces, thereby increasing the output current provided by the CMOS transistor. By utilizing a device-circuit co-simulation framework, it can be shown that the variation between the input synaptic current and the output current provided by the transistor bears an approximately linear relationship [14] . The device level simulation framework involves a micromagnetic simulation utilizing Eqs. 1, 3 to determine the DW location for a given magnitude of the input current through the HM underlayer. Subsequently the DW location is employed to determine the output current provided by the transistor using circuit level simulations that utilize Non-Equilibrium Green's Function (NEGF) based transport simulation framework to model the MTJ resistance [36] (capturing the variation of MTJ resistance with oxide thickness and applied voltage). The neuron is "reset" after every "read" cycle by passing a current through the HM in the opposite direction to initialize the DW location at the left edge of the MTJ. It is worth mentioning here that this device can be used as a "step" neuron as well if the domain wall is sensed at the two extreme edges of the MTJ [9] .
D. Integrate-Fire Spiking Neuron
With recent research efforts aimed at implementing lowpower event-driven Spiking Neural Networks, there has been a rich body of literature dedicated to the development of algorithms and heuristics for implementing deep fully-connected or convolutional Spiking Neural Networks [37] , [38] . Such spiking networks are mainly focused on the implementation of an Integrate-Fire (IF) "spiking" neuron where the neuron is characterized by a membrane potential that accumulates the synaptic weight upon the receipt of a spike at the corresponding input [37] . The neuron generates a spike when the membrane potential crosses a particular threshold and gets reset. This functionality of an IF neuron directly maps to the current integrating property of DW motion in magnetic structures discussed in the previous subsection [15] . During a particular time-step The DW gets displaced by an amount proportionate to the magnitude of input synaptic current during every "write" cycle and continues integrating the pulses till it reaches the opposite edge of the magnet. The "read" circuit, in this scenario, can be an MTJ at the opposite edge of the magnet (instead of the entire magnet in the previous case) that detects whether the DW has reached the opposite edge of the MTJ. Utilizing the same resistive divider structure discussed previously, an output spike is generated by a simple inverter, as shown in Fig. 6(c) .
III. SPINTRONIC SYNAPSE
The basic functionality desired from a nanoelectronic implementation of a synapse is that of a programmable resistor, which directly maps to the operation of the three terminal device shown in Fig. 6(a) . The magnitude of the input charge current flowing between terminals T2 and T3 determines the device resistance measured between terminals T1 and T3. However, the unique property that makes the device suited both as a neuron and a synapse, in contrast to traditional two terminal memristors, is the decoupled "write" and "read" current paths that enables switching/programming the device at ultra-low terminal voltages. While the low input resistance of the device during "write" operation is attractive for neural operations (discussed in details in the next section), the low programming energy during "write" operation translates to huge energy savings for implementing plasticity or learning functionalities in the nanoelectronic synapses. Note that although memristive devices might be suitable as electronic synapses, they will pose to be a difficult alternative for neuronal operations due to their associated high input resistance and threshold voltage during "write" operations [5] - [7] .
Let us now discuss the design considerations for the spintronic device structure from the synaptic operation perspective. The "read" current flowing through each synapse for a particular neuron is dictated by the current requirement Fig. 7 . The conductance of the spintronic device structure depends on the relative proportion of P and AP domains in the FM. Synaptic functionality can be implemented in the device by appropriately programming the DW position.
for the neuron. Note that the "read" current flowing from terminal T1 to T3 flows through the "pinned" layer along with some portion of the HM. Since spin-orbit torque from spin-Hall effect is much more dominant in comparison to that induced by the "pinned" layer, effect of the "read" current on FM magnetization is expected to be determined by the spinorbit torque induced by the "read" current flowing through the HM underlayer. However, critical current for DW movement scales linearly with the magnet width. Hence, the synapse FM width can be scaled appropriately such that the "read" current does not cause any DW depinning. Further, the length of the FM will be determined by the minimum number of discrete levels required for proper network operation (from recognition accuracy considerations). Note that most of these applications do not require high level of bit discretization or precision in the synaptic or neural elements, being highly adaptive/errorresilient in nature. For pattern recognition problems, synapses typically require more intermediate programming levels than the neuron. For instance, the character recognition problems considered in [14] , [15] required 4-bit discretization in the synapses and 2-bit discretization in the neurons to achieve recognition accuracies close to the trained baseline network. Hence, while neuronal devices can be aggressively scaled down, synaptic devices are typically larger in terms of device dimensions in order to achieve sufficient bit discretization and simultaneously to ensure disturb-free "read."
As shown in Fig. 7 , the device conductance between terminals T1 and T3 is dominated by the MTJ resistance which varies linearly with the domain wall position. Let us denote the conductance of the device when the FM magnetization is P (AP) to the "pinned" layer as G P (G AP ), i.e. the domain wall is at the extreme right (left) of the FM. Thus, for an intermediate position of the domain wall at a position x from the left-edge of the MTJ, the device conductance between terminals T1 and T3 is given by
where, L denotes the length of the MTJ excluding the domain wall width and G DW represents the conductance of the wall region. For a given time duration of the current flowing through the HM, it can be shown from micromagnetic simulations that the programming current magnitude is directly proportional to the DW displacement, and hence the change in synaptic conductance [14] , [15] .
Another important design consideration for the spintronic synapse is the operating voltage during "read." The TMR, referred to in (2) , is a function of the applied voltage across the MTJ and reduces with increasing voltage. Typical experimental results measured in MTJ structures [39] , [40] demonstrate that the TMR variation is almost negligible for operating voltages below 100 mV and starts reducing appreciably as the voltage increases. Reduction in TMR implies that the ratio of the maximum to the minimum synaptic weight that can be encoded becomes limited. Further, for "artificial" non-spiking networks where the input voltages for different rows of the crossbar array of synapses (Fig. 1) are analog and can differ in magnitude from one row to another, it is desirable that the synapse weight or conductance does not vary with variation in the input voltage for different rows in the array arbitrarily. Hence, it is imperative from functionality viewpoint to operate the crossbar array of spintronic synapses below 100mV (in addition to benefits obtained from reduction in power consumption). Such low voltage operation of the crossbar arrays is essentially enabled by the ultra-low current requirements of magneto-metallic spintronic neurons. The interfacing of spintronic synaptic crossbar arrays with spintronic neurons from circuit operation perspective will be discussed in details in the next section.
IV. ALL-SPIN NEUROMORPHIC PROCESSING ELEMENT
In this section we will discuss the design of a Neuromorphic Processing Element (NPE) based on spintronic neurons and synapses that can potentially serve as the computing core for the implementation of deep spiking neural systems. For the purpose of demonstration, we will consider spiking neuron computational models as described in [37] . Each NPE receives a set of spike inputs and subsequently performs a dot product operation with the synaptic weights for a particular neuron. The weighted summation of inputs is then processed by the corresponding neuron to generate an output. Such a computational model corresponds to a post-synaptic voltage (voltage applied at the synapse due to pre-neuron spiking) that assumes a high value corresponding to a spike in a particular time-step. It is worth noting here that complex post-synaptic voltages with exponential decay to model higher degree of bio-fidelity can be implemented by interfacing the crossbar array with appropriate peripheral circuits that can drive the rows of the array with exponentially decaying supply voltages at the arrival of an input spike/event. The neuron accumulates the input synaptic currents at every time-step and generates an output spike only when the DW has reached the opposite edge of the magnet. Note that the computational model under discussion [37] does not involve refractory period (duration during which a neuron should not spike after generating a spike). However, such refractoriness can be easily implemented by having a global control circuitry that switches off the neuron from the crossbar array corresponding to the time-steps that lie within the refractory period.
The computation being performed in an NPE receiving m inputs and producing n outputs can be directly mapped to a crossbar architecture of size m × n, as shown in Fig. 8 , where each horizontal metal line provides an input voltage All-Spin Neuromorphic Processing Element where a spintronic synaptic crossbar array is interfaced with spintronic neurons. The input synaptic current supplied to each spintronic neuron, described by Eq. (5), can be derived by utilizing Kirchoff's laws for a particular column of the crossbar array assuming voltage V i being applied across row i and G i, j to be the synaptic conductance connecting row i to column j for the j-th spintronic neuron.
across each spintronic synapse and each vertical metal line provides an input current to the spintronic neuron situated at the end of the vertical column. The mapping relies on the basic principle of Kirchhoff's law where the current flowing through each synapse (scaled by the conductance of the spintronic synapse) is eventually summed up along the column to provide the resultant current input to the spintronic neuron. However, this is only valid under the assumption that the voltage drop across the neuron is very low in comparison to the voltage drop across the synapses, which entails that the input resistance of the neuron has to be low. After the "write" cycle, the "read" terminals of the neuron are activated and spikes (logic value "1") are generated at the output inverter of each spintronic neuron in case the DW is displaced at the opposite edge of the FM. The inverter outputs can be stored in a latch to enable a pipelined design.
In order to implement bipolar weights, two rows (V i+ and V i− ) are used for each input V i . When the input V i assumes a logic value of "0"(no spike), then "0" voltage level is applied to both the inputs. However, when V i assumes a logic value of "1"(spike), then voltage + V (less than 100mV ) is applied to the row corresponding to V i+ and − V is applied to the row corresponding to V i− . If the weight G i, j for the j -th neuron corresponding to input V i is positive, then the conductance corresponding to V i+ is programmed to the corresponding weight, while the conductance corresponding to V i− is programmed to high OFF resistive state and vice versa. Let us consider the input conductance of the spintronic neuron during the "write" operation (mainly the HM conductance of the neuron) to be G s and the voltage drop across the neuron to be V s . Equating the current supplied by the resistive synapses to the current flowing through the neuron, we get
indicates that the net synaptic current supplied to the spintronic neuron is given by
Note that lower the operating voltage, higher is the range of synaptic conductances (which can be appropriately tuned by choosing a proper value of MTJ oxide thickness) required to ensure the critical current requirement for DW displacement from one edge to another in the FM of the spintronic neurons. This results in the increment of the ratio, γ = i (G i, j + + G i, j − )/G s , resulting in non-ideal operation of the neuron. In order to ensure that γ << 1 for a given crossbar operating voltage, the duration of the "write" cycle can be adjusted accordingly since the critical current required to displace the DW from one edge of the FM to the other scales linearly with the duration of the "write" current. Note that although the discussion in this section focussed on "spiking" networks, NPE design for "non-spiking" networks would involve the interfacing of the synaptic crossbar array with "non-step artificial" neurons along with the corresponding "axon" circuit [ Fig. 6 (a)-(b) ] [14] .
V. DISCUSSIONS: PROSPECTS AND PERSPECTIVES
In this section we address some of the key design challenges lying in the roadmap for neuromorphic computing platforms based on such spintronic synapses and neurons. Deterministic domain wall motion due to spin-orbit torque generated by a heavy metal underlayer in magnetic multilayer structures have been demonstrated by several research groups [11] - [13] . However, the granularity at which the domain wall position in the proposed device structure can be programmed and sensed as the MTJ conductance has to be confirmed by device fabrication and measurements. We believe that our proposal will stimulate proof of concept experiments to develop such three-terminal spintronic device structures and analyze the programming resolution versus dimension tradeoff. In fact, this a limitation of other proposed memristive devices as well since it is expected that such devices will have limited programming resolution at aggressively scaled dimensions [41] . However, typically ∼ 4 − 5 bit synaptic discretization and ∼ 2 bit neuronal discretization is sufficient for achieving good accuracy even in large-scale pattern recognition problems [14] , [15] . Further, notches can be also utilized to pin the domain wall at specific locations along the length of the magnet to achieve the necessary bit discretization [42] .
Another critical design issue is the extent to which variability and noise in the spintronic devices will impact the system level performance. However, we would like to mention here, that such neuromorphic systems are significantly robust to imprecision due to device mismatch, variability and noise effects due to the adaptive nature of such computations. Interested readers are directed to [43] for details on the robustness of such systems to noise and how device mismatch and variability can be exploited to perform more efficient learning. As a matter of fact, simulations performed on a largescale deep learning architecture for a standard digit recognition problem in [15] reveal that the recognition accuracy degradation is almost negligible even with 25% σ variation in the MTJ conductances of the spintronic crossbar array [15] .
An additional figure of merit is the TMR ratio available in the spintronic synapses since it limits the maximum to minimum ratio of synaptic weights that can be encoded in the network. Higher the ratio, better is the network performance in terms of recognition accuracy. However, in general, a ratio of 10 is able to provide sufficient accuracy in such deep networks [44] . While in present day technology, a maximum TMR ratio of 600% [45] (maximum to minimum synaptic weight ratio of 7×) has been experimentally demonstrated, it is expected to reach values greater than 1000% in a time frame of ten years [46] .
Finally to conclude, let us provide a brief discussion on the performance and energy consumption of the proposed spintronic neural architectures in comparison to baseline CMOS implementations. Table I provides the device parameters and voltage/current requirements of the various spin-neurons (calibrated to experimental measurements) in addition to the average energy consumption of a corresponding CMOS analog/digital neuron implementation at the 45nm technology node. However, it is worth noting here that unlike logic and memory (von-Neumann computing), benchmarking efforts on neuromorphic hardware implementation requires the simultaneous consideration of activation function and network connectivity, recognition problem, input data pattern, classification accuracy, neural computing model, necessary bit discretization required in neurons and synapses, among others. As an initial benchmarking effort, the performance of the proposed All-Spin neuromorphic architecture was evaluated for a standard deep convolutional network connectivity on a standard digit recognition example in [15] . A hybrid device-circuit-algorithm co-simulation framework, calibrated to experimental results performed in [33] , [34] , [39] , [40] , was utilized to estimate the power and energy consumption of the network. Interested readers are referred to [15] for an in-depth discussion on the details of the simulation framework and results. An intuitive understanding of the power benefits that could be potentially offered by such spintronic neural network designs can be obtained from two perspectives-the ultralow current induced switching of magneto-metallic spintronic neurons, which in turn, enables the ultra-low voltage operation of the spintronic synapse crossbar array. For instance, micromagnetic simulations reveal that ∼ 10.6 μA current is sufficient to displace the DW from one edge of the neuron to another (dimension 80nm × 20nm) in a duration of 2ns. The current flows through the FM-HM bilayer resistance resulting in an energy consumption of 0.05 f J (I 2 Rt energy consumption). In addition, the spintronic synapses providing input currents to each neuron are operated at ultra-low terminal voltages of 100mV. System level simulation studies of the entire network indicate that the proposed spintronic design can potentially achieve 250× improvement in energy consumption and 56× improvement in EDP over a baseline CMOS implementation in commercial 45nm technology [15] while achieving a classification accuracy of ∼ 98.5% on the recognition problem.
