Recent years have witnessed growing interest in the field of brain-inspired computing based on neural-network architectures. In order to translate the related algorithmic models into powerful, yet energy-efficient cognitive-computing hardware, computing-devices beyond CMOS may need to be explored. The suitability of such devices to this field of computing would strongly depend upon how closely their physical characteristics match with the essential computing primitives employed in such models. In this work we discuss the rationale of applying emerging spin-torque devices for bio-inspired computing. Recent spin-torque experiments have shown the path to low-current, low-voltage and high-speed magnetization switching in nano-scale magnetic devices. Such magneto-metallic, current-mode spin-torque switches can mimic the analog summing and 'thresholding' operation of an artificial neuron with high energy-efficiency.
Introduction
Several neural-network based computing models have been explored in recent years for realizing hardware that can perform brain-like cognitive-computing [1] . The fundamental computing units of such-systems can be identified as the 'neurons' that connect to each other and to external stimuli through adaptable or programmable connections called 'synapses' [1] .
Large number of neurons can be connected in various different network-topologies to realize different neuralnetwork architectures. For instance, cellular-neural-networks employ near-neighbor connectivity [2] , whereas, feed-forward networks employ all-to-all connections between neurons in consecutive network-stages ( fig. 1a ) [3] . Several other network-paradigms like Convolution
Neural Networks (CNN) [4] , and Hierarchical Temporal Memory (HTM) [5] , may employ pyramidal interconnections in which a larger number of neurons in a lower-level of network connect to fewer neurons at the next higher-level. The more recent network models possess higher learning capacities and are capable of performing more complex cognitive-computing tasks [5] .
Irrespective of the network-topology, the energy-efficiency, the performance and the integration density of neuromorphic hardware would be governed by the design of the fundamental computing units, i.e., the neurons. The basic operation of a step-transfer function neuron can be expressed as a 'sign' or 'threshold' operation given by eq.1 [1] . Y = sign (ΣW i I i + b i ) (1) Where, I i denote the i th input to the neuron, W i the corresponding synapse-weight and b i the neuron-bias. The input-weights (that can be positive or negative) can be realized using compact programmable, non-volatile resistive elements, namely memristors [6] . Several different device-techniques, including spintronic-memristors [7] , have been proposed and demonstrated in literature [8] . Application of input voltages to such resistive input-weights results in analogcurrents that are summed and compared with a threshold (which can be zero) by the neuron.
Conventionally, CMOS operation amplifiers have been used in literature for implementing the analog summation and thresholding operation of neurons [3, 7] . However, such schemes may not lead to scalable and energy-efficient designs. We proposed the application of ultra-low-voltage, current-mode spin-torque switches as neurons in our recent work [9] [10] [11] [12] .. In this work we present the basic rationale of using nano-scale spin-torque switches as 'neurons' for the design of highly energy-efficient neural-networks [10] . We discuss the specific device-characteristics of such spin-torque switches that lend themselves to an efficient mapping of the neuron-equation (eq.1).
We show how the terminal characteristics of spin-neurons can provide more than three orders of magnitude reduction in energy-delay product as compared to the conventional CMOS circuits.
II. Conventional Neuron Circuit
Fig. 1b depicts an ideal circuit-model for a neuron with step transfer-function given by eq. 1.
The synapse-weights are implemented using programmable conductance elements G i (which can potentially have negative values). Input voltages V i applied to the synapses result in a current ΣG i V i, , which can be either positive or negative, depending upon the set of inputs and the weights. The neuron-output, acting as a current-dependent binary voltage-source, assumes a high (+1) or a low (-1) value, depending upon the sign of the total current. It is important to note the essential input characteristics provided by the idea neuron model. The input port provides a fixed potential (in this case, ground potential) and offers small input impedance (ideally zero). This essentially implies that there is negligible change in the voltage potential at the input port.
Note that any significant deviation in the input potential from a desired value would result in a net current of ΣG i (V i -V in ) , where V in is the non-zero input potential. This would cause erroneous network outputs when V in varies randomly for different neurons.
A practical CMOS circuit design to implement the ideal neuron model presented in fig. 1b is given in fig. 1c . An operation amplifier (OPAMP) is used at the first stage of the circuit, which, for a sufficient amplification-gain, forces its two inputs to remain close to each other. Thus, by applying a fixed voltage on one of the two inputs (ground-potential V g ), the other input, (which is used as the neuron-input terminal) is also clamped to the same potential.
Assuming V g =0, the output voltage of the OPAMP can be visualized as Vo = (1/G R )ΣG i (V i ) , which can be positive or negative. The result is compared with zero using a comparator. For an appropriate choice of G R the output voltage swing can be made sufficiently large so that a simple inverter can be used as a comparator in the second stage.
This example shows that the conventional circuit model of neuron employs an OPAMP for providing a low-impedance (fixed-voltage) input-node for linear summation of inputcurrents, and for transimpedance conversion of the current-mode summation, to yield the neuron output. Thus the energy-efficiency and the performance of such a neuron model would be limited by the characteristics of the OPAMP, which is a power and area consuming circuit.
The summation term in eq. 1 can be divided into its positive (Σ(G i V i ) p ) and negative (Σ(G i V i ) n ) constituents. The result of the sign operation is determined by the difference between these two terms ( |Σ(
As an example, we obtained the network parameters for a 2-layer feed-forward neural network for character recognition using the method described in [9] . The output layer of the network has 26 neurons, each corresponding to one of the 26-alphabetic characters. Results show that after considering 10% σ variations in the input weights, we are left with less than 3% tolerance for the variation in the input node-voltage. For OPAMP supply as well as the binary-input level of + 0.5 V in 45nm technology, this would translate to ~30mV of tolerance. Notably, the random offsets in an OPAMP can be few tens of millivolts [13] . The sizing and gain of the OPAMP must be large enough to meet the offset requirements. With the aforementioned constraints, we obtained the power-consumption, delay, energy (per-operation) and energy-delay product for a 25-input CMOS neuron-circuit shown in fig. 1c , for different supply voltages (rail to rail). The results are given in fig. 2 a-d . At the optimal point, powerconsumption and the bandwidth (delay-1) were found to be around ~70µW and ~100MHz
respectively. This provided an optimal energy-dissipation of ~0.7pJ per-neuron per-cycle. The energy-delay-product can be obtained as ~3.5e -21 J-s. The maximum current per-synapse used for this case was ~3µA. Notably variability-related design constraints may become increasingly more stringent at lower technology nodes for conventional analog circuits, leading to heightened design challenges.
We next present the design and analysis of spin-torque based neuron and discuss its energy-benefits over CMOS model discussed above.
III. Spin Torque Neuron
In our recent work, we proposed the application of spin-torque neurons for designing ultra-low power neural networks. Application of device structures based on lateral spin valves [9, 11] , as well as domain-wall magnets (DWM) [10, 12] were proposed. [14, 16, 17, 18] . Thus a spin neuron with 60nm long fee-layer with cross-section area of 20x2 nm 2 may be switched with a current of less than 10µA within 1ns [14, 22] .
Recently, application of spin-orbital (SO) coupling in the form of Spin Hall Effect (SHE) has been proposed for low-current, high-speed domain-wall motion [19, 20, 21] . For
Neel-type DW, SHE induced from an adjacent metal layer results in an effective magnetic-field (H SHE ) [19] , that can be expressed as, H SHE = K(σ x m ). Here, m denotes the magnetization of magnetic domains. σ is a current-dependent vector defined as σ = j × z, where, j is the current vector (which can be positive or negative depending upon direction of current flow) and z is the direction perpendicular to the magnetization plane (along easy axis). As shown in fig. 3a , σ can be in-plan or out of plane of the figure, depending upon the direction of the current-flow. K is a quantity dependent upon material parameters of the magnet and is proportional to the effective Spin-Hall angle, θ H [19] . Notably, θ H determines the effectiveness of the Spin-Hall interaction, larger θ H implies larger effective torque due to SHE.
For a Neil-type domain wall shown in fig. 3a , the magnetization in the region of the domain wall lies along the length of the magnetic nano-strip [19] . For this configuration, the effective H SHE acting on the domain wall region can be visualized to be perpendicular to the plane of the magnet. The H SHE assists the non-adiabatic spin-torque (which results from the current-flow) acting on the domain-wall region. For a θ H of 0.2, micromagnetic simulations
showed an increase of ~5x in the domain-wall velocity for a given current density, due to the H SHE term ( fig. 3d ). This effect can be used to achieve higher switching speed for a given current, or, to reduce the required switching-current for a given switching-time for the free-domain in the spin-neuron [23] .
In this work switching current threshold of ~2µA for 1 ns switching-speed has been chosen for a neuron with SHE-assisted free-domain size of 20x2x60nm 3 , which corresponds to The energy dissipation for the spin neuron has two components. First, the switching energy due to the static current flow between the input voltages and the neuron. These components equal to the product of the total input-current flowing across the synapses, the inputvoltage levels and the neuron switching time. For an average of ~40µA of current flow across input voltage levels of + 10mV for 1ns switching time, this component evaluates to ~0.4fJ. The noise considerations in the state of the art on-chip supply distribution schemes may limit the minimum input voltage levels that can be used. Even for + 100mV of input levels, which might be more easily achievable, the first energy component is limited to ~4fJ, which is more than two orders of magnitude less than that obtained for the CMOS neuron. The second component of energy-dissipation in the spin-neuron can be ascribed to the MTJ-based read operation. A read current of ~0.3µA (~10% of neuron switching threshold) was found to be sufficient for 1ns readspeed. For a sensing supply voltage of 0.4V this would evaluate to ~0.12fJ. An additional ~0.2fJ
of energy dissipation comes from the inverter's operation. Thus the total energy-dissipation in a spin-neuron for 1ns switching speed can be less than 1fJ. This leads to the possibility of three to four order of magnitude improvement in energy-delay product as compared to a conventional CMOS implementation. Apart from ultra-high energy efficiency, another attractive feature of the spin-neurons is their compactness. In the CMOS layer a compact CMOS inverter replaces an area consuming OPAMP. Hence, spin neurons can facilitate higher integration density for neural-network circuits.
A 3x3 neural-network circuit using spin neurons is shown in fig. 4b . The network has two conductances (that can be implemented using multi-level spintronic memristors) G i+ and G ifor each input in i . When an input is high (logic '1'), a voltage signal +∆V and -∆V are applied to the conductances G i+ and G i-respectively, resulting in proportional current flow into the input terminal of the neuron, as shown in fig. 4b . The net current due to the i th input in i, injected into the j th neuron, therefore, can be written as ∆V(G ii+ -G ii-). Thus, the input weights needed for the neurons can be obtained by programming G i+ and G i-to appropriate states.
The write path of the neuron is connected to ground. Using Kirchhoff's law it can be visualized that the net current flowing into the input node of the neurons is given by the following equation:
This expression is essentially same as the term within the braces in eq.1. The sign function over the current-mode summation is carried out by the spin-neurons, thus realizing the energyefficient neural-network functionality. At the level of network-design, another noticeable advantage of spin-neurons is ultra-low energy-dissipation in cross-bar interconnects in the synapse network shown in fig. 4b . This results from the ultra low-voltage operation of entire network, facilitated by the spin neurons. 
IV. Conclusion
In this article we explained the rationale of using nano-scale spin-torque switches as "neurons"
for the desgin of energy-efficient neuromorphic computers. Using simple device-circuit analysis we showed that spin neurons provide essential terminal characteritics like low input impedance
