We present artificial neural network design using spin devices that achieves ultralow voltage operation, low power consumption, high speed, and high integration density. We employ spin torque switched nanomagnets for modeling neuron and domainwall magnets for compact, programmable synapses. The spin-based neuron-synapse units operate locally at ultralow supply voltage of 30 mV resulting in low computation power. CMOS-based interneuron communication is employed to realize network-level functionality. We corroborate circuit operation with physics-based models developed for the spin devices. Simulation results for character recognition as a benchmark application show 95% lower power consumption as compared to 45-nm CMOS design.
I. INTRODUCTION
H ARDWARE implementation of computation architectures based on artificial neural network (ANN) has always been challenging in terms of power consumption, level of integration, and throughput. Prior work in this field involved development of circuit models for neurons and synapses using CMOS [1] - [4] . Digital ANN designs proposed earlier consume large area and hence limit the level of integration [31] . On the other hand, analog designs, although compact, consume large amount of power [2] , [4] .
In order to tap the potential of neural-network-based computation at the hardware level, the device-circuit models for the neuron and the synapse, apart from being compact, should also achieve low power consumption. In this study, we propose the application of spin devices in ANN hardware design that can help achieve these goals.
Recent experiments on lateral spin valves (LSVs) have shown spin-torque-induced switching of nanomagnets using spinpolarized current flow through metal channels [7] , [8] . Such magneto-metallic LSV's can operate at ultralow terminal voltages, resulting in low switching energy [10] . A multi-input LSV Manuscript can perform non-Boolean, analog-mode computation like majority evaluation [12] , [17] . All spin logic design based on spin majority evaluation using LSV's was proposed previously [9] , [11] , [12] - [15] . We show that, with an appropriate clocking scheme, a spin majority gate with weighted inputs mimics the neuron-synapse functionality. Programmable spin injection strength of domain-wall magnet (DWM) can be used to implement a compact synapse. In the proposed neuron-synapse model, charge current flows through a low resistance path that constitutes of the nanomagnets and nonmagnetic metal channels. This allows application of ultralow terminal voltages, resulting in low power consumption.
Energy dissipation for spin-mode computation increases steeply with the separation between nanomagnets. This is due to the limited spin-diffusion length of nonmagnetic channels [9] , [10] . Hence, spin-mode signaling between two neuron units proves inefficient. Therefore, we employ a CMOS-based chargemode interneuron signaling scheme in order to realize networklevel functionality. Hence, the programmable, spin-CMOS hybrid ANN architecture, presented in this paper, clubs the benefits of localized, spin-based, low-energy computation, and robust charge-mode communication.
The rest of this paper is organized as follows. Section II describes the operation of spin majority gate based on LSV. Detail description of the proposed neuron-synapse model is given in Section III. Section IV discusses system-level integration. Device simulation framework employed in this work is discussed in Section V. Performance of the spin-based ANN design for a benchmark application (character recognition), and its comparison with 45 nm CMOS analog and digital designs is given in Section VI. Summary and conclusions are given in Section VII.
II. MAJORITY GATE BASED ON LSV
Two different methods of current-induced spin torque transfer (STT)-based switching of nanomagnets have been proposed in recent years. The first involves injection of spin-polarized charge current into a nanomagnet through another magnet. This phenomenon has been widely explored for magnetic tunnel junction (MTJ)-based memory applications [24] , [25] . More recently, a second strategy has been demonstrated which employs pure spin-current injection for flipping a nanomagnet [7] , [8] . Fig. 1(a) shows the LSV structure employed in this method. It consists of a transmitting magnet and a receiving magnet connected through a nonmagnetic channel. Electrons flowing into the channel through the transmitting magnet (which possesses "up-spin" polarization) get up-spin polarized when they reach the magnet-channel interface. Spin-polarized charge current is modeled as a four-component quantity, one charge component and three spin components (Is x , Is y , Is z ) [9] , [10] . Charge component of the input current flows into the ground lead. The output magnet-channel interface absorbs the transverse spin components of the current which in turn exerts spin torque on the output magnet and causes it to flip. Owing to the separation of the spin-diffusion current responsible for nanomagnet switching, from the charge current flow, spin transport in the LSV is often termed as "nonlocal." Fig. 1(b) shows the device structure for five input spin majority gate based on LSV. The majority function can be employed to perform non-Boolean computations. A clock-synchronized operation of the spin majority gate can be compared to that of a neuron, if the output magnet's state is restored after every flipping. The two spin-polarization states of the input magnets are analogous to bipolar, binary synapse weights with values ±1. In this study, we propose the use of DWMs as input synapse to realize programmable, bipolar, multilevel weights for a spin-based neuron model.
To reduce the amount of average current injection per synapse, we incorporate current-mode Bennett clocking in the neuron model [9] . It involves switching the nanomagnet to an intermediate metastable state from which it can be switched back to one of its stable states with a very small current. In the proposed neuron model, the output magnet is switched with nonlocal spin torque, i.e., with pure spin current. It will be shown that this technique is helpful in achieving ultralow voltage operation and hence low power consumption.
III. SPIN-BASED NEURON-SYNAPSE MODEL
In this section, we present the spin-based neuron-synapse model. First, we discuss the application of DWM as a synapse. Following this, the neuron model is described which is based on the LSV structure discussed in Section II.
A. DWM as Synapse
DWM, shown in Fig. 2 (a), consists of two ferromagnetic domains separated by a nonmagnetic region or domain wall (DW). DW is formed in a magnetic nanostrip due to balance in anisotropy and exchange energies present in nanomagnet [18] . DW can be moved along a magnetic nanostrip by application of magnetic field [18] or by injection of charge current along the nanostrip. [19] . Fig. 2 (b) shows the simulation plot for DW velocity versus injected current density, benchmarked with experimental data in [20] .
Application of DWM in the design of nonvolatile memory [21] and logic design [22] has been explored by several authors. In this study, we propose the use of DWM as synapse, where its programmable spin injection strength is used for implementing spin-mode weighting operation. Fig. 3 (a) shows a DWM interfaced with the nonmagnetic channel of a neuron.
In order to write the weight into the DWM, current is injected along the length of the DW as shown in Fig. 3 (a). Under this condition, the channel is kept in a floating state. A thin MgO layer incorporated at the top and bottom surface of the DWM reduces the fringe current passing through the parallel path provided by the floating channel and the input lead, during the write operation. The interface oxide also imparts an effective resistance to the input lead of the DWM that makes it dominate the parasitic resistance of the signal-routing metal lines.
During computation, the input current is injected into the channel through the DW in the vertical direction. Fig. 3 (b) shows the plot for spin polarization of current passing into the channel through the DWM versus DW location for different charge current values. It can be observed that spin polarization strength of the charge current reaching the channel is proportional to the offset of the DW location from the center. For the extreme left location of the DW, the charge current reaching the metal channel is maximally up-spin polarized and vice versa. The net polarization is reduced to zero for the central location of the DW, as equal amount of up and down spin electrons are injected into the channel in this case.
In the simplest case, the two extreme locations of the DW can be employed for implementing programmable binary weights. Neural networks with binary weights can be applied for logic synthesis and pattern recognition applications [28] , [29] . However, network with binary weight may require larger number of neurons for a given operation, as compared to a network with higher number of weight levels depending upon the size of the exhaustive training set [29] . Larger number of weight levels can be obtained by employing longer DWM stripes that can facilitate better quantization of DW location. It has been shown that incorporation of nanoscale notches in the DWM strips can enhance the stability of DW at the notch sites [23] . The incorporation of notches along the length of the DWM synapse can help in achieving higher writing accuracy. In this study, we incorporate DWM synapses with a cross-sectional area of 350 nm ×80 nm. Notches etched at 22-nm interval along the 350-nmlong DWM strip can provide 16 levels of weight. Fig. 4 shows the magnetization state of the DWM at equal time intervals after the application of 250-ps voltage pulse train.
Physics-based device modeling of DW synapse is discussed in Section V.
B. Spin-Based Neuron Model
Transfer function of an "integrate" and "fire" neuron is given by
Here, w i and I i are the weights and corresponding inputs and b is the neuron bias. The bias can be chosen to be zero. It however aids in training convergence and can be easily implemented by an additional synapse magnet which is driven by a clock. The Fig. 5 . Spin-based neuron model with three inputs (DWM synapses). The free layer of the neuron MTJ is in contact with the channel and its polarity, after preset, is determined by spin polarity of combined input current in the channel region just below it. function f (x) is given by (2) and approximates a step transfer function for a sufficiently large N
Here, t denotes the threshold of the neuron. It can be inferred that a higher |t| would require a larger value of |x| to switch the neuron. For a given set of normalized weights W i , this translates to larger levels of the input signals I i . For the spin-based neuron model, this implies larger input current per synapse and hence higher power consumption. Therefore, switching threshold of the output nanomagnet needs to be reduced. We incorporate current-mode Bennett clocking to achieve this.
The device structure for the neuron with three inputs is shown in Fig. 5 . The "firing magnet" forms the free layer of an MTJ. The two antiparallel, stable polarization states of a magnet lie along its easy axis (see Fig. 5 ). The direction orthogonal to the easy axis is an unstable polarization state for the magnet and is referred as its hard axis [9] , [12] . The preset magnet shown in Fig. 5 has its easy axis orthogonal to that of the neuron magnet (MTJ free layer which is in contact with the channel). In the beginning of a clock period, current pulse injected through the preset magnet forces the neuron magnet to the hard-axis configuration (see Fig. 6 ). As soon as the hard-axis biasing pulse goes low, the free layer makes transition to the easy-axis polarity governed by the polarity of net spin polarization of the channel current flowing under it. As a result, the firing magnet, i.e., the free layer of the MTJ acquires either parallel or antiparallel polarization with respect to the fixed layer. Note that summation of the "spin-weighted" input currents (1), received through multiple DWM synapses, takes place in the metal channel, whereas the symmetric step-transfer function, upon the summed spin current (2), is realized with the help of Bennett clocking of the neuron magnet.
When the clock is low, a CMOS-based detection unit (discussed later) reads the state of the neuron MTJ. For a parallel configuration, it generates a high output whereas for the antiparallel configuration, it settles to a low value. Hence, the detection unit converts the spin-mode information of the neuron magnet's state into a charge-mode signal. For a particular stage of network, spin-and charge-mode evaluations occur in alternate clock phases (see Fig. 6 ). For a multistage, feed-forward neural network, neurons in alternate stages are driven by complementary clock phases. This results in a fully parallel and pipelined network.
In the proposed neuron model, the use of nonlocal STT switching allows a low resistance path for static charge current flow that includes the DWM synapse and the nonmagnetic channel. This allows application of very small voltages, which in turn results in ultralow energy operation for the magnetometallic neuron-synapse unit. The detection scheme, discussed later, involves negligibly small transient current flow through the high-resistance MTJ stack.
Performance metrics of the neuron device, such as spin injection efficiency, switching energy, and switching speed, can be improved by the appropriate choice of magnet parameters, device geometry, and operating conditions. Nonlocal spin injection efficiency in the device can be defined as the ratio of spin current I s injected into the output magnet and the net spin-polarized charge current in the channel under the neuron MTJ. As discussed earlier, the spin components of the combined synapse current gets divided between the output magnet and the ground lead. Thus, the spin injection efficiency for a given charge current input is enhanced by increasing the resistance of the ground lead [see Fig. 7(a) ].
Smaller volume for the output magnet along higher coercive field H k leads to higher switching speed for a given spin current [see Fig. 7 (b)] [10] . It also leads to faster easy-axis restoration [see Fig. 8 (a)]. In order to maintain the spin injection efficiency, resistance of the ground lead needs to be scaled up proportionately.
Hard-axis switching energy is a significant portion of the energy dissipation per neuron, per cycle. Fig. 8(b) shows that the hard-axis switching current increases with switching speed (∼ direct proportionality [10] ). Hence, for a given terminal voltage, the switching energy remains almost constant. In this study, the hard-axis biasing current is supplied through a transistor operating between a small terminal voltage. In order to allow a small transistor width and, hence, lower clocking power, it is favorable to choose the smallest possible value for switching current and hence maximum possible preset pulse width for a given operating frequency. In this study, we employed presetcurrent pulse of amplitude 300 μA and pulse width 0.5 ns.
C. Modular Neuron-Synapse Unit
A center-surround layout for a neuron with 12 input synapses is shown in Fig. 9 . Spin-polarized charge current inputs from DWM synapses combine in the channel and flow into the ground lead located near the neuron MTJ. Spin polarization strength of charge current decays exponentially with the distance traveled along the nonmagnetic channel. Thus, the channel length between the synapses and the neuron must be within one to two times spin-flip length λ [9] , [10] . This imposes a limit on the number of input synapses for the structure shown in Fig. 9 . For copper channel (λ ∼ 1 μm) up to ∼32 synapses can be combined directly. For graphene channel (λ ∼ 6 μm), this number can be increased.
Limited spin-diffusion length also introduces mismatch between the strengths of different DWM synapses, depending upon their location with respect to the neuron magnet. The two synapses S 1 and S 2 depicted in Fig. 9 are the closest and the farthest synapse from the neuron magnet, respectively. For a neuron with 16 input synapses, this effect does not introduce a significant mismatch [see Fig. 10(a) ]. However, for a 32-input neuron, the mismatch is quite prominent [see Fig. 10(b) ]. The mismatch can be mitigated by slightly grading the magnitude of synapse current injection into the DWM synapses so as to equalize all the weights [see Fig. 10 
DW programming interface is also depicted in Fig. 9 , where the contact via's and path for DWM writing current flow have been indicated. Selection of a pair of transmitting and receiving neurons indentifies the synapse to be programmed. Thus, only two transistors per neuron (for identifying it as receiving or transmitting neuron) suffice for programming the whole network. In the case of cellular architectures based on arrays of identical neuron units, the whole array can be programmed parallelly [32] . Fig. 11 depicts the plot for spin potential in the central region of the channel, surrounding the output magnet of a 16-input neuron, under firing and nonfiring conditions. It shows that, in case of a firing event, the entire channel is dominantly at a positive spin potential and vice versa.
IV. SYSTEM INTEGRATION
Due to small spin-diffusion length of metal channels, spinmode signaling cannot be used for network connectivity. Hence, in this study, the spin-based neuron-synapse modules are interconnected through charge-mode signaling using CMOS. The spin-mode "firing" information is converted into charge-mode signal using the dynamic CMOS latch, shown in Fig. 12(a) .
It compares the effective resistance of the MTJ units in its two load branches. The firing MTJ of the neuron unit connects to one of the loads, whereas a reference MTJ is connected to the other.
The latch drives a distributed set of current source transistors which in turn supply charge current to all receiving neurons through the respective input magnets (DWM) [see Fig. 12(b) ]. The source terminal of the current source transistors and the ground lead of the spin-based neuron modules are biased at V+ΔV and V volts, respectively. Hence, the synapse current flows across a small terminal voltage of ΔV. In this study, values of V and ΔV are chosen to be 800 and 30 mV, respectively. The CMOS units operate between 800 mV and 0 V. Biasing of the spin modules between two relatively high dc levels proves advantageous as compared to direct application of a small supply voltage of magnitude ΔV. This is because application of differential dc supply can mitigate the impact of I-R voltage drop along the supply lines. It can also be exploited to reject the common-mode noise in the dual supply lines. Moreover, generation of clean dc levels below 100 mV is challenging in the state-of-the-art CMOS technology, whereas a regulated voltage supply of higher magnitude can be distributed with less than 0.1% fluctuation [30] . For supplying a current of 5 μA per synapse (across a drain-tosource voltage of 30 mV) for 16 receiving neurons, the required source transistor width in 45-nm technology is around 2.5 μm. In order to minimize the impact of synapse current mismatch, distributed source transistors are used. Fig. 13 depicts the correspondence between the proposed spin-CMOS hybrid ANN and the biological neural network. The spin potential of the 2-D metal channel (which is analogous to neuron cell body) depicted in Fig. 11 can be related to the electrochemical potential in biological neuron's cell body [33] . Interneuron communication in the present design is realized using ultralow voltage current transmission, which is somewhat similar to the natural mechanism [33] . However, the aim of the proposed model is not to mimic the biological neural network in terms of functionality, but to evolve a model for ANN suitable for computational hardware.
V. SIMULATION FRAMEWORK
In this section, we describe the physics-based simulation framework used in this study for simulating the spin-based neuron-synapse units.
In order to simulate the neuron model, which is based on the LSV structure shown in Fig. 1(a) , we need to self-consistently solve both the transport and the magnet dynamics equations. In our model, the channel spin transport is based on the spindiffusion model developed by Valet-Fert [26] , The magnetchannel interface is modeled based on the interface model developed by Brataas et al. [27] . Both these models are well established and are used for spin transport in long channels [9] - [11] . The spin-diffusion formulation yields four-component conductance matrices G magnet , G lead , G int , and G ch for the elements of nanomagnets, supply leads, magnet-channel interface, and the nonmagnetic channel, respectively. The four components are the charge and the three spin components. 
The nonmagnetic channel and lead elements are modeled as π-conductance matrices with shunt G sh and G se as, respectively, shunt and series components [11] 
(4) Here, g sh = (A/ρλ)tanh(l/2λ) and g se = (A/ρλ)csch(l/λ), l is the length of the contact, A is the area of the contact, ρ is the resistivity, and λ is the spin-flip length. These conductance matrices are obtained by solving spin-diffusion equation as shown in [11] . Contact magnet-channel interface can be described through the matrix G int
where g = 2−r l r l * −r r r r * and gP = r r r r * −r l r l * , Γ = 1−r l r r * , and P is the polarization of magnet. r l and r r are the reflection coefficients corresponding to left and right spin, respectively. The components of the interface matrix are dependent upon the nanomagnet's magnetization state, to be evaluated self-consistently with magnet dynamics. Note that the elements of G sh are responsible for the decay of spin current along the channel due to spin diffuse scattering [10] .
The nanomagnet dynamics is captured by solving the Landau-Lifshitz-Gilbert (LLG) equation (6), self-consistently with spin diffusion
Here, m is the magnetization vector, α is the damping constant, N S is the number of spins in the magnet, γ is the gyromagnetic ratio, H is the effective magnetic field, and I S is the spin current, which is obtained by the transport framework. This simulation framework has been benchmarked with experimental data on LSV's [10]- [12] . This approach leads to the mapping of a spin device structure, involving nanomagnets interacting through nonlocal spin transport, into an equivalent "spin circuit" [10] . The circuit model for the LSV is shown in Fig. 14. The spin-circuit approach, discussed previously, is extended to the 2-D neuron-synapse model shown in Fig. 9 , where the channel is modeled as a 2-D grid of 10 nm × 10 nm sections.
The device model for the DW structure is derived from the aforementioned spin-diffusion model. It consists of a 2-D grid of nanomagnets obtained by dividing the nanostrip into square grids (10 nm × 10 nm) as depicted in Fig. 15 . Each nanomagnet is modeled as a conductance network with shunt and series components G 0F and G F (four-component spin transport model), respectively, using Valet-Fert diffusion model [26] Fig. 14. (a) Fabricated LSV structure in [7] . (b) Depiction of structure in Fig. 1(a) . (c) Spin-circuit model based on spin-diffusion model for the device in Fig. 1(a) . [27] . The resulting spin circuit is shown in Fig. 15 . It yields the spin current components at each lattice points for a given input voltage. These spin currents are used to evaluate LLG at each point to capture the nanomagnet dynamics. The conductance matrices are dependent upon the magnetization state of the grid points, and hence, the spin-diffusion transport is solved self-consistently with LLG at each grid point. We benchmarked our simulation framework for DWM with experimental data in [20] . The corresponding plot for DWM velocity as a function of charge current density is shown in Fig. 2(b) . The effect of channel interface on the writing process is incorporated by including the nanomagnet-channel interface conductance matrix in series with the channel conductance matrix at each grid point as shown in Fig. 15 . The interface conductance matrix constitutes of spin-dependent conductance components for MgO [16] .
As discussed earlier, during computation, the input current is injected into the channel through the DW in the vertical direction. Hence, writing and computation modes are fully decoupled. Therefore, for the computation mode, the DWM synapses can be modeled as two parallel nanomagnets with opposite polarities and area dependent on the DW location, i.e., the weight.
VI. NETWORK SIMULATION
In this section, we describe the network simulation for character recognition as a benchmark application. Impact of process variation upon network performance is assessed. We also compare the performance of the proposed spin-CMOS hybrid ANN with that of a state-of-the-art CMOS ANN design.
A. Benchmark Application
We simulated character recognition as a benchmark application for the proposed spin-CMOS hybrid design. The overall process for character recognition can be divided into two steps, namely, edge extraction and pattern matching. For edge extraction, columnwise pixels form the binary image along four directions-horizontal, vertical and ±45 • -are fed to the first stage neurons.
These neurons generate a high output if the number of nonzero pixels along a particular column (or equivalently the spin current input I in to the neuron) is higher than the neuron threshold. Note that a desirable threshold for a neuron is set by applying a bias input to it. The horizontal edge extraction process for different input character is depicted in Fig. 16(a) . The solid lines denote the magnetization state of the neuron magnets, whereas the dashed lines indicate the corresponding MTJ evaluation. Fig. 16(b) shows the effect of variation in the handwriting style for the numeral "3" on the horizontal bar code. It shows that significant variations in writing style translate to slight variations in the barcode pattern which can be tolerated by an ANN. Variation tolerance can be enhanced by training with different styles of input characters. The resultant four binary patterns form a 1-D representation of the input character. This pattern is fed to the output stage of the network for classification. The output neurons correspond to the 36 alpha numeric characters. The output evaluation for numeric characters is shown in Fig. 16(c) .
B. Variation Analysis
As described earlier, variation aware circuit design techniques, such as the use of distributed and matched current source transistors, can reduce the effect of CMOS process variation upon network performance significantly. The impact of nanomagnet parameter variation upon system performance, however, needs to be assessed while modeling an ANN with nanoscale devices.
The critical DWM parameters, having impact on computation accuracy, can be identified as interface oxide thickness, crosssectional area, and DW locations. Variation in oxide thickness can lead to mismatch in the effective resistance of the DWM input leads. This leads to difference in charge current injection for different synapses, which in turn introduces errors in weights. However, since the interface oxides are generally grown through atomic layer deposition, their thickness can be precisely controlled. Cross-sectional area variation in the DWM synapse leads to variation in spin polarization of the input charge current. Inaccuracy in DW programming directly translates to imprecision in synapse weights.
The effect of writing inaccuracy in the DW synapse is captured in the simulation framework by imposing random shifts in DW location [see Fig. 17(a) ]. Impact of process variation like line-edge roughness (LER) is incorporated in terms of random variations in the DWM cross-sectional area [see Fig. 17(a) ]. Fig. 17(b) shows the superimposed effects of inaccurate writing and geometrical imperfection upon DWM weight.
The neuron magnet is highly scaled in order to achieve fast easy-axis restoration and lower switching current. It is therefore expected to be prone to thermal noise and magnet parameter variations. Fig. 18(a) depicts the effect of thermal noise on neuron transfer characteristics. Under very small input spin current, the easy-axis restoration can be nondeterministic due to thermal noise. The impact of the noisy transition zone on overall network performance can be ignored as long as it correspond to a small fraction (<10%) of the range of spin current injection I s . The range of I s in turn depends linearly on average synapse current I in [see Fig. 18(b) ]. Hence, noise determines the limit to which the average synapse current can be lowered to reduce the overall power consumption.
Since Bennett clocking places the neuron switching threshold at origin, irrespective of the magnet parameters, the impact of output-magnet parameter variations upon the device transfer characteristics is significantly mitigated. Parameter variation, however, affects the dynamic switching characteristic of the neuron. Easy-axis relaxation time for neuron magnet spreads with increased parameter variations, which limits the maximum operating frequency for reliable operation. Fig. 18(c) shows the scatter plot for neuron switching time for two different sizes of the output magnet. The input current has been varied over two orders of magnitude (20-0.5 μA) corresponding to the variation in synapse currents for different input combinations. 25% 3σ variation has been applied for critical magnet parameters. It is evident that lower volume and higher H k (for a constant switching energy barrier) result in lower spread and, hence, facilitates higher operating frequency. Fig. 19(a) shows the effect of increasing process variation upon the spin current delivered to the output neurons corresponding to the input character. A negative value of spin current for firing neuron and a positive value of spin current for a Network simulations show that, among different device parameters considered, DW location has the maximum impact upon network performance. This is because it bears a direct relation to the synapse weight. As mentioned earlier, incorporation of nanoscale notches along the DWM length can achieve improved programming accuracy. Fig. 19(b) shows the plot for 1000 simulation points for the network under combined 15% 3σ variations for DWM and neuron magnets. Monte Carlo simulation results for a neuron given in Fig. 19 (c) depict that it retains accuracy up to more than 18% 3σ variations. Note that 18% variation in a 16-level synapse weight implies a programming error of three levels.
C. Design Performance
In order to establish a comparison with state-of-the-art CMOS technology, we implemented the same network architecture in CMOS 45-nm technology in two different ways: digital and analog. For the digital design, programmable latches were used to store synapse weights and full adders were employed to implement neuron [31] . For the analog design, memristive synapses were employed. Resistance values in the range of 10-200 kΩ were used to emulate memristors. In this design, analog current summers were employed for modeling the neuron [4] . The area   TABLE II  SPIN AND CMOS ANN SPECS   TABLE III  DEVICE PARAMETERS was estimated based on the cross-bar architecture for memristive neural network [4] , [5] . Table I compares the two designs with the proposed spinbased neural network. The digital implementation consumes large area as well as power due to bulky neuron and synapse units. Note that a fully parallel implementation for the digital ANN was chosen for the purpose of comparison. Area for the digital design can be reduced through sequential processing using smaller number of neuron units, but power consumption is expected to remain almost constant for a given throughput. The analog implementation with memristive synapse turns out to be the most inefficient in terms of power. However, it achieves a large improvement in area as compared to the digital design due to compact synapses and cross-bar architecture [4] , [5] .
The spin-CMOS hybrid implementation achieves both low power as well as small area, comparable to that of the analog ANN. The power and area benefits of the proposed design can be ascribed to simple and compact spin devices that operate at ultralow supply voltages and mimic the neuron operation. Both low energy consumption as well as compactness is conducive to integration of large number of neurons for programmable computational networks for cognitive and Boolean computation. His primary research interests include low-power digital/mixed-signal circuit design. His current research is focused on device-circuit co-design for low power logic and memory, with emphasis on exploration of post-CMOS technologies like, spin-devices. He has pioneered the concept of spin-CMOS hybrid design for ultra-low power neuromorphic computation architectures. He has also worked on application of spin-torque devices in approximate computing hardware and memory design.
Mr. Sharad was awarded Prime Minister of India Gold Medal for his academic performance by IIT Kharagpur. He also bagged proficiency awards for the best Btech and Mtech projects the department of ECE at IIT. He received Andrews Fellowship from Purdue University in 2010. Mrigank was a research intern at IBM T. J. Watson Research Centre in the summer of 2012. He is currently a Senior Research Scientist in the Circuit Research Lab (CRL) at Intel Corporation in Hillsboro, OR. His primary research interests include low-power memory and logic using spin-torque devices, low-voltage CMOS circuits and reliability issues associated with them. He has published more than 30 papers in refereed journals and conferences.
Dr 
