Abstract-Deep spiking neural networks are becoming increasingly powerful tools for cognitive computing platforms. However, most of the existing studies on such computing models are developed with limited insights on the underlying hardware implementation, resulting in area and power expensive designs. Although several neuromimetic devices emulating neural operations have been proposed recently, their functionality has been limited to very simple neural models that may prove to be inefficient at complex recognition tasks. In this paper, we venture into the relatively unexplored area of utilizing the inherent device stochasticity of such neuromimetic devices to model complex neural functionalities in a probabilistic framework in the time domain. We consider the implementation of a deep spiking neural network capable of performing high-accuracy and lowlatency classification tasks, where the neural computing unit is enabled by the stochastic switching behavior of a magnetic tunnel junction. The simulation studies indicate an energy improvement of 20× over a baseline CMOS design in 45-nm technology.
I. INTRODUCTION

D
ESPITE the huge success of deep artificial neural networks (ANNs) at complex recognition problems, such as the Canadian Institute for Advanced Research [1] and ImageNet [2] benchmarks, the significant computational costs involved in training and testing such deep nets have inspired researchers to develop alternative computing models. Over the past few years, ANNs have evolved into the more biologically realistic spiking neural nets (SNNs), where information is communicated between the neural nodes as spikes rather than real-valued analog signals. Such spiking networks have resulted in the development of specialized custom hardware implementations [3] that exploit the prospects of event-based computing. However, training such SNNs for recognition problems has been mostly limited to single-layered networks [4] , which have been unable to compete with the high recognition performances offered by deep ANN networks. Hence, research efforts have been directed to develop algorithms for converting a fully trained ANN computing model to a corresponding SNN model in order to achieve event-driven hardware implementation [5] , [6] . However, such conversion schemes have been developed with little or no regard to the underlying hardware implementation for the neuron or synaptic units.
As a parallel effort, research in neuromorphic computing has been aimed at identifying nanoelectronic devices that can mimic and thereby offer a compact and energy-efficient implementation for neural and synaptic units. While much research has been conducted on the implementation of synaptic functionalities, such as spike-timing-dependent plasticity [7] - [10] and short-term plasticity effects [11] - [13] in resistive technologies, potential neuristor devices emulating neuronal units are still in its infancy. For instance, spintronic devices have been proposed to be a promising candidate for implementing such neural functionalities [14] - [16] , but have been able to implement only the step (neuron switching state depending on the sign of input stimulus) transfer function of ANNs, while a graded analog ANN transfer function, such as the sigmoid can be potentially appealing for implementing deep ANNs capable of performing complex recognition tasks. It is worth noting here that although the recent proposals have investigated the implementation of analog ANN transfer functions by spintronic devices [17] , they require the fabrication of relatively complex device structures based on multidomain nanomagnets. This paper lies at the juncture of the two parallel research thrusts mentioned earlier. We note that although technologically mature spintronic devices, such as the magnetic tunnel junction (MTJ) based on monodomain magnets, may not be able to exhibit complex analog ANN neural transfer functions (being binary switching devices), they exhibit switching probability characteristics that vary in a fashion similar to the sigmoid function with variation in the magnitude of the input current. Based on this observation, we propose an ANN to SNN conversion scheme by arguing that a fully trained ANN can be converted to an SNN if the neural units are assumed to generate spikes depending on a probability density function, which is similar to the original ANN transfer function. We provide a mathematical formulation to justify that such a conversion mechanism is able to approximate the original ANN functionality to a reasonable degree of precision. Our motivation is driven by the fact that in addition to being an intuitive formulation for ANN-SNN conversion, such an implementation can be enabled by the underlying stochastic 0018-9383 © 2016 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See http://www.ieee.org/publications_standards/publications/rights/index.html for more information. device physics of the MTJ. This proposal can potentially pave the way for probabilistic neuromorphic platforms that exploit the device variability and stochasticity inherent in such emerging neuromimetic devices.
II. PROBABILISTIC SPIKING NEURAL COMPUTATION: CONVERSION FROM ANN
In this section, we will describe the neural computational unit typically used for ANNs, followed by an intuitive discussion for the basis of our proposed conversion mechanism to SNN. Subsequently, we will illustrate a simple mathematical justification to validate our claims.
Let us consider an ANN neural unit that receives an input I through a synapse of weight w. The neuron generates an output y by passing the weighted input through a nonlinearity f (·). We will consider the function f (·) to be the sigmoid function ( f (x) = (1/1 + e −x )), in this paper, due to its popularity in traditional ANN networks for achieving high accuracy in complex recognition problems [18] along with the possibility of enabling this functionality by MTJ devices, as will be explained in Section III. Hence, for the ANN neuron, the corresponding output y will be given by
It is worth noting here that the input I ∈ [0, 1], since it represents the inputs coming from the normalized values of external stimuli (image pixels for image recognition systems) or from other neuron outputs in previous layers (which lie in the range [0, 1] due to the limited range of sigmoid function). Next, let us describe the proposed conversion process from ANN to SNN [ Fig. 1(a) ]. In the spiking mode of communication, the input I can be rate encoded as a Poisson spike train I (t). The train consists of a sufficiently large number of time-steps, T N , where the probability of generating a spike at each time-step equals the input I . It can be proved that the resulting process is a homogenous (the probability of spike generation constant over time-steps) Poisson process, where the average firing rate, i.e., the average number of spikes generated over the entire train duration, is given by [19] 
The spiking neuron processes the input spikes and generates a set of output spikes y(t). The response of the neuron is determined by its average firing activity over the T N timesteps, y(t) . Note that such input encoding and neuron output measurement schemes are standard norms for SNNs and is not an additional requirement/overhead for our proposal. Our proposal concerns the manner in which the neuron will process and generate the output spike train y(t). In order to achieve near lossless (with respect to accuracy) conversion from ANN to SNN, y(t) should approximate y reasonably well.
Prior work on such ANN-SNN conversion replace the original ANN neural unit with an SNN unit that has little correspondence to the original ANN unit being considered and often rely on heuristic mechanisms to achieve the conversion, thereby incurring some accuracy loss in the process. Our conversion mechanism follows from the very intuitive observation that the analog activation output of the ANN neuron in the range [0, 1] can be mapped to the probability of spike generation, p(t), of the spiking neuron at each time-step. Hence, at each time-step t, the neuron receives the input spike train, I (t), and generates an output spike with the probability
Now, let us provide a mathematical analysis to justify that such a mapping is able to approximate the original ANN neural unit to a reasonable degree of precision. It follows from 2 that the spike train consists of I.T N number of spiking events and (1 − I ).T N number of nonspiking events, on the average, over the entire duration of time-steps T N . The output spike train is generated according to an inhomogeneous Poisson process [19] (spike generation probability varies over time), where the probability of spike generation is equal to p(t| I (t) = 1) = (1/1 + e −w ) whenever there is an input spike and p(t| I (t) = 0) = (1/1 + e 0 ) = (1/2) in the case of no spike. Hence, the inhomogeneous Poisson process can be decomposed into two homogeneous Poisson processes corresponding to spiking (of duration I · T N time-steps) and nonspiking events [of duration (1 − I ) · T N time-steps]. Hence, the average firing activity of the neuron will be given by the sum of the firing activities of the individual Poisson processes averaged over the total number of time-steps T N . Following 2, we can state that the average firing rate of the output spike train, y(t), is given by
Closer inspection of the above equation reveals that y(t) is a linear approximation of the sigmoid function in the range I ∈ [0, 1]. Fig. 1(b) and (c) shows a plot of the outputs, y (ANN) and y(t) (SNN) with variation in the input I and for synaptic weight magnitudes w = 1 and w = 3, respectively (3 being the maximum weight for the synapses in our network). Note that the negative range for I represents the case for negative synaptic weight. As can be concluded from Fig. 1 , the error between the functions is almost negligible for w = 1 and increases slightly as the magnitude of the weight increases. However, even for the maximum weight w = 3, the error remains bounded below reasonably low values over the entire approximation range. This fact is reinstated by Fig. 1(d) , which shows a contour plot of the error magnitude between the two expressions y and y(t) with variation in both I and w. Note that since we are trying to encode information in the analog sigmoid output of the neural units, weights obtained as a result of backpropagation training typically remain bounded below the values that ensure that the neuron outputs do not fall in the saturation regime of the sigmoid function. As can be observed from Fig. 1(c) , for a weight magnitude of 3, almost the entire range of the sigmoid function is being used and hence, it is expected that synaptic weights should converge to such limited ranges after the training process. In addition neural nets, being inspired from computational mechanisms observed in the biological brain, are characterized by an inherent tolerance to variations in the neural and synaptic units and hence, such minor variation between y (ANN) and y(t) (SNN) is not expected to impact the network performance.
III. MTJ AS A PROBABILISTIC NEURON
The MTJ consists of a thin spacer layer, typically MgO, sandwiched between two nanomagnetic layers. While the magnetization state of one of the layers [free layer (FL)] is switched from one stable state to another, the magnetization of the other nanomagnet [pinned layer (PL)] is fixed. The relative orientation of the FL magnetization with respect to the PL magnetization determines the resistance of the MTJ. The MTJ exhibits a low-resistive parallel (P) state when both FL and PL magnetizations are in the same direction and a high-resistive antiparallel state (AP) otherwise. In this paper, we will consider FL switching induced by spin-orbit torque (SOT) generated by a heavy-metal (HM) underlayer due to the potential of achieving energy-efficient switching along with the possibility of interfacing such a device as a neuron with a synaptic resistive crossbar array. These advantages will be discussed in detail in the subsequent text. The underlying device phenomenon that lends a randomness or probabilistic feature to the switching event of the MTJ is the inherent timevarying thermal noise. As we will show later, the probability of MTJ switching increases in a nonlinear fashion as the magnitude of input current through the HM is increased.
Let us first illustrate the underlying physical phenomena involved in an SOT-induced MTJ switching. The threeterminal device structure under consideration is shown in Fig. 2 . When an input charge current density, J q , flows through the HM underlayer from terminal T2 to T3, an input spin current, J s , is injected into the FL, whose spin orientation is perpendicular to both J q and J s (assuming spin-Hall effect [20] to be the dominant mechanism). Hence, the injected spin current being in-plane polarized can be utilized to switch an FL with in-plane magnetic anisotropy. Note that such an SOT-induced MTJ switching has been confirmed by multiple experiments [21] - [24] and the simulation framework considered in this paper is based on measurements reported in [22] .
The input spin current density is related to the charge current density flowing through the HM underlayer by the relationship,
where I s and I q are the input spin current and charge current magnitudes, respectively, θ SH is the spin-Hall angle [20] , [22] , and A MTJ and A HM are the MTJ and HM cross-sectional areas, respectively. Although θ SH < 1, the input spin current polarization can be maintained > 100% ( J s > J q ) by properly optimizing the device dimensions (A MTJ and A HM ). In contrast, a conventional spin-transfer torque (STT) switching due to charge current flow through PL is always limited by the spin-polarization strength of the PL (<100%). In addition to the possibilities of achieving energy-efficient switching in comparison with the conventional STT mechanism, the input current flows through the HM underlayer, which typically has a resistance of a few hundred ohms. This makes such a three-terminal device a potential candidate to be operated as a neuron interfaced with a resistive crossbar array of synapses. The proper functioning of the array requires the input resistance of the neuron to be sufficiently low in comparison with the synaptic resistances at each cross-point (discussed in detail in later text). In contrast, standard two-terminal MTJ with an STT-induced switching (charge current flowing from terminals T1 to T3 or T3 to T1 through the PL to generate the necessary spin current) would require the input current to flow through the oxide layer, which would typically have considerably higher resistance, thereby leading to nonideal network operation.
The probabilistic switching characteristics of the MTJ can be analyzed by Landau-Lifshitz-Gilbert (LLG) equation with additional term to account for an SOT generated by the HM underlayer [25] 
where m is the unit vector of FL magnetization, γ = (2μ B μ 0 /h) is the gyromagnetic ratio for electron, α is Gilbert's damping ratio, H eff is the effective magnetic field including the shape anisotropy field for elliptic disks,
is the number of spins in FL of volume V The data closely resembles a sigmoid probability density function.
(M s is saturation magnetization and μ B is Bohr magneton), and I s is the spin current generated by the HM underlayer. Thermal noise is included by an additional thermal field [26] ,
, where G 0,1 is a Gaussian distribution with zero mean and unit standard deviation, K B is Boltzmann constant, T K is the temperature, and δ t is the simulation time-step. The device simulation parameters have been outlined in Table I and are based on experimental measurements performed in [22] . A barrier height of 20K B T was chosen, since the MTJ is being used as a computing element in this application. Fig. 3(a) shows the switching probability of the MTJ with variation in the magnitude of input current. The probability switching characteristics undergoes more dispersion with the decrease in the duration of the input write current, T w . While more dispersion in the characteristics results in increased robustness of the system in presence of variations, power consumption of the network increases. These tradeoffs will be discussed in detail in Section IV. In order to map such switching probability characteristics of the MTJ to the sigmoid probability function for spike generation discussed in Section II, the MTJ is considered to be driven by two input currents, namely I bias and I syn . The current I bias provides the necessary current to the MTJ to bias it at a probability of 0.5. The current I syn is the resultant input synaptic current to the neuron. Hence, in absence of I syn , the MTJ has 50% probability of switching similar to the sigmoid characteristics. Fig. 3(b) shows the switching probability characteristics of the Hardware mapping of the computing core (weighted synaptic summation of inputs followed by neural processing) mapped to a crossbar array of resistive synapses interfaced with probabilistic spiking MTJ neurons. The additional row of pMOS transistors supplies the necessary input bias current to each of the MTJ neurons.
MTJ with variation in input synaptic current, I syn (normalized by a factor, I o , which encodes the degree of dispersion of the MTJ switching probability characteristics). The switching characteristics match the sigmoid variation to a reasonable degree of approximation. In addition, note that such neuromorphic algorithms are highly error-resilient and such small approximations in the neuron output will not cause significant changes in the network performance. We will validate our claims by presenting results for a large-scale deep neural network in Section IV. The mapping of the normalization factor in the input synaptic current, I o , to the hardware implementation of a synaptic crossbar array will be discussed later.
In order to implement a neural network, neurons need to be interfaced with synapses. The basic computing core in any neural network architecture, even for deep networks, consists of a dot product implementation, where each of the neural inputs are initially multiplied by synaptic weights, and are subsequently processed by the neuron. Such a functionality can be directly mapped to a crossbar architecture, as shown in Fig. 4 , where each horizontal metal line provides an input voltage across each resistive synaptic device and each vertical metal line provides an input current to the MTJ situated at the end of the vertical column. In order to implement bipolar weights, two rows (V i+ and V i− ) are used for each input V i . When the input V i assumes a logic value of 0 (no spike), then 0 voltage level is applied to both the inputs. However, when V i assumes a logic value of 1 (spike), then voltage V o is applied to the row corresponding to V i+ , and −V o is applied to the row corresponding to V i− . If the weight w i, j for the j th neuron and input V i is positive, then the conductance corresponding to V i+ is programmed to G i, j + = w i, j · G o (G o is the mapped conductance for unity weight), while the conductance, G i, j − , corresponding to V i− is programmed to high OFF-resistive state and vice versa. It is worth noting here that the resistive synapses can be implemented by phase change devices [7] , [8] , memristive devices [9] , or even spintronic synapses [10] .
Since synaptic learning is not the focus of this paper (offline learning), a resistor model was considered for the synapses with 4-b discretization in the synaptic levels and a maximum to minimum conductance ratio of 10, which are typical of such resistive memory technologies.
Let us consider the conductance in the path of the net synaptic current while flowing through the HM of the spintronic neuron to be G s and the voltage drop across the neuron to be V s . Equating the current supplied by the resistive synapses along with the input bias current, I bias , to the current flowing through the neuron, we get i (
, which indicates that the net synaptic current supplied to the spintronic neuron is given by
Note that the resultant weighted synaptic input (with respect to the computational model described in Section II) is scaled by a factor G o ·V o (in the current domain). Hence, in order to map the functionality to the sigmoid probability characteristics, the scaling factor in the MTJ switching characteristics discussed previously, I o has to be equal to G o · V o . In other words, the resultant synaptic current being supplied by the crossbar array needs to be adjusted according to the dispersion of the switching probability characteristics of the MTJ in order to maintain consistency with the computational model described previously.
Another interesting point to note is the nonideality factor, (5) . This reiterates the fact that the input resistance of the neuronal device has to be sufficiently low in order to ensure that most of the input voltage drops across the resistive synapses and the voltage drop across the neurons are negligible. Hence, a sufficient value of the spike voltage, V o (which dictates the value of G o ), has to be maintained to ensure that γ 1. The duration of the input write current also has an impact on the choice of V o and G o . With more duration of input current and hence, less dispersion in the switching characteristics, I o decreases resulting in decrease of G o and hence γ . However, the robustness of the system to variations in the bias current and synaptic conductances suffers. These design space explorations will be considered in detail in Section IV. The operation of each time-step of the SNN takes place through three cycles. In the first phase or the write cycle, the MTJ neuron receives the bias current and the input synaptic current from the crossbar array and switches probabilistically. Note that the bias current can be provided by an additional row of the crossbar array consisting of pMOS transistors biased in saturation. After the write cycle, the read terminals of the neuron are activated. As shown in Fig. 4 , the read circuit consists of a resistive divider network with a reference MTJ (whose state is fixed to the AP state). Hence a spike (logic value 1) is generated at the output inverter in case the MTJ switches to the P state. In case a spike is generated, the MTJ is switched back to the AP state by passing a sufficiently high magnitude of current through the HM in the opposite direction during a subsequent reset phase to ensure normal MTJ operation during the next time-step.
IV. RESULTS AND DISCUSSION
A. Device-Circuit-Algorithm Cosimulation Framework
In order to validate the proposal and explore the design space of the network, a hybrid device-circuit-algorithm cosimulation framework was devised. The probabilistic magnetization switching characteristics of the MTJ was determined by running stochastic LLG simulations for different input current magnitudes and write cycle durations (Fig. 3 ). This behavioral model of the MTJ switching characteristics was utilized for the subsequent system level simulations of the network. SPICE simulations (including the Verilog-A model of the MTJ resistance [27] ) were performed to assess the power and energy consumption of the crossbar array and the MTJ read circuit.
The performance of the network was assessed for a largescale deep learning network architecture [18] (28 × 28-6c5-2s-12c5-2s-10o) on a standard digit recognition problem based on the MNIST data set [28] . The network consists of alternate layers of convolutional and subsampling operations. While the convolutional layers constitute the major computationally expensive component of the network and can be mapped to the synaptic crossbar array interfaced with MTJ neurons as described in Section III, the subsampling layer simply performs an averaging operation over the spikes generated by the MTJ neurons over the nonoverlapping windows of the convolution output maps. The dimensions of the input MNIST images are 28 × 28, which are applied as input to the convolutional layer consisting of six convolutional kernels of size 5 × 5. The subsampling kernel is of size 2 × 2, and is followed by another convolutional layer comprising of 12 output maps, which in turn, is followed by another subsampling layer. The final layer consists of ten neurons, each of which represents one of the ten digit classes. The network is trained using 60 000 training samples based on the methodology outlined in [18] . 1 Once the training is accomplished, the learned weights are mapped to the synaptic conductances using the scheme mentioned in Section III. All recognition accuracies mentioned in this text are with respect to the 10 000 test samples in the data set. The baseline ANN network was trained with an accuracy of 98.56% over the testing set. During the operation of the converted SNN, the image pixels are converted to Poisson spike trains, where the average number of spikes generated over a given time window encode the corresponding pixel intensity.
Note that a deep learning architecture is being used in this paper, since it has achieved high recognition accuracies in a large number of complex data sets. Further the architecture only dictates the manner in which the neurons and synapses are connected to form the network. However, our proposal holds true for any neural network topology, since the basic computational elements and their mapping to crossbar architectures remain equally valid. We would also like to point out that improved training algorithms/network architectures to enhance the performance of the network in terms of recognition accuracy can be performed. However, the goal of this paper is to demonstrate the applicability of the MTJ as a probabilistic spiking neuron that can potentially enable nearlossless (with respect to classification accuracy), low-power, low-latency SNNs converted from the trained ANNs.
B. Impact of MTJ Write Cycle Duration and Crossbar Supply Voltage on Network Performance
Let us first describe the impact of write cycle duration on the performance of the network. With increase in the duration of the write cycle, the switching probability characteristics become sharper. Hence the synaptic current requirement from the crossbar array reduces. Further, the bias current magnitude also reduces, since SOT is exerted on the magnet for a longer duration of time. Hence, power consumption of the network is expected to reduce with increase in the magnitude of the write cycle duration. However, this occurs at the expense of delay, since the network has to be operated over a number of time-steps, and each time-step duration is directly related to the duration of the write cycle.
However, decrease in the write cycle duration, i.e., increase in the dispersion of the probability switching characteristics of the MTJ will result in the increase of the factor γ , as discussed previously, thereby leading to nonideal network operation. Fig. 5 shows the classification accuracy as a function of the time-steps of simulation of the SNN with varying write cycle durations (T w ), namely 0.2, 0.5, and 1 ns. As expected, for a fixed supply voltage, classification accuracy improves with increase in the write cycle duration. While the network accuracy reaches 97.6% and 96.4% for T w = 1 and 0.5 ns, respectively, it saturates at 83% for T w = 0.2 ns at the ends of 500 time-steps. An interesting point to note is the low latency in the performance of the network. The accuracy reaches 96.3% and 93.8% at the end of just 20 time-steps for T w = 1 and 0.5 ns, respectively. This is a crucial advantage offered by our ANN-SNN conversion scheme, since although SNN implementations are ideal for low-power neural network implementations, they incur penalty in terms of the delay, since the network outputs need to be observed over a number of time-steps to generate sufficient confidence in the inference process. With our proposed conversion scheme, network accuracies close to the original trained ANN baseline can be achieved only within a few tens of time-steps of the spiking network operation.
Scaling the supply voltage, in turn, results in the increment of the factor γ , thereby leading to more errors in the network performance. However, it is worth noting here that the drop in recognition accuracy is minimal for sufficiently large durations of the write cycle. For instance, the accuracy drop is insignificant (97.1% and 94.6% for T w = 1 and 0.5 ns, respectively) even with the crossbar supply voltage being scaled down to 0.8 V. The key point we would like to stress from this section is that by maintaining a sufficient duration of the write cycle, it is possible to achieve near-lossless SNN operation with minimal delay coupled with the possibilities of voltage scaling for reduction in power consumption. It is also worth noting here that the analysis performed in this section includes nonidealities arising from the hardware mapping of the SNN to a synaptic resistive crossbar array interfaced with MTJ neurons (including nonideality factor γ and the deviations of MTJ switching probability characteristics from ideal sigmoid function).
C. Variation Analysis
Although increase in the write cycle duration helps to reduce the nonideality in the network (by reduction of factor γ ), it is associated with increased performance loss in the presence of random variations due to sharper probability switching characteristics of the MTJ. In this section we will investigate the impact of random variations in the synaptic resistances of the crossbar array along with variations in the input bias current of the MTJ (Fig. 6) . The average classification accuracy was determined by performing 50 independent Monte Carlo simulations of the network for each of the 10 000 test images in the data set. Fig. 6(a) shows the average classification accuracy of the network with variations in the synaptic resistances of the crossbar array. Since the range of synaptic resistances are adjusted according to the dispersion of the MTJ switching probability characteristics (through the relation I o = V o · G o discussed previously), the impact of synaptic resistance variation is expected to be similar for different write cycle durations. An additional point to note is that, even with σ = 20% variation in the synaptic resistances, only 3% (T w = 1 ns) and 3.3% (T w = 0.5 ns) degradation in classification accuracy was observed with respect to the original network (without variations) at the end of 50 time-steps. Such robustness to variations in the input synaptic current can be attributed to the error-resiliency of such neuromorphic computing systems.
However, the input bias current of the MTJ is a more critical parameter (with respect to variations) that ensures proper functionality of the network. Variations in the input bias current can skew the probabilistic MTJ operation in one direction, thereby causing degradation in recognition accuracy. Hence, sharper MTJ probability switching characteristics would result in more errors during the recognition process with variations in the input bias current. Fig. 6(b) shows that while 12.8% reduction in accuracy was observed for σ = 20% over the ideal network at the end of 50 time-steps for T w = 1 ns, only 7.6% degradation was observed for T w = 0.5 ns. These results signify the fact that it is crucial to choose an optimal value of the write cycle duration that simultaneously achieves near-lossless SNN conversion along with the robustness to random variations in the input bias and synaptic currents. Note that a precise value of input bias current can be maintained by utilizing CMOS reference current generators that would exhibit σ variations much <20%. However, impact on network performance with such high degree of variations was performed to establish that the network is highly error-resilient along with the fact that a judicious choice of the write cycle duration can enable robustness of the network even to large variations in the more sensitive MTJ input bias current.
In addition, we considered the impact of variation in the chip operating temperature by running a worst case simulation, where all the MTJs in the network were assumed to operate at 400 K instead of the design temperature, 300 K. A recognition accuracy of 96.73% was achieved at the end of 50 time-steps of network operation, thereby confirming that the proposed probabilistic neural computing framework is resilient to temperature variations as well.
D. Power and Energy Benefits
In order to evaluate the energy consumption of the network, SPICE simulations were performed to determine the energy consumption involved in write, read, and reset operations. In addition to providing a compact implementation of a spiking neuron, the MTJ enables low-power operation of the synaptic crossbar array. This is due to the fact that only input current magnitudes of a few tens of microamperes need to be supplied by the crossbar array on either side of the bias current (Fig. 3) . Note that the dominant power consumption of the network is involved in the synaptic crossbar array (since the number of synapses typically outnumber the number of neurons in such deep neural networks by two to three orders of magnitude), and such magnetometallic spintronic neurons enable the low-power operation of the crossbar architectures. For the energy analysis, we considered the optimal write and reset cycle durations to be 0.5 ns due to the possibilities of achieving near-lossless SNN conversion along with robustness to input bias current variations. An intuitive insight to the power efficiency of the network can be obtained by considering the fact that only 71 μA of input current is required to bias the MTJ at 50% switching probability (T w = 0.5 ns). This current flowing through an HM resistance of 400 , results in an I 2 Rt energy consumption of ∼1 fJ in the neuron. Considering the resultant energy consumption in the write, read, and reset cycles of the network over a duration of 50 time-steps (since competitive classification accuracy can be obtained at the end of a few tens of time-steps), the total energy consumption of the proposed MTJ-based SNN network was evaluated to be 19.5 nJ per image classification.
An interesting point to note is that there is an additional delay overhead involved in the SNN operation. On the other hand, an ANN operation (for instance, resistive crossbar array driven by analog CMOS neurons) would require a single timestep for recognition. However, the delay overhead (few tens of time-steps) is much smaller than the corresponding reduction in power consumption due to an event (spike)-driven hardware operation. For example, the average energy consumption of an analog CMOS neuron is estimated to be ∼700 fJ [15] , which would still be an order of magnitude greater than the average energy consumption of an MTJ neuron (∼1 fJ) operated over a duration of 50 time-steps.
In order to compare with a baseline digital CMOS implementation, a deep spiking network consisting of integratefire (IF) neurons converted from a corresponding trained ANN was used based on the methodology proposed in [5] for the same network architecture (28 × 28-6c5-2s-12c5-2s-10o) being considered in this paper. The network was synthesized using a standard cell library in 45-nm commercial CMOS technology. The design consisted of digital adders to sum up the synaptic weights in the case of a spiking event (enabled by multiplexers). A comparator was utilized to compare the accumulated synaptic contributions with a specific threshold (IF functionality) and determine the corresponding spiking activity. A pipelined design with power gating (to exploit the advantage of the event-driven operation of the network) was considered with the same bit discretization in the synaptic weights as mentioned previously. The average energy consumption involved in the network per image classification was evaluated to be 391 nJ (20× more energy consumption than the proposed MTJ-based spiking architecture).
V. CONCLUSION
In conclusion, we proposed a probabilistic neural computing platform that exploits the stochastic device physics of the MTJ to model complex neural transfer functions in the time domain. While the stochasticity of MTJ switching has been traditionally viewed as a disadvantage for logic and memory applications, we demonstrated that such probabilistic switching behavior can not only lead to high-accuracy cognitive recognition platforms but also provide energy benefits over conventional CMOS designs.
