We present a low barrier magnet based compact hardware unit for analog stochastic neurons and demonstrate its use as a building-block for neuromorphic hardware. By coupling circular magnetic tunnel junctions (MTJs) with a CMOS based analog buffer, we show that these units can act as leaky-integrate-and fire (LIF) neurons, a model of biological neural networks particularly suited for temporal inferencing and pattern recognition. We demonstrate examples of temporal sequence learning, processing, and prediction tasks in real time, as a proof of concept demonstration of scalable and adaptive signal-processors. Efficient non von-Neumann hardware implementation of such processors can open up a pathway for integration of hardware based cognition in a wide variety of emerging systems such as IoT, industrial controls, bio-and photo-sensors, and Unmanned Autonomous Vehicles.
I. INTRODUCTION
Temporal inferencing and learning form the next frontier in the discipline of Artificial Intelligence. Development of hardware that can implement these tasks in-situ can revolutionize the rapidly emerging era of smart-sensors, self-driving automotives, Unmanned Autonomous Vehicles (UAVs), and the Internet of Things (IoTs) by opening a pathway for self-contained, energyefficient, highly scalable, and secure machine intelligence. In this work we propose a hybrid unit consisting of a low-barrier magnet based tunnel-junction (MTJ) coupled with a conventional CMOS based analog buffer as a building block for neuromorphic hardware that is particularly suited for temporal learning based signal processing tasks.
MTJs lie at the heart of the Spin Transfer Torque based Magnetic Random Access Memory (STT-MRAM), a rapidly emerging commercial non-volatile memory technology, and can be built with semiconductor fabrication facilities available today 1 . The dynamics of our proposed hardware unit has a one-to-one mapping with the physical behavior of biological neurons, in particular stochastic leaky-integrate-and fire (LIF) neurons. We further show how networks assembled from these building-blocks can successfully learn and reproduce a chaotic signal by building temporal generative models, and work as adaptive filters by inverse modeling a communication channel with non-linear distortions.
The ultra-compact footprint of these building-blocks with built-in neuron like behavior enables the design of reconfigurable large scale analog neuromorphic hardware that is more energy-efficient and highly scalable compared to present day practice, where neural networks are emulated as software models on Boolean algebra based hardware typically in cloud, with its associated inefficiencies, as well as concerns with cybersecurity and high a) Electronic mail: sganguly@virginia.edu b) Electronic mail: kcamsari@purdue.edu c) Electronic mail: ag7rq@virginia.edu communication bandwidth consumption.
II. STOCHASTIC NEURAL NETWORK NODES USING MAGNETIC TUNNEL JUNCTIONS
In magnets the state retention time is given by the Arrhenius relation 2 :
For memory technology, the energy-barrier U targeted for the MTJ free layer is at least 40kT , to maintain a high state retention (τ ∼ 10 years) with typical τ 0 ∼ 0.1−1 ns.
is determined by material properties such as saturation magnetization (M s ), anisotropy field strength (H k ), and geometrical volume (Ω). However, in this work we use low energy-barrier magnets achieved by ultra-scaling 3 ( fig. 1a ) to enable fast dynamics in the reservoir and to leverage the built-in stochasticity provided by such magnets. In this case, the free layer magnetization m stays randomized between the two energy minima states when no current is provided. A large spin-torque from a large driving input current I in biases m towards one of its minima directions, as shown in fig. 1b . We can utilize this controllable stochastic behavior to build noisy hardware neurons, both analog and digital versions. The core of the proposed hardware unit is a 1-MTJ 1-T in a pull-up, pull-down configuration as shown in fig. 1c . We then add a Wilson current mirror based analog buffer to this unit to generate a noisy analog output, while preventing any loading effects on the transfer characteristics of the unit from high fan-out at the output end. In this unit, the MTJ's intrinsic resistance (RA = 100Ω − µm 2 , area = π × 50 × 50 nm 2 , T M R = 100% ), is chosen to fig. 1d . The response is inherently noisy due to the thermal noise's effect on the free layer magnetization as described before. Both the averaged signal and its upper and lower bounds show a tanh (or logistic function) like excitation. The overall response of the unit can be modeled by the following equation:
where the parameters α, β depend on the particulars of device design. It can be be shown that a transistor's turn off and turn on depends on the deposition of a critical amount of switching charge on the gate terminal of the transistor 4, 5 . This switching charge, in our design, is supplied by the net current from preceding neurons flowing in to the resistive-capacitive metallic interconnects and then into the gate capacitor, where it automatically gets weighted, summed, and integrated over time
. The use of a low-barrier magnet in this structure inherently introduces volatility due to thermal noise, resulting in leakiness of the input current integration. Therefore, this unit behaves as a stochastic leaky-integrate-and-fire (sLIF) neuron. Additionally, adjustment of barrier height of the magnets (U ) allows for tuning the dynamical rates of sLIF neural networks built from this unit.
Binary Stochastic Neuron (BSN or p-bit)
A binary stochastic neuron can be built from the same 1-MTJ 1-T unit. However, instead of a analog current mirror, we use a digital CMOS buffer (e.g. two cascaded NOT gates) as the output stage. This turns the output of the unit digital, i.e. V out is either V + or V − , probability of which is dictated by a tanh law, unlike ASN whose response is continuous between V + and V − . The response of the BSN is shown in fig. 1d inset and given by:
This BSN design, presented elsewhere and called "pbit" 6 , has been used in a variety of optimization problems (see refs. 7, 8 ). These two related but distinct units, ASN and BSN, could form building blocks for a variety of neural networks, depending on the behavioral requirements. Controllability of behavioral noise through device design and electrical control make these units particularly useful in cases where stochasticity is an integral feature Reservoir Computers (RC) are models of biological neural networks 12,13 that have been used for various signal processing tasks [14] [15] [16] [17] . In these networks, the computation is performed by a collection of randomly coupled non-linear units with recurrent network topology ( fig. 2a ). Such networks: (a) provide a huge expansion of the dynamical phase-space, increasing the distance between the signal-class centroids; and (b) give rise to memory states in the network 18 allowing a signal to be temporally correlated, resulting in better signal classification. The nodes of the network are leaky; therefore, the network memory is short term and fading -a feature critical to avoid overtraining.
RC Dynamics: Let x be the collective state vector of the reservoir, u be the input vector and y the output vector. Also let W in , W self , W out , W f b be the matrices representing the synaptic connections between the input-reservoir, reservoir-reservoir, reservoir-output, and output-reservoir nodes respectively. The most general form of the RC dynamical equations is given by:
Hardware implementation of a reservoir computer: a. The reservoir is a network of randomly connected sLIF neurons that processes an input stream and produces the output stream as a collective response. b. The reservoir node is built from an ASN, while the synaptic connections are made from controllable resistor networks.
Here η and κ are system constants representing the leaking rate and the strength of the activation in the reservoir, f N L is a non-linear function, usually tanh, and ν is the noise. This dynamical equation is equivalent to a model for a network of stochastic LIF neurons whose synaptic strengths are given by the various W matrices. RC Learning : We use a weighted linear sum on x for time-series pattern learning and classification. The only synaptic weights adjusted during the training are the reservoir-output connection W out which involves minimizing the 2 norm: ||W out x−y|| 2 by finding the optimal W out (for example see Weiner-Hopf method 18 ) .
B. Reservoir Computer Hardware Implementation
The RC dynamical equation (eq. 4) on a discretized temporal grid equispaced by δt = 1 in normalized time units (i.e., the magnet's dynamical time-scale γH k /(1 + α 2 )) and f N L ≡ tanh, can be written as follows:
noise (6) where denotes an additional factor of ∆t, and
. Eq. 6 can be interpreted as describing a blackbox, whose output (x[t+1]) is the sum of three terms: a) a transduction function given by a tanh type nonlinear activation, b) "leaked" past state x[t], where the leakiness arises from small stateretention times of ASN, c) noise inherent to the unit, both of which are naturally provided by the low-barrier magnets. The ASN's electrical response directly corresponds with the behavior described by eq. 6 and therefore it can be used to build compact reservoir computing nodes. For hardware based reservoir computing proposals using other material systems please see [19] [20] [21] [22] [23] [24] [25] [26] [27] [28] .
C. Programmable Hardware Synapses
A fully hardware based neural network necessitates that the physical interconnections or synaptic-weights be controllable in strength. In the presented hardware unit, the input signal is the net current flowing in the unit, while the output signal is the resulting voltage level. Therefore a resistor network can implement the synaptic weighing using Kirchoff's current law at each neuron's input, i.e. I in = G k V k , where interconnect conductances G k 's are proportional to synaptic weights of the interconnection with other neurons (fig. 2b) . Optionally, a small p-n junction diode can be introduced to ensure uni-directionality of current flow within the synaptic network with an added circuit cost of adjusting bias voltage ranges, since ASN and BSN require bipolar voltage range for operation.
These programmable resistor networks could be implemented using a MOSFET in linear mode, whose channel resistance is controlled by the gate voltage. Compact memristor cross-bar arrays 29, 30 might be even better suited for this task since memristors are non-volatile and therefore more energy efficient than transistor networks.
IV. SIGNAL PROCESSING USING ANALOG STOCHASTIC NEURON (ASN) NETWORKS

A. Chaotic Time-Series Predictor
The Mackey-Glass (MG) equation 31 is a time series generator with periodic but subtly chaotic characteristics. The generating equation is given by:
We train our ASN network to generate an MG system with a chaotic datastream for training, and then tested it on a test signal from the same generator. The ASN learns to reproduce the generator signal purely from its previously self-generated output. We found that for small number of nodes, the network fails to match the MG signal, but starts to generate better match for larger networks (see highlighted areas in fig. 3 ). This happens because of the substantially richer dynamics and phasespace volume possible in a larger network. This task illustrates the possibility of creating temporal sequence-predictors and temporal auto-encoders using ASNs. Such temporal predictors and auto-encoders can find applications in temporal data modeling and reconstruction, and early-warning systems in bio-physical signal monitors by distinguishing out-of-the-norm patterns and beats, such as cardiac arrhythmia and seizures.
B. Filtering Using Learning
We now demonstrate a task at the heart of signal processing and digital wireless communication, i.e. signal filtering and channel equalization. The task is to recover a bitstream after it passes through a medium or channel that introduces non-linear distortions, inter-symbol interference, and noise which cannot be fully compensated using a linear filter 32 . The principal idea behind our implementation of channel equalizer is to use an ASN network to reverse the effect of the channel, by learning the inverse of the underlying model of the channel's transfer function.
Let d(t) be the original signal which goes through a channel ( fig.4.a. ) whose transfer function p(z) produces u(t) = p(d(t)) and is given by:
n ] + C m [rnd(−1, 1)]
(8) The function p(z) asymmetrically and non-linearly amplifies d(t), introduces phase distortions, inter-symbol interference, as well as a random noise to generate u(t). We train the network the function q(z) = p −1 (z), so that it can recover d(t) from u(t). After training, we test the network and find that for even small size networks (N = 20), the signal can be extracted with high fidelity from severely distorted signals (Fig. 4b) . In the presented simulation, the Symbol Recovery Rate (SRR = 1 − |y(t) − d(t)| |u(t) − d(t)| ) was 94.28%. From multiple simulations on a wide variety of models for p(z), we have found the SRR to lie in the 90 − 100% range. More complex filter designs with stacked networks may help increase the performance of such filters. This task shows the possibility of building highly compact and energy-efficient dynamically trainable neuroadaptive filters using ASNs. Such filters can find wide applications in SWaP (size-weight-and-power) constrained environments such as IoT, sensor networks, and selfdriving automotives and UAVs.
