Abstract -Recent artificial neural network architectures improve performance and power dissipation by leveraging resistive devices to store and multiply synaptic weights with input data. Negative and positive synaptic weights are stored on the memristors of a reconfigurable crossbar array (MCA). Existing MCA-based neural network architectures use high power consuming voltage converters or operational amplifiers to generate the total synaptic current through each column of the crossbar array. This paper presents a low power MCA-based feedforward neural network architecture that uses a spintronic device per pair of columns to generate the synaptic current for each neuron. It is shown experimentally that the proposed architecture dissipates significantly less power compared to existing feedforward memristive neural network architectures.
Introduction
Artificial neural networks are used in many applications such as pattern matching, character and speech recognition, and big data management, among others. They consist of an input layer, an output layer, and multiple hidden layers [1, 2, 3, 4] . Each layer consists of several neurons. Each neuron has multiple inputs that are typically real numbers and one output that is typically a real number. Each neuron communicates with other neurons through links called synapses that have positive or negative weight values. The neuron calculates the sum of all its weighted inputs and maps the sum into an output signal by a transfer function called activation function [1, [5] [6] [7] [8] .
An emerging artificial neural network paradigm uses reconfigurable memristive crossbar array (MCA) to perform the needed multiplication and addition operations [2, 6, [9] [10] [11] [12] [13] with low power and high performance. MCA-based NN (MNN) architectures require a pair of memristors to store either a positive or a negative synaptic weight. There are different type of MNN. This paper considers multilayer feedforward MNN [2] as opposed to spiking neural networks and recurrent neural network architectures, among other types of neural networks. The feedforward MNN in [9] uses a dual column structure where two adjacent memristors in a row store a synaptic weight. The MCA in [2] uses dual row structure where the two adjacent memristors in a column store a weight value. Both approaches store a weight value in one of the two memristors, and require the other to be in a very high resistive state so that the current through other memristor is negligible. The sign of the weight value determines which memristor is in the high resistive state. Other feedforward memristor-based NN architectures use the Wheatstone bridge [14, 15] instead of the MCA to implement a synaptic weight. Among these approaches, the least power consuming are the dual row MCA architecture in [2] that requires a voltage converter to implement positive and negative weight values and the dual column architecture in [9] that uses an operational amplifier per column.
Architectures as in [14] [15] [16] are gaining much attention because the required arithmetic operations can be performed by simple components that use emerging resistive devices. Power dissipation and execution time are drastically lower when compared to multiprocessor-based systems tailored to neuromorphic calculations [17, 18] or to GPGPU-based architectures [19] . This paper presents a dual column feedforward MNN architecture that avoids the operational amplifier of [9] . Instead, it uses a spintronic device per neuron to compute the total synaptic current through each MCA column. SPICE-based simulation in 45nm technology shows that the proposed architecture dissipates considerably less power when comparing to [2] and [9] . Experimental results are presented on the benchmark data sets in [20] [21] [22] .
The paper is organized as follows. Section 2 describes the proposed architecture. Section 3 presents the experimental evaluation of the proposed architecture, and Section 4 concludes the paper. Figure 1 shows the structure of the proposed MCA-based layer in the feedforward MNN. It consists of n rows and 2m columns. Each layer has n inputs , 1 ≤ ≤ that are real numbers in the range [0, 1]. There are m neurons, and each neuron has a pair of MCA columns. At the j th pair, there is an interface module (denoted by IM) that generates the total synaptic current into the activation function module , 1 ≤ ≤ . The output of the j th activation function , is also a real number in the range [0, 1]. Each IM is a domain wall spintronic device. In the current implementation, activation function is the sigmoid and is implemented by the circuit in [23] . It is noted that the architectures in [9] and [14] only implement sigmoid activation functions. In contrast, the proposed architecture as well as [2] can accommodate any existing current-based hardware implementation of an activation function, such as the step function in [2] .
Proposed Architecture
Each synaptic weight , linking the i th neuron and the j th neuron consists of two adjacent memristors , and , , respectively. Only one of these two memristors is in the off-state. For instance, considering the j th column pair, if the weight is positive, , is programmed to have the specific weight value, and , is in the off-state. However, if the weight is negative, , is programmed to have the weight value, and , is in off-state. Let and denote the synaptic current for the positive and negative convolutions in the j th column pair.
Each column in the crossbar array of Figure 1 calculates the partial weighted sum of either positive or negative convolutions. The difference between two currents and in the j th column pair is calculated by the IM. This is the j th total synaptic current. Columns and in the j th column pair and the IM are part of the neuron that calculates the total synaptic current for the j th neuron. Fig. 2 (a) shows the circuit diagram of the proposed IM where V denotes a control voltage. and are the inputs to the IM that determines the total synaptic current . The DW device is a three-terminal device that consists of a thin nano strip between two anti-parallel fixed magnetic layers (PL). This nano strip forms the free magnetic layer (FL). The magnetization of FL determines the resistive state of the device. The transition area between the two PLs is called the DW. The DW can be moved by injecting current along the nano strip. This changes the magnetic orientation of the FL. A fixed magnet and a domain wall strip form the Magnetic Tunnel Junction (MTJ) that reads the resistive state of the device [2, 24] .
The operation of the IM is described using three non-overlapping clocks Clk1, Clk2, and Clk3. The duty cycles of each clock are different because the reset, write, and read times of the DW device are different. Let and denote the low and high resistive state of device, respectively. When Clk1 is high, the spintronic device is reset with resistive value . In this case, the DW is at the center position. When Clk2 is high, the spintronic device is programmed using the total synaptic current = − . That way, the difference between positive and negative total synaptic current through consecutive columns of the MCA is mapped to a resistive value in the DW spintronic device. When Clk3 is high, the activation circuit is active, and the difference of total synaptic current is mapped to a voltage value. The current mirrors in the Figure 2 (a) ensures that the range of current generated by the interface module falls into the range required by the activation circuit for reliable operation. The externally supplied negative current amounts to the current through spintronic device when the DW is at the center of the nano strip. Let and denote the currents through IM when spintronic device has and resistive state, respectively. The value of = . Figure 2 (b) shows the timing diagram of the operation of the interface module.
Experimental Results
A simulator for the proposed architecture has been developed. We consider TiO2 bipolar metal-oxide memristors and the VTEAM model in [25] . In our simulator, the length and the , memristance boundaries were set to 5 nm, 5 KΩ, and 5 MΩ, respectively. Other memristor parameters were set as in [26] . The switching time was 100 ns when the applied voltage was ±1 V. Multiple bits of information can be stored in a single cell using different memristance values. Thus, , and , were implemented with a 5-bit memristive multi-level cell [27] [28] [29] .
Since the current-voltage relation of a memristor is nonlinear, each level corresponding to a weight value was assigned using the approach presented in [29] . Any level of weight value can be realized by changing the memristance of the memristor gradually with a precise write control signal [27] . We used five different levels to implement 31 weight values. In our simulator, the dimensions of domain wall strip were 100 × 20 × 2 nm 3 , the MgO thickness was 1.1nm, the saturation magnetization was 6.8 × 10 5 A/m, and domain wall width was 15 nm. The DW could be moved from one edge of free layer to the other in 2ns when applying 35 µA current.
The proposed interface module (IM), the voltage converter of [31] , the operational amplifier-based sigmoidal neuron of [9] , and the low power analog sigmoidal neuron of [23] were implemented in 45 nm predictive technology. Our experimentation showed that the average power dissipated by [23] was 8 µW.
Simulators for the MCA-based feedforward architectures in [2] , [9] and non MCA-based feedforward architecture in [14] were also developed in 45 nm predictive technology for experimental comparisons. The simulator for the architecture in [2] was enhanced in order to implement the analog sigmoidal circuitry in [23] . Table 1 shows the average power dissipated by the various components in the proposed architecture as well as the MCA-based feedforward architectures in [2] , [9] . Table 1 shows that the power dissipated by the proposed interface circuit was almost the same as the power of the voltage converter in the architecture of [2] . The proposed dual column architecture had reduced power when compared to the dual column architecture in [9, 16] because the number of voltage converters at any layer is equal to the number of inputs, and the number of IM equals to the number of neurons. The number of neurons in a layer is always less than the number of inputs. Simulation results for the proposed architecture as well as [2] and [9] were obtained in Python for the MNIST dataset [15] , American Sign Language (ASL) dataset [16] , and the CIFAR10 dataset [17] . MNIST contains 28 × 28 gray scale handwritten images, ASL contains 200 × 200 RGB images, and CIFAR10 contains 32 ×32 RGB images. For the MNIST dataset, the NN had 784 input neurons and 10 output neurons. There were three hidden layers with 500, 300, 128 neurons, respectively. For the ASL dataset, the NN had 400,000 input neurons and 24 outputs. There were three hidden layers with 1,000, 500, and 128 neurons, respectively. For the CIFAR10 dataset, the NN had 1024 input neurons and 10 output neurons. There were three hidden layers with 500, 256, and 64 hidden neurons each. Images in both the ASL and CIFAR10 datasets were converted to grayscale image before feeding them into the network. Table 2 shows the total average power dissipated by the NN architectures in [2] and [9] , and the proposed architecture. The total power dissipated by the proposed architecture is the sum of the power consumed by the interface module and the sigmoidal activation function in [18] . The total power by the NN architecture in [2] amounts to the power dissipated by voltage converter and the power by the sigmoidal activation function component [23] . The total power by the NN architecture [9] amounts to the average total power dissipated by the two differential amplifiers. Notice that the power savings over [2] were approximately 19%, 41%, 28% when considering the MNIST dataset, the ASL dataset, and the CIFAR10 dataset, respectively. The power savings over the operational amplifier-based architecture in [9] was 56% for all three datasets. These results exclude the power dissipation on the MCA, which is common to all architectures.
We also provided simulation results for the NN architecture in [14] . It was simulated for a small network consisting of 10 inputs and 4 output neurons. Weights were set between 5KΩ -5MΩ, and inputs were in the range [0, 1] V. Even for that small network, the total power dissipation was 230mW. The power dissipated by the sigmoidal neuron alone was 2.06mW. These results show that [14] is not as power efficient as the proposed architecture.
It is noted that the proposed NN architecture had the same classification accuracy as [2, 9] in all benchmarks. In particular, the accuracy was 96%, 70% and 95% for the MNIST, CIFAR10 and ASL datasets, respectively.
Conclusion
A low power spintronic circuit has been introduced in order to generate the input current for the activation circuit of an MCA-based neuromorphic architecture. The proposed interface circuit uses the domain spintronic device. It has been experimentally shown that the power dissipation of the proposed neuromorphic architecture outperforms existing architectures with emerging resistive devices.
