In this work we present an implementation of spike-timing-dependent plasticity (STDP) in both a high level simulation and at a circuit level. It is verified that the high level simulation captures the behavior of the circuit implementation. We use the simulation to assess the effectiveness of STDP for online-learning, and find that STDP enables networks to improve performance online after training.
INTRODUCTION
Neuromorphic computing has emerged as a new paradigm of computing inspired by the human brain. Unlike the traditional Von Neumann architecture where memory is stored separately from the processor, neuromorphic architectures store data within neurons that are interconnected in parallel through synapses. Because the data is stored in the processing units, neuromorphic systems do not face the same memory botteleneck that Von Neumann systems typically do.
One of the main motivations for neuromorphic computing is to replicate the success of the human brain in many applications such as pattern recognition, image classificaiton, and control tasks that current computing paradigms struggle with. In addition to improving performance on these applications, these neuromorphic systems also offer the ability to adjust their parameters, such as ACM acknowledges that this contribution was authored or co-authored by an employee, or contractor of the national government. As such, the Government retains a nonexclusive, royalty-free right to publish or reproduce this article, or to allow others to do so, for Government purposes only. Permission to make digital or hard copies for personal or classroom use is granted. Copies must bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. To copy otherwise, distribute, republish, or post, requires prior specific permission and/or a fee. Request permissions from permissions@acm.org. NCS '17, July [17] [18] [19] 2017 the synaptic weight, online. These online adjustments of synaptic weights are inspired by learning mechanisms in the human brain, where synapses are potentiated and depressed based on their activity. Thus, neuromorphic systems and their inclusion of biologically inspired features have the potential to realize a system that can enhance performance online.
One of the challenges for neuromorphic architectures has been evaluating their designs. Many of these systems are inspired by the human brain, and some seek to outright mimic it. It is important, however, to consider the impact these brain inspired features might have on the performance of the architecture. One of these brain-inspired features is spike-timing-dependent plasticity (STDP) to enable online-learning [15] . Unlike some models for synaptic potentiation and depression which only consider synapses that carry a charge directly before or after a neuron fires, STDP considers synapses that carry a charge within a time window around neuronal firing. In our synchronous memristive neuromorphic system we consider synapses that carry a charge within an arbitrary number of clock-cycles before or after the receiving neuron fires.
Another one of the primary challenges is defining an appropriate network topology for the problem. Typical neural networks define the topology ahead of time, and then modify the synaptic weights during training. Our approach uses a genetic algorithm to generate a network with no predefined structure, though typically recurrent neural networks are generated [14] . These networks are generated and tested using a high level C++ simulator that accurately captures the behavior of the hardware. In this work, we evaluate networks generated for XOR and iris classification tasks using a C++ simulator of a memristive neuromorphic system. These networks are then used to evaluate the impact of STDP in the memristive neuromorphic model. have properties similar to those predicted by Chua. Memristors act as programmable non volatile memory devices and hence are of particular interest in neuromorphic circuits. Synaptic weights can be encoded as memristance values. Incremental memristance changes prove vital for on-chip learning as well as for off-chip learning. To predict the change in memristance value, different memristor models for different devices are used.
The memristor considered in this work is a HfO 2 based device. The model used for simulation is derived from a model previously developed in [1] . The original model considers unipolar, nonpolar and bipolar behavior, but the adapted model mainly focuses on bipolar behavior [17] . In this model, the memristance change is proportional to the applied voltage and time for which this voltage is applied. If the memristor has attained either the High Resistance State (HRS) or the Low Resistance State (LRS), it remains at that resistance level regardless of the voltage applied further with the same polarity. During a SET operation from HRS to LRS, the change in memristance is given by:
Conversely, during a RESET operation from LRS to HRS, the the change in memristance is given by:
where M is the memristance, ∆r is the absolute difference between the HRS and LRS values, V (t) is the applied voltage bias across the memristor, t pw is the time for which this voltage is applied, t swp (t swn ) is the time to effect a resistance change from HRS to LRS (LRS to HRS), and V tp (V t n ) is the threshold voltage under positive (negative) polarity. If we consider the memristance to have symmetric switching time and threshold voltage, the change in memristance (∆M) in either direction is given by:
where t sw = t swp = t swn and V t h = V tp = V t n . The parameters used in this work are based on the HfO 2 device described in [2] . Based on experimental results reported therein, we conservatively assume the switching time (t sw ) to be about 1µs and switching threshold voltage (V t h ) to be ±750 mV. Also, we have assumed HRS to be 50kΩ and the LRS to be 5kΩ, which gives rise to a HRS/LRS ratio of 10. Fig. 1a shows the circuit symbol for the memristor. Note that a positive voltage applied across it will reduce the memristance (∆M < 0) and vice-versa. The pinched hysteresis curve of current vs voltage is shown in Fig. 1b for these device parameters.
Spike-Timing-Dependent Plasticity
Spike-timing-dependent plasticity is a process in which the strength of the connections between neurons is adjusted as a function of temporal differences between neuron spiking events. The concept of STDP originates from Hebbian learning [9] which describes a basic mechanism for synaptic plasticity. When a neuron repeatedly or persistently contributes towards another neuron's firing, the synaptic connection between them is bolstered. It may be noted that in this work, we refer to the neuron that precedes a synapse as the pre-neuron and the succedent one as the post-neuron. A pre-neuron can only contribute to a post-neuron's fire if it precedes the latter's fire, implying that the causality of fires is crucial. Hebb's postulation is extended in STDP, which also formulates how much the synaptic weight change should depend on the temporal difference of the fires. In general, if the pre-neuron fire occurred within a reasonable time window (five clock cycles in our case) before the post-neuron fire occurs, long term potentiation (LTP) takes place. Conversely, if the pre-neuron fire occurs within a reasonable time window after the post-neuron fire occurred, long term depression (LTD) takes place. The largest change in synaptic weight occurs when the difference in time between the pre-and post-neuron fires is small and as this difference gets larger, the synaptic weight change diminishes exponentially [15] . The STDP behavior captured in a piecewise exponential relation can be described by equation (4):
where ∆t is time difference between the post-neuron fire and preneuron fire (∆t = t pr e −t post ), ∆w is the change in synaptic weight which is a function of ∆t. It is expressed as a percentage of a predefined maximum weight value (w max ). The parameters τ + and τ − are the time constants, which determine the time window over which potentiation or depression may occur. A + and A − are both positive numbers and define the maximum synaptic modification, which occurs as ∆t approaches zero. For biological systems, the time constants (τ + and τ − ) are usually in the range of tens of milliseconds. In some experiments, they were roughly equal, leading to identical synaptic weakening and strengthening [3, 10, 18] . As mentioned in [6, 7] , the weakening window could be greater than the strengthening window (τ − > τ + ).
The two cases are illustrated in Fig. 2 . Specifically, Fig. 2a shows anti-symmetric STDP behavior where A + τ + = A − τ − . In Fig. 2b , A − is taken smaller than A + while τ − is taken greater than τ + to give A + τ + < A − τ − . This replicates the STDP behavior described in some works [5] where weakening is stronger over a longer period of time. 
NIDA
The Neuroscience-Inspired Dynamic Architecture (NIDA) is a spiking neural network architecture composed of simple leaky integrate and fire neurons [13] . Unlike traditional artificial neural networks, NIDA networks include a temporal delay along their synapses. To facilitate this temporal component of the NIDA model, the networks are embedded in a 3-dimensional space. The Euclidean distance between neurons in this space then determines the delay along the synapse connecting them. Besides the delay value, synapses also include a weight represented as a floating point value between -1 and 1. Neurons accumulate, or sum, this weight and determine whether or not it crosses a threshold, the accumulation needed to cause a neuron to fire. After a neuron fires it enters a refractory period, a time period for which it cannot accumulate any charge.
As a model for neuromorphic computing, NIDA has a host of software tools built around it to facilitate it's development. NIDA fits into the software framework presented by Plank et al [12] . Within this framework, NIDA functions as a model with the ability to simulate networks and perform evolutionary optimization to train networks. Unlike other training methods which first fix a network's topology and search for the optimal parameters, evolutionary optimization allows the network's topology to evolve with it's parameters. In this way, evolutionary optimization can train a network to the optimal topology for the target application. While it is possible to generate any topology such as a convolutional network, the networks generated are typically recurrent neural networks [14] .
MEMRISTIVE MODEL
Because NIDA is a proven model and provides a suite of software tools, it was desirable to extend the model to a memristive implementation. In order to facilitate this implementation it was necessary to make modifications to the NIDA model. The first major modification is changing NIDA's asynchronous processing of events to a clocked synchronous implementation. While, as analog devices, memristors can function asynchronously it was determined that the implementation of the control logic for the circuit would be simplified with a digital approach. Thus, a mixed signal approach was chosen. With this synchronous approach, time had to be discretized. Thus, events in the memristive model are clock based.
Another major limitation of the memristive model is the limited granularity of the synaptic weights. In NIDA, synaptic weights exist as floating point values, and have a high granularity. For the memristive model, however, it was determined that programming precise synaptic weights on-chip would be difficult. Thus, the memrisitive model severely limits the granularity of programmable synaptic weights. In high level simulations this granularity has been explored up to 21 programmable integer weights from -10 to 10. For some devices, however, this granularity might be optimistic. Extreme granularities have been considered to understand how this limited programming resolution affects network performance at a higher level. Particularly, the case of three programmable weights of -1, 0, and 1 has been explored. These programmable weights correspond to only being able to program the memristors to their HRS and LRS.
Synapse
The bi-memristor synapse comprises of two memristors connected with opposite polarity as shown in Fig. 3 . This combination of memristors (which forms the synapse) is interposed between the pre-neuron and the post-neuron. While a pair of terminals of the memristors are driven seperately by the pre-neuron, the other pair that connects to the post-neuron is shorted to form the post-synaptic node. Upon the occurrence of a firing event in the pre-neuron, the current flowing into the post-neuron is proportional to the synaptic weight, which is in turn proportional to the effective conductance of the synapse. As shown in Fig. 3 , while the memristor M p drives positive current into the post-neuron, M n drives negative current. The effective current flowing out of the post-synaptic node is dependent on the relative values of these individual currents. Thus, the values of the memristances M p and M n determine the effective conductance as follows:
The effective conductance of the synapse is limited by HRS and LRS values of the memristors. Maximum conductance (G max ) is attained when M p reaches LRS and M n reaches HRS. The synapse operates in two phases, namely accumulation and learning. The control circuit in the neuron provides appropriate voltage levels to drive the memristors during each phase. This is the phase during which the pre-neuron fires. During the time period for which the pre-neuron fires, the control circuit drives opposite polarity voltages on the nodes V p and V n , while the post-synaptic node is held to a virtual ground by the post-neuron. By virtue of these voltages across the memristors, positive current flows through M p while negative current flows through M n . These currents are summed at the post-synaptic node and depending on the effective conductance given by equation (5), the current can be positive, negative or zero. It may be noted that positive current through the synapse leads to accumulation of charge in the postneuron while negative current causes the dissipation of charge from it. This is the phase during which the synapse adjusts its weight according to the STDP learning rule. During this phase, the control circuit in the pre-neuron drives the same voltage on both the presynaptic terminals i.e., V p = V n . The post-synaptic node is driven by the feedback from the post-neuron. The polarity of the voltage across the synapse is dependent on whether the synapse is being potentiated or depressed. During potentiation, the synaptic weight must increase. To achieve the increase, the control circuit provides a negative voltage to the pre-synaptic memristor terminals while the post-synaptic node is held at a positive voltage by a feedback from the post-neuron. This causes a voltage difference across the memristors, which surpasses their switching threshold. Since the memristors are connected with reversed polarity, memristance of one of the memristor (M p ) decreases, while that of the other (M n ) increases by the same amount (∆M). This results in an overall increase in effective conductance, which translates to an increase in synaptic weight. The new conductance is given by:
Depression implies a decrease in the synaptic weight. In this case, the control circuit provides a positive voltage while the postsynaptic node is held at a negative voltage. This leads to a voltage difference across the memristors exceeding their switching threshold, but with a polarity opposite to the potentiation case. Here, memristance of M p increases, while that of M n decreases. This results in an overall decrease in effective conductance, which translates to a decrease in synaptic weight. The new conductance is given by:
The amount of increase or decrease in the memristance (∆M) in both the cases depends upon the timing of the firing events of the pre and post-neurons; the further they are separated in time, the smaller the ∆M.
Evolutionary Optimization
Training networks in the software framework requires a model to implement three genetic operators: random, mutate, and crossover. These operators are then used by the evolutionary optimization piece of the software framework to train networks for a specified application. NIDA has implementations for these operators that serve as a basis for the implementation of the memristive model. These operators are facilitated by NIDA's 3-dimensional embedding, as they are used to train the temporal components of a network. For the memristive model, networks are embedded in a 2-dimensional space to facilitate mapping networks to circuits. Like NIDA, the temporal delays along synapses are optimized by the distance between the pre-neuron and post-neuron of a synapse. Because the synaptic delays are discretized in the memristive model, however, distances are rounded to the nearest integer to realize the cycle delay.
The random operation in the memristive model is similar to NIDA, as both models generate a sparse network. Mutations are similar in both models as well. Both models include mutations to add and delete neurons and synapses from a network, as well as change the parameters for these components. The memristive model includes an additional mutation that moves a neuron to a different place in the embedded space. This movement preserves the neuron's connections and their weight, but changes the delay along those connections. Finally, crossover is performed in both models by dividing the embedding space into two halves. In NIDA this division is the result of a random cutting plane, while the memristive model randomly selects a vertical or horizontal line to accomplish this. This division is performed on two networks, and each half of one network is combined with the complementary half from the other.
STDP IN MEMRISTIVE MODEL 4.1 Motivation
Many neuromorphic systems include STDP and other biologically inspired mechanisms to improve the performance of the system. STDP, for example, has been shown to be a suitable training mechanism for certain tasks [8] . Additionally, STDP has been shown to be effective in combating process variation for neuromorphic systems [4] . As memristors are particularly prone to process variation, this would be a desirable mechanism within our model [11] .
Another motivation for including STDP in the memristive model is to combat the limited programming resolution. The programming resolution specifies how many integer weights a synapse can be programmed to. A programming resolution of 21, for example, means a synapse can be programmed to integer weights between -10 and 10, while a granularity of 3 means a synapse can be programmed to weights -1, 0, or 1. In both cases note that the maximum and minimum weights correspond to the same effective conductance, but networks with a higher programming resolution will be able to achieve more combinations of weights thusly increasing the potential solution space. While it is possible to program memristors to arbitrary memristance states on-chip, the control logic to support this programming is not only complex, but adds a large area overhead.
STDP can act as a potential solution to this limited on chip programming resolution. While it is difficult to program to specific memristance states, it is easy to design an efficient circuit to adjust the memristance of the memristor on-chip towards either the HRS or LRS. These adjustments can be used to implement potentiation and depression of synapses. Additionally, these adjustments allow the memristors to take on memristance states that they cannot be programmed too, thereby allowing the synapse to achieve new synaptic weights. Thus, the inclusion of a potentiation and depression mechanism such as STDP can increase the online resolution of networks thereby increasing the potential solution space.
Implementation
In this mixed signal neuromorphic system, analog voltage accumulated in the neuron, upon reaching its threshold, is sampled to produce a digital rectangular spike. Depending on the relative timing of the firing events of pre and post-neuron, a synapse is either potentiated or depressed. In this implementation, instead of using analog tails, the digital rectangular pulses themselves are modulated in time. According to the timing of the fires from pre-neuron and post-neuron, a voltage greater than the switching threshold is applied across the memristor for different periods of time. If the pre-neuron fire is further apart in time from post-neuron fire, voltage across the memristor is applied for smaller period of time and causes weaker potentiation or depression. As the pre-neuron fire gets closer to the post-neuron fire, voltage across the memristor is applied for longer period of time, causing stronger potentiation or depression.
In Fig. 4 and 5, examples of potentiation and depression are given. In Fig. 4 , the condition for LTP is shown. The pre-neuron fire occurs before the post-neuron fire and hence contributes to this fire. In Fig. 4a , the pre-neuron fire occurs well before the postneuron fire, which means it only has a small contribution to it. This translates to weak LTP condition. Voltage across the memristor crosses threshold only for a small period of time and hence there will be only a small increase in conductance. As the pre-neuron fire moves closer to post-neuron fire (Fig. 4b) , voltage difference crosses threshold for longer period of time and hence there will be a larger change in conductance. In Fig. 4c , pre-neuron fire occurs just before the post-neuron fire and has direct contribution to it. This is the case for strongest LTP. Applied voltage across the memristor crosses threshold for even longer period of time and causes the largest conductance change.
In Fig. 5 , the condition for LTD is shown. Here, the pre-neuron fire occurs after post-neuron fire. Hence, it has no contribution to the fire and the synapse must be depressed. Depending on the time difference, the amount of depression is determined. In Fig.  5a , the pre-neuron fire occurs well after the post-neuron fire. This translates to a weak LTD condition. Voltage across the memristor crosses threshold only for a small period of time and hence there will only be a small decrease in conductance. If the pre-neuron fire is closer to post-neuron fire (Fig. 5b) , voltage difference crosses threshold for longer and there will be larger change in conductance.
In Fig. 5c , pre-neuron fire occurs just after the post-neuron fire. This is the strongest LTD case. Applied voltage across the memristor crosses threshold for even longer period of time and causes the largest conductance decrease. In this implementation, the firing events need to be tracked. Up to 5 firing events before and after the post-synaptic fire are tracked to provide necessary pulses to perform potentiation or depression. For simulations in Spectre, we have used the memristor model discussed in Section 2 for the synapse and the Integrate and Fire neuron.In addition to accumulation and firing, the neuron also has to feed its fire back to the pre-neuron and drive the post-synaptic node during the refractory period. During the refractory period, the neuron accumulation path is cut off and hence it does not accumulate charges. Instead, it provides a digital signal to the postsynaptic node that stays at positive rail for half the refractory period and goes to negative rail for the other half of the refractory period. First half of the refractory period is used to perform potentiation while the second half is used to perform depression.
VERIFICATION OF SIMULATORS
To facilitate the design of the memristive model, simulators are used at both the circuit level as well as the network level. The circuit level simulations are vital in building and testing hardware implementations of the model. These simulators, however, are too slow to evaluate networks at an application level which is required in training. Thus, a high level simulator was designed to accurately capture the behavior of networks at an application level. This high level simulator performs simulations faster than the circuit level simulator by sacrificing granularity for speed. One particular challenge was in capturing the behavior of STDP in the high level simulator. Originally, the high level simulator only included abstract synaptic weight values, and assumed a mapping existed from the abstract weight to the memristor states. However, due to the twin memristor synapse, it is impossible to directly model the change in synaptic weight accurately. This is because a synaptic weight can be achieved through multiple combinations of memristance states of the two synapses. Therefore, performing potentiation or depression directly on the synaptic weight may not be accurate to the hardware. To accurately capture this behavior it is necessary to include the memristor states at a high level and map them to a synaptic weight.
With the reduction in granularity, it was then necessary to ensure that the high level simulator would accurately model STDP and the other components at a network level. To measure this behavior, a simple network shown in Fig. 6 was implemented in both simulators, and tested over 100 cycles. Both input neurons were pulsed every 5 cycles. It was then determined whether the neurons in both simulators fired on the same cycles. By ensuring that neurons fired on the same cycle in both networks we could be confident the behavior of both simulators was the same. The output from both simulators is displayed in Fig. 7 . Due to the design of the network, potentiation and depression were forced to occur. It can be observed that in both simulators online learning adjusts the synaptic weights in a way such that only one pair of fires from the input neurons causes a fire on the output neurons, when two fires were needed previously. Thus, we can be confident that the high level simulator accurately captures the behavior of the circuit at a network level.
RESULTS

XOR
In order to understand the ability of STDP to optimize networks online, the simple XOR application was first considered. A network was trained with STDP enabled, and tested twice. In the first case STDP was disabled, and the synaptic weights stayed at their original programmed values. In the second case, STDP was enabled, and the synaptic weights were updated within a 3-cycle STDP window before and after neuron fires. The accuracy of the network over several runs in each test case is displayed in Fig. 8 .
From Fig. 8 it is clear that the inclusion of STDP in the network is vital to the network's success. Without STDP the network is hopelessly stuck at approximately 60% accuracy. However, the network with STDP, though it begins around the same accuracy, eventually converges to a network that perfectly does the XOR operation. Note that this is different from the improvement in performance from using evolutionary optimization to train. The improvement comes strictly from the unsupervised STDP.
These improvements lend credence to the idea that not only can STDP improve performance online, but also suggest that STDP might be able to be used as a mechanism to expand the solution space. Even though the network could only be programmed to a limited number of integer values, STDP enabled the network to achieve new weights it might not have been able to before. Thus, STDP can increase the granularity of synaptic weights in systems with limited resolution. 
Iris Classification
After confirming that STDP was able to take a suboptimal network to an optimal state for the XOR application, it was necessary to investigate the performance on other applications. While XOR can be a good application to gain insight on the viability of a design, it is still much too simple to quantify the benefits or detriments of that design. Next, the impact of STDP was measured on the iris classification application. This application involves classifying iris flowers into one of three species based on four measurements from the physical properties of the flowers. Iris classification allows us to more easily quantify the impact of STDP through the classification accuracy. For the XOR application, a network is considered a failure if it is unable to reliably perform the operation. The iris classification task also allows us to measure whether STDP provides any generalization, improved performance on data the network was not trained on.
In order to study the impact of STDP on the memristive model, two sets of networks were trained. The first set of networks was trained without STDP, and the second set was trained with STDP. These sets were further divided by the resolution for which the synaptic weights were allowed to be programmed to. Networks were trained with programming resolutions of 3, 7, and 21. The programming resolution of 3 is of particular interest because it corresponds to the case where a memristor can only be programmed to the HRS or LRS. Programming only to LRS or HRS drastically reduces the complexity of the control logic to support the programming.
After the networks were generated, their accuracy was evaluated on a testing set that is separate from the training set. Networks that were trained without STDP were tested without STDP as well. Networks that were trained with STDP, however, were tested both with STDP on and off. In doing so the effect of STDP on these specific networks can be observed. It is possible that evolutionary optimization was able to find networks in spite of STDP, and that STDP did nothing to improve the performance of these networks. Testing the network with STDP both on and off reveals whether or not STDP actually does anything within these networks.
The results from the iris classification are displayed in Fig. 9 and Fig. 10 . From Fig. 9 it is clear that STDP is important to the success of networks that are trained with it. When STDP is turned off during testing, the accuracy drops on average from 94% to 79%. Thus, like in the XOR task, STDP is able to take networks that are initially programmed to suboptimal values, and improve the network fitness by adjusting the synaptic weights online. Additionally, the inclusion of STDP in the model improves the performance on average. Fig.  10 shows that, while small, STDP improves the performance on the testing set for networks with 3, 7, and 21 programmable levels. Additionally, the improvement from the inclusion of STDP in the model is increased as the resolution decreases. This is likely a result of the ability of STDP to increase granularity more when the resolution is limited most.
CONCLUSIONS AND FUTURE WORK
In this work an implementation of STDP was presented for a memristive neuromorphic circuit. The STDP implementation provides a programmable cycle window before and after a neuron fires to track Figure 9 : Impact of STDP in networks for the iris classification task Figure 10 : Improvement in performance of networks trained with STDP over those without the synapses to be potentiated or depressed. This implementation was modeled in a high level simulator, and it was verified that the simulator accurately captured the behavior of the circuit.
From the results on the XOR and iris classification tasks it is clear that STDP can be used as a mechanism in the memristive model to improve the performance in networks online. Additionally, STDP can act as a way to increase the number of possible synaptic weights achievable by memristive networks, which can suffer from a limited resolution. Notably, the improvement from STDP is increased the most when the resolution is limited the most.
While STDP has shown promising results in this work, further work remains to determine whether it can improve performance in other models, as well as whether it improves performance under all circumstances in this model. These methods can be repeated to determine whether STDP improves the performance of networks for control applications. Additionally, the methods can be used for classifying more challenging datasets.
As described previously, STDP has also been shown to be a way to combat process variation. Further work can be done to determine whether process variation adversely affects networks generated for our memristive model, and whether STDP can effectively mitigate the effects of process variation.
For all of the results in this work, a window of three cycles before and after a neuron fires was used to track synapses to either potentiate or depress. This window, however, is programmable and can be implemented for one cycle or extended to the maximum number of feasible cycles that can be implemented on-chip. It remains to be determined what the impact of expanding or contracting this cycle window has on the performance of STDP.
