Abstract-Axonal delays are used in neural computation to implement faithful models of biological neural systems, and in spiking neural networks models to solve computationally demanding tasks. While there is an increasing number of software simulations of spiking neural networks that make use of axonal delays, only a small fraction of currently existing hardware neuromorphic systems supports them. In this paper we demonstrate a strategy to implement temporal delays in hardware spiking neural networks distributed across multiple Very Large Scale Integration (VLSI) chips. This is achieved by exploiting the inherent device mismatch present in the analog circuits that implement silicon neurons and synapses inside the chips, and the digital communication infrastructure used to configure the network topology and transmit the spikes across chips. We present an example of a recurrent VLSI spiking neural network that employs axonal delays and demonstrate how the proposed strategy efficiently implements them in hardware.
I. INTRODUCTION
Spiking neural networks are typically characterized by their network topology (e.g. multi-layer, feed-forward, recurrent, etc.) and by their distributions of synaptic weights, while they seldom make use of temporal delays to carry out information processing tasks. However, temporal delays can provide an extra degree of complexity for solving computationally demanding problems, and can be used to implement faithful models of real neural networks, as they account for the spike propagation delays that takes place along the neuron's axon. Indeed, axonal delays are often modeled to describe the temporal dynamics of biologically realistic spiking neural networks [1] - [3] . For example, it has been shown that transmission/conductance delays help enhance neural synchrony [4] and that axonal delays provide the anatomical and physiological basis for a neuronal map of inter-aural time differences in the nucleus laminaris of barn owls [5] .
While there have been sporadic attempts at implementing axonal delays in hardware spiking neural networks [6] - [9] , most VLSI neuromorphic setups do not support them, either at the single VLSI device level, or in multi-chip setups. Recent developments in the construction of VLSI spiking neural networks focus increasingly more on distributed, multi-chip setups [10] - [12] . These setups typically consist of several multineuron chips comprising hybrid analog/digital neuromorphic circuits, interfaced among each other using asynchronous event based digital communication modules. A common communication protocol used in these setups is the Address Event Representation (AER) [13] , [14] . In this representation spikes (address events produced by neurons) are routed from one chip to the other using a specified addressing schemes via custom digital boards, typically comprising one or more Field Programmable Gate Arrays (FPGAs) [15] , [16] . In principle, one could therefore exploit the digital domain used in the event-based communication across chips to emulate axonal delays, but this is not an optimal solution, as it requires additional dedicated hardware overhead. For example, in [17] axonal delays are implemented by accumulating address events in pulse packets, time-stamping them, and transmitting them to a dedicated digital network chip. Here the events are held, sorted, and buffered until a target delay is reached (after which they are sent to their target destination). While this approach is flexible and accurate, it requires specialized hardware for the computationally intensive real-time event sorting, and looses the efficient representation of time in the AER, where events are transmitted as they happen, and time represents itself.
An alternative approach that does not require to explicitly time-stamp each event and that can reduce these overhead costs, is to exploit both the digital domain used for the inter-chip communication and the analog one used with the silicon neurons and synapses inside the chips [9] . In this paper we follow this approach by making use of inherent device mismatch present in the analog neuromorphic circuits to implement axonal delays, and exploit the AER communication digital infrastructure to (re)configure the placement of these delays in the neural network.
Device mismatch in neuromorphic multi-neuron chips produces inhomogeneities in the response of the synapses and neurons present in the chip. An example of this effect is evident in Fig. 1 , where we show a raster plot of spiking activity measured from a neuromorphic chip comprising 128 putatively identical silicon neurons [18] . In this example the neurons are stimulated with constant current injection, set by a common global bias. Ideally, all neurons should have the same firing rates, but given that the neuron circuits are analog and that the transistors operate in the weak-inversion regime [19] , their response properties vary substantially. Device mismatch effects in these chips also affect several other neural network properties, such as synaptic weights and time constants.
Device mismatch can be minimized using standard electrical engineering approaches and appropriate analog VLSI design techniques. But this leads to very large transistor sizes and large layout designs, which can significantly reduce the number of neurons and synapses that can be integrated onto a single chip. Rather than attempting to reduce mismatch using brute-force engineering approaches, neuromorphic approaches should try to exploit the adaptation mechanisms and learning strategies that they seek to model and implement in hardware. For example this has already been a successful strategy in neuromorphic vision sensors that employ adaptation (adaptive photoreceptor circuits), in each individual pixel rather than the single/global auto-gain mechanisms used in standard imagers [20] - [23] . Learning and plasticity are also very effective mechanisms for compensating the effects of device mismatch [24] , [25] , or homeostatic mechanisms [26] . Hardware neural networks can also employ population coding approaches and make use of redundancy, exploiting the large number of parallel elements present in these devices [27] . The use of these strategies would allow designers to implement large arrays of compact, redundant, possibly plastic synapses, that can carry out robust computation even if they are affected by mismatch. And mismatch can then be used as a feature, rather then being something to try to minimize.
In the next Section we show how we used mismatch to produce a range of variable response properties that can be exploited for efficiently modeling axonal delays.
II. MATERIALS AND METHODS
Our methodology can be used to realize arbitrary connectivity patterns with axonal delays. To demonstrate our approach we chose a recurrent network architecture of the type shown in Fig. 2 .
A. Network model
The network consists of a population A of 32 recurrently connected leaky Integrate and fire (I&F) neurons receiving spikes as input, and arranged as shown in Fig. 2 . Each neuron A recurrent neural network with axonal delays. Spiking input is sent to the population of neurons from the bottom. The neurons are recurrently connected with excitatory synapses. The synaptic projections have transmission delays that vary with distance between the source and destination. Projections from only one of the neurons in the population are shown here, with their corresponding transmission delays.
projects to its nearest neighbors with excitatory connections, and each projection incurs a propagation time delay ∆t i proportional to the connection distance i. Equation (1) describes this relationship:
where v represents the propagation velocity and ∆T is the minimum possible propagation delay.
B. Hardware setup
The hardware setup used to implement the network described in Fig. 2 is outlined in Fig. 3 . It consists of two multineuron chips connected in daisy-chain to an AER mapper [16] . A workstation is used to inject the input spikes in the network and log the network's output activity. The first multi-neuron chip (CHIP-1) houses 128 leaky I&F neurons, equipped with excitatory, inhibitory, and plastic synapses [18] . The second multi-neuron chip (CHIP-2) houses 2048 leaky I&F neurons, equipped with excitatory and inhibitory synapses. Both chips send and receive spikes using the AER. The chips were fabricated using a standard 0.35 µm CMOS technology and occupy an area of 10 mm 2 and 15 mm 2 respectively. The AER mapper is a custom digital FPGA board that can route spikes from source neuron to the destination synapse, with a latency of 0.8 µs and supports 66 M Hz peak event rates [16] .
While this setup can efficiently support the implementation of fairly large networks of hardware neurons, there is no explicit mechanism dedicated to the implementation of axonal delays. In order to implement the network or Fig. 2 with the appropriate delays we have to resort to a second a population of neurons, and use them as intermediate delay elements. We call these neurons delay neurons.
C. Delay neurons
A common way of implementing temporal delays in electrical engineering is by using low-pass filters. We follow the same approach and use the low-pass filtering properties of the synapse circuits [28] present on the multi-chip labeled CHIP-2. We configure the synapse parameters such that the integration of a single pulse (the input spike) produces an output spike, after a set delay ∆t. The neurons connected to these synapses that have this behavior are the delay neurons. Figure 4 shows the delays measured for all the neurons on CHIP-2. These delays depend on the synapse circuit time constant, on the synaptic weight, as well as the neuron's membrane time constant and firing threshold. These parameters shared among all delay neurons in the chip and ideally should produce a single common delay. As can be seen from the histogram, this is far from ideal: the neurons exhibit a broad range of delays, due to device mismatch. The distribution of delays can be modified by changing one or more of the four parameters mentioned above. For the set of biases used for this measurement, only a portion of the delay neurons produce usable delays. The rest of the neurons either have too strong or too weak synaptic efficacy leading to multiple or no spikes at their output.
D. Network Implementation
The population of neurons (population A in Fig. 2 ) was modeled using 32 silicon neurons of CHIP-1. The recurrent connectivity was implemented via the AER mapper and transmission delays were obtained by placing a delay neuron from CHIP-2 between each projection of the CHIP-1 I&F neurons. The routing of the address events is as follows: (A i → Delayneuron(∆t ij ) → A j ), thereby producing the desired transmission delays. The choice of the desired transmission delay is done by indexing the appropriate delay neuron, chosen from the histogram of Fig. 4 . Every projection therefore passes through a dedicated delay neuron.
III. RESULTS
We selected the appropriate set of delay neurons from CHIP-2 to implement the desired axonal delays in the network. We stimulated the neurons of the network with input spikes and measured the time they took to produce an output spike. The delays measured from the network are shown in Fig. 5 . The axonal delay increases linearly with increasing distance between the source and destination neurons, as expected.
The strategy of choosing the delay neurons by programming the AER mapper appropriately, in order to set the desired axonal delays in the network can be used to implement arbitrary delay profiles. As an example, Fig. 6 shows the measurements from a recurrent network analogous to that of Fig. 2 , but with random axonal delays on its recurrent connections.
In the networks described above we chose axonal delays comparable to the time constants of the delay neurons. Axonal delays longer than the typical time constant of the delay neurons can be achieved by stacking several delay neurons in a sequence. Specifically, by stacking N delay neurons together, it is possible to implement N different axonal delays with lower and upper bounds defined as:
where ∆t i is the axonal delay of the ith delay neuron. This strategy is used to implement time delays that range up to 125 ms (rather than the 8 ms of the previous example), in the network of Fig. 2 . Figure 6 shows the expected measured delays in this condition. The approach of stacking in sequence Fig. 6 . The figure shows the resultant transmission delays for a connectivity as implemented and measured on the hardware setup. The transmission delays between the neurons are randomly chosen from the available pool of delay neurons. Fig. 7 . Silicon neurons are stacked among each other to generate long delays (top): the first neuron A 1 projects to A 2 neuron with a delay of ∆t 1 via a delay neuron. This delay neuron also projects to A 3 with a delay ∆t 2 making the effective delay from A 1 to A 3 equal to ∆t 1 + ∆t 2 . This process is repeated for every projection to gain incremental delays. Measurements of transmission delays from a hardware recurrent network that uses such a connectivity is shown on the bottom. delay neurons would allows the implementation of arbitrary delays without necessarily requiring mismatch in the circuits.
By choosing the right set of delay neurons, one can implement a wide range of neural network architectures with arbitrary axonal delay profiles, provided the availability of a large enough pool of inhomogeneous silicon neurons in an AER VLSI setup.
A. Limitations
There is a limitation to the approach of using delay neurons for generating axonal delays: as this approach relies on the integration time of the delay neuron input synapse (used as a low-pass filter), there is an upper-bound on the maximum input firing rate. If a spikes arrives at the delay neuron's input synapse before the delay neuron finished processing the first spike (i.e. before the delay neuron produces an output spike), the effective transmission delay is disrupted. In Fig. 8 we plot the measured the dependence of effective delays on the input Inter-Spike Intervals (ISIs). The effective delay decreases with decreasing ISI (increasing firing rate). This happens because of accumulation of residual synaptic currents after the spike generation of a delay neuron. Coincidentally, a similar relationship between transmission delays and firing rates was observed in physiological recordings of voluntary discharge properties of extensor motor units in humans [29] . Fast spiking neurons were reported to exhibit smaller axonal delays and slower ones longer.
Therefore we argue that this technique of generating delays is appropriate in biologically plausible conditions. But, with the constraint that the pre-synaptic ISI has to be greater than the set delay. This limitation can be overcome by stacking multiple delay neurons, each having a delay shorter than the minimum input ISI, also allowing the generation of longer propagation delays.
Another limitation is the variability of the delays, due to the noise present in the CMOS circuits and in the AER infrastructure. Figure 9 shows the delays generated by neurons on CHIP-2, sorted by delay value. The error bars show the standard deviation of generated delays over 20 trial measurements and is approximately 10% on average. This variability is very hard to overcome but is compatible with variability observed in biological systems. 
IV. CONCLUSIONS
We implemented propagation delays using a population of silicon neurons whose time constants were comparable to the desired time delay. We were able to select delay neurons with different time constants by exploiting the mismatch effect in their analog circuit implementations.
Our methodology allows to implement arbitrary recurrently connected networks of I&F neurons with axonal delays. The results described in this paper show that this approach is suitable for implementing architectures used to demonstrate polychronization by Izhykevich [30] . This is a promising approach that can allow the construction of complex multi-chip neural processing systems, and is currently being used to implement a hardware model of an auditory processing system that can learn spectro-temporal correlations in its input stimuli [10] .
