Abstract. This paper presents a hardware implementation of a Time Multiplexing Architecture (TMA) that can interconnect arrays of neurons in an Artificial Neural Network (ANN) using a single metal wire. The approach exploits the relative slow operational speed of the biological system by using fast digital hardware to sequentially sample neurons in a layer and transmit the associated spikes to neurons in other layers. The motivation for this work is to develop minimal area inter-neuron communication hardware. An estimate of the density of on-chip neurons afforded by this approach is presented. The paper verifies the operation of the TMA and investigates pulse transmission errors as a function of the sampling rate. Simulations using the Xilinx System Generator (XSG) package demonstrate that the effect of these errors on the performance of an SNN, pre-trained to solve the XOR problem, is negligible if the sampling frequency is sufficiently high.
that existing interconnect technologies will even remotely approach this order of magnitude and thus new approaches need to be explored. This paper presents a novel Time Multiplexing Architecture (TMA) as a possible solution to the interconnect problem for Spiking Neural Networks (SNNs) in hardware. A single bus wire is used to transmit the signals between neuron layers, where timing is guaranteed by using a clocking system that is synchronized to a "global" clock. This implementation removes the requirement of dedicated metal lines for every synaptic pathway and therefore a significant saving in the silicon surface area is achieved. Section 2 of this paper discusses the TMA while section 3 highlights results that verify the approach. Errors in spike timing "across the TMA" due to the sampling frequency are investigated in section 4 where a simple SNN is initially trained, using a supervised approach, to solve the XOR problem. Using the Xilinx System Generator (XSG) package the output firing times that results from the TMA architecture are compared with those obtained when conventional metal interconnect is used, and from the subsequent analysis it is clear that the sampling frequency must be at least twice the minimal sampling frequency: note the minimal sampling frequency is set by the duration of the spike and the number of neurons in the sampled layer. Section 5 presents a quantitative analysis underpinning the scalability of the TMA and section 6 makes concluding remarks.
Time Multiplexing Architecture (TMA)
This section presents a novel inter-neuron communication architecture where biologically compatible neuron spikes are represented as digital pulses and connectivity between neuron layers is achieved using the TMA. Figure 1 shows a two layer neural network fragment containing two input neurons, I 0 and I 1 , and one output neuron, O 0 . The sampling circuit to the left of the bus wire contains two D-type latches in a daisy chain configuration where one of the latches is preset to logic 1, prior to the application of the clock, C K . Effectively the clock input, C K , rotates a logic 1 between the two latches, switching on transistors M1 and M2 sequentially: M1, M2, M3 and M4 are n-channel enhancement mode MOSFETs. This sampling to the left of the bus wire is repeated on the right of the bus wire. Consider the case where the input neuron, I 0 , fires a spike, {0, 1, 0}, of duration T P which forms the input to the drain, D, of M1. The gate terminal, G, of M1 is controlled by the Q output of a D-type latch and when Q is asserted, I 0 is sampled and a logic 1 is placed on the bus wire: note that the gate of M2 will be held at logic 0 while M1 is on (sampling). Because both sampling circuits are driven from the same clock input, C K , the bus line is now sampled by M3 ensuring that the pulse from I 0 is directed to the correct synapse, Synapse 1. I 1 will be sampled directly after I 0 whereby M2 and M4 will be turned on by the sampling circuits allowing the pulse from I 1 (if I 1 has fired) to reach Synapse 2. Clearly the sampling frequency is a function of the number of neurons in the sampled layer and the duration of the spike pulse. In a layer of n neurons which are sampled sequentially, it can be shown that the minimum sampling frequency F S (Hz) is given by,
The authors are aware that pulse transmission errors can exist between the time a neuron in one layer fires and the time required for this pulse to be observed at the synaptic inputs associated with the neurons in the subsequent layer. These are caused by the sampling circuitry operating in a synchronous mode while all the neurons that are sampled will fire in an asynchronous mode. Pulse transmission errors and their effect on a pre-trained SNN are investigated in section 5. 
Simulation Results
The proposed TMA was simulated using the Mentor Graphics mixed signal simulation package, System Vision Professional. Figure 2 represents the layout used in the simulation where the SNN has four input neurons, I 0 -I 3 , and two output neurons, O 0 , O 1 (note that this architecture is modified from that shown in figure 1 in that the MOSFET transistors at the input to each synapse are replaced by D-latches, D13-D20). It will be shown later that in order to reduce pulse transmission errors it is necessary to sample at a rate that is in excess of the minimum sampling frequency defined by equation (1). However, "gating" these high frequency pulses using MOSFETs causes glitches at the input to the synapses. This problem is avoided by the additional layer of D-latches, D13-D20. In the simulations, as shown in figures 2 and 3, the pulse length for all neurons, T P , was set to 1ms and since there are four input neurons, the sampling frequency was calculated from equation (1) to be 4KHz. Because M 1 -M 4 are not ideal the transitions from logic 1 to logic 0, and vice versa, are not instantaneous. Therefore, to avoid any overlap between the turn on transient of one transistor and the turn off of another a two phase clock system is used where one clock C K1 operates on the sampling circuit to the left of the bus wire and another clock C K2 operates on the sampling circuit to the right: note that C K1 and C K2 are in anti-phase but operate at the same frequency (8 KHz) , as shown in figure 3(a) . Figure 3 (b) shows random firing of neurons I 0 -I 3 , and their arrival times at the appropriate synapses. It can be seen that there exists a time error ∂t Io between I 0 firing and the arrival time of the pulse at the appropriate synapses. Note that from figure 3(b) similar errors exist for all pulses and therefore while TMA provides inter-neuron communication, transmission errors exist. The following section analyses these errors to determine their effect on the dynamics of a pre-trained SNN. 
Xilinx System Generator implementation
In order to investigate pulse transmission errors both conventional interconnect and the TMA were used to interconnect neuron layers in a SNN topology that has been pre trained to solve the benchmark XOR problem [10] . Both topologies were simulated using the XSG toolset from Xilinx [11], as illustrated in figure 4 . The SNN was trained off-line by an Evolutionary Strategy (ES) [10] . Table 1 shows three input neurons where neuron 1 is a biasing neuron [10] , which fires at 0ms, and neurons 2 and 3 provide the conventional 2 inputs for the XOR truth table. Note that column four is the post-trained firing times of the output neuron where the simulation used conventional metal interconnect. Columns 5 and 6 are the firing times of the same output neuron where the simulation used the TMA and clearly transmission errors are appreciable if the minimum sampling rate is used (column 4), this is the worst case firing times. However, if the sampling frequency is increased to 2*F S , then satisfactory agreement between column 6 and column 4 is obtained. Therefore, for effective pulse transmission without significant error the sampling frequency must be maintained such that 
TMA Scalability
To demonstrate the scalability of the TMA consider a network where we have n input neurons and m output neurons. It should be noted that the number of input neurons n, afforded by the TMA technique, is a function of the maximum possible operating frequency of the global clock while the theoretical limit on the scale of an n*m network is determined by the physical size of the sampling circuits. To estimate n, consider a 1ms spike and assume a realistic sampling frequency F S of 1GHz [12] . Equation (1) is then used to predict the number of neurons that can be accommodated on the input layer which equates to approximately one million. Even if we sample at 2*F S to minimise pulse transmission errors, then equation (2) predicts an upper limit for n of half a million. This is an improvement over what is currently achieveable [13] . However, it is clear that the scale of a SNN implemented using the proposed TMA is unlikely to be severely limited by the frequency of the global clock, rather scaleability will be limited by the real estate occupied by circuitry, and the following is an estimate of this limit.
Consider again the case where we have n input neurons and the number of output neurons, m, is allowed to increase. If we assume a fully connected feedforward network then the number of associated synapses increases according to the product nm. To calculate the limit on the network size, an estimate of the area consumed by the associated sampling circuits is required. Given that the sampling circuit is dominated by n D-type latches in the transmitting layer and 2*n*m D-type latches in the receiving layer, then we can write that the total area, A T , occupied by the sampling circuitry is given by
for large m where A D is the area of a D-type latch. It has been reported for a 0.18-µm process technology that a D-type latch can be designed to occupy a silicon area of approximately 4µm 2 [14] and if the area occupied by sampling circuitry is restricted to 10% of the total chip area (assumed to be 1cm
2 ), then a simple calculation (taking n = m) predicts that the TMA approach permits over three thousand neurons to be fabricated in each layer using a planar submicron process. For a fully connected feedforward NN this equates to 9 million synapses. While this is a significant improvement from what is reported elsewhere [13] , it will be further enhanced as technology improvements continue [15] . Furthermore, given that the interconnect density will be substantially reduced by the proposed TMA then the real estate given over to the sampling circuitry is expected to be in excess of the 10% estimate. Hence, the above estimate is viewed as conservative and it is expected that the proposed TMA approach will advance the synaptic density even further.
Conclusion
This paper has proposed a novel time sampling architecture for the hardware implementations of SNNs. This work has shown that the optimal sampling frequency depends on the number of neurons in the sampled layer and the duration of the "digital spikes" they emit. However, with on-chip clock frequencies typically in the GHz range, the limitations placed on this approach by the sampling frequency are negligible. The TMA has been verified using the Mentor System Vision Pro software package and issues such as pulse transmission errors have been investigated using the XSG platform. It has been shown that these errors can be minimized by ensuring that the sampling frequency is maintain to at least twice the minimum sampling frequency (2*F S ). The authors wish to note that this paper has demonstrated the potential of the TMA for inter-neuron communication where the target implemented for this approach is a mixed signal Application Specific Integrated Circuit (ASIC) layout, given the asynchronous firing nature of neurons. Moreover, the authors are confident that if this approach is optimized in terms of minimal area circuitry and timing issues are addressed for large implementations, then this approach has the potential to implement well over a million inter-neuron pathways using a very simple and compact sampling architecture. Future work shall involve a comparative analysis with alternative interconnect strategies such as Address Event Decoding (AED).
