Abstract-We use a large-scale analog neuromorphic system to encode the hidden-layer activations of a single-layer feed forward network with random weights. The random activations of the network are implemented using the device mismatch inherent to analog circuits. We show that these activations produced by analog VLSI implementations of integrate and fire neurons are suited to solve multi dimensional, non linear regression tasks. Exploitation of the device mismatch eliminates the storage requirements for the random network weights.
I. INTRODUCTION
The use of randomness in neural networks has been the subject of considerable work for many decades. The inventors of radial basis function networks (RBFs) [1] were among the first to propose random selection of network parameters, which in their case was the set of RBF centers. This idea was later applied to multilayer perceptrons (MLPs) [2] , [3] . These networks contain a single hidden layer of neurons whose weights have been randomly initialized, and the output weights can be computed by solving a set of linear equations. While random hidden layer (RHL) networks later gained popularity under the moniker Extreme Learning Machine (ELM) [4] , we will refer to them as RHL-MLPs for clarity. In contrast to traditional neural networks, which are trained by gradient descent and require backpropagation of errors, learning in RHL-MLPs is restricted to a linear read-out layer and can be accomplished by fast and efficient linear regression. There are many benefits of using RHL-MLPs over traditional neural networks [5] : RHL-MLPs learn very quickly, tend to avoid problems of gradient descent such as local minima, and can be used to train networks with non differentiable activation functions. RHL-MLPs have been successfully used for non linear regression and classification tasks. This paper presents results from multidimensional function approximation when the hidden layer activations of an RHL-MLP are implemented with neuromorphic hardware. Neuromorphic hardware can be broadly defined as any circuit or system optimized for the simulation or emulation of neurobiology. A subset of neuromorphic hardware focuses on creating analog circuits which implement spiking neural networks [6] , [7] . Such systems are beneficial because of their low power consumption and utilization of spikes for computation, which results in lower-latency processing.
We create the random input weights and resulting hidden layer activity in one such system using the device mismatch inherent to all analog hardware. Rate-coded input spike trains are applied to differential pair integrator (DPI) synapses [8] , which transmit excitatory postsynaptic currents to leaky integrateand-fire neurons [9] . All of the circuits are identically biased, but since they are different physical devices, their responses differ due to device-to-device mismatch [10] . This mismatch causes different responses in the hidden-layer neurons to the same input spike train. We exploit the device mismatch to efficiently implement the hidden layer of an RHL-MLP on neuromorphic hardware. In fact, there is no additional storage required to represent the random network parameters on chip. The alternative approach would be to build a dedicated memory on chip large enough for storing all these random weights. We record the hidden-layer responses through the Address-Event Representation (AER) protocol [11] , and then train the weights between the hidden layers and output neuron on a PC.
Previous work has shown that this approach yields promising results. The idea to use silicon spiking neurons for RHLMLPs was originally proposed in [12] , and architectural details and simulations were provided in [13] . The architecture of [13] consisted of current mirrors which feed the RHL-MLP inputs into spiking neurons. Mismatch of the current mirrors provided the random weights. Simulations of the architecture were completed using a single neuron on a Field Programmable Analog Array, and the randomness was simulated in Matlab. In contrast, our system realizes the entire hidden layer on chip, and we exploit the mismatch present in the neuron and synapse circuit without using additional current-mirrors. As a follow up to [13] , [14] recently presented a system consisting of a dedicated chip with a complete hidden layer interfaced with a microcontroller which implements the output weights.
Another recent result [15] demonstrated that a combination of systematic and random offset in an analog neuron's transfer function allow for function approximation. As above, the function approximation was based on readings from a single test circuit, rather than an array of devices on a fullyrealized platform.
Function approximation has previously been demonstrated on an analog neuromorphic platform [16] . In that work, different activation functions were induced by explicitly adding different biases to the neurons. These biases were Gaussian spike trains with randomly chosen mean firing rates supplied as synaptic current. In contrast we demonstrate that sufficiently mismatched hardware generates activation functions which are varied enough to support function approximation without the need for another source of randomness. We also note that [16] implemented weighted connections from the hidden layer to the output neurons on-chip, while we do so on a standard desktop computer.
The paper is structured as follows. Section II contains a brief overview of the architecture of the RHL-MLP on chip. Section III discusses the device mismatch which enables the randomization of input weights. Section IV presents measured activation functions from the network, as well as results from offline learning tests for 1-and 2-dimensional functions. Section V discusses future work which will extend the capabilities of such hardware.
II. NETWORK ARCHITECTURE
The proposed system implements the following portion of an RHL-MLP on chip:
where h i is the activity of the ith hidden-layer neuron, g(a) is the neuron's activation function, w i and b i are the randomized weights and bias from the input neurons to the ith hidden-layer neuron, x is the activity of the input neurons, and b is the bias activity. In traditional RHL-MLPs, g(a) = 1/(1 + exp(−a)), but we replace this function with the measured activation functions of leaky integrate-and-fire neurons. The basic architecture of this implementation is shown in Fig. 1 . All of these terms are implemented on an aVLSI platform (IFSLWTA) with 128 leaky integrate-and-fire neurons [17] . Each neuron is connected to 2 excitatory and 2 inhibitory DPI synapses, as well as 28 excitatory plastic synapses, which are not used in this work. For circuit schematics see [17] . Excitatory synapses source current onto the neuron's membrane capacitance, and inhibitory synapses sink current from this capacitance. The number of inhibitory synapses per neuron sets the limitation on the dimensionality of the input for the IFSLWTA chip.
In order to attain the widest possible range of w i values, half of the synapses are inhibitory, and half are excitatory, to obtain a symmetrical distribution of positive and negative weights. For a two-dimensional input, this yields four possible combinations of synapse types and synapse sources. We have therefore divided our hidden neurons into four groups, each group having a different combination of inhibitory/excitatory synapses from input x 1 and x 2 .
The biases for each group are set by a combination of two neurons firing at a constant rate. Neuron b 1 is connected to the hidden neurons via an inhibitory synapse. Neuron b 2 is connected to the hidden neurons via an excitatory synapse. The input x is broadcast to H = 100 neurons. All hidden layer neurons are located on the IFSLWTA chip, for circuit schematics see [17] . Each neuron receives an input from x 1 and x 2 through either an inhibitory or excitatory synapse. Each neuron is also connected to a bias that is a combination of a constant inhibitory input b 1 and a constant excitatory input b 2 . To ensure an equal effect of excitatory and inhibitory connections for each dimension, the hidden layer neurons are split into 2 dim(x) groups. Each group receives a different combination of inhibitory and excitatory inputs such that all possible combinations exist. The 1-dimensional setup is similar to the 2-dimensional but it consists only out of 2 instead of 4 groups. The neuron activations are recorded through an Address Event Representation (AER) Interface and are transmitted to a computer, where offline processing is performed.
These two biases enable the network to reach an optimal working point, e.g. b 2 excites one neuron group to a given activity which can be suppressed by x to get a response that decreases as x increases.
III. DEVICE MISMATCH ENABLES RANDOM HIDDEN-LAYER WEIGHTS
Variation in the fabrication of CMOS devices yields inevitable mismatch between the characteristics of identicallydrawn transistors. In above-threshold operation, this mismatch is typically attributed to variations in the threshold voltage V T and the current scaling factor β, which are usually modelled as varying independently [18] . The ability to transfer V T and β mismatch, measured in above threshold, to sub-threshold current mismatch has been debated within the literature, but they are generally considered to be affected similarly [18] .
The fixed-weight synapses on this chip which implement the weight and bias terms are restricted such that all synaptic weights of a given type (excitatory or inhibitory) have the same nominal value. However, each synapse has some random offset i which is a function of the mismatch. Thus, each w i and b i in Eq. 1 can be written as
where w exc and w inh are constants across the chip and i is unique to each synapse. The exact distribution of i is unknown.
Additionally, the leaky integrate-and-fire neurons are subject to mismatch. We can model them very simply as piece-wise linear activation functions over a certain frequency range. If we do so, both the slope and frequency range are subject to mismatch, as shown below:
Our system exploits this mismatch to automatically generate the randomized activations required for RHL-MLPs. The mismatch occurs at both the neuron and synapse level. To quantify the relative contributions of each circuit to the overall mismatch, we ran a series of experiments in which we measured neural activity with different circuits connected to the neuron. In each experiment, we computed the mean firing rate of each of the 100 neurons over one second of stimulation, normalized the results, and then computed the standard deviation of the resulting data distribution.
When the only source of input to the neuron was a DC current (I in ) [17] provided by a pFET biased with a gate voltage of 2.78V, the standard deviation of the distribution of normalized rates was 0.0716. When a regular spike train of 200 Hz was applied to excitatory synapses onto the neurons, the standard deviation was 0.1289. When a constant current (I in ) [17] was again injected by a pFET with a gate voltage of 2.78V and a regular inhibitory spike train was applied at 700Hz, the standard deviation was 0.1667. Finally, when the inhibitory (700Hz) and excitatory (260Hz) synapses were both active, the standard deviation is 0.2505. All data is normalised by the mean firing rate of the population for each measurement (which was approximately 340Hz in all cases). Thus all three circuits have a significant impact on the total mismatch.
For every excitatory input connection we provided a constant inhibitory bias b 1 (1D: 100Hz; 2D: 250Hz) and for every inhibitory input synapse an excitatory bias b 2 (1D: 700Hz; 2D: 600Hz) on a second physical synapse, as illustrated in Fig. 1 . The bias frequencies were selected in order to obtain nonlinear responses and good distribution over the input space. These two biases are constant regular spike trains and equal for all neurons. These biases are then mismatched by the synapses and behave like additional random offsets to the neuron. Unlike [16] , we completely depend on the internal variability of the physical system, adding neither temporal nor quantile variability in the input or in the biases.
The mismatch in our system can most easily be seen by plotting the firing rate of our hidden layer neurons as a function of the input firing rates. Fig. 2a shows the variability in the 1-dimensional activities, and Fig. 2c shows the variability in the 2-dimensional activities.
The input to the network is a regular spike train for each input dimension in addition to the two constant bias frequencies. In the 1D case the input is mapped to a frequency between 200Hz and 1400Hz. In the 2D case it is mapped to an input frequency between 0Hz and 1000Hz. We stimulate the network for 5s for each input but only use 3s of data for our calculations starting from t = 1s. The time can be increased to gain robustness against environmental noise. For the 1D case we took 50 measurements in linear steps from 0Hz to 1400Hz, and for the 2D case 25 linear steps from 0Hz to 1000Hz for each dimension, resulting in 625 measurements.
IV. OFFLINE LEARNING FOR NONLINEAR REGRESSION
Once the activations of H hidden layer neurons have been recorded for a set of N example inputs, we can derive the output weights required to approximate the function y = f (x). This is accomplished with a regularized least-squares solution:
where H is the N ×H matrix of hidden-layer activations, λ is a regularization parameter, and Y is the N × 1 vector of desired outputs. When the W have been calculated, we test the network's capability by applying new inputs to the system. The results of learning the 1-dimensional function y = sin(4πx)+ n are shown in Fig. 3a . n is a Gaussian distributed noise with standard deviation 0.1. The noise combined with the sparse sampling of the function makes the regression task more difficult. The learning was evaluated over 10 trials, in which the data was randomly split into 38 training samples and 12 test samples. The average mean-squared training error over the 10 trials was 0.0617, while the average mean-squared test error was 0.0682. The results of learning the 2-dimensional function y = sin(2πx 1 ) · cos(2πx 2 ) + n are shown in Fig. 3b and 3c . The data was split randomly into 469 training samples and 156 test samples. The average mean-squared error for training was 0.0606, and the average error for testing was 0.075.
Both results show that a) the device mismatch provides sufficient variability of the neuron activations for function approximation and that b) good generalization of the learned functions to new inputs can be achieved with the RHL neuron activations from the neuromorphic chip.
V. CONCLUSION
We have presented the implementation of the random weights and hidden layer neurons of random hidden layer networks (RHL) using a neuromorphic platform. By just using device mismatch we get rid of additional storage circuits and requirements for a RHL. The chip design [17] allows to compute functions with up to 2-dimensional inputs. This limitation only exists because the chip was not designed for this application. For further inputs, either a multi-chip setup or a new chip design can be devised. Implementing the weights between the hidden layer and the output layer is the logical next step of this work. Given that the synapses are binary, this would require using multiple synapses to represent a single weighted connection, with each synapse encoding an incremental weight change. On-chip learning is another goal. 
