In this paper we present the hardware design of an extremely compact and novel digital stochastic neuron, that has the ability to generate the derivative of its output with respect to an arbitrary input. These derivatives may be used to form the basis of an on chip gradient descent learning algorithm.
Introduction
An arti cial neuron is required to calculate a single output value by applying an`activation function' to the weighted sum of its inputs. Such neurons are intended to operate in massively parallel networks, often processing real time data. Conventionally, feedforward networks containing neurons of this type are trained o line using learning algorithms such as back propagation, but recently some research has focused on building the learning algorithms directly into the neural hardware 1, 2] .
In this paper we present the design of an enhanced stochastic bit stream neuron that contains additional circuitry that allows the real time calculation of the neuron's output derivative with respect to an arbitary input. This derivative may then be used as the basis for an`on chip' gradient descent learning algorithm. The detailed hardware design and operating principles of standard stochastic bit stream neurons and their networks is given in 4, 5, 3, 7] .
The stochastic bit stream neuron
To describe the process required to calculate the output derivative of a stochastic bit stream neuron, we will begin by giving a brief description of the basic operation of such a neuron.
All signals processed by these neurons are real values represented by stochastic bit streams in the interval 0,1] for unsigned values, and -1,1] for signed values. A neuron has only one physical input and weight connection, but by the use of time division multiplexing, may have many logical connections. The core of the neuron is a simple counter, which may be preloaded with a threshold value. Each input bit is weighted by either ANDing, when operating on unsigned values, or XORing, when operating on signed values, with a corresponding weight bit. Thus this weighted input contributes 0 or 1 to the counter on each operational cycle. Details of signed and unsigned stochastic bit stream neurons may be found in 4, 8] . The unique threshold values supplied to the counter are chosen such that they will cause an over ow into the top most bit, when a given input count is achieved or exceeded. Thus the top bit of the counter provides the output of the neuron.
The activation function applied by the neuron, which requires no additional circuitry, is formed by the interaction of the probability distribution of the weighted input values, and the probability distribution of the chosen threshold values. A sigmoid like activation function is achieved by using a xed threshold value, and a linear activation function is obtained when using a uniformly distributed threshold value, see 6] for the precise mathematical de nitions.
Calculating the derivative
The probability of generating a`1' as the output bit on a given operational cycle of a bit stream neuron, with weighted inputs i 1 to i m , and preloaded threshold value t n , may be written as shown in equation (1 
This function may be rewritten as equation (2), which is now easily di erentiated with respect to the arbitary input i k , giving equation (3), @O n @i k = Pr (i 1 + i 2 + i 3 + + i k?1 + i k+1 + + i m = t n )
To implement this result for a given neuron, additional hardware will be required. This circuitry must prevent the input i k from contributing to the internal counter, and also must detect the condition that the counter exactly matches the preloaded threshold value. This is easily arranged, as the preloaded threshold value is chosen such that it sets the most signi cant bit of the counter when achieved. So if i k is`0' then the circuitry must detect the counter valuè 1000 00', or if i k is`1', it must detect the value`1000 01'. This functionality can be achieved with a simple combinatorial circuit. The number of counter bits required by a typical bit stream neuron with m inputs, which must now be checked, is given by 2 log 2 m, which for most applications will be small, approximately 8 bits. Actual circuit details are not given here, as these will depend largely on the nal hardware implementation platform, the most e cient being full custom VLSI. Each function is displayed twice. In the rst instance all of the inputs to each neuron are distinct 1M bit streams of the same value following the linear ramp function y = x. The second set of two graphs shows neurons with the same activation functions, but with input 1 set to sin 2x, and the remaining inputs 2 to 15 set to y = x as before. For a neuron with a linear activation function the derivative of the output with respect to a given input will always be constant, irrespective of the inputs. This is illustrated by the two graphs showing outputs and derivatives of a linear neuron.
Results
In the case of a bit stream sigmoid neuron, the actual activation is a complex function of the inputs 6]. A fully symmetric sigmoidal activation function is only achieved when all the inputs are the same value, (each one must be represented as a distinct bit stream) and the threshold value is chosen as the mid point of the input range on a given operational cycle, ie. 7 for a 15 input device. A direct consequence of this is that the derivative of the sigmoidal activation is also a function of the inputs. Two examples of this behaviour can be seen in the appropriate graphs shown in gure 1.
The next two graphs, shown below in gure 2, show the same results as the rst two of the last set, but here the lengths of the bit streams presented to the inputs of the neuron have been reduced to 10K and 1K bits. The resulting increase in noise caused by the random variance errors inherent in stochastic bit streams is easily apparent from the graphs. 
Conclusion
In this paper we have demonstrated that it is possible to construct linear or sigmoidal stochastic bit stream neurons with the ability to generate real time output derivatives using only simple digital circuitry. We are currently evaluating an`on chip' learning scheme designed for feedforward networks that is closely modeled on back propagation 9]. In this scheme each bit stream neuron contains additional circuitry to produce the appropriate values. These are then aggregated and passed back through the network to their respective weights by additional layers of simpli ed linear neurons.
