A bs t~ act-A space efficient fully parallel stochastic architecture is described in this paper. This stochastic architecture circumvents the main drawback of stochastic implementations of neural networks -the concurrent processing of a high number of weighed input signals, leading to a simple realization of stochastic summation. An unlimited number of stochastically coded pulse sequences can be added in parallel using only very simple and space efficient digital circuitry. Any neural network, either recurrent or feedforward, can be implemented using this scheme provided that neurons take discrete values. Design criteria are deduced from the mathematical analysys of the involved stochastic operations. Simulation results are also given.
I. INTRODUCTION
Electronic realization of neural networks can be faced in different ways. On one hand analog approaches are very simple in terms of circuitry and have fast convergence times, specially when they are compared with digital implementations, but on the other hand their programming flexibility is very low. Digital implementations perform high flexibility and easy interface with general purpose computers but their efficiency in terms of consumed silicon area is very low, as a floating point multiplier is needed in every neuron to calculate the presynaptic activity. One way to circumvent this problem is to employ stochastic logic [l] .
Stochastic logic systems realize pseudoanalog operations using stochastically coded pulse sequences. Multiplication of two stochastic pulse sequences should produce another stochastic stream of pulses whose firing probability is the product of the input firing probabilities. This can be achieved easily if the input sequences are stochastically independent. The circuit that implements this operation is a simple AND gate.
Stochastic summation is a much more difficult operation to perform, specially if the terms to be added are signed. Two types of circuits have been described in the bibliography. One is the OR gate [2] and the other is an up/down counter [3] . If two pulse sequences are feeded into an OR gate and the pulse sequences to be added do not overlap, the output firing probability is equal to the sum of both firing probabilities This OR-based add function is thus distorted by pulse overlap. In order to achieve a quasy linear behaviour pulse densities should stay very low, specially if many terms are to be added. This technique does not permit the integration of neurons with a very high number of synaptic connections as it would lead to extremely low maximum pulse density.
The up/down counters technique, although is widely used (41, has a very important drawback. The pulses coming from other neurons have to be multiplexed in time (i.e. serialized) leading to high computation limes if the network has many neurons and many synapsys per neuron.
We propose a fully parallel stochastic architecture for neural networks whose neuron activity values take discrete values (either -1,l or 0 , l ) . This architecture permits the integration of highly interconnected neural networks. It can be used either for recursive or feedforward nets and it is very efficient in terms of circuitry. This paper is organized as follows. In section 2 the accuracy of stochastic multiplication is analyzed. The obtained results justify the scheme proposed in section 3, where design criteria are given. Section 4 is devoted to some applications of this novel architecture. Finally in section 5 conclusions are drawn.
11. STOCHASTIC MULTIPLICATION. The smaller this fraction is, the more accurate This expression is a the aproximation will be.
STOCHASTIC ARCHITECTURE

A . Basic Concepts.
The transfer function of a discrete neuron requires the application of the sign operator to the summation of weighed input signals. Consider the following identity
-ti
The last term in ( 5 ) can be regarded as the comparison of two pulse streams generated by two stochastic multipliers, therefore no adder is needed.
If the neural network has been adimentionalized in such a way that all terms to be aggregated take values ranging from zero to a small number close enough to zero, e l l can be aproximated by The fully parallel stochastic architecture is shown in Fig.5 . In block M synapsys are compared with random numbers producing a set of uncorrelated streams of pulses whose densities are proportional to their values. The evaluation of ( 8 ) is performed by a simple NOT gate (block E).
Block S is a logic block, where pulses are separated in either "positive" if weight and neuron values were equaly signed or "negative" if they were not. "Positive" and "negative" streams are multiplied separately, leading to two diferent stochastic signals. Two different implementations are suggested for neurons with either {-1,1} or {0,1} saturation states, as shown in Fig.6 and 7 respectively, where "positive" pulse is at high level the counter is incremented by one and if the pulse is "negative" the counter is then decremented by one. Block L resets the sign bit if a zero crossing takes place. Positive and negative terms are transformed and then are multiplied separately as it is shown in Fig.5 . Due to the fact that these terms are close to unity many terms can be aggregated with a high degree of accuracy as it becomes clear from Fig.2 and Fig.3. C. Maximun Error Bounda y. 1 1 1 1 1 -1 -1 -1  -1 -1 -1 1 1 1 1 -1 -1 -1  -1 -1 -1 1 1 1 1 -1 -1 -1  -1 -1 -1 1 1 1 1 -1 Figure 9 : Patterns stored in a hopfield net. Fig.10 shows the evolution of the Hamming distance between the neural state vector and the prototype vector (Fig.9a) for different amas. They are also compared with the evolution of the architecture of Van den Bout [3] . The initial state was a corrupted vector of data (Fig.9d) resembling this stored pattern. For the suggested architecture, the number of clock cycles needed for full convergency is 25 when amas = 0.0625, and 74 when amas = 0.125. If a systolic array of parallel neuron processors [3] is used the evolution is slower, as showed in Fig.10 , needing 100 clock cycles. The number of connections summed up in each clock cycle is N in every neuron while only one is possible ih the systolic array. In order to test the behavior of this architecture in feedforward networks the two layer perceptron needed in [5] to carry out nondestructive evaluations has been implemented. The aim of this network is to classificate a set of input signals into four categories. Ten hidden units with eight input signals and two output units were configured using backpropagation [6]. Fig.11 represents the dynamic behaviour of its two output neurons when applying one of the previously learnt patterns. In the steady state, the output counters reach limit counts of -128 and 128, corresponding to output neurons states -1 and 1 respectively, which are the associated target output values of the applied pattern. The pseudorandom evolution of the values of the output neurons are defined by the evolution of the neurons in the hidden layer. All patterns were applied, leading to their corresponding output vector. 100 1 Figure 11 : Dynamic behaviour of a two-output one-hidden layer perceptron.
V. CONCLUSIONS
A new approach to summation of stochastic weighed signals has been presented. The number of concurrent input signals is no longer a limit.
The evaluation of these signals is carried out in every cycle, leading to a full parallel implementation. Limiting the range the input signals allows for a very simple implementation. Simulation results validates this model for recurrent and feedforward networks.
