Abstruct -A digital architecture which uses stochastic logic for simulating the behavior of Hopfield neural networks is described. This stochastic architecture provides mussiw paruf/e/ism (since stochastic logic is very space efficient), reprogrammability (since synaptic weights are stored in digital shift registers), large dynumic runge (by using either fixed or floating-point weights), unneuling (by coupling variable neuron gains with noise from stochastic arithmetic), high execution speecLF ( = N.108 connections per second), expMdability (by cascading of multiple chips to host large networks), and practiculity (by building with very conservative MOS device technologies). Results of simulations are given which show the stochastic architecture gives results similar to those found using standard analog neural networks or simulated annealing.
I. INTRODUCTION RTIFICIAL neural networks are a family of mas-
A sively parallel architectures that solve difficult problems via the cooperation of highly interconnected but simple computing elements. The speed and solution quality obtained when using neural networks for solving specific problems in visual perception [2] and dynamic control [ 51 make specialized neural network implementations attractive. For instance, the Hopfield network of Fig. 1 can be used as an associative memory [ll] or for solving various combinatorial problems [9] by the programming of synaptic weights stored as a conductance matrix.
Analog implementations of the Hopfield network containing up to 512 neurons have been built with matrices of fixed resistors and nonlinear amplifiers fabricated on a single chip [8] . Variable resistors are needed in order to change the problem constraints, but the increased complexity of such interconnections reduces by an order of magnitude the number of neurons that can be built on a c h p [14] , [18] , [16] . Real-world applications will require many more neurons than this, so finding a method of interconnecting these chips to form larger networks will be a primary concern. This task is made difficult by the large number of analog signals which must pass between chips and by the external parasitic capacitances which will distort the charging characteristics of the network and possibly cause erroneous results.
The limitations of analog computing have led researchers of neural networks to rely upon digital simulation. The Hopfield network minimizes objective functions of the form 1 N -1 N -1
and will converge to a stable solution if dE/dt < 0. This can be rewritten as
which can be guaranteed if dvJ/duJ > 0 and aE/auJ = -du,/dt. The first condition is satisfied by using any strictly increasing transfer function relating uJ to U,. The second condition is satisfied by making du,
which resembles the Hopfield network charging equation minus the capacitive decay current. The current source term, I,, can also be absorbed into the summation by adding a constant bias neuron (i.e., uB =1) that charges each neuron j through a conductance of G, = ZJ. Singlechip digital signal processors (DSP's) excel at inner product computations such as those in (1) and have been used to build neural network archtectures [17] . Since most DSP's contain only one hardware multiplier, a large and expensive system employing hundreds of DSP chips would be needed in order to utilize all the potential parallelism available in a neural network. Stochastic systems [6] use binary signals whch randomly assume either the value 0 or 1. The average of a stochastic signal can be viewed as an analog value in the range [0, 1] function (PDF) of the signal. Neural networks built from stochastic logic elements [7, 15] avoid the problems of the preceding implementations and have the following advanManuscript received June 20, 1988; revised October 12, 1988 . This and can be changed by altering the probability distribution paper was recommended by Guest Editors R. W. Newcomb and N. El-Leithy. During time slice n (0 G n < CO), the output of amplifier i = n mod N is used to send charge to all the capacitors. The charging equation for U, in this altered network is U, ( n 6t + s t ) = U,( n s t ) + G,,u, ( n s t ) at,
This expression behaves similarly to (1) as long as the time slices are much smaller than the main integration period (i.e., 6t < < A t ) so that the capacitor voltages do not change too much during a time slice. No deleterious effects caused by this time-multiplexing have been noted in any of our experiments. Fig. 3 shows the translation of the above idea into a small, all-digital, stochastic neural architecture containing four neurons ( N = 4). On the right hand side of the figure are four counters which are connected as a circular shift register. The counters contain the neural state vector arranged such that the j t h counter initially contains the j t h component of the state vector, U,. The 16 synaptic conductance coefficients are stored in four circular shift regsters such that the ith cell of the j t h circular shift register initially contains G,,,, +J)mod , , , , .
Once initialized, the contents of counters CO, C,, . . . , C,-, are continuously rotated such that counter CJ contains u ( , +~) ,~, , , , at clock cycle n . Simultaneously, the synaptic shft registers are also continuously rotated such that the output of the j t h shift register is G,,modN,(n+J)modN. Thus the neural firing signal derived from the output of CO (which contains u n m o d N )
arrives at the same time as the weights through which it influences the other neurons (Fig. 4) .
Stochastic arithmetic is now used to compute the component of charge to be added to each simulated neural capacitor. During each clock cycle, the output of counter CO, U,, is compared with a random number, R,, uniformly distributed over [R,,, R m J . The output of the comparator will pulse if R, < U,, thus creating a stochastic neural firing signal U , whose mean is proportional to U , provided U , E [R,,, Rma]. If U , is outside this range, the mean will saturate at either 0 or 1, providing the needed nonlinearity in the network amplifiers. Similarly, the outputs of the N synaptic shft registers are compared to another uniformly distributed random number, R , , to provide N stochastic signals with means proportional to the corresponding synaptic weights. ANDing the neural firing signal with each synaptic weight signal creates a third set of stochastic signals that pulse with probabilities of lG,,u,l. A pulse from an AND gate will cause the connected counter to either increment or decrement U, depending upon the sign of GI,. So the charging equation for this network is U, ( n + 1) = U,( n) + G,,u, ( n ) with i = n mod N which is the same as (2). A fully parallel neural network of N neurons can process all N 2 connections in a single cycle, while the pipeline-ring arclutecture described above requires N clock cycles due to its time-multiplexed nature. The advantage, however, is that each neuron is connected to the rest of the system by only the following signals: two buses for inputing and outputing counter values, a bus for receiving random number R,, and a single wire which carries the neural firing signal. If the synaptic weight shift registers are built to hold kN, coefficients, where N, is the number of artificial neurons per chip and k is an integer, then it becomes possible to cascade k chips and construct much larger networks without an increase in the number of 1/0 pins (Fig. 5) .
The use of random numbers also confers significant advantages. For example, the neural firing comparator pulses only when R, < U and thus has a mean output voltage of
which is the definition of the cumulative distribution function (CDF) of R,. Therefore, the neural transfer function can be altered by adjusting the PDF of R, (Fig. 6) . Wlule a uniform PDF gives a linear transfer function with hard limits, a PDF of VAN Also, tightening the interval over which R, ranges effectively increases the gain of the amplifier. If the capacitive decay current is also simulated, this permits the stochastic architecture to do annealing since increasing the gain acts in a manner analogous to lowering the temperature in the simulated annealing process [9] . The PDF for R , can also be altered to more efficiently encode a set of synaptic weights. Assuming the stochastic signal generated by the largest synaptic weight, g M , should pulse with a probability of 1, then a weight gk should generate pulses with a probability of gk/gM. The following procedure constructs a discrete PDF for R , based on the differences between these probabilities: 1) Arrange the absolute values of all the synaptic weights plus a weight of zero into ascending order and eliminate all duplicate entries to create a list G* = (0, g,,. * * , gM}. 2) Encode each of the synaptic weights by its position within the sorted list plus an additional sign bit. 3) Create a discrete PDF for R , where,
otherwise.
The use of t h s algorithm to encode floating-point synaptic weights with 2-bit mantissas and exponents is illustrated in Fig. 7 . Note that the interpretation of the encoding is a function of the PDF of R , alone-the circuitry need not be changed in order to support floating-point weights! Fig. 8 compares the evolution of the neural state vector in the stochastic network to that of an analog Hopfield network when both networks are programmed to act as analog-to-digital converters [lo] Table  I . The stochastic architecture offers a good compromise between the high speed of dedicated analog VLSI networks and the flexibility of a general-purpose computer.
SIMULATION RESULTS
As a second test, the stochastic architecture was used to divide a graph containing N nodes into two subgraphs containing = N / 2 nodes while cutting as few edges as possible. Solving such a bipartitioning problem [12] involves minimizing the objective function
where the variables have been defined so that ui = 0 if the node i is in the first subgraph and ui =1 otherwise. Individual nodes i and j are attracted into the same subgraph by edges of strength Eij while clustering is discouraged by the amorphous repulsive force, r. One hundred bipartitionings of an arbitrary graph containing 84 nodes and 115 unit-weight edges were done using both the stochastic architecture and simulated annealing [3] . Solutions found using simulated annealing had an average of 3.07 cut edges while 5.85 cut edges existed in solutions generated by the stochastic architecture (a randomly generated solution would contain 57.5 cut edges, on average). However, simulated annealing required an average of 329 seconds to converge to a solution versus just 4 . 2 X 1 0 -3 s for the stochastic architecture. Obviously, a dedicated analog VLSI network would be even faster (just as it was in the previous example), but no hard data is available. A comparison of the stochastic architecture to other neural network implementations [13] is given in Fig. 9 in terms of the storage capacity and processing speed (in connections/second). In general, a stochastic architecture of N neurons will have storage for N 2 synaptic weights and will process connections at a rate of N X f,, where f, is the system clock speed. Even at a modest 10 MHz, the stochastic architecture outperforms the other implementations.
IV. IMPLEMENTATION ISSUES
The stochastic system shown in Figs. 3 and 5 is plagued by signal propagation delays (when transmitting U and R , to each artificial neuron) and computational delays (caused by the cascaded comparison, logical AND, and increment/ 
STORAGE (synaptic weights)
A comparison of various neural network implementations. Fig. 9. decrement operations). The R , propagation delay can be eliminated once it is realized that each neuron does not have to receive the same random number, only one which has the same characteristics, i.e., the same PDF. Thus multiple random number generators with the same PDF could be placed on the circuit board to minimize the wiring length to each chip. Internal to each chip, interspersed pipeline registers along the R , bus ( R , , , R 2 1 , R,,, and R,, in Fig. 10 ) increase the system speed by reducing the wire length which must be driven during a clock cycle whle still allowing each neuron to receive random numbers with the same probability distributions. Pipeline delays can also be introduced on the neural firing signal wire, but the counting and shifting operations of the neural state registers must be separated in order to maintain correct operation. (This exacts a very small penalty since the chip area is dominated by the shift registers which store the synaptic weights.) During operation, the neural state vector stored in the counters is transferred into the shift registers and is then shifted out to create the neural firing signal. The result of each neuron firing passes through the delay line and updates each neural state counter. Once the shift register has been emptied, the new neural state is transferred into the shift register and the process begins again.
The computational delay can be significantly reduced by using bit-level pipelining in the comparator and increment/decrement circuitry. By skewing the storage of the sign bit and p magnitude bits of each weight G,, = sigi, . . gilgrO, then the comparison with R , can be done by a series of single-bit comparators. Single-bit registers exist between each comparator stage to store the intermediate borrow bits between cycles, which permits the processing of p comparisons simultaneously. The final borrow output is ANDed with the firing signal from neuron i and the result is used along with the sign bit to control a bit-level pipelined counter. (Additional logic prevents the counter from overflowing or underflowing during the charge integration process.) The computational delay is now determined solely by the time required to do 1-bit arithmetic. The combination of the changes described above with a modem silicon process should allow system clock speeds of 50-100 MHz. cated. A more advanced version is being designed whch accesses synaptic weights stored in external RAM'S and is capable of learning by dynamically adjusting these weights.
V. CONCLUSIONS AND FURTHER WORK
A neural network implementation based upon stochastic arithmetic has been described. This stochastic archtecture is dynamically reprogrammable, is easily expanded using multiple chips, and uses a constant number of 1/0 pins no matter what the size of the neural network being simulated. Simulations show that the solutions produced by the stochastic neural net do not suffer any ill effects due to its probabilistic, time-multiplexed nature, yet its speed exceeds those of a wide variety of other neural network implementations by 2-6 orders of magnitude.
A prototype of the stochastic architecture which supports 100 neurons has been designed and is being fabri- 
