Bit-serial neuroprocessor architecture by Tawel, Raoul
I11111 111ll Il1 Il11 US000 III III 1 0005 III III 7 I <  Il11 1 III 111111 1111 1111 1111 
(12) United States Patent (io) Patent No.: US 6,199,057 B1 




Assignee: California Institute of Technology, 
Raoul Tawel, Glendale, CA (US) 
Pasadena, CA (US) 
Subject to any disclaimer, the term of this 
patent is extended or adjusted under 35 
U.S.C. 154(b) by 0 days. 
Notice: 
Appl. No.: 08/956,890 
Filed: Oct. 23, 1997 
Related U.S. Application Data 
Provisional application No. 601029,593, filed on Oct. 23, 
1996. 
Int. C1.7 ...................................................... G06F 15/18 
U.S. C1. .................................. 706/30; 706141; 706127 
Field of Search .................................. 706126, 42, 35, 
706130, 41, 27 
References Cited 
OTHER PUBLICATIONS 
Diaz et al, “A Full-Custom BitSerial Multiplier for Neural 
Network Algorithms”, IEEE proceedings of the 7th Medi- 
terranean Electrotechnical Conference, Apr. 1994.* 
Johansson, H.O.; Larsson, P.; Larsson-Edefors, P.; Svens- 
son, C.; “A 20CkMHz CMOS bit-serial neural network”, 
ASIC Conference and Exhibit, 1994. Proceedings., Seventh 
Annual IEEE International, 1994, pp. 312-315, Apr. 1994.* 
Han, Gunhee and Sanchez-Sinencio, Edgar; “A General 
Purpose Discrete-Time Multiplexing Neuron-Array Archi- 
tecture”; IEEE, 1995; pp. 1320-1323.* 
* cited by examiner 
Primary Exuminerqeorge B. Davis 
(74) Attorney, Agent, or Firmarooks  & Kushman P.C. 
(57) ABSTRACT 
A neuroprocessor architecture employs a combination of 
bit-serial and serial-parallel techniques for implementing the 
neurons of the neuroprocessor. The neuroprocessor archi- 
tecture includes a neural module containing a pool of 
neurons, a global controller, a sigmoid activation ROM 
look-ua-table. a aluralitv of neuron state registers, and a 
U.S. PATENT DOCUMENTS synaptic weight RAM. ‘The neuroprocessor- reduces the 
number of neurons required to perform the task by time 
5,093,792 311992 Taki et al. .............................. 701199 
5,148,385 * 911992 Frazier ................................. 7081426 
multiplexing groups of neurons from a fixed pool of neurons 
to achieve the successive hidden layers of a recurrent 
5,175,858 1211992 Hammerstrom ....................... 712122 network topology, 
5,495,415 211996 Ribbens et al. ...................... 7011111 
5,781,700 * 711998 Puskorius et al. ..................... 706114 
5,956,703 * 911999 Turner et al. .......................... 706127 15 Claims, 5 Drawing Sheets 
https://ntrs.nasa.gov/search.jsp?R=20080004921 2019-08-30T02:53:53+00:00Z
U '*Sa Patent Mar. 6,2001 Sheet 1 of 5 US 6,199,057 B1 
' I  
I ' 8  
' I  
I , 
U S .  Patent 
I 




0 ?: 0 
cu 





C V  
Sheet 3 of 5 
z 
3 
C T V  








U S .  Patent Mar. 6,2001 Sheet 4 of 5 US 6,199,057 B1 
I 
rg x 
I 1  
I 1  
I 1  
I I  
I I  
I 1  
I 1  
i l  
I 1  






I \ I 
“I 









automotive systems (i.e., idle speed control, airifuel ratio 
control, etc.). This flexibility is achieved through the com- 
bined use of bit-serial design techniques, parallel hardware 
architecture, high speed design, and time-multiplexing the 
s hardware building blocks to achieve maximum computa- 
tional performance for the required task. 
More specifically, the neuroprocessor architecture com- 
prises a neural module containing a pool of neurons, a global 
controller, a sigmoid activation Read Only Memory (ROM) 
10 look-up-table, a plurality of neuron state Random Access 
Memory (RAM) registers, and a synaptic weight RAM, The 
This application claims the benefit of U.S. provisional 
application Under 37 CFR 1.53(b)(2), Ser. No. 601029,593, 
filed Oct. 23, 1996. 
ORIGIN OF INVENTION 
The invention described herein was made in the perfor- 
mance of work under a NASA contract, and is subject to the 
provisions Of Law 96-517 (35 u.s.c. 202) in which 
neuroprocessor achieves its compactness by employing a 
combination ofbit-serial and serial-parallel techniques in the 
implementation of the neurons and mimimizing the number 
15 of neurons required to perform the task by time multiplexing 
techniques, where groups of neurons from a fixed pool of 
neurons are used to configure the successive hidden layers of 
the recurrent network topology. Sufficient neuron resources 
are provided to address the most challenging diagnostic and 
20 control applications. In fact, of the most demanding neural 
network vehicular applications, the misfire detection 
the Contractor has elected to retain title. 
TECHNICAL FIELD 
This invention relates generally to processor architecture 
and more particularly to a bit-serial based recurrent neuro- 
processor architecture 
BACKGROUND ART 
Recently, considerably progress has been achieved in the 
use of neural network methodologies for both diagnostic and 
problem, a candidate pool Of sixteen 
deemed to be sufficient. By time 
was 
the sixteen 
control applications of nonlinear dynamical systems, This neurons can be re-utilized on successive layers. This time- 
progress is due in part to the use of context sensitive neural 25 multiplexing of layers radically streamlines the architecture 
to improved training methodologies (as with multistream Of resources. 
training techniques). The bulk of previous efforts used static 
or feedforward networks, which were plagued by slow 3u 
adaptation and large error rates. Architecturally, recurrent A more complete understanding of the present invention 
neural networks are simple extensions of feedforward net- may be had from the following detailed description which 
works where the network's neuron node outputs are no should be read in conjunction with the drawings in which: 
longer a f ~ c t i o n  of their current inputs, but also of the FIG. 1 shows the topology of a recurrent neural network 
recent time history of inputs via time-lagged connections. 
In the automotive sector to date, this recurrent neural FIG, 2 is a block diagram of the neuroprocessor of the 
network formalism has been successfully applied and 
reported in the literature for several engine subsystems. FIG, is a block diagram of the global controller of the 
These include the idle speed problem and the antilock brake 
problem (control problems) and the misfire detection prob- 4o 
lem (a diagnostic problem), In either case, the recurrent is a diagram Of the time Of the 
neuromorphic methodologies developed were trained to 
detect, identify and/or control improper events in an oper- FIG. 5 is a block diagram of the bit-serial architecture of 
ating internal combustion engine. In order to utilize infor- 
mation from sensors now in production use, the diagnostic 45 FIG. 6 is a schematic diagram of the multiplier of FIG. 5;  
and control operations are based upon the temporal analysis 
of existing sensor outputs or dynamics. It has been demon- FIG. 7 is a schematic diagram of the accumulator of FIG. 
strated that the diagnostic and control tasks can be accom- 
plished by the use of trainable classifiers of suitable capacity. 
These trainable classifiers, however, are based upon systems 
which require considerable computational resources and as 
such require dedicated hardware implementations in order to A schematic representation of recurrent network structure 
meet the real-time on-board computations requirements. in accordance with the present invention is shown in FIG, 1, 
While there exist a number of commercially available neural The basic element in neurocomputation is the neuron- 
hardware implementations, none meet the specific design 55 which is a simple processing element, N~~~~~~ can be 
requirements needed for large scale COmmercial deployment interconnected in various topologies by means of synaptic 
in the automotive sector. weights. Typically, neurons are organized into computa- 
tional layers, i.e. slabs. Though arbitrarily complex network 
architectures can be constructed with these slabs- 
In accordance with the present invention a cost-effective 60 architectures with multiple levels of hierarchy and 
hardware realization of an application specific integrated interconnectivity-practical applications intimately link the 
circuit (ASIC) neuroprocessor is provided that will enable neural network structure to its proposed functional use. The 
the execution of on-board diagnostic and control tasks in simplest form of network, the multilayer perceptron, is one 
real-time in production vehicles. The neuroprocessor archi- having an input layer of source nodes, any number of 
tecture and hardware is sufficiently flexible to be able to 65 intermediate hidden layers, and a final layer of output 
perform the misfire diagnostic task and still have the capa- neurons. The main goal for the hidden layer neurons is to 
bility of performing other diagnostics or control functions in enable the network to extract higher-order statistics from the 
network architectures (as in recurrent networks) and in part by increasing hardware through 
BRIEF DESCRIPTION OF THE DRAWINGS 




a neuron used in the recurrent neural network; 
and 
5 ,  
DETAILED DESCRIPTION OF PREFERRED 
EMBODIMENT 
SUMMARY OF THE INVENTION 
US 6,199,057 B1 
3 4 
data set. The output signals of the neurons in the final layer active neurons are those in the second hidden layer. The 
of the network together constitute the overall response to the computation proceeds a layer of neurons at a time until all 
activation pattern supplied to the source nodes on the input output neuron activations are finally computed. Thus, com- 
layer. putations in a neural network are strictly performed a layer 
If the neurons are indexed by the subscript j ,  then the total 5 at a time sequentially Progressing through the hierarchy of 
input, to j, is a linear function of the outputs, yL, layers that compose the network architecture. In the example 
of all the neurons that are connected to j and of the weights of FIG. 1, the assigned neuron resources (15 of 16) for the 
wLl on these connections, i.e. first hidden layer at time TO is indicated at 14, and the 
assigned neuron resources (7 of 16) for the second hidden 
Referring now to FIGS. 2 and 3, a block diagram repre- 
sentation of the single chip stand-alone neuroprocessor of 
the present invention, generally designated 20, is shown. 
Neurons can be provided with additional stimuli in the The chip was designed with the goal of minimizing the size 
form of a bias by introducing an extra input to each unit 15 of the neuroprocessor while maintaining the computational 
which has a value of 1. The weights on this extra unit are accuracy required for automotive diagnostic and control 
called the bias weights and are equivalent to a threshold. applications. The neuroprocessor architecture comprises a 
Neurons have real-valued outputs, yl, which are a nonlinear neural module 22, a global controller 24, a sigmoid activa- 
function of their inputs. The exact form of this equation can tion ROM look-up-table 26, neuron state RAM registers 28, 
vary dependent on the application at hand. The activation 20 and synaptic weight RAM registers 30. 
function used in this VLSI architecture, the bipolar sigmoid, The neural module 22 performs the neuronal multiply and 
is given in equation (2). accumulate operations. Each neuron receives, as input, the 
synaptic weights and activations from input nodes (or from 
neurons on a previous layer) in a bit serial and bit-parallel 
2s fashion respectively, and outputs the accumulated sum of 
partial products as given by equation (1). The global con- 
troller 24 enables the neurochip to execute its required task 
The ne~roProcessor architecture of the Present invention of generating necessary control logic as well as orchestrating 
includes a neuronal module comprising a Plurality of data movement in the chip. When there are no computations 
neurons, each of which Performs the neuronal multiply and 30 being performed, the global controller remains in an idle 
accumulate operation. The neurons receive as inputs, the state. When a RUN command is issued, the global controller 
synaptic weights and activations from input nodes, or from is in charge of providing control signals to the on-chip 
neurons on a previous layer, in a bit serial-parallel fashion. neurons, the RAM, and the ROM in order to proceed with 
The neurons output the accumulated sum of Partial Products the desired neurocomputation. Input activations are read out 
as given by equation (1). Because of the computational 35 of the neuron state register RAM 28, synaptic weights are 
nature of neural networks-where information is sequen- read out of the synaptic weight RAM 30, and both are 
tially computed a layer at a time-OnlY enough neurons need propagated to the bank of neurons 22. The global controller 
be physically implemented in actual silicon as are required keeps track of intra-layer operations as well as global 
by the largest layer. In other words, if we denote the number inter-layer operations. Upon completion of a forward pass 
of neurons in layer i of application j, nL19 then the number of 40 through the network architecture, the controller returns to 
neurons implemented in silicon is given by n,, where the idle state. A bias term from the bias block 32 is 
propagated on the data bus to the neurons, in much the same 
(3) way as the neuron inputs from the neuron storage RAM 28. 
In the recurrent neural network applications of internal With reference to FIG. 3, the global controller 24 is made 
combustion engine misfire detection, or idle speed control, 45 up of a configuration controller 34, and a run-time controller 
a candidate pool of sixteen silicon neurons is sufficient. By 36. Configuration of the hardware is performed by the 
making use of a time multiplexing of layers approach to configuration controller and requires the loading of five 
neurocomputation, the sixteen neurons can be re-utilized on 16-bit registers that together explicitly define the topology of 
successive layers. This time-multiplexing of layers radially the current neural network application. The configuration 
streamlines the architecture by significantly increasing hard- SO controller 24 accepts as input, 16-bit data on bus D, a 3-bit 
ware utilization through reuse of available resources. address on bus A, a configuration control signal CFG, a 
Time multiplexing or sequential processing issue clock C, and a global reset signal R. All signals feeding into 
becomes clearer once the flow of information in the neural the configuration controller are externally accessible. The 
network upon initiation of a computation is understood. 3-bit address bus internally selects one-of-five 16 bit con- 
Consider the network topology for the engine misfire detec- ss figuration registers as the destination of the 16-bit data 
tion problem, shown in FIG. 1. If input sensory data are source D. By strobing the CFG control line, data can be 
presented to the neural network's four inputs at time t=O, the synchronously loaded into any of the five architecture reg- 
only active computations being performed in the network isters RA-RE. From an implementation perspective, the first 
are strictly limited to those neurons receiving stimuli from four registers, registers RA-RD, uniquely define the topol- 
the input layer neurons, i.e., neurons lying uniquely in the 60 ogy of each layer in the neural network architecture. Thus, 
first hidden layer 10. All other neurons remain totally in this embodiment of the architecture there can be at most 
inactive. If the computation time of the neuron is defined by 4 layers in any recurrent neural network application-i. e., 
T, then at time t=T, all neurons in the first hidden layer will an input layer, an output layer, and two hidden layers. The 
have computed their activations. Neurons in the first hidden 16-bit registers RA through RD each contain layer specific 
layer can now play a passive role and simply broadcast their 65 bit-fields (such as the number of neurons in the current layer 
activations to neurons in the next layer, i. e., the second and the number of recurrent connections within the layer) 
hidden layer 12, in a similar fashion. At this time, the only that collectively define the neural topology. Register RE 
( 1 )  i o  layer at time T1 is indicated at 16. 
XJ = Y , W y  
2 (2)  
y J = - l + -  
1 + e?J 
%,,,(n,)W) 
US 6,199,057 B3 
5 6 
defines the number of non-input layers in the neural network been calculated, as given by equation (1). The state machine 
topology and since the number of layers is restricted to 4, 42 then passes execution to the storage controller 46 by 
only the lowest 2-bits are of significance. Once the five toggling its RUN pin. The responsibility of the storage 
configuration registers are loaded, a unique network topol- controller 46 is to calculate the non-linear activations for the 
ogy is defined, and the global controller can proceed to the s neurons whose linear activation was just calculated and 
run-time mode. subsequently store the resulting quantities in RAM 28. This 
Once the configuration registers are loaded, control is is achieved by sequentially enabling the linear output of 
passed to the run-time controller 36. At this stage, 2’s each neuron on that layer, allowing the signal to propagate 
complement binary coded data representing the input quan- through the sigmoid activation look-up-table (LUT) 26, and 
tities that need to be processed by the neural network are i o  storing the result in an appropriate memory location in RAM 
loaded into the neuron state RAM module 28 at appropriate 28. Upon completion, the storage controller 46 returns 
memory locations. The controller 36 remains in the idle control to the state machine 42. When active, the controller 
mode for as long as the RUN line remains low. The low to 42 generates the addresses ANRAM[S:O] and control signals 
high transition on the RUN line immediately resets the CNRAM[2:O] to the neuron RAM 28, sequentially enables 
Active Low BUSY flag and initiates the execution of a is output of the active neurons via the OEN control lines, and 
single forward pass of the control hierarchy using the enables access of the output from the LUT onto the neuron 
registers RA through RE as a template that together define data input bus. When controller 42 completes execution, a 
the network’s topology. The BUSY flag remains low until full forward pass has been completed for a single layer of the 
the neural network has completed the neurocomputation. It recurrent neural network architecture. The state machine 42 
subsequently returns high after the contents of the output 20 increments internal layer counters, and checks to see if there 
layer of the neural network have been placed back into are any additional layers in the neural network topology that 
appropriate memory locations of the neuron register RAM need to be calculated. If there are, the above process is 
module 28. Once the BUSY flag goes high, the contents of repeated. If all layers have been computed and the neuron 
the neuron RAM module are made available to the external outputs stored in RAM 28, the controller sets the BUSY flag, 
world, and can be retrieved by the appropriate toggling of zs and returns to the idle mode. When the BUSY flag is high, 
the RAM control lines. In this fashion, the output of the data can be read from all RAM memory locations, and the 
network can be read out and fresh inputs can be loaded into results of the neurocomputation can be off-loaded to the 
the hardware. The neuron RAM module 28 is a single port external world. This completes the execution of the neuro- 
RAM, so once the neural network begins computations, the computation. 
RAM module is inaccessible. Referring now to FIG. 5,  each neuron in the pool of 
The run-time global controller 36 is shown in greater neurons 22 includes a serial-parallel multiplier 50, and a 
detail in FIG. 4. It is made up of four distinct logic blocks bit-serial accumulator 52. The multiplier 50 is used to 
including a current layer register selector 40; a finite state perform the synaptic multiplications required by the neural 
machine 42 in charge of sequencing high-level inter-layer network architecture. In operation, the multiplier accepts as 
operations; an intra-layer propagation controller 44; and an 35 input either (1) an input stimulus to the neural network or (2) 
intra-layer specific neuron output data storage controller 46. the activation output from a neuron on a previous layer. It 
When the RUN command is issued to the run-time controller multiplies this quantity by the corresponding synaptic 
36, state machine 42 begins execution by clearing the BUSY weights. The input stimulus (or activation output) is pre- 
flag, the current layer register selector 40, the propagation sented to the multiplier in a fully parallel fashion while the 
controller 44, and the storage controller 46. The current layer 40 synaptic weights are presented in a bit-serial fashion. 
controller has access to all four configuration registers, RA The serial output of the multiplier feeds into the accumu- 
through RD. Upon reset, selector 40 points to the RAregister lator. The multiplier 50 is shown in greater detail in FIG. 6. 
(which defines the input layer topology) and thereby propa- Any size multiplier can be formed by cascading the basic 
gates its contents to the propagation and storage controllers, multiplier cell. The bit-wise multiplication of the multiplier 
44 and 46 respectively. The state machine 42 then passes 45 and multiplicand is performed by the AND gates 60a-60n. 
control to the propagation controller 44 by toggling the RUN At each clock cycle, the bank of AND gates therefore 
pin on controller 44 and goes into an idle mode. The role of compute the partial product terms of a multiplier Y and the 
the propagation controller 44 is to oversee the execution of current multiplicand X(t). Two’s complement multiplication 
the neuronal multiply and accumulates. This is achieved by is achieved by using XOR gates 62a-6211 connected with the 
providing the necessary control logic and precise synchro- SO outputs of the AND gates and providing inputs to full adders 
nization of data flow out of both the neuron RAM 28 and the 64a-64n. By controlling one of the inputs on the XOR gate, 
synapse RAM 30 into the bit-serial neurons 22. The propa- the finite state machine 66 can form the two’s complement 
gation controller 44 therefore generates (1) the addresses of selected terms based on its control flow. In general, for a 
ANRAM[S:O] and control signals CNRAM[2:O] to the neu- nxn multiplier resulting in a 2n bit product, the multiplier 
ron RAM 28; and (2) the addresses AWRAM[S:O] and ss can be formed using 2n basic cells and will perform the 
control signals CWRAM to the synaptic weight RAM 30. multiplication in 2n+2 clock cycles. Successive operations 
The propagation controller 44 also generates control signals can be pipelined and the latency of the LSB of the product 
on the control lines CN[3:0] to the neuron block 22. These is n+2 cycles. In this implementation, n=16. 
control signals include commands to clear the neuron mul- The accumulator 52 is shown in greater detail in FIG. 7 
tipliers and accumulators. The OEBIAS signal allows the 60 and comprises a single bit-serial adder 70 linked to a chain 
propagation of a bias term to the neurons 20. The bias term of flip-flops generally indicated at 72. The bit-serial adder is 
is propagated on the data bus to the neurons 22 in much the made up of a single full adder and a flip-flop to store the 
same way as the neuron inputs from the neuron storage carry bit. The length of the accumulator chain is controlled 
RAM 28. When the bias term is invoked, the neuron RAM by the multiplication which takes 2n+2 clock cycles to 
outputs are simply tri-stated. 65 perform a compete multiplication. At each clock cycle, the 
Upon completion of the propagation controller task, the accumulator sums the bit from the input data stream with 
linear activation for all neurons in the current layer have both the contents of the last flip-flop on the chain as well as 
30 
US 6,199,057 B3 
7 
the carry bit, if any, generated from the last addition opera- 
tion a clock cycle before. This value is subsequently stored 
into the first element of the chain. This creates a circulating 
chain of data bits in the accumulator. In operation, the 
adder’s flip-flop is reset prior to the accumulation of a sum. 
The chip architecture of the present invention may be 
used to compute transfer functions not ordinarily associated 
with neural networks, such as FIR and IIR filters. Thus, such 
a chip can also perform multiple functions which have been 
realized by conventional signal processing algorithms in 
real-time systems. 
While the best mode for carrying out the present invention 
has been described in detail, those familiar with the art to 
which this invention relates will recognize various alterna- 
tive designs and embodiments for practicing the invention as 
defined by the following claims. 
What is claimed is: 
1. A recurrent neuroprocessor for implementing a recon- 
figurable neural network topology including a plurality of 
hidden layers containing neurons interconnected in a recur- 
rent configuration; said neuroprocessor comprising: 
a neural module including a plurality of bit-serial neurons; 
a global controller for time multiplexing groups of neu- 
rons from the neural module to form first and second 
hidden layers of the network topology, said global 
controller controlling application of an input pattern to 
the first hidden layer and controlling storage of the 
output of said first hidden layer for subsequent appli- 
cation as input to said second hidden layer. 
2. The system defined in claim 1 wherein each neuron 
comprises a bit-serial multiplier for multiplying first and 
second inputs, said global controller sequentially applying to 
said first input of said multiplier one input of said input 
pattern or the activation output of a neuron on a previous 
layer, the controller applying a synaptic weight appropriate 
for the input pattern to said second input of said multiplier, 
each neuron further comprises a bit-serial accumulator for 
accumulating the output of said multiplier. 
3. The system defined in claim 2 wherein the recurrent 
configuration includes unit time delays. 
4. A recurrent neuroprocessor for implementing a recon- 
figurable neural network topology including a plurality of 
hidden layers containing neurons interconnected in a recur- 
rent configuration; said neuroprocessor comprising: 
a neural module including a plurality of bit-serial neurons, 
each neuron comprising a bit-serial multiplier for mul- 
tiplying first and second inputs and further comprising 
a bit-serial accumulator for accumulating the output of 
said multiplier; 
a global controller for time multiplexing groups of neu- 
rons from the neural module to form first and second 
hidden layers of the network topology, said global 
controller controlling application of an input pattern to 
the first hidden layer and controlling storage of the 
output of said first hidden layer for subsequent appli- 
cation as input to said second hidden layer; 
said global controller sequentially applying to said first 
input of said multiplier one input of said input pattern 
or the activation output of a neuron on a previous layer, 
the controller applying a synaptic weight appropriate 
for the input pattern to said second input of said 
multiplier; 
said input neuron activation bits being provided to the 
multiplier in parallel and the input synaptic weight bits 
being provided to the multiplier serially, the multiplier 
computing the product of the two inputs and making 
the results available to the accumulator on a bit-serial 
basis. 
8 
5 .  The system defined in claim 4 wherein said global 
controller comprising a configuration controller, and a run- 
time controller, 
said configuration controller including a plurality of con- 
figuration registers for storing data that defines the 
topology of each layer of the recurrent neural network 
architecture, 
said run-time controller initiating a neurocomputation 
using the data in said configuration registers. 
6. The system defined in claim 5 wherein said configu- 
ration controller includes a data bus, and an address bus, and 
receives a configuration control signal, a clock signal, and a 
reset signal, the address on the address bus internally select- 
ing a configuration register as the destination of data on the 
data bus. 
7. The system defined in claim 6 wherein said neuropro- 
cessor further includes a neuron state RAM module for 
storing the contents of the output layer of the neural net- 
work. 
8. The system defined in claim 7 wherein the run-time 
controller comprises a current layer register selector, a finite 
state machine for sequencing high-level inter-layer 
operations, an intra-layer propagation controller for control- 
ling execution of neuronal multiply and accumulates, and an 
25 intra-layer specific neuron output data storage controller for 
controlling calculation of non-linear activations for the 





y I = l - -  
1 + 30 
and for subsequently storing the resulting quantities in the 
neuron state RAM. 
9. The system defined in claim 8 wherein the neuropro- 
cessor further includes a sigmoid activation look-up-table 
for performing the non-linear activation function. 
10. The system defined in claim 9 wherein each neurons 
comprises a bit-serial multiplier for multiplying first and 
4o second inputs, said global controller sequentially applying 
one input of said input pattern or the activation output of a 
neuron on a previous layer to said first input of said 
multiplier, said global controller applying a synaptic weight 
appropriate for the input to said second input of said 
45 multiplier, each neuron further comprises a bit-serial accu- 
mulator for accumulating the output of said multiplier. 
11. The system defined in claim 10 wherein the input 
neuron activation bits are provided to the multiplier in 
parallel and the input synaptic weight bits are provided to the 
multiplier serially, the multiplier computing the product of 
the two inputs and making the results available to the 
accumulator on a bit-serial basis. 
12. The system defined in claim 11 wherein the accumu- 
lator comprises a cyclical shift register with an adder at the 
13. The system defined in claim 12 wherein the neuron 
state RAM module is a single port RAM module. 
14. The system defined in claim 13 wherein said input 
pattern includes a bias input. 
15. A recurrent neuroprocessor for implementing a recon- 
figurable network topology including a plurality of hidden 
layers containing neurons interconnected in a recurrent 
configuration; said neuroprocessor comprising: 
a neural module including a plurality of bit-serial neurons; 
a global controller for time multiplexing groups of neu- 
rons from the neural module to form first and second 
hidden layers of the network topology, the controller 
35 
55 input stage. 
6o 
65 
US 6,199,057 B1 
9 10 
controlling application of the input pattern to the first 
hidden layer and controlling storage of the output of 
said first hidden layer for subsequent application as 
input to said second hidden layer; 
said global controller comprising a configuration control- 5 
ler; 
said configuration controller including a plurality of con- 
figuration registers containing data that explicitly 
defines the topology of respective ones of said layers of 
the recurrent neural network architecture including the lo 
number of neurons and the number of recurrent con- 
nections within respective layers; 
said global controller including a run-time controller for 
initiating a neurocomputation using the data in said 
ule for storing the contents of the output layer of the 
neural network; 
said run-time controller comprising a current layer regis- 
ter selector, a finite state machine for sequencing high- 
level inter-layer operations, an intra-layer propagation 
controller for controlling execution of neuronal multi- 
ply and accumulates, a sigmoid activation look-up- 
table for performing a non-linear activation function 
and an intra-layer specific neuron output data storage 
controller for controlling calculation of non-linear acti- 
vations for the neurons and for subsequently storing the 
resulting quantities in said neuron state RAM. 
configuration registers, and a neuron state RAM mod- * * * * *  
