Neural Network Adaptations to Hardware Implementations by Moerland, Perry & Fiesler, Emile
 
 
E
S
E
A
R
C
H
R
E
P
R
O
R
T
I
D
I
A
P
D a l l e M o l l e I n s t i t u t e
for Percept ive Art i f i c ia l
Intelligence   P O Box   
Martigny  Valais  Switzerland
phone      
fax      
email secretariat idiapch
internet httpwwwidiapch
 IDIAP 
Martigny - Valais - Suisse
Neural Network Adaptations
to Hardware
Implementations
Perry Moerland

Emile Fiesler
IDIAP RR 
January 
published in
E  Fiesler and R  Beale editors Handbook of Neural Computation
E   Institute of Physics Publishing and Oxford University
Publishing New York 

email PerryMoerland	idiapch
IDIAP Research Report  
Neural Network Adaptations to Hardware
Implementations
Perry Moerland Emile Fiesler
January  
published in
E  Fiesler and R  Beale editors Handbook of Neural Computation E   Institute of Physics
Publishing and Oxford University Publishing New York 
Abstract  In order to take advantage of the massive parallelism o ered by articial neural net
works hardware implementations are essential However most standard neural network models
are not very suitable for implementation in hardware and adaptations are needed In this section
an overview is given of the various issues that are encountered when mapping an ideal neural
network model onto a compact and reliable neural network hardware implementation like quant
ization handling nonuniformities and nonideal responses and restraining computational com
plexity Furthermore a broad range of hardwarefriendly learning rules is presented which allow
for simpler and more reliable hardware implementations The relevance of these neural network
adaptations to hardware is illustrated by their application in existing hardware implementations
Acknowledgements The support of the Swiss National Science Foundation Grant SNSF 
	
 
 is gratefully acknowledged 
IDIAPRR   
  Introduction
Soon after the widespread revival of neural network research in the mideighties it was realized that to
fully prot from the massive parallelism inherent in neural network models hardware implementations
are essential  This has led to a large variety of implementations using digital and analog electronics
optics and hybrid techniques  Even though these implementations are largely dierent a common
denominator is the mapping of neural network algorithms onto reliable compact and fast hardware 
Any hardware implementation has to optimize three main constraints accuracy space and processing
speed  The design of hardware implementations is governed by a balancing of these criteria  An analog
implementation for example is very ecient in terms of chip area and processing speed but this comes
at the price of a limited accuracy of the network components  In general this amounts to a tradeo
between the accuracy of the implementation and the reliability of its performance  In this section the
inuence of the limitations typical for hardware implementations will be outlined  Examples of this
phenomenon are
 the quantization of network parameters in digital implementations in specic its weights to
obtain a far more compact implementation  Its counterpart in analog implementations is a
limited accuracy of the network parameters due to system noise 
 and that computation in analog hardware be it electronic or optical is characterized by the non
uniformity of its components and by the fact that the components are at best approximations
of the corresponding mathematical operations in the neural network model 
This section provides a thorough review of the experimental and theoretical research that has been
performed on the behaviour of existing learning algorithms under the limitations imposed by hardware 
Furthermore training algorithms are discussed that oer an improved performance in case of limited
accuracy and that further simplify the hardware implementation of neural networks 
In section  the eects of a quantization of the network parameters and weight discretization
algorithms for various neural network models are reviewed  The dierent approaches are illustrated
with examples from existing neural hardware implementations and several commonly used schemes
are discussed in more detail  The inuence of hardware nonidealities such as spatial nonuniformity
and nonideal response is outlined in section   Section 	 contains an overview of hardware friendly
learning algorithms which are better suited for hardware implementation and especially for onchip
learning  Finally in section 
 a summary and conclusions are presented 
 Quantization Eects
The use of very high precision cannot be matched with the goal of developing fast and compact
hardware implementations  While in digital implementations a high numerical precision is too area
consuming it is incompatible with the system noise present in analog implementations  Therefore
hardware implementations of neural networks typically use a representation of the network parameters
with a limited accuracy  For example in Philips LNeuro   architecture which allows the imple
mentation of feedforward networks and onchip backpropagation training bit weights are used
during the training process and only 	bit or bit weights are employed during recall Mauduit 
An example of an analog electronic implementation is Intels Electrically Trainable Analog Neural
Network ETANN which can perform an impressive two billion weight multiplications per second 
The accuracy of its weights and neurons however can be compared with a resolution of only seven
bits Holler 
Since hardware implementations are characterized by a low numerical precision it is essential to
study its eects on the recall and training of the various neural network models  The need for a
further reduction of the accuracy while retaining a satisfactory network performance has also led to
various weight discretization algorithms especially designed for this purpose  Since most research has
been performed for multilayer feedforward networks these will be discussed separately from the other
IDIAPRR   
Reference Accuracy  of Benchmarks Remarks
in bits Articial Realworld
Holt    Finite precision error analysis for the forward re
trieving pass 
Dundar
    Statistical model of weight quantization in sig
moidal networks 
Piche
    Statistical analysis of the eects of weight errors
upon an ensemble of multilayer networks 
Table  Weight discretization in multilayer neural networks ochip learning 
Reference Accuracy  of Benchmarks Remarks
in bits Articial Realworld
Fiesler
Fiesler
   Forward pass with discrete weights backward
pass with continuous weights 
Marchesi 	   Poweroftwo weights in the forward pass and
an adaptive learning rate 
Tang 	   Poweroftwo weights and adaptive gain of the
activation function 
Table  Weight discretization in multilayer neural networks chipintheloop learning 
neural network paradigms  A compact overview of a large variety of results on the eects of limited
precision in neural networks can be found in Table  to 	  These tables list the number of bits that are
required for satisfactory learning performance and briey describe the core idea of the algorithms 
In order to give an indication of the quality of the experimental evaluation in the cited articles two
columns listing the number of articial and real world benchmarks on which the algorithms have been
tested are also included 
  Quantization Eects in Multilayer Neural Networks
Most methods deal with the various aspects of limited precision calculation in multilayer networks 
These approaches can be divided into three categories corresponding to the three dierent training
modes for neural network hardware
O chip learning In this case the hardware is not involved in the training process which is
performed on a computer using high precision  The weights resulting from the training process are
quantized and then downloaded on the chip  Only the forward propagation pass in the recall phase is
performed onchip which makes these quantization eects amenable for mathematical analysis using
a statistical model  Some of the results have been summarized in Table  which indicate that the
accuracy needed in the onchip forward pass is around  bits  In Piche
 a comparison between
Heaviside and sigmoidal multilayer networks is given showing that the weight precision required
in a Heaviside network is much higher and even doubles when a layer is added to the network  An
interesting practical example illustrating that low onchip accuracy is sucient when mapping a neural
network trained with a high precision onto a chip is the application of the analog ANNA chip to high
speed character recognition Sackinger  Here a high precision bit oating point network is
mapped on the ANNA chip which uses a bit weight resolution and a bit resolution for the neuron
inputs and outputs  The chips recognition accuracy is only slightly less than the one obtained with
oatingpoint calculations 
Chipintheloop learning In this case the neural network hardware is used during training but
only in forward propagation  The calculation of the new weights is done ochip on a computer which
downloads the updated weights onto the chip after each training iteration  Several learning algorithms
IDIAPRR   
have been proposed that take advantage of the fact that in this way the limited precision only plays a
role in the forward propagation pass and that oatingpoint calculations can be used in the backward
pass Table   One of the rst and perhaps most successful weight discretization techniques is of the
chipintheloop kind Fiesler Fiesler  It is suitable for feedforward neural networks easy to
implement and very exible in that it can handle a large range of discretizations up to the precision
of a few bits only Table   The basic idea is to start with a normal neural network with continuous
valued weights  These weights are discretized using a staircase shaped multiple threshold function and
the so created discrete weights are then used for the forward propagation pass of the learning rule 
The errors obtained which are based on the dierence between the obtained network outputs and
the desired target outputs are subsequently used to update the continuous valued weights during the
backward propagation pass  This scheme is repeated until convergence is obtained  This exible weight
discretization method has been successfully used in the development of the Apple Newton Lyon
and in optical neural networks at Mitsubishi Japan Takahashi and in Switzerland Saxena

Moerland   A similar approach has been applied to design neural networks restricted to single
poweroftwo weights see section   Marchesi Tang 
Onchip learning Here the training of the neural network is done entirely onchip which oers the
possibility of continuous training  This means in specic that at least the weight values are represented
with only a limited precision  Simulations have shown that the popular backpropagation algorithm see
for example Rumelhart is highly sensitive to the use of limited precision weights and that training
fails when the weight accuracy is lower than  bits rst two references in Table   This is mainly
because the weight updates are often smaller than the quantization step which prevents the weights
from changing  In order to reduce the chip area needed for weight storage and to overcome system noise
a further reduction of the number of allowed weight values is desirable  Several weight discretization
algorithms have therefore been designed and an extensive list of them and the attainable reduction
in required precision is given in Table   Some of these weight discretization algorithms have already
proven their usefulness in hardware implementations  Battitis reactive tabu search for example has
been implemented in the TOTEM processor and successfully applied to a triggering problem in high
energy physics with a weight accuracy as low as 	 bits Battiti	  Recently an analog electronic
chip Kakadu has been applied successfully to some classication problems by training it with the
combined search algorithm and semiparallel weight perturbation algorithms using only a bit weight
accuracy Jabri	 Leong
 
   Quantization Eects in Other Neural Network Models
Also for other neural network models the eects of a coarse quantization of the weight values on recall
and learning have been investigated  The small number of weight discretization algorithms proposed
can be partly explained from the fact that the required accuracy for successful learning in these
models is lower than for gradient descent learning in multilayer networks Table 	  An interesting
example of a hardware implementation is Bellcores implementation of a Boltzmann machine and
MeanField learning which allows onchip learning with only 
bit weights Alspector  Recently
a weight discretization algorithm for an associative memory with binary f g weights has been
implemented on a digital VLSI chip Hendrich  The pattern storage capacity that can be obtained
with this learning rule is good  	 times the number of neurons and the algorithm is suited for
onchip learning  Verleysens associative memory training algorithm that uses the Simplex method to
train a network with ternary weights is bestsuited for ochip training Verleysen 
  Some Remarks on Commonly Used Schemes
A common point of many weight discretization algorithms is the way in which the eects of having
only a limited weight range are treated  It has been shown by simulations that as soon as the range
of the weights decreases below a certain value which depends on the problem at hand the training
fails to converge because of the clipping of the weight values Hoehfeld  This can often be solved
IDIAPRR   	
Reference Accuracy  of Benchmarks Remarks
in bits Articial Realworld
Asanovic    Coarse weight quantization in the back
propagation algorithm 
Holt 	   An error analysis of backpropagation with 
nite precision 
Grossman    Adaptation of both weights and the internal
representation of the neurons 
Reyneri    Batch backpropagation with a nearoptimum
learning rate 
Xie    Weight perturbation with gain adaptation 
Xie    Combination of weight perturbation and a
partial random search 
Abramson    A slight modication of Grossman to
train sparsely connected Heaviside networks 
Sakaue    A weighted error function in the backpropaga
tion algorithm based on an overestimation of
the error 
Hollis	    Weight perturbation with an adaptive gain
and learning rate 
Jabri	    Semiparallel weight perturbation algorithms 
Simard	    Backpropagation without multiplication
gradients and states of poweroftwo
Battiti
    Heuristic method for solving combinatorial
optimization problems 
Dundar
    Backpropagation with forced weight updates 
Table  Weight discretization in multilayer neural networks onchip learning 
by allowing a dynamic rescaling of the weights and hence the weight range by adapting the gain  
of the activation function  The calculation of an activation value a
j
in a multilayer network is namely
done as follows
a
j
    
X
i
w
i j
 a
i
 
Thus a change of the weight range is equivalent to changing the gain   of the activation function 
Various strategies have been proposed to perform this gain adaptation ranging from heuristics based
on the average value of the incoming connections to a neuron Hoehfeld Xie to approaches
that use some form of gradient descent to train the gains Tang Coggins	 
In some training algorithms the weight values have been limited to powersoftwo White
Tang Marchesi  The main advantage of this technique is that all costly multiplications can be
replaced by easy to implement shift operations  This scheme has also been applied to gradient values
activation values and learning rates Hollis	 Simard	 
Work on limiting the number of weight levels has also been done in the design of Heaviside net
works for the computation of boolean functions majority parity comparison addition and for the
twospiral problem Beiu
 Beiu   Beius concern is to minimize the total number of bits re
quired to represent the weights of a network since this is a realistic measure of the complexity of
VLSI implementations  Moreover it opens up the possibility to compare results obtained by learning
algorithms with the entropy number of bits upper bounds of the data set Beiu  
Finally we would like to point out that a comparative benchmarking study of quantization eects
on dierent neural network models and the improvements that can be obtained by weight discretization
algorithms has not yet been done  The accuracies listed in Table  to 	 are therefore highly biased by
the dierent benchmarks that were used by the various authors 
IDIAPRR   

Reference Accuracy  of Benchmarks Remarks
in bits Articial Realworld
Selforganizing map see for example Kohonen
Kohonen 	   Quantization of input values during recall 
Rueping	 	   Poweroftwo adaptation factor and quantized
weights 
Thiran	  
   Uses a conical neighbourhood function instead
of a rectangular one 
Associative memory see for example Hopeld
Verleysen    A linear programming learning algorithm for
associative memories 
Johannet    Integer arithmetics for learning in associative
memory 
Hendrich    Associative memory with binary weights and
a good storage capacity 
Boltzmann network Ackley

Balzer    Coarse quantization of the weights during
learning 
Alspector 
   Coarse weight quantization for Boltzmann and
meaneld learning 
Neocognitron Fukushima
White    Uses poweroftwo weights 
Cascade topology Fahlman
Hoehfeld    Coarse weight quantization in the cascade cor
relation algorithm 
Hoehfeld    Cascade correlation with probabilistic round
ing and variable gain 
Campbell
    A constructive algorithm for Heaviside cas
cade networks 
Table 	 Weight discretization in other neural network models 
 Hardware NonIdealities
Both in analog electronic and optical neural network implementations computation suers from draw
backs which do not play an important role in digital hardware  Some characteristic examples of such
nonidealities inherent to analog computation are the spatial nonuniformity of components and non
ideal responses  In this section examples of these nonidealities are presented together with their
eects on the learning behaviour of neural networks 
 Component NonUniformity
Variations between the onchip components such as multipliers Cairns	 and the readout of op
tical weight matrices Robinson are inevitable in analog hardware  These nonuniformities are
particularly troublesome when the training of the network is done ochip without taking these com
ponent variations into account Frye  It is however widely claimed that chipintheloop or on
chip learning can compensate to a considerable extent for these nonuniformities Card  This is
also intuitively clear because the use of the analog circuit in the forward pass incorporates the non
uniformities in the learning process  This has been conrmed by experimental results for example for
onchip learning in backpropagation networks Cairns	 Dolenko
  Their research indicates that
backpropagation learning can adapt to the nonuniformity of multiplier gains which are caused by fab
IDIAPRR   
rication inaccuracies  The occurrence of additive osets in the multiplications and especially in weight
adaptations do pose serious problems which are not easily overcome by onchip learning Dolenko
 
A possible solution is the use of some dedicated hardware in the weight adaptation circuitry which
enables osetcompensation Annema
 
  NonIdeal Response
Computations performed in hardware are approximations of the mathematical operations assumed to
be ideal in neural network models  This aects in particular the analog implementation of a linear
multiplication and the implementation of a nonlinear activation function like the widelyused stand
ard sigmoid  The use of a linear multiplier with a reasonable operating range leads to a large area
penalty in VLSI implementations  Therefore simple nonlinear multipliers are often preferable and are
used in both electronic Lont Hollis	 Reyneri
 and optical implementations Robinson
Neiberg	  The claims on the learning behaviour of a neural network with nonlinear multipliers are
rather contradictory  While in Cairns	 Dolenko
 the straightforward use of nonlinear multipli
ers in simulations of onchip learning in analog backpropagation networks leads to satisfactory results
in Lont the standard backpropagation algorithm fails to converge with nonlinear synapses  In
stead Lont proposes to incorporate nonlinear multipliers in the formulation of the backpropagation
rule which leads to good results  A disadvantage of this approach is that an accurate model of the
onchip multiplier is needed  This can be alleviated by chain rule perturbation learning Hollis	
which only performs a forward pass through a multilayer network and hence incorporates the hard
ware characteristics directly into the training  A solution sometimes applied in optical networks is the
use of an additional weight mask which complements and thereby compensates for the nonlinearities
in the multiplier Neiberg	 
Another problem for analog hardware is the requisite of an activation function that is similar to
the standard sigmoid  The incorporation of a model of a sigmoidlike hardware activation function
in the training algorithm can compensate for some inaccuracy Lont  This is another example
of the opportunism that often plays a role in the design of neural hardware search for the hidden
advantages of apparent drawbacks and try to exploit these instead of trying to approximate the
existing mathematical model as close as possible  Another approach is the use of a simplied activation
function for example the replacement of the Gaussian function in radial basis networks by a triangular
one Dogaru leading to a simplied hardware implementation Additional diculties arise when the
activation functions are implemented by optical hardware as for example liquid crystal light valves 
These optical activation functions are characterized among other nonidealities by a gain   that
diers greatly from the standard value of one as can be seen in gure  where a sigmoid with a gain of
approximately  is depicted Saxena
  While in analog electronics one can try to compensate
for a nonstandard gain by including a gain stage this is not possible in optical implementations 
In theory one could add additional optical components whose aim would be a modication of the
eective gain but this would increase the complexity and cost of the system as well as introduce
new side eects  A nice and simple way to solve this problem is by using an adapted backpropagation
learning rule that is based on a simple and precise relationship between the gain and two other network
parameters Thimm which compensates for a nonstandard gain without any additional hardware
and shows superior results Moerland
 
 HardwareFriendly Learning Algorithms
In this section a variety of learning algorithms that are well suited for hardware implementations of
neural networks are presented  These hardware friendly learning algorithms Moerland  can de
divided into two classes namely
  Adaptations of existing neural network learning rules that facilitate their hardware implement
ation 
IDIAPRR   
  N
or
m
al
ise
d 
U
ni
ts
 1.0
0.5
0
200 400 600 800 1000
Write Light Intensity ( 2µ W / cm )
Figure  Response curve of an LCLV 
 
 j
k ∆ Wjk  =   - η ∆Ε
Wjk
perturbation of Wjk: Wjk
Figure  A schematic of the weight perturb
ation algorithm 
  Learning algorithms that are by their very conception suitable for hardware implementation 
Here the emphasis will be on the rst of these two classes of hardwarefriendly learning algorithms 
An example of the second class are cellular neural networks which are of special interest for VLSI
implementation because of their sparse local connectivity every unit of the network is a simple analog
processor that interacts only with its neighbouring units see Chua for a survey  Another example
is the class of RAMbased networks which can be easily implemented with standard available com
ponents  A recent overview of RAMbased networks and related implementation aspects is given in
Austin	 
Various hardwarefriendlier alternatives have been proposed for several neural network learning rules
especially with the objective to enable on chip learning  The most signicant ones are discussed in
this section with an emphasis on hardwarefriendly alternatives of the backpropagation algorithm for
training multilayer neural networks 
 Perturbation Algorithms
The most popular algorithm for the training of multilayer networks is the backpropagation algorithm
see for example Rumelhart  However the realization of large backpropagation networks in analog
hardware poses serious problems because of the need for separate or bidirectional circuitry for the
backward pass of the algorithm Other problems are the need of an accurate derivative of the activation
function and the cascading of multipliers in the backward pass 
The general idea of perturbation algorithms is to obtain a direct estimate of the gradients by
a slight random perturbation of some network parameters using the forward pass of the network to
measure the resulting network error  Thus these onchip training techniques do not only eliminate the
complex backward pass but are also likely to be more robust to nonidealities occurring in hardware 
The two main variants of this class of algorithms are node perturbation which is based on the
perturbation of the input value of a neuron as for example the Madaline  rule Widrow and
weight perturbation see for example Jabri  The basic concepts of weight perturbation gure 
are easily explained by the observation that the gradient descent weight update can be approximated
by nite dierences W
jk
denotes the perturbation or change of W
jk

W
jk
   
E
W
jk
   
E
W
jk
 
The Madaline rule is based on an application of the chainrule that is standard in the derivation of
the backpropagation algorithm s
k
denotes the input to neuron k and  s
k
its perturbation
W
jk
   
E
W
jk
   
E
s
k

s
k
W
jk
   
E
 s
k
 a
j
 
IDIAPRR   
The main disadvantage of these perturbation algorithms is their sequential nature as opposed to the
weight update calculation in the backpropagation algorithm which can in principle be performed in
parallel  The main dierences between the Madaline rule and weight perturbation are the simpler
addressing and routing circuitry needed for the latter and the lower computational complexity of
the Madaline rule  As can be seen in table  weight perturbation also has a good performance
with limited precision weights Xie  Moreover it is more robust against nonidealities occurring in
analog hardware nonuniformity nonideal circuit response and noise Cairns	  The reason for this
is that in this algorithm modeling of activation functions and multipliers does not need to be done
since these form an integral part of the training algorithm  It is interesting to note that the derivation
of the Madaline rule does assume the multiplication to be linear which makes possible the reduction
of s
k
W
jk
to a
j
in equation  
The sequential nature of these simple perturbation algorithms has led to more intricate variants
which perform some of the calculations in parallel  A simultaneous perturbation of all weights is
a promising alternative Alspector Cauwenberghs even when for a reliable estimate of the
gradient the results of several perturbations should be averaged or a very small and accurate perturb
ation is required  Other variants use a semiparallel perturbation scheme like chain rule perturbation
Hollis	 fan out or fan in out perturbation Jabri	 and summed weight neuron perturbation
Flower  These semiparallel techniques perturb simultaneously all the weights feeding into or
leaving one neuron  An experimental comparison of these perturbation algorithms with an analog
multilayer perceptron chip Kakadu intheloop showed that the semiparallel techniques are best
suited for eective learning when the accuracy is low Jabri	  The faninout technique showed
the best generalization and training convergence results when the weights and weight updates were
quantized to  bits 
  Local Learning Algorithms
The implementation of a learning rule can be greatly simplied if it only uses information that is locally
available Palmieri  This feature minimizes the amount of wiring and communication  Since the
backpropagation algorithm is not local several local learning algorithms have been designed that
avoid a global backpropagation of error signals  An example is an antiHebbian learning algorithm
that is suitable for optical neural networks Psaltis  The weight updates in this algorithm depend
only on the input and output of that layer and one global error signal  Although it is not a steepest
descent rule it is still guaranteed that the weights are updated in the descent direction  Another local
learning rule has been developed in Brandt	 which uses only the rates of change of the outgoing
weights of a neuron  One of their algorithms is mathematically equivalent to the backpropagation
algorithm but the measurement of the rates of change of the weights could be hard to implement 
A promising approach is taken in the Alopex algorithm Venugopal Unnikrishnan	 which is a
stochastic algorithm based on the correlation between individual weight changes and changes in the
networks error measure  The main advantages of this approach are that the weights can be updated
synchronously and that no modeling of the multipliers and activation functions is needed 
 Networks with Heaviside Functions
The design of a compact digital neural network can be simplied considerably when Heaviside functions
are used as activation functions instead of a dierentiable sigmoidal activation function  While training
algorithms for perceptrons with Heaviside functions abound training multilayer networks with non
dierentiable Heaviside functions requires the development of new algorithms  One of the earliest
examples of such a learning rule is the Madaline  rule Widrow which is closely related to the
previously described Madaline  It is also based on a slight perturbation of the input to a neuron but
in this case the training error is minimized by investigating the eect of an inversion of the activation
value of a neuron  If this inversion reduces the Hamming error on the output neurons the incoming
weights of the inverted neuron are adapted with a perceptron training algorithm to reinforce this
IDIAPRR   
inversion 
There is also a large variety of constructive algorithmswhich gradually build a Heaviside network by
adding neurons and weights 

Smieja  The basis of these algorithms is often formed by a perceptron
algorithm that is used to adapt the weights into the freshly added neurons  Recently some digital
and mixed analogdigital architectures have been designed to be suitable for the implementation of a
range of these constructive algorithms Moreno	 
 Robustness
In section  several examples have already been given of the robustness of neural networks to hardware
nonidealities  Some research has also been devoted to the robustness of a network to unreliable neur
ons  This unreliability can consist of sign inversions of hidden neuron values Judd or destruction
of hidden neurons Kerlirzin
  While neural networks trained by standard learning algorithms are
not inherently faulttolerant the incorporation of the expected faults in the training phase leads to
remarkable improvements  An illustration of this fact is an adaptation of the backpropagation learning
rule that uses only a random subset of hidden neurons for each iteration  The trained network is far
more robust to the destruction of hidden neurons and shows performance comparable to the noiseless
case Kerlirzin
  This is closely related to the injection of random noise in the weight values during
the training of a multilayer neural network whose eects have been elaborately discussed by Murray
and Edwards Murray	  It is demonstrated both analytically and experimentally that this synaptic
noise improves the networks fault tolerance to weight damage generalization to unseen patterns and
training time  Similar results have been obtained when injecting additive noise into the weights of
recurrent neural networks Jim	 
 Other HardwareFriendly Neural Network Models
Although the majority of neural hardware is concerned with the implementation of multilayer net
works because of their wideranging applicability most other popular neural network models have
also been implemented in hardware  A few examples of the use of hardwarefriendly learning in self
organizing feature maps and recurrent networks are given here 
Selforganizing maps One of the requisites of a neural network hardware implementation is the
eective use of the processor resources  In general batch processing is an appropriate alternative to
obtain better parallelisation  Kohonens original algorithm however has both an online selection of
the neuron closest to the input pattern the winner neuron and an online weight update  Two pos
sible variants are to have a batch winner selection combined with either a batch or an online weight
update  In Vassilas
 the convergence properties of these two variants are shown to be comparable
with those of the original algorithm 
Recurrent networks Two widely used paradigms for training recurrent networks are Boltzmann
Machine learning and Mean Field Theory learning  The parallelism of a potential hardware implement
ation is seriously hampered by the required asynchronous update of the neurons  Therefore in both
analog Pujol	 and optical Peterson implementations a synchronous neuron update is used 
Another characteristic of the Boltzmann Machine is the use of simulated annealing to gradually in
crease the gain of a neurons activation function  In Bellcores implementation of a Boltzmann Machine
this annealing schedule has been replaced by a gradual decrease of additive noise Alspector while
the main idea of Mean Field Theory learning is to replace the annealing strategy by a deterministic
approximation 
 Summary and Conclusions
In this section an overview has been given of a variety of adaptations of neural network learning to
enable their successful hardware implementation  These problems can be as general as the eects of a
IDIAPRR   
quantization of the network parameters or those of the nonidealities of hardware components  Other
problems are more specic for a certain neural network model like the complications related to the
implementation of the backward pass of the standard backpropagation algorithm 
The eects of quantization on a range of neural network models have been outlined and weight
discretization algorithms have been reviewed  These estimations of the required accuracy for well
known learning algorithms and several of the weight discretization algorithms described are already in
use in some largescale hardware implementations  Designers of digital neurocomputers for example
prot from the fact that the required weight accuracy for backpropagation training is around 
bits Mauduit  An example of a successful implementation of a weight discretization algorithm is
Battitis TOTEMchip which uses a weight accuracy of 	 bits Battiti	 
Compared to the stateoftheart in digital neural network implementations the design of ana
log neural network implementations with nonidealities like component nonuniformity nonideal re
sponses and system noise is still in a more experimental state  Implementations have therefore been
limited to smallscale networks Leong
 and it is yet to be shown whether reliable large networks
can be realized in practice by analog techniques  An important step towards this goal could be the
possibility of onchip learning since it has been exemplied that neural network models are remark
ably robust to hardware nonidealities when these are incorporated in the training of the network 
The development of hardwarefriendly learning rules that form an alternative for algorithms which
are intricate to implement like the backpropagation algorithm is therefore essential  The ecacy of
perturbation algorithms illustrates the usefulness of this approach and the rst implementations using
these training algorithms are emerging Leong
 
References

Abramson S Abramson D Saad and E Marom Training a Neural Network with Ternary Weights Using the
CHIR Algorithm IEEE Transactions on Neural Networks vol  no  pp  November 

Ackley D H Ackley G E Hinton and T J Sejnowski A Learning Algorithm for BoltzmannMachines Cognitive
Science vol  pp  

Alspector J Alspector A Jayakumar and S Luma ExperimentalEvaluation of Learning in a Neural Microsystem
Advances in Neural Information Processing Systems  NIPS vol  pp  Morgan Kaufmann San
Mateo 

Alspector J Alspector R Meir B Yuhas and A Jayakumar A Parallel Gradient Descent Method for Learning
in Analog VLSI Neural Networks Advances in Neural Information Processing Systems  NIPS vol  pp
 Morgan Kaufmann San Mateo CA 

Annema A J Annema and H Wallinga Analog Weight Adaptation Hardware Neural Processing Letters vol 
no  pp  

Asanovic K Asanovic and N Morgan Experimental Determination of Precision Requirements for Back
Propagation Training of Articial Neural Networks Proceedings of the Second International Conference Mi
croNeuro pp  U Ramacher U Ruckert and J A Nossek eds Munchen Germany October 

Austin J Austin A Review of RAM Based Neural Networks Proceedings of the Fourth International Conference
on Microelectronics for Neural Networks and Fuzzy Systems pp  Turin Italy September  
ISBN 

Balzer W Balzer M Takahashi J Ohta K Kyuma Weight Quantization in Boltzmann Machines Neural Net
works vol  pp  

Battiti R Battiti and G Tecchiolli TOTEM A Digital Processor for Neural Networks and Reactive Tabu Search
Proceedings of the Fourth International Conference on Microelectronics for Neural Networks and Fuzzy Systems
pp  Turin Italy September   ISBN 

Battiti R Battiti and G Tecchiolli Training Neural Nets with the Reactive Tabu Search IEEE Transactions on
Neural Networks vol  no  pp  September 

Beiu V Beiu VLSI Complexity of Discrete Neural Networks Gordon and Breach New York 

Beiu V Beiu Direct Synthesis of Neural Networks Proceedings of the Fifth International Conference on Mi
croelectronics for Neural Networks and Fuzzy Systems pp  Lausanne Switzerland February 


Beiu V Beiu Entropy Bounds for Classication Algorithms Neural Network World vol  pp  
IDIAPRR   

Brandt R D Brandt and F Lin SupervisedLearning in Neural Networks withoutExplicit Error BackPropagation
Proceedings of the ThirtySecond Allerton Conference on Communication Control and Computing pp 
Monticello Illinois September  

Cairns G Cairns and L Tarassenko Learning with Analogue VLSI MLPs Proceedings of the Fourth International
Conference on Microelectronics for Neural Networks and Fuzzy Systems pp  Turin Italy September
  ISBN 

Campbell C Campbell and C Perez Vincente The Target Switch Algorithm A Constructive Learning Procedure
for FeedForward Neural Networks Neural Computation vol  no  pp  November 

Card H C Card and C R Schneider Analog CMOS Neural Circuits  In Situ Learning International Journal of
Neural Systems vol  no  pp  

Cauwenberghs G Cauwenberghs A Fast Stochastic ErrorDescent Algorithm for Supervised Learning and Optim
ization Advances in Neural Information Processing Systems  NIPS vol  pp  Morgan Kaufmann
San Mateo CA 

Chua L O Chua and L Yang Cellular Neural Networks Theory IEEE Transactions on Circuits and Systems
vol  pp  

Chua L O Chua and T Roska The CNN Paradigm IEEE Transactions on Circuits and SystemsI Fundamental
Theory and Applications volume  no  pp  March 

Coggins R Coggins and M Jabri Wattle A Trainable Gain Analogue VLSI Neural Network Advances in Neural
Information Processing Systems  NIPS	 vol  pp  Morgan Kaufman San Mateo CA 

Dogaru R Dogaru A T Murgan S Ortmann and M Glesner A Modied RBF Neural Network for Ecient
CurrentMode VLSI Implementation Proceedings of the Fifth International Conference on Microelectronics for
Neural Networks and Fuzzy Systems pp  Lausanne Switzerland February  

Dolenko B K Dolenko and H C Card Tolerance to Analog Hardware of OnChip Learning in Backpropagation
Networks IEEE Transactions on Neural Networks vol  no  pp  September 

Dundar G Dundar and K Rose The Eects of Quantization on Multilayer Neural Networks IEEE Transactions
on Neural Networks vol  no  pp  November 

Fahlman S E Fahlman and C Lebiere The CascadeCorrelation Learning Architecture Advances in Neural
Information Processing Systems  NIPS
 vol  pp  Morgan Kaufmann San Mateo CA 

Fiesler E Fiesler A Choudry and H J Cauleld Weight Discretization in Backward Error Propagation Neural
Networks Neural Networks special supplement with Abstracts of the First Annual INNS Meeting vol  p
 

Fiesler E Fiesler A Choudry and H J Cauleld A Weight Discretization paradigm for Optical Neural Networks
Proceedings of the International Congress on Optical Science and Engineering vol SPIE  pp 
SPIE Bellingham Washington  ISBN 

Flower B Flower and M Jabri Summed Weight Neuron Perturbation An ON Improvement over Weight Per
turbation Advances in Neural Information Processing Systems  NIPS vol   Morgan Kaufmann
San Mateo CA 

Frye R C Frye E A Rietman and C C Wong BackPropagation Learning and Nonidealities in Analog Neural
Network Hardware IEEE Transactions on Neural Networks vol  no  pp  January 

Fukushima K Fukushima Neocognitron A SelfOrganizing Neural Network Model for a Mechanism of Pattern
Recognition Unaected by Shift in Position Biological Cybernetics vol  pp  

Grossman T Grossman The CHIR Algorithm for Feedforward Networks with BinaryWeights Advances in Neural
Information Processing Systems  NIPS
 vol  pp  Morgan Kaufmann San Mateo CA 

Hendrich N Hendrich A Scalable Architecture for Binary Couplings Attractor Neural Networks Proceedings of
the Fifth International Conference on Microelectronics for Neural Networks and Fuzzy Systems pp 
Lausanne Switzerland February  IEEE Computer Society Press Los Alamitos CA USA 

Hoehfeld M H Hoehfeld and S Fahlman Learning with Limited Numerical Precision Using the Cascade
Correlation Algorithm IEEE Transactions on Neural Networks vol  no  July 

Holler M Holler S Tam H Castro and R Benson An Electrically Trainable Articial Neural Network ETANN
with  Floating Gate Synapses Proceedings of the International Joint Conference on Neural Networks
 IJCNN
 vol  pp  Washington DC 

Hollis P W Hollis and J J Paulos A Neural Network Learning Algorithm Tailored for VLSI Implementation
IEEE Transactions on Neural Networks vol  no  pp  September 

Holt J L Holt and JN Hwang Finite Error Precision Analysis of Neural Network Hardware Implementations
IEEE Transactions on Computers vol  no  pp  March 

Hopeld J J Hopeld Neural Networks and Physical Systems with Emergent Collective Computational Abilities
Proceedings of the National Academy of Sciences USA vol  no  pp  Washington DC April

IDIAPRR   

Jabri M Jabri and B Flower Weight Perturbation An Optimal Architecture and Learning Technique for Analog
VLSI Feedforward and Recurrent Multilayer Networks IEEE Transactions on Neural Networks vol  no 
pp  January 

Jabri Practical Performance and Credit Assignment Eciency of Analog MultiLayer Perceptron Perturbation
Based Training Algorithms SEDAL Technical Report  Systems Engineering and Design Automation
Laboratory Sydney University Electrical Engineering NSW  Australia 

Jim K Jim C L Giles and B G Horne Synaptic Noise in DynamicallyDriven Recurrent Neural Networks
Convergence and Generalization Technical report UMIACSTR  CSTR Institute for Advanced
Computer Studies University of Maryland College Park MD  USA May 

Johannet A Johannet L PersonnazG Dreyfus JD Gascuel andM Weinfeld Specicationand Implementation
of a Digital HopeldType AssociativeMemory with OnChip Training IEEE Transactions on Neural Networks
volume  number  pp  July 

Judd S Judd and P W Munro Nets with Unreliable Hidden Nodes Learn ErrorCorrecting Codes Advances in
Neural Information Processing Systems  NIPS vol  pp  Morgan Kaufmann San Mateo CA 

Kerlirzin P Kerlirzin and P Refregier Theoretical Investigation of the Robustness of Multilayer Perceptrons
Analysis of the Linear Case and Extension to Nonlinear Networks IEEE Transactions on Neural Networks vol
 no  pp  May 

Kohonen T Kohonen SelfOrganization and Associative Memory rd edition Springer Verlag Berlin 

Kohonen T Kohonen Things You Havent Heard about the SelfOrganizing Map Proceedings of the 	 IEEE
International Conference on Neural Networks vol  pp  San Francisco California March April
  ISBN 

Leong P H W Leong and M A Jabri A LowPower VLSI Arrhythmia Classier IEEE Transactions on Neural
Networks vol  no  pp  November 

Lont J Lont and W Guggenbuhl Analog CMOS Implementation of a Multilayer Perceptron with Nonlinear
Synapses IEEE Transactions on Neural Networks vol  no  pp  May 

Lyon R F Lyon and L S Yaeger OnLine HandPrinting Recognition with Neural Networks Proceedings of
the Fifth International Conference on Microelectronics for Neural Networks and Fuzzy Systems pp 
Lausanne Switzerland February  

Marchesi M Marchesi G Orlandi F Piazza and A Uncini Fast Neural Networks Without Multipliers IEEE
Transactions on Neural Networks vol  no  pp  January 

Mauduit N Mauduit M Duranton J Gobert and JA Sirat Lneuro  A Piece of Hardware LEGO for
Building Neural Network Systems IEEE Transactions on Neural Networks volume  number  pages 
May 

Moerland P Moerland E Fiesler and I Saxena The Eects of Optical Thresholding in Backpropagation Neural
Networks Proceedings of the International Conference on Articial Neural Networks  ICANN vol  pp
 Paris France October  

Moerland P Moerland and E Fiesler HardwareFriendlyLearningAlgorithms for Neural Networks An Overview
Proceedings of the Fifth International Conference on Microelectronics for Neural Networks and Fuzzy Systems
pp  Lausanne Switzerland February  

Moerland P Moerland E Fiesler and I Saxena Discrete AllPositive Multilayer Perceptrons for Optical Imple
mentations IDIAPRR  IDIAP Martigny Switzerland February  accepted for publication in Optical
Engineering

Moreno J M Moreno Arostegui VLSI Architectures for Evolutive Neural Models PhD Thesis Technical Uni
versity of Catalunya Department of Electronics Engineering Barcelona Spain 

Murray A F Murray and P J Edwards EnhancedMLP Performanceand Fault ToleranceResulting from Synaptic
Weight Noise During Training IEEE Transactions on Neural Networks vol  no  pp  September


Neiberg L Neiberg and D Casasent HighCapacity Neural Networks on Nonideal Hardware Applied Optics vol
 no  pp  

Palmieri F Palmieri J Zhu and C Chang AntiHebbian Learning in TopologicallyConstrained Linear Networks
A Tutorial IEEE Transactions on Neural Networks vol  no  pp  September 

Peterson C Peterson S Redeld J D Keeler and E Hartman An Optoelectronic Architecture for Multilayer
Learning in a Single Photorefractive Crystal Neural Computation vol  pp  

Piche S W Piche The Selection of Weight Accuracies for Madalines IEEE Transactions on Neural Networks
vol  no  p  March 

Protzel P W Protzel D L Palumbo and M K Arras Performance and FaultTolerance of Neural Networks for
Optimization IEEE Transactions on Neural Networks vol  no  pp  July 
IDIAPRR   

Psaltis D Psaltis and Y Qiao AdaptiveMultilayerOptical Networks InProgress in Optics editor E Wolf vol 
chapter  pp   Elsevier Science Publishers Amsterdam The Netherlands ISBN 

Pujol H Pujol O Klein E Belhaire and P GardaRA An AnalogNeurocomputer for the SynchronousBoltzmann
MachineProceedings of the Fourth International Conference on Microelectronics for Neural Networks and Fuzzy
Systems pp  Turin Italy September   ISBN 

Reyneri L M Reyneri and E Filippi An Analysis on the Performance of Silicon Implementations of Backpropaga
tion Algorithms for Articial Neural Networks IEEE Transactions on Computers vol  no  pp 
December 

Reyneri L M Reyneri A Performance Analysis of Pulse Stream Neural and Fuzzy Computing Systems IEEE
Transactions on Circuits and SystemsII Analog and Digital Signal Processing vol  no  pp 
October 

Robinson M G Robinson and K M Johnson Noise Analysis of PolarizationBased Optoelectronic Connectionist
Machines Applied Optics vol  no  pp  January 

Rueping S Rueping K Goser and U Rueckert A Chip for SelforganizingFeatureMaps Proceedings of the Fourth
International Conference on Microelectronics for Neural Networks and Fuzzy Systems pp  Turin Italy
September   ISBN 

Rumelhart D Rumelhart G Hinton and R Williams Learning Internal Representations by Error Propagation
Parallel Distributed Processing Explorations in the Microstructure of Cognition vol  Foundations pp 
 MIT Press Cambridge Massachusetts  ISBN 

Sackinger E Sackinger BE Boser J Bromley Y LeCun and L D Jackel Application of the ANNA Neural
Network Chip to HighSpeed Character Recognition IEEE Transactions on Neural Networks vol  no  pp
 May 

Sakaue S Sakaue T Kohda H Yamamoto S Maruno and Y Shimeki Reduction of Required Precision Bits for
Backpropagation Applied to Pattern Recognition IEEE Transactions on Neural Networks vol  no  March


Saxena I Saxena and E Fiesler Adaptive Multilayer Optical Neural Network with Optical Thresholding Optical
Engineering ISSN  special on optics in Switzerland P Rastogi editor vol   no  pp 
August 

Simard P Y Simard and H P Graf Backpropagation without Multiplication Advances in Neural Information
Processing Systems  NIPS	 J D Cowan G Tesauro and J Alspector eds vol  pp  Morgan
Kaufman San Mateo CA 



Smieja F J

Smieja Neural Network Constructive Algorithms Trading Generalization for Learning Eciency 
Circuits Systems and Signal Processing vol  no  pp  

Takahashi M Takahashi M Oita S Tai K Kojima and K Kyuma A Quantized Back Propagation Learning
Rule and its Application to Optical Neural Networks Optical Computing and Processing The Science and
Technology of Optics in Computing Communications Switching and Information Processing vol  no  pp
 

Tang C Z Tang and H K Kwan Multilayer Feedforward Neural Networks with Single PowerofTwo Weights
IEEE Transactions on Signal Processing vol  no  pp  August 

Thimm G Thimm P Moerland and E Fiesler The Interchangeability of Learning Rate and Gain in Backpropaga
tion Neural Networks Neural Computation vol  no  pp  February 

Thiran  P Thiran V Peiris P Heim and B Hochet Quantization Eects in Digitally Behaving Circuit Im
plementations of Kohonen Networks IEEE Transactions on Neural Networks vol  no  pp  May


Unnikrishnan K P Unnikrishnan and K P Venugopal Alopex A CorrelationBased Learning Algorithm for
Feedforward and Recurrent Neural Networks Neural Computation vol  no  pp  May 

Vassilas N Vassilas P Thiran and P Ienne How to Modify Kohonens SelfOrganizing Feature Maps for An E
cient Digital Parallel ImplementationProceeding of the International Conference on Articial Neural Networks
Cambridge June  

Venugopal K P Venugopal and A S Pandya Alopex Algorithm for Training Multilayer Neural Networks Pro
ceedings of the International Joint Conference on Neural Networks  IJCNN Singapore November 

Verleysen M Verleysen B Sirletti A Vandemeulebroecke and P G A Jespers A HighStorageCapacity Content
Addressable Memory and Its Learning Algorithm IEEE Transactions on Circuits and Systems vol  no 
pp  May 

White B A White and M I Elmasry The DigiNeocognitron A Digital Neocognitron Neural Network Model for
VLSI IEEE Transactions on Neural Networks vol  no  pp  January 

Widrow B Widrow and M A Lehr  Years of Adaptive Neural Networks Perceptron Madaline and Back
propagation Proceedings of the IEEE vol  no   September 

Xie Y Xie and M A Jabri Training Limited Precision Feedforward Neural Networks Proceedings of the 	rd
Australian Conference on Neural Networks pp  
