Optimal architectures for designing multiplierless multilayer feedforward neural networks suitable for various combinations of discrete/continuous input to discrete/continuous output mappings are introduced. For illustration, the details of one design for discrete input to discrete output mapping and one design for continuous input to continuous output mapping are presented. For discrete input to discrete output mapping, the network has continuous weights in the first layer and one-powers-of-two weights at the second and higher layers. The simplified sigmoid activation functions are used at the hidden neurons and the step activation functions are used at the output neurons. Consequently, no multiplier is needed for discrete input to discrete output mapping. For continuous input to continuous output mapping, the network adopts the simplified sigmoid activation functions at the output neurons, continuous weights in the second and higher layers, 3-level activation functions (of ±1 and 0 outputs, Quantized Neurons with S=0) at the hidden neurons, and one-powers-of-two weights in the input layer. Consequently, no multiplier is required for continuous input to continuous output mapping. Results show that each of these multiplierless multilayer neural networks can retain nearly identical recall performance of the corresponding continuous-weight network, while having reduced hardware cost and increased computational speed in digital implementation.
Introduction
Multilayer feedforward neural networks (MFNNs) are widely used for building artificial neural network systems. A multilayer feedforward neural network (MFNN) has a parallel structure that involves massive computational operations. In an MFNN, the basic operations involve taking the sum of weighted inputs, and then computing the activation through a nonlinear function. Nonlinear activation function computation and multiplication are the two most expensive items in the digital implementation of a MFNN.
The most popular nonlinear activation function that has been used in MFNNs is the sigmoid activation function (SAF). In [1] , it has been shown that a SAF can be implemented by a simplified sigmoid activation function (SSAF) with one binary multiplication, one binary shift, one binary addition, and one comparator. This provides a simple way for digital implementation of the SAF especially when the sum of weighted inputs (as one of the binary multiplication inputs) can be adequately represented in a short binary word length format.
Assuming that M and N are, respectively, the word lengths in bits for representing the weight and the activation function output. A MFNN contains many multiplications and each M-bit x N-bit multiplication has a computational complexity proportional to MxN. The two inputs that are to be multiplied by a multiplier inside a MFNN are a M-bit weight and a N-bit activation function output. To simplify the MxN multiplication, we can reduce the word length requirement of either the weight or the activation function output by adopting a reduced word length representation format. This idea leads to two approaches for designing digital multiplierless multilayer feedforward neural networks (MMFNNs) such that multiplications can be eliminated. Before describing further, we consider the one-powers-of-two (OPOT) digital format, in which a value is represented in one of the following allowed discrete values [±1, ±2 -1 , ..., ±2 -S , 0]. This OPOT format is especially suitable for reduced word length digital representation because a multiplication can now be reduced to a binary shifting operation that can be directly implemented by a shift register.
The first approach to eliminate multiplications in a MFNN is to represent each weight in an OPOT format such that no multiplication is needed. We shall call this as the OPOT approach. OPOT weight designs have been presented in [2] [3] [4] [5] for discrete (represented in binary [0, 1] or bipolar [-1,+1]) input to discrete output mapping application. In [2] [3] , OPOT weights are used to quantize continuous weights, OPOT weights and slopes of activation functions are then adjusted adaptively to reduce the sum of squared output errors to a specified limit for discrete input to discrete output mapping. In [4] [5] , in addition to OPOT weights, the simplified sigmoid activation functions (SSAFs) are used at the outputs of hidden and output neurons, OPOT weights and biases are then adjusted to compensate the weights' quantization errors. Adapting biases [4] [5] is a much simpler implementation strategy than that of adapting slopes of sigmoid activation functions (SAFs) [2] [3] .
The second approach to eliminate multiplications in a MFNN is to represent each activation function by a quantized activation function with OPOT output. We shall called this OPOT quantized activation function and its associated neuron as a quantized neuron (QN) [6] [7] . We shall call this as the QN approach. In so doing, a multiplier, once again, is now reduced to a simple shift register. In [6] [7] , a method for designing MMFNNs using the quantized neurons (QNs) at the hidden layer and continuous weights for discrete input to discrete output mapping has been presented.
In real-world applications, the input and output formats can be continuous (represented in fixed-point or floating-point) or discrete (represented in binary or bipolar). By utilizing the properties of the given input to output format of an application, digital arithmetic can be designed in an optimal manner so as to reduce the cost of hardware implementation as well as to increase the speed of computation. So far, the previous two approaches have been focused on multiplierless designs that may not yet fully utilized the properties of some given input to out formats. In this chapter, we are going to describe a third approach which makes use of all possible combinations of OPOT weights, QNs, and a given input to output format to arrive at an optimal architecture for any discrete/continuous input and discrete/continuous output mapping application. We shall call this approach as the Mixed OPOT or Mixed QN approach. Individual design details of the third approach can be found in [8] [9] [10] [11] . In [8] , an optimal design for MMFNNs tailored to discrete input to discrete output type of applications has been advanced and shall be described in Section 13-5. Instead of adopting continuous weights in the first layer and OPOT weights in the second layer as in [8] , a study on the adoption of uniformly quantized sum-of-powers-of-two (SOPT) weights in both layers and more choices of uniformly quantized SOPT weights are given in the first layer is presented in [9] . In [10] [11] , an optimal MMFNN architecture is introduced for continuous input to continuous output mapping and the details shall be described in Section 13-6. In this chapter, our focus is on optimal MMFNN architectures, in which one of the two inputs to be multiplied is represented in OPOT format, that make use of a specified input to output mapping format.
There are a number of other design techniques for reducing the hardware complexity in the digital implementation of artificial neural networks. A design of recurrent neural networks with [-1, 0, +1] weights, which is a special case of OPOT weights with S=0 (see 13.16), for word sequence learning is described in [12] [13] . A multiplierless one-layer feedforward neural network for maximum or minimum computation is described in [14] . In [15] , weight values of multiplayer perceptrons are restricted to powers-of-two or sum of powers-of-two, thus saving chip area and computation time. The design was applied to pattern recognition problems with a binary input to binary output format. In [16] [17] , integer weights in the set [-3, -2, -1, 0, 1, 2, 3] are used in multilayer feedforward neural networks for applications to XOR, encoder/decoder, and MONK type of binary input to binary output problems. The results obtained in [17] also suggest that, in many cases, limited weight resolution can be offset by an increase in the size of its hidden layer. In [18] [19] , hard-limiter activation functions, integer weights, and integer thresholds are used to facilitate the actual hardware implementation of multilayer binary neural networks. The design was applied to the circular region, a 4-bit odd-parity function, and a 7-bit random function type of binary input to binary output problems. In [20] , the VLSI implementation of a multilayer neural network with ternary activation functions and limited integer weights is described, and multiplexing is used to solve I/O limitation. The design was applied to a handwritten digital recognition problem. In [21] , multiplierless designs for the Hopfield network and the multi-layer perceptron are presented. In the designs, one-to-four shifts and zero-to-three additions type of operations are used to replace a multiplication. In addition, sign-digit representation of weights is adopted which leads to a minimal number of non-zero digitals needed for the operations. The multiplierless Hopfield network design was applied to estimate motion between successive frames of digital video, and the multiplierless multiplayer perceptron design was applied to a binary input to binary output mapping problem.
The organization of this chapter is as follows: The backpropagation-based adaptive algorithms for the weights and for the biases of a MFNN are summarized in Section 13.2. The equations defining the SSAF and the three-level activation function (3-LAF) are given in Section 13.3. Section 13.4 summarizes the optimal MMFNN architectures for various combination of discrete/continuous input to discrete/continuous output mappings. An optimal MMFNN architecture for discrete input to discrete output mapping is presented in Section 13.5. In Section 13.5, we also include the design strategy, the design algorithm, the simulation results, and the hardware comparison of the design. In Section 13.6, the design of an optimal MMFNN architecture for continuous input to continuous output mapping is presented. Similarly, in Section 13.6, we also include the design strategy, the design algorithm, the simulation results, and the hardware comparison of the design. Finally, a conclusion is given in Section 13.7.
Weight and Bias Adaptation

A. Weight Adaptation
In an MFNN, the activation of the jth neuron at the hth layer during the presentation of the kth input pattern X k = [x1k, x2k, x3k, …, xN0k] can be computed as:
for j=1 to N h , h=1 to H, and k=1 to K. y ik [h-1] is the output of a neuron i at the layer h-1; w ij [h] is the weight between a neuron i at layer h-1 and a neuron j at layer h;
is the bias of a neuron j at layer h; N h is the number of neurons at layer h; F [.] represents the activation function; and
The sum of the squared output errors (SSE) related to the input pattern k is defined as:
where t jk represents the jth element of the target pattern k and h=H refers to the output layer. Based on gradient descent, the change in weights [22] [23] can be expressed as:
where ε is a learning rate parameter for weights. After some derivations, we obtain 
B. Bias Adaptation
Similarly, based on gradient descent, the bias of a neuron can also be adapted [22] [23] as
where ε b is a step size for bias adaptation. We can show that
Both weights and biases are updated at the end of each epoch.
Activation Functions
A. Simplified Sigmoid Activation Function
A type of activation function commonly used in MFNNs is the SAF in which its bipolar version is of the following form
where g is a scaling factor. The above expression involves an infinite exponential series and traditionally it can be implemented by a look-up-table (LUT). Recently, a simplified version of the above sigmoid activation function is described in [1] with an approximate order of complexity of one binary multiplication, which provides an attractive method for digital implementation of a SAF. This SSAF, saturates to ±1 at and beyond points ±L, can be expressed as
and Θ=1/L 2 , ß=2/L, and L is an arbitrary positive number determining the saturation points. If an OPOT value of L is used, the above defined activation function G s (z) can be implemented by one binary multiplication, one binary shift, one binary addition, and one comparator.
To prevent the learning process from being stuck at either saturation region, a small positive value σ is assigned to the derivative of G s (z) at and beyond ±L, hence
where G' s (z) and H' s (z) represent, respectively, the partial derivatives of G s (z) and H s (z) with respect to z.
B. Three-level Activation Function
The 3-LAF applied at the output of each hidden neuron is defined as (13.14) where t 3 (=0.33) is the threshold value. Before training, the three intersections of G 3 (z) with the scaled bipolar SAF (at g=1.1 in (13.10)) were found. The derivatives of G 3 (z) at these three intersections of z were used, respectively, as the approximate derivatives of G 3 (z) in the three regions of (13.14) during training. 
Optimal Architectures for MMFNN Designs
As briefly introduced in Section 13.1, there are a total of three approaches to arrive at optimal architectures for MMFNN designs. The first approach is the OPOT approach that adopts only OPOT weights (WOPOT) and SSAFs for continuous input to continuous output mapping. The second approach is the QN approach that adopts QNs (except StepAFs at the output layer) and continuous weights (WC) for discrete input to discrete output mapping. The third approach is the Mixed approach, which can be applied to various combinations of discrete/continuous input to discrete/continuous output mappings. The Mixed approach can be subdivided into the Mixed OPOT approach and the Mixed QN approach. In the Mixed OPOT approach, WC have to be used as weights in the input layer for discrete input, and StepAFs have to be used at the output layer for discrete output. In the Mixed QN approach, OPOT weights (WOPOT) have to be used in the input layer for continuous input, and SSAFs have to be used in the output layer for continuous output. 
Input format Discrete Continuous
For illustration, these optimal architectures are summarized in Tables  13-1 
A. Design Strategy
We shall illustrate the design strategy for the case of a 2-layer feedforward neural network (2FNN) for an application on numerals 0 to 9 recognition problem with bipolar (±1) input to bipolar (±1) output mapping format (see Fig. 13-2) . For bipolar inputs, continuous-valued weights can be used in the first layer to allow maximum flexibility for adaptation while without requiring any multiplication.
To reduce the implementation cost of bipolar SAFs, bipolar SSAFs are used at the output of all the hidden neurons. The outputs of the hidden layer are represented in a continuous format. In order to eliminate multiplications, OPOT weights are used in the second layer such that a multiplication is reduced to just shifting of binary bits. To facilitate a smooth adaptation in learning, bipolar SAFs are used at the outputs of output neurons for learning. After learning, bipolar step activation functions (StepAFs) are used at the outputs of the output neurons to reduce the hardware complexity. Based on the given bipolar input to bipolar output format, the above design strategy optimizes the arithmetic operations such that no multiplication is required in the hardware implementation. Moreover, the inexpensive bipolar SSAFs and the simple bipolar StepAFs are used to replace the expensive bipolar SAFs at the outputs of the respective hidden and output layers. The concept of the design can be extended and generalized to other discrete/continuous input to discrete/continuous output formats, as well as to any M-layer feedforward neural networks with M greater than 2. 
B. Design Algorithm
The objective of the algorithm is to design an optimal multiplierless multilayer feedforward neural network (MMFNN) suitable for bipolar input to bipolar output mapping. The detailed design algorithm is given as follows:
Step 1: Prepare a set of random weights and zero biases, with bipolar SAFs at the output layer and bipolar SSAFs (with a OPOT L) at the hidden layer.
Step 2: Starting with the latest weights and zero biases, adaptively train the weights of the network using the backpropagation algorithm without adjusting biases until the total sum of squared output errors (TSSE) of all patterns K is
where E is a prespecified error level. The network obtained at this point is denoted as Network 1.
Step 3: Find the maximum absolute value w max [2] among the weights in the output layer and normalize these weights by w max [2] .
Step 4: Adjust the parameter α of the sigmoid activation functions applied at the output neurons as αw max [2] .
Step 5: Quantize all the weights in the output layer to their nearest OPOT values chosen from the following set: (13.16) where S determines the number of quantization levels. The quantization curve when S=4 is depicted in Fig. 13-1. Step 6: Calculate the TSSE. If (13.15) is not satisfied, proceed to step 7; otherwise, go to step 8.
Step 7: Adapt all continuous weights of the first layer and the biases of neurons at both layers using the backpropagation algorithm until either (13.15) is satisfied or convergence is reached in which no further improvement in TSSE can be obtained.
Step 8: Find the maximum absolute value w max [1] among the weights in the first layer, and set w max [1] =2 p [1] , where 2 p [1] is the smallest OPOT value greater than or equal to w max [1] .
Step 9: Normalize the weights in the first layer by 2 p [1] and set parameters ß and Θ of the SSAFs applied at hidden neurons as ß=2 p [1] ß and Θ=2 2p [1] Θ, respectively, such that they remain in OPOT format.
Step 10: Replace the SAFs at the output layer by the StepAFs. Then stop and denote the network obtained here as Network 2. 
C. Simulation Results
Simulations have been conducted to verify the proposed design algorithm. The input patterns used in training were 10 numerals as given in Appendix 13-1, each represented by 10x10 bipolar binary codes. Corresponding desired output patterns were 4-bit bipolar binary codes given below each input pattern. The network had 100 inputs, 4 outputs, and one hidden layer with various numbers of neurons. After training, 100 noisy versions of each of the 10 input patterns, totally 1000, were presented to test the recall accuracy of the network obtained. A noisy pattern was constructed by inverting randomly a specified percentage (it was 5% in this simulation) of the elements of the original pattern. The recall accuracy was obtained by taking the average over all 1000 testing patterns. Simulation results are summarized in Tables 13-5 and 13-6. All data given in these tables were averages of five designs, starting with different initial random weights uniformly distributed in [-0.1, +0.1]. For the purpose of comparison, the results of the corresponding continuous-weight MFNN (CMFNN), which had the same topology but continuous weights and SAFs at both layers, were also obtained and included in these tables. The total number of epochs under the MMFNN is the sum of epochs required to obtain both Network 1 and Network 2. In order to get faster convergence, ±0.9 instead of ±1 were used for target patterns. The remaining parameters used for simulations were: ε=0.01, ε b =0.1, E=0.01, α=2, L=2, σ=10
. Based on results obtained in Tables 13-5 and 13-6, we can see that convergence was always reached in the training of the MMFNN and there was only slight degradation in the recall performance of the MMFNN as compared to the CMFNN.
D. Hardware Comparison
A comparison of the digital hardware requirements of the CMFNN and the MMFNN, both with one hidden layer of N 1 hidden neurons, N 0 inputs, and N 2 outputs (with StepAFS at outputs), is summarized in Table 13 
A. Design Strategy
We shall illustrate the design strategy for the case of a 2FNN for an application on continuous input to continuous output mapping. For continuous inputs, OPOT weights are used in the first layer so that no multiplication is required. To reduce the implementation cost of bipolar SAFs, 3-level activation functions (3-LAFs) (of ±1 and 0 outputs, a special case of QNs with S=0) are used at the output of all the hidden neurons. The outputs of the hidden layer are represented in the 3-level discrete format. In order to allow a maximum flexibility for learning adaptation, continuous weights can be used in the output layer while without requiring any multiplication. Bipolar SSAFs are used at the outputs of output neurons. Based on the given continuous input to continuous output format, the above design strategy optimizes the arithmetic operations such that no expensive multiplication is required in the hardware implementation. The simple 3-LAFs and the inexpensive bipolar SSAFs are used to replace the expensive bipolar SAFs at the outputs of the respective hidden and output layers. The concept of the design can be extended and generalized to other discrete/continuous input to discrete/continuous output formats, as well as to any M-layer feedforward neural networks with M greater than 2. 
B. Design Algorithm
Step 1: Prepare a set of random weights and zero biases.
Step 2: Starting with the latest weights and zero biases, train the network using the backpropagation algorithm with the SSAFs at the output neurons and the 3-LAFs at the hidden neurons. Update the weights without adjusting biases until the TSSE becomes less than a prespecified error level E. The obtained network is denoted as Network 1.
Step 3: Find the maximum absolute value w max among all the weights in the first layer and normalize these weights by w max .
Step 4: Scale the threshold value t 3 of all the 3-LAFs applied to hidden neurons as t 3 /w max .
Step 5: Quantize those normalized weights in the first layer to their nearest OPOT values from the set of {±1, ±2 Step 7: Re-adapt all continuous weights in the second layer using the backpropagation algorithm.
Step 8: Adapt the biases of neurons at both layers.
Step 9: Go to Step 6.
C. Simulation Results
Simulation results are summarized in Tables 13-8 , were used for training and recall. Each vector set consists of 10 vectors and each vector consists of 25 continuous real elements, which are generated by using a method described in [24] . These ten continuous input to continuous output pattern-pairs are plotted in Appendix 13-3. The network was used as a pattern associator, which had 25 inputs, 25 outputs, and one hidden layer with a variable number of neurons. After training, 100 noisy versions of each of the 10 input vectors were presented to the network to test the recall performance. The noisy vectors were constructed by adding random noise within the interval of ±R to each element of each input vector. R represents a percentage of the maximum element value among all the 10 input vectors. In the simulations presented here, R was 10% or 20%. The output vector was identified based on its cross correlations with all ideal output vectors. The ideal output vector with maximum cross correlation was selected as the recall vector. For comparison, the simulation results of the corresponding CMFNN, which had the same topology but continuous weights and bipolar SAFs at both layers, were also obtained. The values summarized in Tables 13-8 and 13-9 represent the average of five designs, starting with different initial random weights uniformly distributed within ±0.1. The learning rate parameter for weights was ε=0.01, the step size for bias adjustment was ε b =0.01, and other parameters used were δ=0.01, α=2, D=2, M=4, and E 0 =10 -6 . It can be seen that the proposed MFNNs with SSAFS, 3-LAFs, and OPOT weights have a similar recall performance as the original MFNNs with SAFs and continuous weights at a cost of additional training epochs.
All simulations were written in Fortran language and run using a 486-33MHz PC: (a) when the number of hidden neurons was 10, the average training times of the CMFNN, the Network 1, and the Network 2 were, respectively, 0.211, 0.215, and 0.172 CPU seconds per epoch, and the average recall times of 1000 noisy input vectors of the CMFNN and the MMFNN were respectively, 5.54 and 3.35 CPU seconds; and (b) when the number of hidden neurons was 30, the corresponding average training times were, respectively, 0.431, 0.479, and 0.334 CPU seconds per epoch, and the corresponding average recall times were respectively, 12.20 and 5.98 CPU seconds. 
D. Hardware Comparison
A comparison of the digital hardware requirements of the CMFNN and the MMFNN, both with one hidden layer of M hidden neurons, N inputs, N output neurons, is summarized in Table 13 -10. The most expensive hardware item is multipliers in which the CMFNN requires a total number of 2xNxM whereas the MMFNN requires only NxM OPOT shift registers. Assuming b-bit implementation (where one multiplier is approximately equivalent to b adders), the number of hidden neurons in the MMFNN can be up to about '2b' times that of the CMFNN for a similar level of hardware cost. The hardware cost of the 'N' SSAF [1] and the 'M' 3-LAF is expected to be lower than that of 'N+M' SAF. In Table 13 
Concluding Remarks
Optimal MMFNN architectures suitable for discrete/continuous input to discrete/continuous output mappings, as illustrated by the above two cases of multiplierless 2-layer feedforward neural networks, have been presented. The design strategies utilize a combination of OPOT shift operations to replace multiplications, SSAFs to replace SAFs, as well as the specified input to output format to arrive at optimal design architectures. A designed MMFNN can retain nearly identical recall performance of the corresponding MFNN with continuous weights and SAFs, while having increased computational speed in applications and reduced cost in digital hardware implementation. In general, the same principle of optimal MMFNN architectures illustrated can also be applied to other types of artificial neural networks. Some systolic implementation methods for neural networks can be found in [25] [26] [27] .
