Abstract-An algorithm for compact neural-network hardware implementation is presented, which exploits the special properties of the Boolean functions describing the operation of artificial neurons with step activation function. The algorithm contains three steps: artificial-neural-network (ANN) mathematical model digitization, conversion of the digitized model into a logic-gate structure, and hardware optimization by elimination of redundant logic gates. A set of C++ programs automates algorithm implementation, generating an optimized very high speed integrated circuit hardware description language code. This strategy bridges the gap between ANN design software and hardware design packages (Xilinx). Although the method is directly applicable only to neurons with step activation functions, it can be extended to sigmoidal functions.
I. INTRODUCTION
According to a European Network of Excellence report [1] , the future implementation of hardware neural networks is shaped in three ways: 1) by developing advanced techniques for mapping neural networks onto field-programmable gate array (FPGA); 2) by developing innovative learning algorithms which are hardware realizable [2] ; and 3) by defining high-level descriptions of the neural algorithms in an industry standard to allow full simulations and fabrication and to produce demonstrators of the technology for industry. Such new designs will be of use to industry if the cost of adopting them is sufficiently low. Hardware-based neural networks are important to industry as they offer low power consumption and small size compared with PC software and they can be embedded in a wide range of systems. Software libraries exist for traditional artificial-neuralnetwork (ANN) models (Matlab). However, the industry-standard form is very high speed integrated circuit hardware description language (VHDL) or C++ parameterized modular code, allowing customization.
A range of research papers on ANN-based controllers were published over the last decade [3] , [4] . Some recent publications [5] - [8] consider the FPGA as an effective implementation solution of control algorithms for industrial applications. Hardware-implemented ANNs have an important advantage over computer-simulated ones by fully exploiting the parallel operation of the neurons, thereby achieving high speed of information processing [9] . Some very large scale integration algorithms achieve efficient implementation by using a combination of AND gates, OR gates, and threshold gates (TGs) [10] . This method The algorithm presented in this letter is applicable to both application-specific-integrated-circuit and FPGA implementations of ANNs composed of neurons with step activation functions [10] . Each neuron is treated as a Boolean function, and it is implemented separately, thus minimizing implementation complexity. The most useful property of such a Boolean function is that, if its truth table is constructed as a matrix with as many dimensions as neuron inputs, then the truth table has only one large group of "1s" and one large group of "0s." The solid group of 1s is not visible when the Gray codification is used, and thus, classical Quine-McClusky algorithms or Karnaugh maps cannot be efficiently used. Our algorithm uses a different approach and generates a multilayer pyramidal hardware structure, where layers of AND gates alternate with layers of OR gates. The bottom layer consists of incomplete NOT gates, a structure to be optimized later by eliminating redundant logic-gate groups. However, the method is effective only when the numbers of inputs and bits on each input are low; otherwise, a classical circuit may be more efficient.
II. IMPLEMENTATION ALGORITHM
Each neuron of the ANN is first converted into a binary equivalent neuron whose inputs are only 1 and 0, in a two-step process. Subsequently, the binary neuron model is iteratively transformed into a logic-gate structure.
A. Digitization of One Neuron Mathematical Model
The binary codification used for neuron inputs is the "two's complement," which is generally used to represent integers, but it can be adapted for real values in the interval [−1, +1). Thus, considering an n-bit
The largest positive number, which can be represented on "n" bits, is 2 n−1 − 1, and −2 n−1 is the smallest. Real values between −1.0 and +1.0 can be represented by dividing the corresponding integer value I n by 2 n−1 . Thus, (2) illustrates the complementary code for real numbers
The analog neuron model is transformed, in two steps, into an appropriate digital model. At each stage, the input weights and the threshold levels of the initial NN are altered carefully, keeping the neuron functionality. This can be achieved by keeping constant the sign of the argument of the activation function
0278-0046/$26.00 © 2010 IEEE However, for mathematical simplicity, a more restrictive condition is used instead: Argument "net − t" of the activation function is kept constant itself rather than only its sign
Conversion Stage One: The first step transforms the analog inputs of the neurons into digital inputs expressed as groups of n b bits. This process is associated with transforming each analog neuron input into an equivalent group of n b binary inputs. The task is achieved by splitting each input defined by its initial weight w ij into n b subinputs, whose weights w ijp (p = 0, 1, . . . , n b − 1) are calculated as follows [11] :
The superscripts "(1)" and "(2)" refer to the respective conversion stage. The initial "m" inputs are turned into "m" input clusters, each containing "n b " subinputs ( Fig. 1) . The symbol "w ij " stands for the weight number "j" of the neuron "i" in the network, while "w (1) ijp " represents the weight of subinput "p" in cluster "j" pertaining to neuron "i." According to the previous considerations, only those neuron parameter changes that maintain the argument "net i − t i " of the activation function constant are allowed. The argument after the first conversion stage is calculated as
where x 
The expression in brackets relates to the complementary code definition given in (2) . Then, (6) becomes
where x j is an analog input value of the initial neuron. This meets the condition expressed by (4) . Thus, the codification style based on complementary code has been introduced, and the required parameter modifications have been performed, without changing the neuron's behavior.
Conversion Stage Two: The second conversion stage aims to replace the neurons with negative weights resulting from the first stage with equivalent ones, having only positive weights, by using only the module of their values: w (2) ijp = |w (1) ijp |. This means that supplementary parameter alterations are required in order to counteract the neuron behavior alteration caused by changing the sign of some input weights. A simple solution is to reverse the value of the affected input bits. The modification can be implemented into hardware with NOT logic gates. The relationship between the input bits x (2) ijp and those at stage one (x
These two alternatives can be compressed into
The transfer function argument is calculated as [11] net
The arguments of the activation function before and after the second conversion stage have to be equal
Therefore, the threshold level of the stage-two neurons is
The stage-one neuron parameters in (13) depend on the initial parameters of the analog neuron as described by (5) . Consequently, substituting (5) in (13)
This expression can be successively transformed [11] into Therefore, the neuron parameters after stage two can be calculated as a function of initial analog neuron parameters
B. Binary Neuron Implementation and Optimization
The ANN implementation into a hardware structure is done separately for each neuron and requires, at first, that the input weights w A . An iterative conversion procedure is used to analyze the input weights and to generate the logic-gate implementation netlist description. At each step, a larger neuron is split into subneurons. Some of them can be implemented with only a few AND and OR logic gates, while the rest are further decomposed into simpler subneurons until all have been implemented. Several important concepts and definitions are presented in [12] , along with the step-by-step iterative implementation procedure, which ends by adding inverters to those inputs corresponding to the initial negative weights at stage one of neural model digitization. The hardware-implementation netlist obtained has redundancies both inside each neuron and across different neurons. Most are eliminated using a simple procedure: The file is repeatedly analyzed, and when the same type of logic gates are found, of same input signals, all but one are removed from the netlist, and interconnections are updated; the cycle ends when no gates can be removed.
C. Neuron Implementation Example
The sample in Fig. 2 shows a neuron with 12 input weights and a positive threshold level. The weights are sorted in descending order, and a recursive implementation starts. The first three weights are larger than the threshold; therefore, inputs 4, 7, and 1 will drive an OR gate along with the subneurons built using the other subgroups [11] .
D. Automated Implementation Method
The algorithm was automated using C++ programs that generate a netlist description of the circuit, optimize it, and then generate the VHDL code. In terms of the software, there is no limitation of the ANN size. The characteristics of the ANN are introduced in the C++ program as a matrix text file (.csv format). A feedforward ANN with three subnetworks generating the pulsewidth modulation (PWM) switching pattern for an inverter was designed [9] using this method (Fig. 3) . In contrast with training algorithms, constructive ones determine both the network architecture and the neuron weights and are guaranteed to converge in finite time. The numerical values of all neuron weights and thresholds were calculated [11] using a geometric constructive solution known as Voronoi diagrams [7] . For this paper, the complex plane is divided into triangular Voronoi cells. The master program allows user control over main parameters: 1) number of Voronoi cells; 2) number of sectors dividing the 360
• interval for argument analysis; 3) number of bits used to code the components of the two complex inputs; and 4) maximum fan-in for the VHDL logicgate model. The desired performance/complexity ratio is adopted. In this case, 5 b to code each component of the two complex inputs gives enough precision (delays less than 100 ns), resulting in a total number of logic gates of 1329 on 14 + 6 = 20 layers [10] , which fits Xilinx XC4010XL FPGA.
When the number of inputs and bits on each input is low (precision appropriate for drives), this method is more effective than a classical digital circuit design implemented in FPGA. For a high number of bits/controller inputs, the NN approach can be less effective than a classical circuit. The explanation is that, in the NN approach, the complexity of the resulting circuit raises exponentially with these numbers, whereas in a traditional approach, the complexity increases quadratically. The case study presented in this paper was implemented as part of an induction-motor controller in a 10 000-gate-equivalent FPGA, as opposed to a classical digital vector control circuit, for controlling the same motor, which was commissioned in our research group, using 99% of a 40 000-gate-equivalent FPGA [12] .
III. SIMULATION AND VERIFICATION
The ANN operation speed was tested by designing a VHDL test bench (Fig. 3) . Input patterns are generated by a 20-b counter and a pseudorandom sequence block. A simulation waveform is shown in Fig. 4 , illustrating delay readings of 39.5 and 80.5 ns.
Generally, oscilloscope measurements taken on the XS40 board, containing a Xilinx XC4010XL FPGA, indicate delays not exceeding 100 ns. Thus, the propagation time is less than 1.5 clock cycles, which demonstrates the advantage of higher operating speeds comparing with other digital circuits [13] .
IV. CONCLUSION
A new digital hardware-implementation strategy for feedforward ANNs with step activation functions has been reported. The novel algorithm treats each neuron as a special case of Boolean functions with properties that can be exploited to achieve compact implementation. This is accomplished by means of reusable VHDL code that can be easily translated into an FPGA implementation, using suitable electronic-design-automation software.
The VHDL programs bridge the gap between the facilities offered by simulation software and software packages specialized in hardware design. This method is most efficient for a low number of inputs/bits on each input; otherwise, a classical circuit may be preferred.
