Abstract: A scheme for efficient hardware implementation of central pattern generators (CPGs) on Field Programmable Gate Arrays (FPGAs) is proposed. A revised distributed-arithmetic (DA) algorithm is applied to the implementation to maximize the utilization of look up tables (LUTs) in FPGAs. The proposed scheme performances satisfactory experiment results which have correlation coefficients of 0.99 with simulation ones. In the mean time, it demonstrates 74% reduction in LUTs consumption, 75% in registers and 100% in embedded multipliers.
Introduction
Central pattern generators (CPGs) are neural circuits that are capable of generating rhythmic outputs with coordinated patterns in the absence of sensory feedback or higher control commands. They have been extensively applied to robotics in developing robust control system, or for biology research [1] . To achieve a "mimic" approach, hardware implementation of CPGs is drawing attentions. Most works use analog devices [2, 3] . Analog circuits could perform nonlinear equations inherently and have reduced power consumption. Nevertheless, works based on Very large Scale Integrated Circuit (VLSI) need a relatively longer design circle and they are lack of flexibility and compatibility comparing to those on Field Programmable Logic Arrays (FPGAs). Although FPGAs have been intensively used in hardware implementation of Artificial Neuron Networks (ANNs) [4] and intelligent control systems [5] , very few works of FPGA-based CPG implementations are reported [6] . On the other hand, the previous works use straight-forward implementation methods which does not make the best of FPGA resources.
In this paper, we firstly exam the important issues of implementing CPGs on FPGA. We consider the conflict between CPG's need for frequent multiplication processing and the limited embedded multiplier resources on FPGA as a major hindrance for the implementation. Therefore, this work introduces the distributed-arithmetic algorithm to maximize the use of look up table (LUT) and to eliminate the reliance on embedded multipliers. The results of this new approach are compared with conventional ones.
The remainder of this paper is organized as follows. In section II, the mathematical description of the CPG is presented and its implementation issues are discussed. In section III, a DA-based implementation scheme of CPG on FPGA is proposed. In addition, the DA algorithm is revised for direct manipulation of fixed-point data. Section IV presents the experiment results of the FPGA based CPG, as well as the comparison of hardware resource consumptions. Conclusions are made in Section V.
Arithmetic models and implementation issues of CPG
CPGs could be designed under several levels as biophysical, connectionist models and coupled oscillators [1] . The coupled nonlinear oscillators with reciprocal inhibitions are most frequently used. Well-known models of coupled oscillators such as Amari-Hopfield, Van De Pol and Matsuoka types share a similar principle. As an example, the dynamics of a Amari-Hopfield oscillator [2] is given by following equations:
Where u and v denote the neuron outputs, S u (t) and S v (t) stand for the external inputs. α 1 , α 2 , β 1 , β 2 , μ are the control parameters and f µ (x) is the transfer function. See Fig. 1 (a) [2] for the architecture of an Amari-Hopfield type coupled oscillators. As the equations shows, implementation of the oscillator requires several basic operations: addition/subtraction, multiplication, integration/deviation and the transfer function (optionally). A single oscillator would require only several multipliers. However, a larger network would need more reciprocal inhibition paths thus makes the limited embedded multiplier resources a major hindrance for efficient implementation of CPGs. On the other hand, most FPGAs have large resources in Look-up tables (LUTs). With the hope of improving multiplication performance, we introduce the Distributed Arithmetic (DA) to CPG implementation which utilizes LUTs instead of multipliers. Another important issue about the implementation of transfer function has been discussed by Tommiska [7] .
DA based CPG
Original DA algorithm demands the input to be fractional or integral. In FPGA systems, normally the input signal is transformed into fractional number through a binary point cast module and vice versa for the output. In this paper, the DA algorithm is revised so that no transform process is needed, consider a sum of products
For a certain signed fixed-point X k , it could be presented using a (N + M = Q, N for integer part and M for fractional part) bit binary form as
By introducing C k = A k × 2 N −1 , X k could be shifted into a pure fractional form. Accordingly, the value of Y is given by:
L i is defined as the sum of products between C k and every binary bit of X k ,
L i can be pre-computed and stored in a LUT, the input signal for which is X k . The output from the LUT should be shifted and summed by a scaling accumulator. Now we apply the proposed DA arithmetic to the implementation of Amari-Hopfield neuron oscillator. Assuming that all variants of the oscillator are in a (Q = N + M)-bit binary form, we will have the DA-based neuron oscillator algorithm as follows:
The content of the LUT for neuron u is presented in Fig. 1 (d) . Another LUT for neuron v takes a similar form. Notice that the transfer function is replaced by a saturation function without significantly reducing the quality and generality of the rhythmic patterns [6] . The DA-based Amari-Hopfield neuron architecture is showed in Fig. 1 (c) . It consists of a three-input LUT, a scaling accumulator (detail of which is given in Fig. 1 (b) ), a parallel adder, a stature block and an integrator (uses forward Euler method). The entire system is designed using DSP Builder offered by Altera (System Generator from Xilinx is an alternative option) under Matlab/Simulink with a modular approach. It can be tested by Matlab/Simulink simulation and incorporated into a soft core possessor as custom IP cores.
Experiment
The proposed circuit is implemented on an Altera Cyclone II EP2C8Q208 chip with a maximum frequency of 50 MHz. We adopt a 20-bit fixed point data presentation form (with 8 bits for integer part). The neuron parameters are given by following:
The experiment data are obtained by Signal Tap II logical analyzer and analyzed in Matlab. See Fig. 2 (a) for the limit circle attractor. The experimental neuron outputs are compared with simulation ones, details of their slight differences are given in Fig. 2 (b) Based on our calculation, the correlation coefficients between experimental and simulation results is 0.99 for both u and v, indicating a high accuracy of the implementation. Table I presents the resource consumption of conventional and the DAbased implementation. The proposed method saves up to 74% of look up tables (LUTs), 75% of logic registers and 100% percent of embedded multipliers.
Conclusion
The Distributed Arithmetic algorithm is introduced to the implementation of Central Pattern Generators on FPGAs. Compared with conventional approach based on multipliers, the proposed scheme achieves significantly savings in hardware resources. Furthermore, this method could be applied to the implementation of Artificial Neuron Network or other control systems. 
