important tool in digital signal processing. In this paper, a multilierless algorithm to compute discrete Hartley transforms is proposed, which can deal with arbitrary length input signals. The proposed algorithm can be implemented by integer additions of fixed points in binary system. Besides, an efficient and regular systolic array is designed to implement the proposed method, followed by the complexity analysis. Being different to other fast Hartley transforms, our algorithm can deal with arbitrary length signals and get high precision. The proposed method is easily implemented by hardware and very suited to a realtime processing.
I. INTRODUCTION
DHTs (discrete Hartley transform) are very important tools for digital signal processing and analysis and many other engineering applications. A variety of methods for computing DHT have been proposed [3, 8, 9, 12, 14] . These methods can roughly be classified into two families. One family tries to reduce the number of multiplier and add operations, while another focuses on realizing DFT with efficient systolic or other parallel hardware architectures. Meantime, since it is highly computation-intensive, there is a strong need of dedicated processors for high-speed computation of the transform coefficients to meet the requirements of real-time signal processing and digital multimedia communication systems. However, the embedded and portal applications continue to impose serious limitations on the amount of hardware involved and the rate of energy consumption. To meet the challenges of ever-growing computational demand with minimal of hardware and power, several attempts have been made for fast implementation of DHT in bit-level as well as word-level VLSI structures.
Systolic architectures are one of the most popular VLSI structures for complete intensive digital signal processing applications due to their good properties of simplicity, modularity and regularity. For DHT, several CORDIC systolic architectures have been reported in [5, 7, 13] . Due to the good performance, the fieldprogrammable gate array (FPGA) for implementation of DHT becomes more and more attractive [1, 2] . Besides, the ROM-based structures presented in [9] by Guo, Liu and Jen are proved to be efficient.
But to achieve the highest efficiency, all the above methods require that the signal length N is highly composite, and the error of DHT is not considered. In this paper, based on our work in [4, 15, 16] , we proposed a multiplierless algorithm for arbitrary length DHT, which transforms the multiplications of our moments-based DHT into additions by shifting digits and accumulation of integers. Based on the approach to the fast calculation of moments [4] , systolic arrays to perform 1-D DHT are presented, followed by a complexity analysis.
The rest of the paper is organized as follows. We introduce our previous work about DHT briefly in section 2. We present an improved algorithm for DHT in section 3. Followed by complexity analysis, the systolic arrays are designed to compute DHT in section 4. Finally, we include our paper in section 5.
II. USING MOMENTS TO COMPUTE DHT
By using a modular mapping, DHT can be approximated by the sum of a finite sequence of discrete moments [4] .
The DHT is defined for a real-valued sequence (0), (1), , ( 1) x x x N− , as follows:
By using the periodic properties of the sine and cosine functions [15] , Eq.( 1) can be rewritten as: , 0,1,2, ,
In effect, [2log / log log ] N N [4] . In this paper, [x]denotes the integer closest to x.
Thus, the computation of X(k) using the approximation of (4) can be represented as sums of distinct powers of 2: It is obvious that the larger the N , the smaller the error ' p R , which can satisfy the accuracy requirement of most applications. Thus, ( ) X k can be approximately computed as follows:
By doing so, all the products of floating-point constants with moments in Eq. (2) are eliminated, replaced by shifting the digits in binary system and accumulations of integers. Eq.(5) can be computed in advance once the signal length N is determined. Next we analyze the complexity of the algorithm.
Step 1 constitutes a preprocessing step which involves equations (2.3) and (3.1). It is noteworthy that [ 2 ] m , we use the p -network method presented in [4] , which require less than 
F F F F F A A
n n
Moment a a a a Our algorithm seems to require a larger number of additions than many fast Hartley transforms, but it is easily implemented by systolic arrays efficiently because it only involves integer additions and shifts.
Besides, it has the following advantages. There are no multiplications in our method, which is superior to the Currently, he leads some major projects, such as the JT-G Green Channel inspection system, the algorithm software for crude oil fast evaluation funded by RIPP of Sinopec, a Guangxi Natural Science Fund on NIR Internet of Things, etc. He has completed several projects including one funded by National Science Foundation of China on modeling of NIR spectrum. He has published more than 30 papers in journals or conferences at home and aboard. His main research directions include intelligent information processing, machine learning, data mining, cloud computing, near-infrared spectroscopy and its application. 
Jianguo Liu was born in

