Abstract This paper presents a novel architecture of 256-point unified discrete Fourier-Hartley transform (UDHFT) processor for digital signal processing applications. The proposed architecture uses a UDHFT theory which can calculate the discrete Fourier and Hartley transforms (DFT, DHT) of types I-IV and the discrete cosine and sine transforms (DCT, DST) of types II-IV; moreover, the architecture utilizes the general existing fast Fourier transform (FFT) IP-core working with pre-and post-processing unit to process the entire UDHFT transforms. We implemented the proposed architecture on the Xilinx Virtex-It Pro FPGA. It achieves 256-point UDHFT transforms in 9.25its at 100MHz operating clock frequency.
INTRODUCTION
The discrete sinusoidal transforms are effective tools in digital signal processing (DSP) applications. The discrete Fourier transform (DFT) enables frequency-domain analysis of a time-domain signal [1] . The discrete Hartley transform (DHT) is similar to the DFT, only with the difference that it deals only with real computation [2] . The discrete cosine transform (DCT) has long been used in image and speech processing [3, 4] . The JPEG standard used the DCT as the basis function. The discrete sine transform (DST) is useful for spectrum analysis, data compression, speech processing, biomedical signal processing and in many other applications. These basic signal-processing transforms are required in almost all the phases of image and signal processing and cover a large range of biomedical signal and image processing, for various imaging techniques and spectral analysis of the signals.
In this paper, we propose the novel architecture of 256-point UDHFT processor for computing the DFT, DCT, DHT and DST by using the unified discrete Fourier-Hartley transform (UDHFT) theory [5] . The UDHFT is a unified fast and efficient algorithm for computing the Fourier and Hartley transforms of types I-IV and the discrete cosine and sine transforms of types II-IV by utilizing the existing FFT algorithms. The reason that the UDHFT utilizing the FFT algorithms because of many commercial FPGA soft IP, DSP hardware and software available in the markets already the FFT algorithms built-in; therefore, it is important to take advantage of them.
As show in figurel, the proposed UDHFT processor which has been designed based on UDHFT fast structure [5] consists of three important processing parts. The first processing part is a pre-processing unit that has been designed to handle the dynamic-routing butterfly structure and to compute a complex coefficients multiplication. The second part is a general highspeed and small-area 256-point fast Fourier transform (FFT) IP-core [6, 7] . The proposed processor uses the existing FFT core to process the entire UDHFT transforms. The third part is a post-processing unit which performs the permutation process and post-computes a complex coefficients multiplication. All processing parts are controlled by the processor control unit to process the DFT Table 1 summarizes the values of these parameters for the cases of DFT and DHT. 
A. Modifications for DCT and DST In this section, we will show that the proposed UDHFT can be used calculate the DCT and DST of types II-IV by simply permuting the output or input data with possible sign changes. Hence, according to (3), CIII POT. The same technique can be applied to the cases of DCT-IV by choosing A = B = 1/2, kO = 1/4 and nO = 1/2. We have CIV = PIT. Since the DCT-II is simple transpose of the DCT-III, and by the symmetry of the variables k and n, the DCT-II can be included as a special case with kO = 0 and nO = 1/4. Similarly, the DST of types II-IV can be found as special cases of the UDHFT as well. Table 2 summarizes the values of A, B, kO (4) where, for each case, the parameters A, B, ko and no are appropriately chosen. 
and Sn is some integer so that u < h < N -1 . Notice that s. The orthogonally of the transform can also be seen in Figure 2 . Since pre-and post-multiplying of the FFT block are orthogonal operations and the FFT itself is an orthogonal matrix, the whole UDHFT is orthogonal if and only if the preprocessing Q whose (k, n) element is given by IV. THE PROPOSED UDHFT ARCTECTURE Figure 1 show the proposed 256-point UDHFT processor which has been designed based on UDHFT fast structure that described in section 3. The proposed processor utilizes a general 256-point fast Fourier transforms IP-core working with pre-and post-processing unit for computing the 256-point DFTs, DHTs, DCTs and DSTs transforms; also, types of transforms will be change depend on the four parameter of UDHFT (A, B, kO and nO). All of the transformation coefficients are pre-calculated and kept in each pages of the coefficients ROMI and ROM2. The processes of the proposed processor are controlled by the control unit which has a 7 stage pipeline data path.
A. Pre-Processing Unit
From the fast structure for UDHFT that described in section 3, the pre-processing structure has a dynamic routing buttery structure. This buttery routing path are changed depend on a parameter no of UDHFT; hence, the preprocessing of proposed processor has a PEt (Processing Element 1) to handle a dynamic routing path and precomputing the transforms before sending the pre-computing data to the FFT block. As show in figure 5(a) , the PEI which consist of three 16-bits complex multipliers, one 16-bits complex adder and three coefficient registers has designed to pre-process the UDHFT transformations. The PEI datapath has 3 pipeline stages. In first stage, the input dataX(n) and X(h) which are fed from input buffer dual-port RAM are multiplied with the coefficient register RI and R2 respectively. In stage 2, the additional of first stage results are performed. In last stage, the coefficient register R3 which fed the coefficient value from the coefficients ROMI is multiplied with previous stage value; then, the results are put to x(k) . The PEI has been designed to processes the transformations data by recursive working to reduce the hardware resources; hence, the PEI will work by 256 times for each 256-point transformations data.
The PEt also has been designed to work in three modes covering four cases of routing path in (7) and depended on parameter A, B and ko of UDHFT theory. In first mode, the three internal registers are set by processor control unit to RI= A/A, R2= BW-Ak°/A and R3= Wnk, covering the cases 1 and 2 of (7). In second mode, the registers are set to RI= A+BW-/|0+BW-k°|, R2=0 and R3= Wnko covering the case 3 of (7). In the last mode (bypass mode), the registers are set to R=1l, R2=0, R3= Wnko covering the last case of (7).
To handle a dynamic buttery routing path, the PEt will feed a pair of complex data from input buffer dual-port RAM at the same time. Input X(n) will get a data from address n of dual-port RAM where n = O..N -1; also, input X(h) will get a data from address h of dual-port RAM where h depend on parameter no and refer to (5) .
X(2) After finished the pre-processing process, The transformations data will be kept in the 256-word FFT input memory (FFT-RAM1) and waiting for next the FFT process which are describe in next section.
B. Fast Fourier Transform Core
In the word markets today, many commercial FPGA and DSP hardware/software already have the fast Fourier transforms (FFT) algorithms built-in; thus, the UDHFT theory utilize these existing FFT core. However, in this paper, we utilize our existing 256-point high-speed and small-area FFT IP-core which use the single-memory architecture with radix-4 algorithm [6] and memory bank structure [8] . This FFT IPcore works with pre-and post-processing units to compute the UDHFT transforms by transferring the transformation data passing the 256-words memory (FFT-RAM1 and FFT-RAM2) as show in figure 1.
C. Post-processing unit
The post-processing unit of the proposed processor has been designed to work for two functions. The first function is permuting the output data from FFT block, and the second is multiplying the output of transforms with the coefficients value. To permute the transformations data from FFT block, the processor control unit will generate the permutation address signals which follow to the permutation matrices Pi to control the address of FFTRAM2; therefore, the output of the FFTRAM2 are permutation data. This permutation process will enable only case of DCT and DST transforms (see section 2.1).
After permutation process, each data are multiplied with the coefficient values w n+kn which are kept in coefficients ROM2. These multiplications process are handled by the PE2 (Processing Element 2) of post-processing unit as show in figure 5(b) .
Moreover, in case of DCT and DST, some input data of PE2 will be multiply with 1 which depend on variable i of permutation matrices Pi in section 2.1. For this process, the processor control unit will generate the S signal to control the direction of input data passing the two's-complements block.
V. SYSTEM IMPLEMENTATION AND RESULT
The processor is implemented on the Xilinx XPU-V2Pro (XC2VP30 FPGA with 30,816 Logic Cells) development system that provides a complete development platform for designing and verifying applications based on the Xilinx Virtex-I1 Pro FPGA family [9] . After place-and-route process, the resource utilization of 256-point UDHFT is concluded in table 3 .
VI. CONCLUSIONS
The essentials of this paper described the UDHFT theory and proposed the novel architecture of 256-point UDHFT processor that utilizes the general FFT IP-core. This architecture is used to calculate the DFT, DHT, DST and For the future works, we will improve this UDHFT processor architecture by combining the pre-and postprocessing unit with the FFT structure to reduce the transformations time and hardware resources.
