Abstract-The modular design of the optimal tap-selective maximum-likelihood (TS-ML) channel estimator based on fieldprogrammable gate array (FPGA) technology is studied. A novel range reduction algorithm is included in the natural logarithmic function (NLF) emulator based on the coordinate rotation digital computer (CORDIC) methodology and is integrated into the TS-ML channel estimator system. The low-complexity TSML algorithm, which is employed for sparse multipath channel estimation, is proposed for long-range broadband block transmission systems. Furthermore, the proposed range reduction algorithm aims to solve the limited interval problem in the CORDIC algorithm. The modular approach facilitates the reuse of modules.
I. INTRODUCTION
Recently, researches on cyclic-prefix (CP) assisted block transmission systems, particularly orthogonal frequencydivision multiplexing (OFDM) and single-carrier with frequency-domain equalization (SC-FDE), have attracted considerable attention. Both these systems are targeted for broadband applications and have been adopted as IEEE 802.16d PHY standards for long-range fixed wireless transmission through multipath fading channels [1] . However, in order to entirely achieve these performance benefits, accurate channel estimation is crucial.
The optimal tap-selective maximum-likelihood (TS-ML) channel estimator [2] based on the maximum-likelihood (ML) and minimum description length (MDL) criteria [3] is proposed for long-range broadband block transmission systems over sparse multipath channels. The TS-ML estimator can reduce the noise effect and improve the estimation performance to achieve the optimal performance. For the purpose, the channel estimator can be used in many high-speed, long-range outdoor wireless transmission applications such as WiMax [1] , DVB-T [4] , and HDTV [5] systems.
However, for the above-mentioned applications, a hardware implementation is required. Recent developments in fieldprogrammable gate array (FPGA) technology have changed the conventional methods of hardware implementation. FPGAs have become an alternative solution for the realization of digital systems. They provide a good combination of highspeed implementation features with the flexibility of a digital platform.
In this paper, a case study is presented in which a modular FPGA-based design approach is applied to design a TS-ML channel estimator. We employ a system generator (SG) [6] to model the DSP projects. Furthermore, the complete system is implemented by dividing the system functions into reconfigurable modules. In addition, the issue of the range reduction on natural logarithmic function (NLF) emulation is discussed.
The central contribution of this paper is a modular hardware TS-ML channel estimator that offers a straightforward FPGA implementation. Furthermore, designing and automating the selection of range reduction for a large dynamic input range and solving a limited interval of the logarithmic function approximation for the CORDIC algorithm based on FPGAs. This paper is organized as follows: In Section 2, an overview of a TS-ML channel estimator algorithm is presented. In Section 3, the design of the components of the FPGA-based modules by using the SG platform is discussed. Section 4 a novel range reduction algorithm and the reconstruction of the NLF is presented. In Section 5, the result of our implementation is reported. The conclusions are presented in Section 6.
Notation:
We use a bold uppercase (lowercase) font to denotes matrices (column vectors). F is the N×N DFT matrix whose (m,n)
( ) log ⋅ , and , the complex conjugate, inverse, transpose, conjugate transpose, absolute value, norm of vector, natural logarithm, and vector pair-wise multiplication operations, respectively; { } E , the expectation operator; and diag{ } a , a diagonal matrix with the elements of a on the diagonal.
II. OVERVIEW OF THE TS-ML CHANNEL ESTIMATOR ALGORITHM
For fixed wireless applications [1] , the composite baseband channel can be modeled as a linear time-invariant system within a small segment of time and is characterized by its impulse response [7] ( )
where ( ) We consider the channel estimation problem with regard to the SC-FDE block transmission system. As shown in Fig. 1 , the transmission frame comprises one preamble block and P data payload blocks, where L, N, and D denote the lengths in symbols for the CP, training sequence, and a single data payload block, respectively.
We begin by formulating a channel estimation problem for the CP-based single-carrier system. Let the time-domain Npoint training sequence . Assume that the signal is passed through an unknown discrete-time channel L h , where the maximum channel length is less than the CP length L. At the receiver, after removing the CP and computing the N-point FFT, the received frequency-domain signal block r
, the channel vector of length N with (N-L) appended zeros;
, a diagonal matrix formed by Ft , i.e., the FFT of t; and v, an N × 1 complex white Gaussian noise vector with covariance 2 N σ I .
In order to facilitate the estimation of channel response L h , we consider the Chu-sequence [8] as the training sequence to satisfy the constant modulus property in both the time and frequency domains, i.e., ( ) = 1 t n and ( )
By performing a pairwise multiplication of (2) with * / N , we obtain an equivalent received data vector for the channel estimation:
where the new Gaussian noise vector
h coincides with the least squares (LS) solution: 
. If all the identified channel order and tap positions are correct, the MSE becomes
Figure. The block diagram of the overall TS-ML channel estimator is shown in Fig. 2 , and the proposed algorithm is summarized in Table I . Therefore, the TS-ML channel estimator algorithm can reduce the estimation error in MSE by an improvement factor of L/K. If some taps have small values, the tap-selective process will automatically reduce the channel order such that the noise effect is alleviated. Hence, we also regard the TS-ML estimator as an adaptive and robust channel estimation method.
III. DESIGN OF FPGA-BASED MODULES FOR THE TS-ML CHANNEL ESTIMATOR
In this section, the modular design of the optimal TS-ML channel estimator based on FPGA technology is implemented by focusing on the computation of the IFFT, magnitude of the complex signal, sorting data, and NLF emulation. Each module in the channel estimator block diagram is shown according to the TS-ML algorithm, which determines the organization of the course contents in the six modules. Each block corresponds to a module, as shown in Fig. 3 . In the following implementation, we set the CP length as L = 16 and use the shortest preamble with N = 32. The sparse channel order is set as K = 3.
The TS-ML channel estimator employs IFFT to obtain the time domain components of the signal; thus, the first module covers the implementation issues of the IFFT algorithm. The difference in the fwd_inv port of the input interface is set to zero, while the inverse transform is selected for the SG's FFT block; furthermore, the pipelined, streaming input/output implementation mode is employed for allowing continuous data processing.
In order to compute the magnitude of the complex signals ˆL h and x, the second module employs the coordinate rotation digital computer (CORDIC) algorithm [9] - [10] , which is introduced as an iterative algorithm that requires only addersubtractors and shifters. Furthermore, the magnitude output should be compensated by a constant coefficient multiplier. Finally, the power of the magnitudes must be generated by a multiplier. The product of the data is computed on the two connected input ports of the multiplier and the power result is obtained on its output port.
The third module focuses on the implementation of the sorting data for the ˆL h in the descending order of power. For the purpose, we employ a parallel sorting architecture such as [11] . In this study, we employ 16 data in order to explain the manner in which parallel sorting operates. Figure 4 shows an example of an array of 16 data and 15 stages. Two parallel sorting structures have been presented:
• The comparator block of the upper parallel sorting block is shown in Fig. 5a . The inputs of this block are (1) datum a; (2) datum b. The outputs of this block are (1) a swp flag signal that indicates if swapping was performed, (2) datum A, and (3) datum B with the corresponding value that indicates whether or not swapping occurred. In addition, the equal data will remain unchanged.
• Another comparator block of the lower parallel sorting block is shown in Fig. 5b . The inputs of this block are (1) a swp flag signal that indicates if swapping was performed, (2) datum c, and (3) datum d. The outputs of this block are (1) datum C and (2) datum D. These outputs will be swapped depending on the status of the swp flag signal. The datum c and datum d inputs are connected with two fixed constants in order to record the original tap orders for the TS-ML algorithm.
The sorting is completed when all swp flags are set to zero, i.e., swapping was not performed in the all the levels of the comparators. The fourth module works with the recursive relation updated equation, which is introduced by using an adder-based accumulator architecture, as a method of generating recursion in the input path. We compute In the fifth module, the natural logarithm is computed, which is represented in Section 4.
The last module comprises three components-a constant coefficient multiplier, an adder based accumulator, and a parallel sorting block. Because N = 32, we can replace the constant coefficient multiplier with a 5-bit left shift. Moreover, 
IV. A NOVEL NLF EMULATOR IMPLEMENTATION BASED
ON THE FPGA MODULE A NLF evaluator typically comprises range reduction and the actual function approximation over a small interval. Range reduction [12] - [13] is crucial because function approximation is rather limited without it and numerous applications have a large dynamic range. Hence, this is the first study that deals with this important issue. Figure 6 shows the overall block diagram of the NLF evaluator based on the FPGA module. The design of a more suitable NLF evaluation for the channel estimator based on the input range and precision, the following two procedures are performed: (1) approximation method selection, and (2) input of range reduction.
A. Approximation Method Selection
The CORDIC algorithm is more suitable for the channel estimator to obtain the natural logarithm. Because the CORDIC algorithm is a collection of iterative shift-and-add algorithms, which provide an extremely efficient means of computing the logarithmic function.
However, for natural logarithms, the range of the valid input with a CORDIC processor is limited in a domain [0.5, 1) of convergence for a fixed-point format of SG [6] platforms based on FPGAs. An equation can be used to assist the computation according to the fundamental property of a logarithmic function. It is known that ( ) = ⋅ . By using this equation, the natural logarithm can be computed by using a CORDIC algorithm and adding an additional multiplier constant. As a result, we can design the applicability of range reduction.
B. Range Reduction
Consider the function As an example to illustrate our approach, consider a given range [ ) , s e and 12 inputs assigned to a 1-output multiplexer. for our TS-ML channel estimator based on the CORDIC algorithm in practice; the combination of the bit-shift segments in parallel is shown in Fig. 7 .
In Fig. 8 e ,e , ,e are restricted to values that can be produced by a simple combination of twelve comparators in parallel. A more detailed comparator block is shown in Fig. 9 . The appropriate taps are obtained from the parallel comparators depending on the choice of the segments and are added to compute the sel index and m index for an added constant 4 − .
V. RESULT OF OUR IMPLEMENTATION
The hardware platform for implementing the channel emulator is Xilinx XtremsDSP DK4 board [14] which hosts a Virtex-4 XC4VSX35 FPGA [15] . We used the top-down design flow based on the Simulink and SG software tools for fast co-simulation. The resource utilization report for this implementation is presented in Table III. VI. CONCLUSIONS In this paper, an FPGA-based channel estimation system using an optimal TS-ML algorithm was presented. On the other hand, a novel method for the range reduction and reconstruction based on the CORDIC algorithm, solution of a limited interval, and integration into the TS-ML channel estimator was also presented. Based on the module scheme of the dividing system functions, the proposed TS-ML channel estimator reduces the complexity of the FPGA design.
In future, we will continue to optimize these components such as the sorting data and NLF modules for integration into the TS-ML channel estimator. More components will be developed if necessary. , s e .
3.
Let us input the combination of twelve bit-shift segments into a 12-input to 1-output multiplexer in parallel. 4 .
The multiplexer will select a correct interval for the 12 inputs in parallel via a sel index of segment index encoder output. 5.
The output of the multiplexer will be modified to a more convenient u′ over a smaller interval. 
