Abstract-In this article we propose a complete solution for the so-called Inner Receiver of an OFDM-WLAN system based on the IEEE 802.11a standard. We concentrate our investigations on three key components forming the Inner Receiver namely, the Synchronizer, the Channel Estimator and the Digital Timing Loop. The main goal is the joint optimization of the signal processing algorithms along with the implementation friendly VLSI architecture required for these three key components in order to reduce power, area and latency, without compromising the performance excessively. We provide both the mathematical details and extensive computer simulations to validate our design.
I. INTRODUCTION
T HE use of the OFDM (Orthogonal Frequency Division Multiplex) transmission technique has gained a lot of interest in the recent years due to its spectral efficiency and capability to overcome multi-path fading. In this paper we concentrate on the OFDM-WLAN (Wireless Local Area Network) systems, which are already a reality thanks to the IEEE 802.11a/g standards [1] , [2] . The application of OFDM is not restricted to these two standards, but new standardization processes already foresee the application of OFDM in future WLAN [3] and UWB (Ultra Wideband) systems [4] .
The key property of OFDM is orthogonality. By this property the system uses the input data to modulate a number of mutually orthogonal sub-carriers. This technique facilitates a high data rate transmission system. However, the whole system performance depends on maintaining the orthogonality of the sub-carriers. If the orthogonality property gets disturbed, unwanted effects such as Inter-Carrier Interference (ICI) and Inter-Symbol Interference (ISI) will occur during signal reception. In general, the orthogonality property of the sub-carriers can be disturbed during the RF Up-and Down-conversion. On top of that the characteristic of the transmission channel may also affect the orthogonality condition. A number of authors July 17, 2006 . The associate editor coordinating the review of this paper and approving it for publication was C. Xiao.
A. Troya was with IHP, Frankfurt (Oder), Germany. He is now with Infineon Technologies AG, COM PS CE ALG, 81726 Munich, Germany (email: alfonso.troya@infineon.com).
K. Maharatna is with the University of Southampton, University Road, Southampton, SO17 1BJ, UK (e-mail: km3@ecs.soton.ac.uk).
M. Krstić, E. Grass, U. Jagdhold, and R. Kraemer are with IHP, Frankfurt (Oder), Germany (e-mail: {krstic, grass, jagdhold, kraemer}@ihp-microelectronics.com).
Digital Object Identifier 10.1109/TWC.2007.05481. have addressed the impact of this type of impairments on OFDM signals in the past years [5] , [6] . Thus, in order to make the system work efficiently, we need to re-establish the orthogonality condition at the receiver. The so-called Inner Receiver (this term was firstly coined by Heinrich Meyr [7] ) is used for this purpose. In essence, there are two main operations carried out inside the Inner Receiver (IRx) namely Signal Acquisition and Channel Correction as shown in Fig. 1 . The acquisition operation is performed by means of a synchronization block, which should be able to perform reliable Frame Detection (FD), and to provide estimations for the Carrier Frequency Offset (CFO) and Symbol Timing Offset (STO). The channel correction operation is needed to estimate and compensate the Channel Transfer Function (CTF), provided that orthogonality has been restored to a great extent by the synchronizer. The final goal is to supply the decoding and demodulator block with In-phase and Quadrature components that are as similar as possible to the original ones. Though the IRx is an integrated part of the OFDMbased WLAN system, its design complexity is frequently underestimated. Unfortunately the standards do not provide in general any hints on how to implement the IRx, but it is left as a developer's task. In this article we investigate an efficient realization of the IRx for IEEE 802.11a systems both from the algorithm and VLSI (Very Large Scale Integration) implementation point of view, and provide a complete and practical solution for it. The results developed in this work are applicable to the future standards [3] . In order to develop our solution we start with the algorithm level formulation of the desired functionality of the IRx. The algorithmic development has been considered strictly in conjunction with the possible architectural feasibility of an ASIC (Application Specific Integrated Circuit) implementation. Thus a joint algorithm and architecture optimization has been undertaken using power consumption, silicon area, system latency and overall noise performance as the "quality/efficiency" parameters for the system. The power consumption and silicon area have been 1536-1276/07$25.00 c 2007 IEEE Fig. 2 . Preamble symbols as defined by the 802.11a standard together with the timing schedule followed inside the Synchronizer. considered as two of the main parameters since the system is targeted for mobile and portable applications where saving of battery life as well as the total size of the system are crucial. Latency has been considered from the operation principle of the IEEE 802.11a MAC (Medium Access Control) protocol [1] .
Different parts of the present work have been published in different renowned conferences in short form [8] , [10] , [11] , [19] . In this paper we provide a much more detailed and integrated view of the complete IRx solution. The rest of the present paper is organized as follows: after introduction, the main components of the IRx are investigated. Subsequently, an efficient synchronizer architecture is examined in Section II, whose main architecture was foreseen by the authors in [8] , [9] . Section III is devoted to the analysis of a decision-directed Channel Estimator (CE). Two blocks are the main focus of our investigations, namely the Noise Reduction Filter (NRF) and the Residual Phase Error (RPE) correction block. The proposed timing loop is analyzed in detail in Section IV and provides a simple method to compensate for the Sampling Clock Frequency Offset (SCFO) based on the RPE estimation supplied by the CE. Section V presents simulation results which show the performance features of the proposed solutions. Finally, in Section VI, some important conclusions are derived.
II. THE SYNCHRONIZER
The synchronizer is the block responsible for signal acquisition. This term encompasses a number of operations that need to be performed in a very limited period of time in order to minimize latency. For our purpose, synchronization must be finished within the preamble time, i.e. 16 μs, and the following operations must be performed based on the preamble symbols: 1) Frame detection.
2) Determination of the symbol timing.
3) Carrier frequency offset estimation and correction. 4) Extraction of the reference channel estimation. The order in which these operations are carried out strongly determines the architecture of the synchronizer. The preamble symbols in the 802.11a standard comprise a number of periodic sequences as shown in Fig. 2 . This periodic structure suggests a solution based on autocorrelators [13] , [14] . The proposed implementation shown in Fig. 3 contains two autocorrelators. Each one encompasses a delay line (FIFOtype buffer) of length N d , a complex conjugate operation, a complex multiplier, and a moving average of length N avg . The moving average is an FIR filter of length N avg with all its coefficients being 1. Let's consider the input signal r(m) to be sampled at frequency f s and affected by a CFO f = Δf, where stands for the normalized CFO, and Δf is the subcarrier spacing in the OFDM signal (Δf = 312.5 KHz in the 802.11a). Hence, the input signal r(m) can be expressed as
where j = √ −1, s(m) is the original time sequence and v(m) represents a zero-mean white Gaussian noise process. According to (1) , the autocorrelator's output signal J x (k) is given by
Δf f s
where the suffic x represents either F or C in Fig. 3 . By considering the sequence s(m) to be uncorrelated with the noise sequence v(m), the last three summands in (2) can be neglected for sufficiently large values of N avg , yielding
where it has been considered that the signal s(m) is periodic with a period of
it is straightforward to see that the phase of J x (k) is only due to , and hence could be estimated as followŝ
However, there is an important factor that destroys the periodicity, making s(m) = s(m − N d ), i.e. the Automatic Gain Control (AGC) settling time, whose influence is analyzed through simulation in Section V. If N avg is a multiple of the minimum periodicity in the preambles (16 
Since the ratio (f s /f ) is a fixed parameter, the range of estimation of will only depend on the selected delay N d in the autocorrelator. In the 802.11a we find that (f s /f ) = 64, resulting in |ˆ | < 0.5 for N d = 64 and |ˆ | < 2.0 for N d = 16.
A. Frame Detection Mechanism
The first operation to be carried out by the synchronizer is FD. We decided to make use of the particular shape of the signal |J F (k)| 2 in order to derive a simple frame detector.
Consequently, if we are able to detect the plateau in |J F (k)| 2 (see Fig. 4 ), this will be the indication that a frame is being received. The proposed plateau detector contains two blocks namely, a differentiator and a peak detector, as depicted in Fig. 3 . The differentiator should indicate the point where the plateau starts. The differentiated signal J dif f (k) is obtained as follows:
where N dif f simply defines the delay applied by the differentiator. The signal J dif f (k) is also shown in Fig. 4 with
The autocorrelation block together with the differentiator and the peak detector constantly "peer" the channel. When the peak detector identifies an absolute maximum at the output of the differentiator, the synchronizer will consider that a new frame has arrived and the CFO estimator will be activated. However, due to the noise and more importantly, due to the Automatic Gain Control, the peak detection will not be a trivial task and a smart algorithm will be necessary in order to distinguish absolute from relative maxima [8] , [9] . For this purpose the peak detector is also divided into two blocks, namely group peak detector and instantaneous peak detector, as shown in Fig. 3 . The instantaneous peak detector is basically a combination of a comparator and a counter. The present sample J dif f (k) coming out from the differentiator is compared with the last recorded maximum J max (J max = 0 at k = 0). As long as the sample J dif f (k) is bigger than J max , the register storing J max will be updated with the new sample J dif f (k) as the new encountered maximum and the counter will be reset. If J dif f (k) is smaller or equal than J max , the counter will be triggered and it will increase its count by one. If this situation persists until the counter overflows, the instantaneous peak detector will activate a signal stating that a relative peak has been found inside the counting scope of the counter. The group peak detector is used to detect the falling edges in J dif f (k), and its main component is also a comparison block. There, the input signal is accumulated in groups of six samples (6-tuples) and the present group is compared with the previous one. If it is smaller, it means that the falling slope has started. If the group peak detector finds a falling edge at the same time as the instantaneous peak detector finds a relative peak, then the detected peak is actually an absolute peak. In the situation where no AGC is present, the signal |J F (k)| 2 shows a plateau of 32 samples. Consequently, the parameter N dif f in (6) was selected to be 16 samples, thus making the FD algorithm to detect the plateau in |J F (k)| 2 at its middle point [8] . This fact justifies the definition of False Alarm Probability done later in Section V.
B. Carrier Frequency Offset Estimation and Correction
According to the specifications in the 802.11a standard [1] , all the clocks and carrier signals for the transceiver should be generated from the same crystal oscillator, which should have a maximum relative frequency error of ±20 ppm. Let's consider an example in which a signal is received at the highest possible carrier frequency of 5,805 MHz (operating channel 161 in the U-NII upper band). The total frequency deviation during down-conversion is then given by 5, 805 · ±20 = ±116.1 KHz. The whole transmit-receive process introduces an overall carrier frequency error of |f | = 232.2 KHz. Normalizing this value with respect to the sub-carrier spacing, Δf = 312.5 KHz, we find the maximum normalized CFO to be |f /Δf | = 0.75. The present implementation in Fig. 3 considers the frequency offsets to be in the range ±1.5, i.e. twice the maximum value required by the standard. This decision is based on a pessimistic approach and was justified by the fact that functional tests had to be carried out using experimental Analog Front-Ends (AFE), which were not entirely fulfilling the specifications.
Two autocorrelators with N d = 64, N avg = 64, and N d = 16, N avg = 16, respectively, are used. The autocorrelator with N d = 64 is used to get a fine estimation of the CFO (|α| < 0.5), whereas the latter is used to obtain a coarse estimation of the CFO (|β| < 2.0). Note that the definition of fine and coarse is not based on the range, but on the accuracy of the estimation, i.e. the length of the moving average. Hence, although α is bounded more restrictively compared to β, it will be less noisy since its moving average is much larger. The final normalized CFO estimation will be a combination of the values obtained for α and β. Although β has a linear dependency throughout the entire range of possible values of the CFO, i.e. ±1.5, this is not the case for α. Hence, the final CFO estimation cannot be directly a linear combination of the two estimations α and β. Instead, β will only serve as a range pointer and will provide the integer value of the frequency offset (either −1, +1 or 0), whereas α will provide the fractional part of the estimation. The final value of results from the following function, The estimation of the CFO will take place in one shot exactly at the time instant when the FD detects the incoming frame, since both autocorrelators exhibit a plateau at that particular point of time. An arctangent calculator is necessary to obtain α and β from J F (k) and J C (k), respectively. The correction of the CFO will follow naturally by using a Numerically Controlled Oscillator (NCO) once has been estimated. In our implementation a novel CORDIC rotator is used in its accumulation mode of operation to compute the arctangent and its rotation mode is used to realize the NCO operation [10] , [11] , [12] .
C. Symbol Timing Estimation
Unlike to what was done during CFO estimation, where the periodicity of the short preamble symbols was the main feature exploited by the estimator, the symbol timing estimation will be obtained by exploiting the direct knowledge of the long preamble symbols.
The main block in the symbol timing estimator is a crosscorrelator. Its purpose is to compare the input frame with a reference signal, which is directly obtained from the long preamble symbol. The proposed crosscorrelator can only be applied once the samples of the incoming frame have been fully corrected by the NCO and contain no frequency offset.
The fraction of the long preamble symbol selected as the crosscorrelator reference c REF (m) is shown in Fig. 2 and corresponds to the sequence defined as T 1 . The reference has a length of 32 complex samples, which is the shortest possible length for this reference in order to obtain appropriate results after correlation. Under an implementation point of view, the complex crosscorrelator is usually a "weak" point in modern communication circuit designs because of its computation complexity, i.e. it requires a large number of complex multipliers and needs large silicon area. Having this in mind, in this implementation we applied a simplified scheme for the crosscorrelator, with simple XNOR 1-bit multipliers that substitute the commonly used complex multipliers. Instead of multiplying b-bit complex numbers, the XNOR multiplier performs only the multiplication of the sign bits of the complex input values, considering the Most Significant Bit (MSB) to be '1' when the sample is positive or zero and '0' when it is negative. Based on this, the reference sequence being used in the crosscorrelator is as follows:
according to the preamble defined in [1] , where ( * ) stands for complex conjugate.
When the preamble symbols go through the crosscorrelator, the output shows two major peaks at instants m 1 and m 2 , Fig. 2 . Both peaks will occur when the portions T 1 of the long preamble symbols are inside the crosscorrelator. For our purpose it is enough to detect the first peak by setting a certain threshold at the output of the crosscorrelator. More sophisticated methods based on an active peak search may be used at the expense of increased latency. The 64 samples coming immediately after the first peak, i.e. the sequence {T 2 , T 1 } will be fed into the FFT in order to extract the reference CTF. In the 802.11a standard the long preamble symbol is defined as the sequence {T 1 , T 2 }, i.e. in our case a cyclic delay of 32 samples is introduced into a sequence of 64 samples. Therefore, the resulting sequence after FFT calculation has to be multiplied by (−1) k , k = 0, 1, 2, ... 63, in order to eliminate the remaining linear phase.
By observing Fig. 2 we see that the preamble contains the sequence {T 2 , T 1 } twice, i.e. by averaging these two sequences one may reduce the noise power by 3 dB in the reference CTF. Note that in our case, as a measure to reduce the signal processing latency, only one preamble symbol is used to initialize the CE, which implies a penalty of 3 dB in the SNR. This problem will be treated in the next section, when discussing the CE itself.
III. THE CHANNEL ESTIMATOR
The CE deals with the estimation and correction of the filtering affecting the OFDM signal. This filtering is mainly due to the multipath transmission channel found in wireless communications, but several filters located in the transceiver hardware play an important role as well. As a result, the OFDM symbols are extended in time by an amount equal to the summation of the impulse response lengths of all the filters involved in the transmission and reception chain. Such an extension provokes the leakage of a symbol into the successive one, resulting in ISI. One interesting feature of OFDM signals is their capability to overcome the ISI when appending a Cyclic Prefix (CP) of length N G to each transmitted OFDM symbol. This has two main advantages: on one hand, the possible leakage from the previous symbol is fully absorbed as long as it is shorter than the cyclic extension. On the other hand, the examination of the OFDM symbols in the frequency domain (after DFT) arises to be much more convenient since now the overall filtering appears inside the OFDM symbols as complex multiplicative factors affecting each of the subcarriers. In view of this fact, channel correction becomes much easier since it can be realized by means of a complex division in the frequency domain.
The proposed CE algorithm is based on the CD3 (Coded Decision-Directed Demodulation) solution given by Mignone and Morello in [15] . The CD3 is a decision-directed method, whose main advantage is based on the fact that pilot subcarriers are not necessary for channel estimation, thus increasing the amount of information transmitted on each OFDM symbol. However, there are a number of issues not considered in [15] that make pilot sub-carriers truly necessary, as it will be seen later. In this section we propose the modification of the CD3 channel estimator in order to accommodate two key blocks that will significantly simplify the signal processing required for reliable channel estimation. These two blocks are the Noise Reduction Filter and the Residual Phase Error estimator.
A. Noise Reduction Filter
As it was shown in Section II, the synchronizer provides a reference channel estimation that is used to initialize the CE. Due to the selected architecture, the reference is obtained from a single preamble symbol and hence, a 3 dB penalty in the initial channel estimation occurs. The NRF should help in compensating this penalty by means of the so-called LowRank approximation. This approach was firstly proposed in [16] , [17] for the case in which pilot tones can be used for channel estimation. In our situation the concept is extended to the case where pseudo-pilots are available, i.e. when the CTF (frequency domain) is estimated based on a previous estimation of the received data. The basic idea hinges on the assumption that the Channel Impulse Response (CIR, time domain) is always shorter than the CP of length N G found at each OFDM symbol. Hence, if an estimation of the CTF is available on vectorĤ, this estimation can be improved by forcing the corresponding CIR, i.e.ĥ = IDFT{Ĥ}, to be shorter than the CP. This is done by setting to zero all those samples in vectorĥ that fall beyond the CP limit since they are considered to be noise. This is equivalent to eliminate the noise components that are orthogonal to the signal of interest. For a particular OFDM symbol l, this operation can be expressed in matrix form as follows
with
whereĤ l is a N×1 vector with the original CTF estimation for symbol l,H l is the "cleaned" CTF estimation, F is the N-point IDFT matrix, F H is the N-point DFT matrix and ( H ) stands for Hermitian transpose. The matrix W is a N×N matrix with the form,
with I standing for the N G ×N G identity matrix (N G < N). The matrix W windows the IDFT ofĤ l . The matrix Θ DF T is referred to as the Noise Reduction Matrix (NRM) with dimension N×N. The problem in fact is more complex than this, since in a real scenario not all N sub-carriers are data-bearing subcarriers. An example is the 802.11a standard, where only N u out of N sub-carriers contain information, with N u = 52 and N = 64. In this case the NRM cannot be obtained as in (10), since now the vectorĤ l is a column vector with N u elements, whereas Θ DF T is a N×N matrix. A solution for this particular case is provided in [18] , yielding a N u ×N u matrix Θ NRM as follows,
where γ is a dummy parameter, 0 < γ << N −1 , used to prevent possible numerical instability in the matrix inversion. The matrices F 11 , F 12 , F 21 , and F 22 are of dimensions N G ×N u ,  N G ×(N − N u ), (N − N G )×N u , and (N − N G )×(N − N u ) , respectively, and are made of elements N −N u ) . The resulting matrix Θ NRM is shown in Fig. 5(a) for the case N = 64, N u = 52, N G = 16. It contains 2,704 complex elements, which must be pre-computed and stored. By means of Θ NRM , a noise reduction factor given by υ dB = 10 · log 10 (N 2 /(N G · N u )) can be achieved. In the 802.11a case this reduction is as high as 7 dB. It should be noted that the matrix Θ NRM is fixed once N, N u and N G have been selected.
The noise reduction concept explained above might be significantly simplified if the NRM is determined not based upon the DFT but on the DCT (Discrete Cosine Transform). Although the DCT is closely related to the DFT, it has a major ability to project energy onto a few transformed coefficients than the DFT has. Nevertheless, according to our design premise,ĥ l = IDFT{Ĥ l } should have its energy projected onto a few coefficients. As a means to reduce the CTF estimation noise, the pseudo-CIRĥ pseudo,l = IDCT{Ĥ l } may be used instead ofĥ l . The DCT-based NRM can be written as
where C stands for the N u -point IDCT matrix, C H is the N u -point DCT matrix, and W is built as in (11) but now with dimensions N u ×N u . In Fig. 5(a)/(b) it can be seen that both matrices, Θ NRM and Θ DCT , have a very similar magnitude shape, with their major coefficients concentrated around the main diagonal. Nevertheless, the matrix Θ DCT only contains real values whereas Θ NRM is made of complex coefficients. More interestingly, it is not necessary to pre-calculate the matrix Θ DCT , as it is the case for Θ NRM , but we might calculate a forward and reverse N u -point (52-point in case of the IEEE 802.11a standard) DCT on the vectorĤ l in order to reduce the CTF estimation noise.
B. Residual Phase Correction
After FFT calculation and channel correction, a residual phase error remains in the modulated data due to several factors: errors in the estimation of the STO and CFO, Phase Noise, and uncorrected SCFO. When applying the CE algorithm it is considered that the transmitted pilots were assigned the values {±1}. Furthermore, the channel is supposed not to change significantly during a period of L OFDM symbols, L being the latency of the CE, so that after channel correction and in the absence of noise the resulting pilots are pure phasors with normalized magnitude given by
and k is the frequency index; ξ is the sampling error (in ppm), α l is the contribution of the Phase Noise (the so-called Common Phase Error) to symbol l, and c 0 is the phase derived from a residual CFO. The method we propose [19] , [20] assumes the condition |δ · k| << 1 be satisfied ∀k ∈ [−26, +26]. In this case (15) may be simplified by considering a first order approximation of the complex exponential, yielding
In (16) four parameters are of interest namely, cos(θ l ), sin(θ l ), δ · sin(θ l ), and δ · cos(θ l ). In order to find these four parameters we must solve the linear system of equations derived from (16) when setting k = −21, −7, +7 and +21, corresponding to the pilot tones. Hence, the parameters cos(θ l ) and sin(θ l ) can be found straightforwardly as
Regarding the parameters δ · sin(θ l ) and δ · cos(θ l ), the exact expressions are as follows,
,l }, which have been modified in order to simplify the scaling by the factor 1/126, yielding
The foregoing method saves a significant amount of hardware, since neither an arctangent block nor an NCO is needed for RPE estimation and correction, respectively.
IV. THE DIGITAL TIMING LOOP
The general scheme of the IRx shown in Fig. 1 includes a so-called Digital Timing Loop (DTL). The purpose of the DTL is to estimate and correct the SCFO. Each OFDM symbol is composed of 80 samples, before CP extraction and FFT operation, with a sampling rate f s = 20 MHz. In the case of a sampling oscillator with e.g. 20 ppm frequency error, this turns into f s = 20,000,400 Hz. Thus 80.0016 samples are obtained for the initial symbol instead of exactly 80, i.e. a timing error of 0.0016 samples. This timing error is not fixed, but it will be 0.0032 samples for the second symbol, 0.0048 for the third one and so on. In essence, the SCFO will be observed as a dynamic timing error that has to be monitored throughout reception. Considering the case of a 6 Mbps transmission, the 802.11a standard allows a frame length of up to 1,367 data symbols, which means that the last OFDM symbol will be affected by a timing error of about 2.2 samples. In our consideration the total SCFO may be as high as 80 ppm (combining the effects from Tx and Rx), yielding a maximum accumulated timing error of 8.8 samples. Since the timing error appears as a linear phase after FFT operation, pilots are very well suited to estimate it. The method shown in the foregoing section for RPE estimation and correction is a posteriori, i.e. no attempt is done to correct the main sources causing the phase error, but only the phase error itself. Hence, we need not only a method to estimate the SCFO based on the pilots but also a way to correct for it prior to FFT operation in order to avoid ICI.
A. Timing Error Discriminator
In a first stage, the variable timing error must be estimated. In the estimation we make use of the phase error signal provided by the RPE estimator, i.e. P 
where p corresponds to the pilot sub-carrier position and l is the symbol index. The RPE signal P φ k,l is compared with these two references through correlation, thus yielding
where V early (l) and V late (l) are Gaussian noise components. In (20) it has been considered that pilot sub-carriers are at position p = p 0 + i · Δ, with 0 ≤ i ≤ P − 1, P being the total number of pilots per OFDM symbol, and Δ the pilot distance. The approximation done in (20) applies when P φ k,l adheres to the approximation in (15) . The total timing error (in samples) at symbol l in (20) is
where t θ is a residual timing synchronization error (|t θ | < 0.5 samples),t θ,l is an estimation of t θ at symbol l, and ξ stands for the SCFO (in ppm). In (20) it is further considered that |Δt l | ≤ 0.5. After low-pass filtering (19), we finally obtain the timing discriminator as follows (sub-index l has been omitted for clarity)
where P = 4, Δ = 14 and N = 64 in the 802.11a case.
B. Timing Error Correction
The parameter of interest is the relative error existing between the sampling period T s at the Analog-to-Digital converters (ADC), and the corrected (ideal) sampling time T I , i.e. T I /T s = 1 + ξ. These two sampling periods are related as follows,
where 0 ≤ μ i < 1 is the fractional delay, m i is the basepoint, i is the discrete timing variable after timing correction (integer value), whereas τ I represents a fractional part of T I . The function x rounds x to the nearest integer towards minus infinity. The timing error compensation is driven by a control block (see Fig. 1 ), which contains a control word, w(l). This parameter is updated on a symbol basis, i.e. every 4 μs, and provides the latest estimate of the ratio T I /T s as follows,
being a (l) as in (19) . The parameter K e defines the bandwidth of the low-pass filter in (26) and it was selected to be 0.01. The parameter K w is given as
where S max is the maximum value of S (Δt) in (22) . The parameters m i and μ i used in the variable interpolator will be recursively computed as explained in [7, page 523] . We already expressed i · T I + τ I · T I as a function of (m i , μ i ) in (23) . The next sample (i + 1) · T I + τ I · T I is given by
By replacing in (27) the unknown ratio (T I / T s ) by its estimate w (m i ) we obtain
From the previous, it readily follows the recursion for the estimates,
In order to obtain the value for μ i based on the control word w (m i ) we define the function
with d = 0, 1, 2, ... At the basepoint m i the value η (m i , 0) is stored in a bbit register. At every T s cycle the value of the register is decremented by 1, i.e. 
The timing error correction block in Fig. 1 is based on a first order Lagrange polynomial interpolator and makes use of a Farrow structure [7] , [22] . Higher order interpolators cannot be used since the DTL becomes unstable. The reason for this instability is related to the considerations made in (16) for calculation of P φ k,l , which no longer hold when high order interpolators are used.
V. SIMULATION RESULTS
This section analyzes the performance of the synchronizer, the CE and the DTL under different transmit conditions through extensive computer simulations. We already mentioned in Section II that the synchronizer was mainly affected by the AGC. Since the attenuation suffered by the transmitted OFDM frame is unknown to the receiver, an AGC able to apply a variable amplification is mandatory prior to ADC. The AGC should be capable of keeping the signal inside a certain voltage range given by the bias voltage of the ADCs. The frame detector found in the synchronizer should be robust against two main effects caused by the AGC: 1) Since the AGC is not able to distinguish the signal of interest from the noise, in the absence of any signal the noise will be amplified in the worst case to the voltage limits of the ADCs. These high noise levels should not provoke false frame detections. 2) In a high SNR situation, the AGC has to change very quickly from a high amplification level to a lower one when the signal is received. Since the AGC cannot react instantaneously to sudden changes in the input power level, the AGC output signal will be heavily saturated for a certain time. The simulation results related to the synchronizer are depicted in Fig. 6 . A channel model A as given in [23] together with a normalized CFO of +1.2 are used in all cases. This corresponds to a Non-Line-Of-Sight (NLOS) channel with a maximum delay spread of 390 ns (50 ns rms). The results for the False Alarm Probability (FAP) are shown in Fig. 6(a) . The model used for the AGC considers that only amplitude distortions (saturation) but no phase distortions are introduced into the signal, since these may lead to false frequency estimations. The filter parameters in the feedback loop of the AGC where selected in order to achieve a settling time in which approximately 64 samples (3.2 μs) of the preamble symbols where completely saturated at SNR = 35 dB (worst case settling time). In the definition of FAP used in our simulations, a frame was considered to be correctly detected when the detected starting point was inside a range of ±16 samples from the "ideal" point, i.e. when no AGC and no channel are used. Fig. 6(a) shows that the FAP decreases with increasing SNR until a certain value of SNR is reached. From this point on, the distortion due to saturation becomes the dominant effect on the preambles and the FAP degrades as the SNR increases. Nevertheless, since saturation is easily detectable at the ADC, the previous effect can be highly mitigated by setting to zero all those saturated samples before being delivered to the frame detector. The obtained standard deviation for the normalized frequency offset estimator shows no dependency on the AGC and has a minimum bound of 0.01, i.e. 1% of the sub-carrier spacing. This value helps in determining the number of bits necessary to represent the frequency offset in the arctangent calculator used in the synchronizer. Finally, Fig. 6 (b) depicts the Timing Error Probability (TEP) derived from the crosscorrelator. Symbol timing is provided by the position of the first significant peak coming out from the crosscorrelator. The ideal position of the peak, i.e. m 1 in Fig. 2 , is known beforehand and a timing error occurs when the estimated position of the peak differs in more than ±2 samples from the ideal position. Nevertheless this definition of the timing error only makes sense if the CP of the symbol being received immediately after the preamble symbols is considered to be 14 samples long instead of 16 in order to compensate positive timing errors. Four possible versions of the crosscorrelator have been tested depending on the length of the reference signal c REF (m), either 32 or 64 samples, and the type of multiplier, either 1-bit XNOR-based multipliers (Hard crosscorrelator) or floating point multipliers (Soft crosscorrelator) with the number of bits determined by the computer on which the simulation is being run. Results shown in Fig. 6(b) indicate that the selection of a 32-sample Hcrosscorrelator may not be appropriate and should be increased to 64 samples. Despite of these results, our first version of the synchronizer considers only a reference of 32 samples in order to reduce the signal processing latency as much as possible. Fig. 7 depicts the Mean-Square Error (MSE) performance of the proposed channel estimator considering all data rates defined for the 802.11a standard. Channel models A and D [23] are considered in the simulations. Channel D corresponds to a Line-Of-Sight (LOS) channel with a maximum delay spread of 1,050 ns (140 ns rms). In both cases a Doppler frequency of 58 Hz (v = 3 m/s, F C = 5,805 MHz) was used. In order to smooth the simulation results, we firstly tested 20 different seeds and looked for the one representing an average channel. This seed was used afterwards for the MSE estimation. Each point in Fig. 7 is obtained after averaging the MSE in 10 trials where a frame containing 37 OFDM data symbols is transmitted at each trial. Furthermore, six soft bits were used during demodulation together with a traceback length of 50 bits in the Viterbi decoder. In order to reduce complexity, a hard-output Viterbi decoder was considered. The figures show a substantial improvement in the MSE when a LOS channel is present. The abrupt decrease of the MSE indicates the point from which on the Viterbi decoder is able to provide fully correct output bits. The correctness of these bits is crucial in order to assure the stability of the CE, specially at the higher transmission rates. Furthermore, Fig. 7 also shows that it will be extremely difficult to obtain the maximum data rates (48 and 54 Mbps) in a real wireless channel, even with LOS, since these rates require a SNR well above 30 dB. The standard 802.11a [1] specifies a Packet Error Rate (PER) of 10% measured on 1000-byte frames, which is equivalent to a BER = 1.25e-5. Fig. 8 shows the results of our MonteCarlo BER simulations based on 1000-byte frames. The same channel seed as in Fig. 7 was used in Fig. 8 . It can be seen from Fig. 8 that the higher modulation schemes require very high SNR in order to achieve the minimum BER and we may use them only in very limited scenarios.
Finally, Fig. 9 shows the simulation results for the timing control loop. We simulated only two transmission modes, i.e 12 and 54 Mbps, and represented the Error Vector Magnitude (EVM) as defined in [1] . Frames with 152 OFDM data symbols were generated in all the cases, since this is the maximum number of OFDM data symbols per frame in the 54 Mbps case. The clock error was set to ξ = -80 ppm, which represents a worst case scenario where the actual sampling frequency is below the reference value. Though an ideal channel estimator was taken into consideration, the effects derived from the processing latency involved in the decision-directed channel estimator are included in the simulation results. Hence, the 12 Mbps case involves a processing latency of three OFDM symbols. For the 54 Mbps case, the processing latency is only one OFDM symbol. As it can be seen from Fig. 9 , the proposed solution achieves an improvement in terms of EVM in both cases, although this improvement is less significant in case of a NLOS channel. 
VI. CONCLUSION
We have investigated the implementation of the Inner Receiver of an OFDM-WLAN system based on the IEEE 802.11a standard. Solutions for the most critical blocks, i.e. Synchronizer, Channel Estimator and Digital Timing Loop, have been proposed and analyzed under careful consideration of nearly realistic transmit conditions. Hence, although our investigations reveal that the Synchronizer is strongly influenced by the gain control, the proposed architecture is shown to be relatively robust against the AGC effects. Regarding the Channel Estimator, a decision-directed architecture has been examined. Two novel solutions have been incorporated into the design in order to improve the performance. Firstly, a novel DCT-based noise reduction filter exploits the energy compression capabilities of the DCT as a means to reduce the channel estimation noise with a moderate computational load. Secondly, the residual phase error is eliminated by means of an innovative estimator that extremely simplifies the traditional solution based on arctangent plus NCO operation. In order to derive a simple time tracking algorithm we have made use of concepts already established in the literature. However, the way these concepts are applied to an OFDM receiver is novel in our solution. The proposed solution has proven to be applicable in both LOS and NLOS channels. However, the performance of the DTL is limited by the fact that only first order Farrow interpolators assure stability of the algorithm. 
