Efficient inner-receiver design for OFDM-based WLAN systems: Algorithm and architecture by Troya, Alfonso et al.
1374 IEEE TRANSACTIONS ON WIRELESS COMMUNICATIONS, VOL. 6, NO. 4, APRIL 2007
Efﬁcient Inner Receiver Design for OFDM-Based
WLAN Systems: Algorithm and Architecture
Alfonso Troya, Member, IEEE, Koushik Maharatna, Member, IEEE, Milo˘ sK r s t i ´ c, Eckhard Grass,
Ulrich Jagdhold, and Rolf Kraemer, Member, IEEE
Abstract—In this article we propose a complete solution for the
so-called Inner Receiver of an OFDM-WLAN system based on
the IEEE 802.11a standard. We concentrate our investigations
on three key components forming the Inner Receiver namely,
the Synchronizer, the Channel Estimator and the Digital Timing
Loop. The main goal is the joint optimization of the signal
processing algorithms along with the implementation friendly
VLSI architecture required for these three key components in
order to reduce power, area and latency, without compromising
the performance excessively. We provide both the mathematical
details and extensive computer simulations to validate our design.
Index Terms—Channel estimation, OFDM, synchronization,
wireless LAN.
I. INTRODUCTION
T
HE use of the OFDM (Orthogonal Frequency Division
Multiplex) transmission technique has gained a lot of
interest in the recent years due to its spectral efﬁciency
and capability to overcome multi-path fading. In this paper
we concentrate on the OFDM-WLAN (Wireless Local Area
Network) systems, which are already a reality thanks to the
IEEE 802.11a/g standards [1], [2]. The application of OFDM
is not restricted to these two standards, but new standardization
processes already foresee the application of OFDM in future
WLAN [3] and UWB (Ultra Wideband) systems [4].
The key property of OFDM is orthogonality. By this prop-
erty the system uses the input data to modulate a number
of mutually orthogonal sub-carriers. This technique facilitates
a high data rate transmission system. However, the whole
system performance depends on maintaining the orthogonality
of the sub-carriers. If the orthogonalityproperty gets disturbed,
unwanted effects such as Inter-Carrier Interference (ICI) and
Inter-Symbol Interference (ISI) will occur during signal recep-
tion. In general, the orthogonality property of the sub-carriers
can be disturbed during the RF Up- and Down-conversion. On
top of that the characteristic of the transmission channel may
also affect the orthogonality condition. A number of authors
Manuscript received July 5, 2005; revised January 11, 2006 and May 29,
2006; accepted July 17, 2006. The associate editor coordinating the review
of this paper and approving it for publication was C. Xiao.
A. Troya was with IHP, Frankfurt (Oder), Germany. He is now with
Inﬁneon Technologies AG, COM PS CE ALG, 81726 Munich, Germany (e-
mail: alfonso.troya@inﬁneon.com).
K. Maharatna is with the University of Southampton, University Road,
Southampton, SO17 1BJ, UK (e-mail: km3@ecs.soton.ac.uk).
M. Krsti´ c, E. Grass, U. Jagdhold, and R. Kraemer are with IHP,
Frankfurt (Oder), Germany (e-mail: {krstic, grass, jagdhold, kraemer}@ihp-
microelectronics.com).
Digital Object Identiﬁer 10.1109/TWC.2007.05481.
Fig. 1. General block diagram of the proposed Inner Receiver.
have addressed the impact of this type of impairments on
OFDM signals in the past years [5], [6]. Thus, in order to make
the system work efﬁciently, we need to re-establish the orthog-
onality condition at the receiver. The so-called Inner Receiver
(this term was ﬁrstly coined by Heinrich Meyr [7]) is used for
this purpose. In essence, there are two main operations carried
out inside the Inner Receiver (IRx) namely Signal Acquisition
and Channel Correction as shown in Fig. 1. The acquisition
operation is performed by means of a synchronization block,
which should be able to perform reliable Frame Detection
(FD), and to provide estimations for the Carrier Frequency
Offset (CFO) and Symbol Timing Offset (STO). The channel
correction operation is needed to estimate and compensate the
Channel Transfer Function (CTF), provided that orthogonality
has been restored to a great extent by the synchronizer. The
ﬁnal goal is to supply the decoding and demodulator block
with In-phase and Quadrature components that are as similar
as possible to the original ones.
Though the IRx is an integrated part of the OFDM-
based WLAN system, its design complexity is frequently
underestimated. Unfortunately the standards do not provide
in general any hints on how to implement the IRx, but it is
left as a developer’s task. In this article we investigate an
efﬁcient realization of the IRx for IEEE 802.11a systems both
from the algorithm and VLSI (Very Large Scale Integration)
implementation point of view, and provide a complete and
practical solution for it. The results developed in this work are
applicable to the future standards [3]. In order to develop our
solution we start with the algorithm level formulation of the
desired functionality of the IRx. The algorithmic development
has been considered strictly in conjunction with the possible
architectural feasibility of an ASIC (Application Speciﬁc
Integrated Circuit) implementation. Thus a joint algorithm and
architecture optimization has been undertaken using power
consumption, silicon area, system latency and overall noise
performance as the “quality/efﬁciency” parameters for the
system. The power consumption and silicon area have been
1536-1276/07$25.00 c   2007 IEEETROYA et al.: EFFICIENT INNER RECEIVER DESIGN FOR OFDM-BASED WLAN SYSTEMS: ALGORITHM AND ARCHITECTURE 1375
Fig. 2. Preamble symbols as deﬁned by the 802.11a standard together with
the timing schedule followed inside the Synchronizer.
considered as two of the main parameters since the system is
targeted for mobile and portable applications where saving of
battery life as well as the total size of the system are crucial.
Latency has been considered from the operation principle of
the IEEE 802.11a MAC (Medium Access Control) protocol
[1].
Different parts of the present work have been published in
different renowned conferences in short form [8], [10], [11],
[19]. In this paper we provide a much more detailed and
integrated view of the complete IRx solution. The rest of the
present paper is organized as follows: after introduction, the
main components of the IRx are investigated. Subsequently, an
efﬁcient synchronizer architecture is examined in Section II,
whose main architecture was foreseen by the authors in [8],
[9]. Section III is devoted to the analysis of a decision-directed
Channel Estimator (CE). Two blocks are the main focus of
our investigations, namely the Noise Reduction Filter (NRF)
and the Residual Phase Error (RPE) correction block. The
proposed timing loop is analyzed in detail in Section IV and
provides a simple method to compensate for the Sampling
Clock Frequency Offset (SCFO) based on the RPE estimation
supplied by the CE. Section V presents simulation results
which show the performance features of the proposed solu-
tions. Finally, in Section VI, some important conclusions are
derived.
II. THE SYNCHRONIZER
The synchronizer is the block responsible for signal acquisi-
tion. This term encompasses a number of operations that need
to be performed in a very limited period of time in order to
minimize latency. For our purpose, synchronization must be
ﬁnished within the preamble time, i.e. 16 μs, and the following
operations must be performed based on the preamble symbols:
1) Frame detection.
2) Determination of the symbol timing.
3) Carrier frequency offset estimation and correction.
4) Extraction of the reference channel estimation.
The order in which these operations are carried out strongly
determines the architecture of the synchronizer. The preamble
symbols in the 802.11a standard comprise a number of peri-
odic sequences as shown in Fig. 2. This periodic structure
suggests a solution based on autocorrelators [13], [14].
The proposed implementation shown in Fig. 3 contains two
autocorrelators. Each one encompasses a delay line (FIFO-
type buffer) of length Nd, a complex conjugate operation, a
complex multiplier, and a moving average of length Navg.
The moving average is an FIR ﬁlter of length Navg with all
its coefﬁcients being 1. Let’s consider the input signal r(m)t o
be sampled at frequency f s and affected by a CFO f   =  Δf,
where   stands for the normalized CFO, and Δf is the sub-
carrier spacing in the OFDM signal (Δf = 312.5 KHz in the
802.11a). Hence, the input signal r(m) can be expressed as
r(m)=s(m) · e
j2π 
Δf
fs m + v (m), (1)
where j =
√
−1, s(m) is the original time sequence and
v(m) represents a zero-mean white Gaussian noise process.
According to (1), the autocorrelator’s output signal Jx(k)i s
given by
Jx(k)=
Navg−1  
l=0
r∗(l − k) · r(l − k − Nd)
= e
−j2π 
Δf
fs Nd ·
Navg−1  
l=0
s∗(l − k) · s(l − k − Nd)
+
Navg−1  
l=0
s∗(l − k) · v(l − k − Nd) · e
−j2π 
Δf
fs ·(l−k)
+
Navg−1  
l=0
v
∗(l − k) · s(l − k − Nd) · e
j2π 
Δf
fs ·(l−k−Nd)
+
Navg−1  
l=0
v
∗(l − k) · v(l − k − Nd), (2)
where the sufﬁc x represents either F or C in Fig. 3.
By considering the sequence s(m) to be uncorrelated with
the noise sequence v(m), the last three summands in (2) can
be neglected for sufﬁciently large values of Navg, yielding
Jx(k) ≈ e
−j2π 
Δf
fs Nd ·
Navg−1  
l=0
s∗(l − k) · s(l − k − Nd)
= e
−j2π 
Δf
fs Nd ·
Navg−1  
l=0
|s(l − k)|
2 , (3)
where it has been considered that the signal s(m) is periodic
with a period of Nd samples, i.e. s(m)=s(m-Nd). From (3) it
is straightforward to see that the phase of Jx(k) is only due
to  , and hence   could be estimated as follows
ˆ   =
fs
2π · Nd · Δf
· tan
−1(J
∗
x(k)). (4)
However, there is an important factor that destroys the
periodicity, making s(m)  = s(m − Nd), i.e. the Automatic
Gain Control (AGC) settling time, whose inﬂuence is analyzed
through simulation in Section V. If Navg is a multiple of the
minimum periodicity in the preambles (16 samples in case of
the 802.11a, Fig. 2) the signal |Jx(k)|
2 shows a plateau in the
region where the phase of Jx(k) only depends on the CFO,
as shown in Fig. 4 for |JF(k)|
2. The arctangent operation in
(4) is bounded in the range [−π, +π). This means that (4) is
also bounded as follows:1376 IEEE TRANSACTIONS ON WIRELESS COMMUNICATIONS, VOL. 6, NO. 4, APRIL 2007
Fig. 3. General scheme of the proposed synchronizer for the IEEE 802.11a standard. The operation tg−1(x) represents the arctangent of x.
Fig. 4. Signals involved in the Frame Detection algorithm.
|ˆ  | <
fs
2 · Nd · Δf
. (5)
Since the ratio (fs/f) is a ﬁxed parameter, the range of
estimation of   will only depend on the selected delay Nd in
the autocorrelator. In the 802.11a we ﬁnd that (fs/f)=6 4 ,
resulting in |ˆ  | < 0.5 for Nd =6 4and |ˆ  | < 2.0 for Nd =1 6 .
A. Frame Detection Mechanism
The ﬁrst operation to be carried out by the synchronizer is
FD. We decided to make use of the particular shape of the
signal |JF(k)|
2 in order to derive a simple frame detector.
Consequently, if we are able to detect the plateau in |JF(k)|
2
(see Fig. 4), this will be the indication that a frame is being
received. The proposed plateau detector contains two blocks
namely, a differentiator and a peak detector, as depicted in
Fig. 3.
The differentiator should indicate the point where the
plateau starts. The differentiated signal Jdiff(k) is obtained
as follows:
Jdiff(k)=|JF(k)|
2 −| JF(k − Ndiff)|
2 , (6)
where Ndiff simply deﬁnes the delay applied by the differ-
entiator. The signal Jdiff(k) is also shown in Fig. 4 with
Ndiff =3 2 .
The autocorrelation block together with the differentiator
and the peak detector constantly “peer” the channel. When the
peak detector identiﬁes an absolute maximum at the output of
the differentiator, the synchronizer will consider that a new
frame has arrived and the CFO estimator will be activated.
However, due to the noise and more importantly, due to
the Automatic Gain Control, the peak detection will not be
a trivial task and a smart algorithm will be necessary in
order to distinguish absolute from relative maxima [8], [9].
For this purpose the peak detector is also divided into two
blocks, namely group peak detector and instantaneous peak
detector, as shown in Fig. 3. The instantaneous peak detector
is basically a combination of a comparator and a counter. The
present sample Jdiff(k) coming out from the differentiator is
compared with the last recorded maximum Jmax (Jmax =0
at k = 0). As long as the sample Jdiff(k) is bigger than Jmax,
the register storing Jmax will be updated with the new sample
Jdiff(k) as the new encountered maximum and the counter
will be reset. If Jdiff(k) is smaller or equal than Jmax,t h e
counter will be triggered and it will increase its count by
one. If this situation persists until the counter overﬂows, the
instantaneous peak detector will activate a signal stating that
a relative peak has been found inside the counting scope of
the counter. The group peak detector is used to detect the
falling edges in Jdiff(k), and its main component is also
a comparison block. There, the input signal is accumulated
in groups of six samples (6-tuples) and the present group is
compared with the previous one. If it is smaller, it means
that the falling slope has started. If the group peak detector
ﬁnds a falling edge at the same time as the instantaneous peak
detector ﬁnds a relative peak, then the detected peak is actually
an absolute peak. In the situation where no AGC is present, the
signal |JF(k)|
2 shows a plateau of 32 samples. Consequently,TROYA et al.: EFFICIENT INNER RECEIVER DESIGN FOR OFDM-BASED WLAN SYSTEMS: ALGORITHM AND ARCHITECTURE 1377
the parameter Ndiff in (6) was selected to be 16 samples, thus
making the FD algorithm to detect the plateau in |JF(k)|
2 at
its middle point [8]. This fact justiﬁes the deﬁnition of False
Alarm Probability done later in Section V.
B. Carrier Frequency Offset Estimation and Correction
According to the speciﬁcations in the 802.11a standard [1],
all the clocks and carrier signals for the transceiver should
be generated from the same crystal oscillator, which should
have a maximum relative frequency error of ±20 ppm. Let’s
consider an example in which a signal is received at the
highest possible carrier frequency of 5,805 MHz (operating
channel 161 in the U-NII upper band). The total frequency
deviation during down-conversion is then given by 5,805 ·
±20 = ±116.1 KHz. The whole transmit-receive process
introduces an overall carrier frequency error of |f | = 232.2
KHz. Normalizing this value with respect to the sub-carrier
spacing, Δf = 312.5 KHz, we ﬁnd the maximum normalized
CFO to be |f /Δf| = 0.75. The present implementation in
Fig. 3 considers the frequency offsets to be in the range
±1.5, i.e. twice the maximum value required by the standard.
This decision is based on a pessimistic approach and was
justiﬁed by the fact that functional tests had to be carried out
using experimental Analog Front-Ends (AFE), which were not
entirely fulﬁlling the speciﬁcations.
Two autocorrelators with Nd = 64, Navg = 64, and Nd = 16,
Navg = 16, respectively, are used. The autocorrelator with
Nd =6 4i su s e dt og e taﬁne estimation of the CFO (|α| < 0.5),
whereas the latter is used to obtain a coarse estimation of the
CFO (|β| < 2.0). Note that the deﬁnition of ﬁne and coarse is
not based on the range, but on the accuracy of the estimation,
i.e. the length of the moving average. Hence, although α is
bounded more restrictively compared to β, it will be less noisy
since its moving average is much larger. The ﬁnal normalized
CFO estimation   will be a combination of the values obtained
for α and β. Although β has a linear dependency throughout
the entire range of possible values of the CFO, i.e. ±1.5,t h i si s
not the case for α. Hence, the ﬁnal CFO estimation cannot be
directly a linear combination of the two estimations α and β.
Instead, β will only serve as a range pointer and will provide
the integer value of the frequency offset (either −1, +1 or 0),
whereas α will provide the fractional part of the estimation.
The ﬁnal value of   results from the following function,
  = α; if −0.25 ≤ β ≤ 0.25,
or (α ≥ 0 and 0.25 <β<0.75),
or (α<0 and −0.75 <β<−0.25),
  =1+α; if β ≥ 0.75,
or (α<0 and 0.25 <β<0.75),
  = −1+α; if β ≤− 0.75,
or (α ≥ 0 and −0.75 <β<−0.25). (7)
The estimation of the CFO will take place in one shot
exactly at the time instant when the FD detects the incoming
frame, since both autocorrelators exhibit a plateau at that
particular point of time. An arctangent calculator is necessary
to obtain α and β from JF(k) and JC(k), respectively.
The correction of the CFO will follow naturally by using
a Numerically Controlled Oscillator (NCO) once   has been
estimated. In our implementation a novel CORDIC rotator is
used in its accumulation mode of operation to compute the
arctangent and its rotation mode is used to realize the NCO
operation [10], [11], [12].
C. Symbol Timing Estimation
Unlike to what was done during CFO estimation, where the
periodicity of the short preamble symbols was the main feature
exploited by the estimator, the symbol timing estimation will
be obtained by exploiting the direct knowledge of the long
preamble symbols.
The main block in the symbol timing estimator is a cross-
correlator. Its purpose is to compare the input frame with
a reference signal, which is directly obtained from the long
preamble symbol. The proposed crosscorrelator can only be
applied once the samples of the incoming frame have been
fully corrected by the NCO and contain no frequency offset.
The fraction of the long preamble symbol selected as the
crosscorrelator reference cREF(m) i ss h o w ni nF i g .2a n d
corresponds to the sequence deﬁned as T1. The reference
has a length of 32 complex samples, which is the shortest
possible length for this reference in order to obtain appropriate
results after correlation. Under an implementation point of
view, the complex crosscorrelator is usually a “weak” point in
modern communication circuit designs because of its compu-
tation complexity, i.e. it requires a large number of complex
multipliers and needs large silicon area. Having this in mind,
in this implementation we applied a simpliﬁed scheme for
the crosscorrelator, with simple XNOR 1-bit multipliers that
substitute the commonly used complex multipliers. Instead
of multiplying b-bit complex numbers, the XNOR multiplier
performs only the multiplication of the sign bits of the
complex input values, considering the Most Signiﬁcant Bit
(MSB) to be ‘1’ when the sample is positive or zero and ‘0’
when it is negative. Based on this, the reference sequence
being used in the crosscorrelator is as follows:
cREF(31 : 0)∗ = {1 ,1,0,0,1+j,j,j,j,
j,0,1,1,0,0,1,1+j,
1+j,0,1+j,1+j,j,1+j,1,1+j,
j,j,1+j,1,1,1+j,j,1}, (8)
according to the preamble deﬁned in [1], where (∗) stands for
complex conjugate.
When the preamble symbols go through the crosscorrelator,
the output shows two major peaks at instants m1 and m2,
Fig. 2. Both peaks will occur when the portions T1 of the
long preamble symbols are inside the crosscorrelator. For our
purpose it is enough to detect the ﬁrst peak by setting a
certain threshold at the output of the crosscorrelator. More
sophisticated methods based on an active peak search may
be used at the expense of increased latency. The 64 samples
coming immediately after the ﬁrst peak, i.e. the sequence {T2,
T1} will be fed into the FFT in order to extract the reference
CTF. In the 802.11a standard the long preamble symbol is
deﬁned as the sequence {T1,T 2}, i.e. in our case a cyclic delay
of 32 samples is introduced into a sequence of 64 samples.1378 IEEE TRANSACTIONS ON WIRELESS COMMUNICATIONS, VOL. 6, NO. 4, APRIL 2007
Therefore, the resulting sequence after FFT calculation has to
be multiplied by (−1)k, k = 0, 1, 2, ... 63, in order to eliminate
the remaining linear phase.
By observing Fig. 2 we see that the preamble contains
the sequence {T2,T 1} twice, i.e. by averaging these two
sequences one may reduce the noise power by 3 dB in the
reference CTF. Note that in our case, as a measure to reduce
the signal processing latency, only one preamble symbol is
used to initialize the CE, which implies a penalty of 3 dB
in the SNR. This problem will be treated in the next section,
when discussing the CE itself.
III. THE CHANNEL ESTIMATOR
The CE deals with the estimation and correction of the
ﬁltering affecting the OFDM signal. This ﬁltering is mainly
due to the multipath transmission channel found in wireless
communications, but several ﬁlters located in the transceiver
hardware play an important role as well. As a result, the
OFDM symbols are extended in time by an amount equal to
the summation of the impulse response lengths of all the ﬁlters
involved in the transmission and reception chain. Such an
extension provokes the leakage of a symbol into the successive
one, resulting in ISI. One interesting feature of OFDM signals
is their capability to overcome the ISI when appending a
Cyclic Preﬁx (CP) of length NG to each transmitted OFDM
symbol. This has two main advantages: on one hand, the
possible leakage from the previous symbol is fully absorbed
as long as it is shorter than the cyclic extension. On the other
hand, the examination of the OFDM symbols in the frequency
domain (after DFT) arises to be much more convenient since
now the overall ﬁltering appears inside the OFDM symbols
as complex multiplicative factors affecting each of the sub-
carriers. In view of this fact, channel correction becomes much
easier since it can be realized by means of a complex division
in the frequency domain.
The proposed CE algorithm is based on the CD3 (Coded
Decision-Directed Demodulation) solution given by Mignone
and Morello in [15]. The CD3 is a decision-directed method,
whose main advantage is based on the fact that pilot sub-
carriers are not necessary for channel estimation, thus increas-
ing the amount of information transmitted on each OFDM
symbol. However, there are a number of issues not considered
in [15] that make pilot sub-carriers truly necessary, as it will
be seen later. In this section we propose the modiﬁcation of
the CD3 channel estimator in order to accommodate two key
blocks that will signiﬁcantly simplify the signal processing
required for reliable channel estimation. These two blocks
are the Noise Reduction Filter and the Residual Phase Error
estimator.
A. Noise Reduction Filter
As it was shown in Section II, the synchronizer provides
a reference channel estimation that is used to initialize the
CE. Due to the selected architecture, the reference is obtained
from a single preamble symbol and hence, a 3 dB penalty in
the initial channel estimation occurs. The NRF should help
in compensating this penalty by means of the so-called Low-
Rank approximation. This approach was ﬁrstly proposed in
[16], [17] for the case in which pilot tones can be used for
channel estimation. In our situation the concept is extended
to the case where pseudo-pilots are available, i.e. when the
CTF (frequency domain) is estimated based on a previous
estimation of the received data. The basic idea hinges on the
assumption that the Channel Impulse Response (CIR, time
domain) is always shorter than the CP of length NG found
at each OFDM symbol. Hence, if an estimation of the CTF
is available on vector ˆ H, this estimation can be improved
by forcing the corresponding CIR, i.e. ˆ h =I D F T {ˆ H}, to be
shorter than the CP. This is done by setting to zero all those
samples in vector ˆ h that fall beyond the CP limit since they are
considered to be noise. This is equivalentto eliminate the noise
components that are orthogonal to the signal of interest. For
a particular OFDM symbol l, this operation can be expressed
in matrix form as follows
˜ Hl =Θ
DFT · ˆ Hl, (9)
with
Θ
DFT = F
H · W · F, (10)
where ˆ Hl is a N×1 vector with the original CTF estimation
for symbol l, ˜ Hl is the “cleaned” CTF estimation, F is the
N-point IDFT matrix, F
H is the N-point DFT matrix and
(H) stands for Hermitian transpose. The matrix W is a N×N
matrix with the form,
W =
⎛
⎜
⎝
I 0 ··· 0
00··· 0
. . .
. . .
...
. . .
00··· 0
⎞
⎟
⎠ (11)
with I standing for the NG×NG identity matrix (NG <N ).
The matrix W windows the IDFT of ˆ Hl. The matrix Θ
DFT
is referred to as the Noise Reduction Matrix (NRM) with
dimension N×N.
The problem in fact is more complex than this, since in
a real scenario not all N sub-carriers are data-bearing sub-
carriers. An example is the 802.11a standard, where only Nu
out of N sub-carriers contain information, with Nu =5 2a n d
N = 64. In this case the NRM cannot be obtained as in (10),
since now the vector ˆ Hl is a column vector with Nu elements,
whereas Θ
DFT is a N×N matrix. A solution for this particular
case is provided in [18], yielding a Nu×Nu matrix Θ
NRM as
follows,
Θ
NRM = F
H
11 ·
 
F
11 − F
12 · F
+
22 · F
21
 
, (12)
with
F
+
22 =
 
F
H
22 · F
22 + γ2
 −1
· F
H
22, (13)
where γ is a dummy parameter, 0 <γ< <N −1,u s e dt op r e -
vent possible numerical instability in the matrix inversion. The
matrices F
11, F
12, F
21,a n dF
22 are of dimensions NG×Nu,
NG×(N −Nu), (N −NG)×Nu,a n d(N −NG)×(N −Nu),
respectively, and are made of elements Wnk
N = N−1/2 ·
exp{j(2π/N) · n · k}.TROYA et al.: EFFICIENT INNER RECEIVER DESIGN FOR OFDM-BASED WLAN SYSTEMS: ALGORITHM AND ARCHITECTURE 1379
(a)
(b)
Fig. 5. Noise reduction matrices being considered: (a) Reordered 52×52
DFT-based (ΘNRM); (b) 52×52 DCT-based (ΘDCT).
The matrix F
11 corresponds to n ∈ [0,N G −1] (rows 1 to
NG)a n dk ∈ [N − (Nu/2),N− 1] ∪ [1,N u/2] (columns 1
to Nu). F
12 corresponds to n ∈ [0,N G − 1] (rows 1 to NG)
and k ∈ [0] ∪ [(Nu/2) + 1,N− (Nu/2) − 1] (columns 1 to
N − Nu). F
21 corresponds to n ∈ [NG,N − 1] (rows 1 to
N −NG)a n dk ∈ [N −(Nu/2),N−1]∪[1,N u/2] (columns
1t oNu). Finally, F
22 corresponds to n ∈ [NG,N−1] (rows
1t oN − NG)a n dk ∈ [0] ∪ [(Nu/2) + 1,N− (Nu/2) − 1]
(columns 1 to N−Nu). The resulting matrix Θ
NRM is shown
in Fig. 5(a) for the case N = 64, Nu = 52, NG = 16. It contains
2,704 complex elements, which must be pre-computed and
stored. By means of Θ
NRM, a noise reduction factor given
by υdB =1 0· log10(N2/(NG · Nu)) can be achieved. In the
802.11a case this reduction is as high as 7 dB. It should be
noted that the matrix Θ
NRM is ﬁxed once N, Nu and NG
have been selected.
The noise reduction concept explained above might be
signiﬁcantly simpliﬁed if the NRM is determined not based
upon the DFT but on the DCT (Discrete Cosine Transform).
Although the DCT is closely related to the DFT, it has a major
ability to project energy onto a few transformed coefﬁcients
than the DFT has. Nevertheless, according to our design
premise, ˆ hl =I D F T {ˆ Hl} should have its energy projected onto
a few coefﬁcients. As a means to reduce the CTF estimation
noise, the pseudo-CIR ˆ hpseudo,l =I D C T { ˆ Hl} may be used
instead of ˆ hl. The DCT-based NRM can be written as
Θ
DCT = C
H · W · C, (14)
where C stands for the Nu-point IDCT matrix, C
H is the
Nu-point DCT matrix, and W is built as in (11) but now with
dimensions Nu×Nu. In Fig. 5(a)/(b) it can be seen that both
matrices, Θ
NRM and Θ
DCT, have a very similar magnitude
shape, with their major coefﬁcients concentrated around the
main diagonal. Nevertheless, the matrix Θ
DCT only contains
real values whereas Θ
NRM is made of complex coefﬁcients.
More interestingly, it is not necessary to pre-calculate the
matrix Θ
DCT, as it is the case for Θ
NRM, but we might
calculate a forward and reverse Nu-point (52-point in case of
the IEEE 802.11a standard) DCT on the vector ˆ Hl in order
to reduce the CTF estimation noise.
B. Residual Phase Correction
After FFT calculation and channel correction, a residual
phase error remains in the modulated data due to several
factors: errors in the estimation of the STO and CFO, Phase
Noise, and uncorrected SCFO. When applying the CE algo-
rithm it is considered that the transmitted pilots were assigned
the values {±1}. Furthermore, the channel is supposed not to
change signiﬁcantly during a period of L OFDM symbols, L
being the latency of the CE, so that after channel correction
and in the absence of noise the resulting pilots are pure phasors
with normalized magnitude given by
P
φ
k,l ≈ ej(δ·k+θl), (15)
where δ ∝ L · ξ, θl = L · c0 +( αl − αl−L) and k is the
frequency index; ξ is the sampling error (in ppm), αl is the
contribution of the Phase Noise (the so-called Common Phase
Error) to symbol l,a n dc0 is the phase derived from a residual
CFO. The method we propose [19], [20] assumes the condition
|δ · k| << 1 be satisﬁed ∀k ∈ [−26,+26]. In this case (15)
may be simpliﬁed by considering a ﬁrst order approximation
of the complex exponential, yielding
P
φ
k,l =  {P
φ
k,l} + j ·  { P
φ
k,l}
= cos(θl) − δ · k · sin(θl)
+ j (sin(θl)+δ · k · cos(θl)). (16)
In (16) four parameters are of interest namely, cos(θl),
sin(θl), δ · sin(θl),a n dδ · cos(θl).I no r d e rt oﬁ n dt h e s e
four parameters we must solve the linear system of equations
derived from (16) when setting k = −21, −7, +7 and +21,
correspondingto the pilot tones. Hence, the parameters cos(θl)
and sin(θl) can be found straightforwardly as
cos(θl)=( 1 /4)·
 
i=−21,−7,+7,+21
 {P
φ
i,l},
sin(θl)=( 1 /4)·
 
i=−21,−7,+7,+21
 {P
φ
i,l}. (17a)
Regarding the parameters δ ·sin(θl) and δ ·cos(θl), the exact
expressions are as follows,1380 IEEE TRANSACTIONS ON WIRELESS COMMUNICATIONS, VOL. 6, NO. 4, APRIL 2007
δ · sin(θl)=
2
126
 {P
φ
−21,l} +
3
126
 {P
φ
−7,l}
−
3
126
 {P
φ
+7,l}−
2
126
 {P
φ
+21,l},
δ · cos(θl)=
2
126
 {P
φ
+21,l} +
3
126
 {P
φ
+7,l}
−
3
126
 {P
φ
−7,l}−
2
126
 {P
φ
−21,l},
which have been modiﬁed in order to simplify the scaling by
the factor 1/126, yielding
δ · sin(θl) ≈
2
128
 {P
φ
−21,l} +
3
128
 {P
φ
−7,l}
−
3
128
 {P
φ
+7,l}−
2
128
 {P
φ
+21,l},
δ · cos(θl) ≈
2
128
 {P
φ
+21,l} +
3
128
 {P
φ
+7,l}
−
3
128
 {P
φ
−7,l}−
2
128
 {P
φ
−21,l}. (17b)
The foregoing method saves a signiﬁcant amount of hard-
ware, since neither an arctangent block nor an NCO is needed
for RPE estimation and correction, respectively.
IV. THE DIGITAL TIMING LOOP
The general scheme of the IRx shown in Fig. 1 includes
a so-called Digital Timing Loop (DTL). The purpose of the
DTL is to estimate and correct the SCFO. Each OFDM symbol
is composed of 80 samples, before CP extraction and FFT
operation, with a sampling rate fs =2 0M H z .I nt h ec a s e
of a sampling oscillator with e.g. 20 ppm frequency error,
this turns into fs = 20,000,400 Hz. Thus 80.0016 samples
are obtained for the initial symbol instead of exactly 80, i.e.
a timing error of 0.0016 samples. This timing error is not
ﬁxed, but it will be 0.0032 samples for the second symbol,
0.0048 for the third one and so on. In essence, the SCFO
will be observed as a dynamic timing error that has to be
monitored throughout reception. Considering the case of a
6 Mbps transmission, the 802.11a standard allows a frame
length of up to 1,367 data symbols, which means that the last
OFDM symbol will be affected by a timing error of about 2.2
samples. In our consideration the total SCFO may be as high
as 80 ppm (combining the effects from Tx and Rx), yielding
a maximum accumulated timing error of 8.8 samples. Since
the timing error appears as a linear phase after FFT operation,
pilots are very well suited to estimate it. The method shown in
the foregoing section for RPE estimation and correction is a
posteriori, i.e. no attempt is done to correct the main sources
causing the phase error, but only the phase error itself. Hence,
we need not only a method to estimate the SCFO based on the
pilots but also a way to correct for it prior to FFT operation
in order to avoid ICI.
A. Timing Error Discriminator
In a ﬁrst stage, the variable timing error must be estimated.
In the estimation we make use of the phase error signal
provided by the RPE estimator, i.e. P
φ
k,l in (15). The estimator
is based on a solution proposed by Yang in [21], in which two
reference sequences are deﬁned, namely
C
early
p,l ˆ = e
j π
N ·p,
C
late
p,l ˆ = e
−j π
N ·p, (18)
where p corresponds to the pilot sub-carrier position and l is
the symbol index.
The RPE signal P
φ
k,l is compared with these two references
through correlation, thus yielding
a(l)=|Rlate (l)|
2 −| Rearly (l)|
2 , (19)
with
Rearly (l)=
P−1  
i=0
P
φ
p0+i·Δ,l ·
 
C
early
p0+i·Δ,l
 ∗
≈
P−1  
i=0
e
j 2π
N ·(p0+i·Δ)·(Δtl− 1
2) + Vearly (l),
Rlate (l)=
P−1  
i=0
P
φ
p0+i·Δ,l ·
 
C
late
p0+i·Δ,l
 ∗
≈
P−1  
i=0
e
j 2π
N ·(p0+i·Δ)·(Δtl+ 1
2) + Vlate (l), (20)
where Vearly (l) and Vlate (l) are Gaussian noise components.
In (20) it has been considered that pilot sub-carriers are at
position p = p0 + i·Δ, with 0 ≤ i ≤ P −1, P being the total
number of pilots per OFDM symbol, and Δ the pilot distance.
The approximation done in (20) applies when P
φ
k,l adheres to
the approximation in (15). The total timing error (in samples)
at symbol l in (20) is
Δtl =
 
tθ − ˆ tθ,l
 
+ ξ · L · (N + NG), (21)
where tθ is a residual timing synchronization error (|tθ| < 0.5
samples), ˆ tθ,l is an estimation of tθ at symbol l,a n dξ stands
for the SCFO (in ppm). In (20) it is further considered that
|Δtl|≤0.5. After low-pass ﬁltering (19), we ﬁnally obtain the
timing discriminator as follows (sub-index l has been omitted
for clarity)
S (Δt)=
 
   
 
 
sin
 
π
N · P · Δ ·
 
Δt + 1
2
  
sin
 
π
N · Δ ·
 
Δt + 1
2
  
 
   
 
 
2
−
   
 
 
 
sin
 
π
N · P · Δ ·
 
Δt − 1
2
  
sin
 
π
N · Δ ·
 
Δt − 1
2
  
   
 
 
 
2
, (22)
where P =4 ,Δ =1 4a n dN = 64 in the 802.11a case.
B. Timing Error Correction
The parameter of interest is the relative error existing
between the sampling period Ts at the Analog-to-Digital
converters (ADC), and the corrected (ideal) sampling time TI,
i.e. TI/Ts = 1+ξ. These two sampling periods are related as
follows,
i · TI + τI · TI = mi · Ts + μi · Ts, (23)TROYA et al.: EFFICIENT INNER RECEIVER DESIGN FOR OFDM-BASED WLAN SYSTEMS: ALGORITHM AND ARCHITECTURE 1381
with
mi =  i · TI + τI · TI , (24)
where 0 ≤ μi < 1 is the fractional delay, mi is the basepoint,
i is the discrete timing variable after timing correction (integer
value), whereas τI represents a fractional part of TI.T h e
function  x  rounds x to the nearest integer towards minus
inﬁnity. The timing error compensation is driven by a control
block (see Fig. 1), which contains a control word, w(l).T h i s
parameter is updated on a symbol basis, i.e. every 4 μs, and
provides the latest estimate of the ratio TI/Ts as follows,
w(l +1 )=w(l)+Kw · e(l), (25)
with
e(l +1 )=e(l)+Ke · a(l), (26)
being a(l) as in (19). The parameter Ke deﬁnes the bandwidth
of the low-pass ﬁlter in (26) and it was selected to be 0.01. The
parameter Kw is given as Kw = (2 · Smax · (N + NG))
−1,
where Smax is the maximum value of S (Δt) in (22).
The parameters mi and μi used in the variable interpolator
will be recursively computed as explained in [7, page 523].
We already expressed i·TI +τI ·TI as a function of (mi, μi)
in (23). The next sample (i +1 )· TI + τI · TI is given by
(i +1 )· TI + τI · TI = mi · Ts +
 
μi +
 
TI
Ts
  
· Ts. (27)
By replacing in (27) the unknown ratio (TI / Ts) by its
estimate w(mi) we obtain
(i +1 )· TI + τI · TI = mi · Ts
+  μi + w(mi) ·Ts
+[ μi + w(mi)]mod1 · Ts. (28)
From the previous, it readily follows the recursion for the
estimates,
mi+1 = mi +  μi + w(mi) ,
μi+1 =[ μi + w(mi)]mod1 . (29)
In order to obtain the value for μi based on the control word
w(mi) we deﬁne the function
η (mi,d)ˆ =μi + w(mi) − d, (30)
with d =0 ,1 ,2 ,. . .
At the basepoint mi the value η (mi,0) is stored in a b-
bit register. At every Ts cycle the value of the register is
decremented by 1, i.e.
η (mi,d+1 )=η (mi,d) − 1. (31)
As long as η (mi,d) > 1, there obviously exists an integer
 μi + w(mi)  >m i+d. The criterion to obtain the next base-
point mi+1 is η (mi,d min) < 1,w h e r edmin is the smallest
integer for which the condition is fulﬁlled. Thus, the decrease
operation is continued until the condition η (mi,d min) < 1
is detected. By deﬁnition, the register content η (mi,d min)
equals μi+1. Afterwards, the operations are continued for
mi+1 with the initial value
η (mi+1,0) = η (mi,d min)+w(mi+1). (32)
The timing error correction block in Fig. 1 is based on a
ﬁrst order Lagrange polynomial interpolator and makes use of
a Farrow structure [7], [22]. Higher order interpolators cannot
be used since the DTL becomes unstable. The reason for this
instability is related to the considerations made in (16) for
calculation of P
φ
k,l, which no longer hold when high order
interpolators are used.
V. SIMULATION RESULTS
This section analyzes the performance of the synchronizer,
the CE and the DTL under different transmit conditions
through extensive computer simulations. We already men-
tioned in Section II that the synchronizer was mainly affected
by the AGC. Since the attenuation suffered by the transmitted
OFDM frame is unknown to the receiver, an AGC able to
apply a variable ampliﬁcation is mandatory prior to ADC.
The AGC should be capable of keeping the signal inside a
certain voltage range given by the bias voltage of the ADCs.
The frame detector found in the synchronizer should be robust
against two main effects caused by the AGC:
1) Since the AGC is not able to distinguish the signal of
interest from the noise, in the absence of any signal the
noise will be ampliﬁed in the worst case to the voltage
limits of the ADCs. These high noise levels should not
provoke false frame detections.
2) In a high SNR situation, the AGC has to change very
quickly from a high ampliﬁcation level to a lower one
when the signal is received. Since the AGC cannot react
instantaneously to sudden changes in the input power
level, the AGC output signal will be heavily saturated
for a certain time.
The simulation results related to the synchronizer are de-
picted in Fig. 6. A channel model A as given in [23] together
with a normalized CFO of +1.2 are used in all cases. This
corresponds to a Non-Line-Of-Sight (NLOS) channel with a
maximum delay spread of 390 ns (50 ns rms). The results
for the False Alarm Probability (FAP) are shown in Fig. 6(a).
The model used for the AGC considers that only amplitude
distortions (saturation) but no phase distortions are introduced
into the signal, since these may lead to false frequency
estimations. The ﬁlter parameters in the feedback loop of the
AGC where selected in order to achieve a settling time in
which approximately 64 samples (3.2 μs) of the preamble
symbols where completely saturated at SNR = 35 dB (worst
case settling time). In the deﬁnition of FAP used in our
simulations, a frame was considered to be correctly detected
when the detected starting point was inside a range of ±16
samples from the “ideal” point, i.e. when no AGC and no
channel are used. Fig. 6(a) shows that the FAP decreases with
increasing SNR until a certain value of SNR is reached. From
this point on, the distortion due to saturation becomes the
dominant effect on the preambles and the FAP degrades as1382 IEEE TRANSACTIONS ON WIRELESS COMMUNICATIONS, VOL. 6, NO. 4, APRIL 2007
(a) (b)
Fig. 6. Simulation results for the synchronizer (channel A,   = 1.2): (a) Simulated FAP for the frame detector; (b) Simulated TEP at the output of the
crosscorrelator.
the SNR increases. Nevertheless, since saturation is easily
detectable at the ADC, the previous effect can be highly
mitigated by setting to zero all those saturated samples before
being delivered to the frame detector. The obtained standard
deviation for the normalized frequency offset estimator shows
no dependency on the AGC and has a minimum bound of
0.01, i.e. 1% of the sub-carrier spacing. This value helps
in determining the number of bits necessary to represent
the frequency offset in the arctangent calculator used in
the synchronizer. Finally, Fig. 6(b) depicts the Timing Error
Probability (TEP) derived from the crosscorrelator. Symbol
timing is provided by the position of the ﬁrst signiﬁcant peak
coming out from the crosscorrelator. The ideal position of the
peak, i.e. m1 in Fig. 2, is known beforehand and a timing
error occurs when the estimated position of the peak differs
in more than ±2 samples from the ideal position. Nevertheless
this deﬁnition of the timing error only makes sense if the CP
of the symbol being received immediately after the preamble
symbols is considered to be 14 samples long instead of 16
in order to compensate positive timing errors. Four possible
versions of the crosscorrelator have been tested depending on
the length of the reference signal cREF(m), either 32 or 64
samples, and the type of multiplier, either 1-bit XNOR-based
multipliers (Hard crosscorrelator) or ﬂoating point multipliers
(Soft crosscorrelator) with the number of bits determined by
the computer on which the simulation is being run. Results
shown in Fig. 6(b) indicate that the selection of a 32-sample H-
crosscorrelator may not be appropriateand should be increased
to 64 samples. Despite of these results, our ﬁrst version of the
synchronizer considers only a reference of 32 samples in order
to reduce the signal processing latency as much as possible.
Fig. 7 depicts the Mean-Square Error (MSE) performance
of the proposed channel estimator considering all data rates
deﬁned for the 802.11a standard. Channel models A and D
[23] are considered in the simulations. Channel D corresponds
to a Line-Of-Sight (LOS) channel with a maximum delay
spread of 1,050 ns (140 ns rms). In both cases a Doppler
frequency of 58 Hz (v =3m / s ,FC = 5,805 MHz) was used.
In order to smooth the simulation results, we ﬁrstly tested
20 different seeds and looked for the one representing an
average channel. This seed was used afterwards for the MSE
estimation. Each point in Fig. 7 is obtained after averaging
the MSE in 10 trials where a frame containing 37 OFDM
data symbols is transmitted at each trial. Furthermore, six soft
bits were used during demodulation together with a traceback
length of 50 bits in the Viterbi decoder. In order to reduce
complexity, a hard-outputViterbi decoder was considered. The
ﬁgures show a substantial improvement in the MSE when
a LOS channel is present. The abrupt decrease of the MSE
indicates the point from which on the Viterbi decoder is able to
provide fully correct output bits. The correctness of these bits
is crucial in order to assure the stability of the CE, specially at
the higher transmission rates. Furthermore, Fig. 7 also shows
that it will be extremely difﬁcult to obtain the maximum data
rates (48 and 54 Mbps) in a real wireless channel, even with
LOS, since these rates require a SNR well above 30 dB. The
standard 802.11a [1] speciﬁes a Packet Error Rate (PER) of
10% measured on 1000-byte frames, which is equivalent to
a BER = 1.25e-5. Fig. 8 shows the results of our Monte-
Carlo BER simulations based on 1000-byte frames. The same
channel seed as in Fig. 7 was used in Fig. 8. It can be seen
from Fig. 8 that the higher modulation schemes require very
high SNR in order to achieve the minimum BER and we may
use them only in very limited scenarios.
Finally, Fig. 9 shows the simulation results for the timing
control loop. We simulated only two transmission modes, i.e
12 and 54 Mbps, and represented the Error Vector Magnitude
(EVM) as deﬁned in [1]. Frames with 152 OFDM data sym-
bols were generated in all the cases, since this is the maximum
number of OFDM data symbols per frame in the 54 Mbps
case. The clock error was set to ξ = -80 ppm, which represents
a worst case scenario where the actual sampling frequency is
below the reference value. Though an ideal channel estimator
was taken into consideration, the effects derived from the
processing latency involved in the decision-directed channel
estimator are included in the simulation results. Hence, the
12 Mbps case involves a processing latency of three OFDM
symbols. For the 54 Mbps case, the processing latency is
only one OFDM symbol. As it can be seen from Fig. 9, the
proposed solution achieves an improvement in terms of EVM
in both cases, although this improvement is less signiﬁcant in
case of a NLOS channel.TROYA et al.: EFFICIENT INNER RECEIVER DESIGN FOR OFDM-BASED WLAN SYSTEMS: ALGORITHM AND ARCHITECTURE 1383
5 10 15 20 25 30 35 40
10
−4
10
−3
10
−2
10
−1
10
0
10
1
SNR (dB)
M
S
E
6 Mbps, ChannelA
9 Mbps, ChannelA
6 Mbps, ChannelD
9 Mbps, ChannelD
(a)
10 15 20 25 30 35 40
10
−2
10
−1
10
0
10
1
SNR (dB)
M
S
E
12 Mbps, ChannelA
18 Mbps, ChannelA
12 Mbps, ChannelD
18 Mbps, ChannelD
(b)
10 15 20 25 30 35 40
10
−2
10
−1
10
0
10
1
SNR (dB)
M
S
E
24 Mbps, ChannelA
36 Mbps, ChannelA
24 Mbps, ChannelD
36 Mbps, ChannelD
(c)
15 20 25 30 35 40 45 50 55
10
−2
10
−1
10
0
10
1
SNR (dB)
M
S
E
48 Mbps, ChannelA
54 Mbps, ChannelA
48 Mbps, ChannelD
54 Mbps, ChannelD
(d)
Fig. 7. MSE versus SNR for the proposed modiﬁed CD3 channel estimator according to the 802.11a standard: (a) BPSK modulation schemes; (b) QPSK
modulation schemes; (c) 16-QAM modulation schemes; (d) 64-QAM modulation schemes.
VI. CONCLUSION
We have investigated the implementation of the Inner
Receiver of an OFDM-WLAN system based on the IEEE
802.11a standard. Solutions for the most critical blocks, i.e.
Synchronizer, Channel Estimator and Digital Timing Loop,
have been proposed and analyzed under careful consideration
of nearly realistic transmit conditions. Hence, although our
investigations reveal that the Synchronizer is strongly inﬂu-
enced by the gain control, the proposed architecture is shown
to be relatively robust against the AGC effects. Regarding the
Channel Estimator, a decision-directed architecture has been
examined. Two novel solutions have been incorporated into
the design in order to improve the performance. Firstly, a
novel DCT-based noise reduction ﬁlter exploits the energy
compression capabilities of the DCT as a means to reduce the
channel estimation noise with a moderate computational load.
Secondly, the residual phase error is eliminated by means of an
innovative estimator that extremely simpliﬁes the traditional
solution based on arctangent plus NCO operation. In order
to derive a simple time tracking algorithm we have made
use of concepts already established in the literature. However,
the way these concepts are applied to an OFDM receiver is
novel in our solution. The proposed solution has proven to
be applicable in both LOS and NLOS channels. However, the
performance of the DTL is limited by the fact that only ﬁrst
order Farrow interpolators assure stability of the algorithm.
Fig. 9. EVM versus SNR considering the proposed variable timing estimator
for 12 Mbps (channel A), and 54 Mbps (channel D) with ξ = −80 ppm.
ACKNOWLEDGMENT
The authors would like to thank Dr. Robert J. Peichocki
from University of Bristol for his insightful comments to this
paper, and Prof. Ulrich Ramacher and Dr. Bertram Gunzel-
mann for their unconditional support in publishing this paper.
We also would like to thank the anonymous reviewers for their
constructive criticism.1384 IEEE TRANSACTIONS ON WIRELESS COMMUNICATIONS, VOL. 6, NO. 4, APRIL 2007
2 4 6 8 10 12 14 16 18
10
−6
10
−5
10
−4
10
−3
10
−2
10
−1
10
0
SNR (dB)
B
E
R
6 Mbps, Channel A
9 Mbps, Channel A
6 Mbps, Channel D
9 Mbps, Channel D
PER = 10%
(a)
5 7 9 11 13 15 17 19 21 23 25
10
−6
10
−5
10
−4
10
−3
10
−2
10
−1
10
0
SNR (dB)
B
E
R
12 Mbps, Channel A
18 Mbps, Channel A
12 Mbps, Channel D
18 Mbps, Channel D
PER = 10%
(b)
12 14 16 18 20 22 24 26 28 30 32 34 36 38 40
10
−6
10
−5
10
−4
10
−3
10
−2
10
−1
10
0
SNR (dB)
B
E
R
24 Mbps, Channel A
36 Mbps, Channel A
24 Mbps, Channel D
36 Mbps, Channel D
PER = 10%
(c)
26 28 30 32 34 36 38 40 42 44 46 48 50
10
−6
10
−5
10
−4
10
−3
10
−2
10
−1
10
0
SNR (dB)
B
E
R
48 Mbps, Channel A
54 Mbps, Channel A
48 Mbps, Channel D
54 Mbps, Channel D
PER = 10%
(d)
Fig. 8. BER versus SNR for the proposed modiﬁed CD3 channel estimator according to the 802.11a standard: (a) BPSK modulation schemes; (b) QPSK
modulation schemes; (c) 16-QAM modulation schemes; (d) 64-QAM modulation schemes.
REFERENCES
[1] “Wireless LAN medium access control (MAC) and physical layer (PHY)
speciﬁcations: High speed physical layer in the 5 GHz band,” IEEE
P802.11a/D7.0, Part II, 1999.
[2] “Wireless LAN medium access control (MAC) and physical layer (PHY)
speciﬁcations. Amendment 4: Further higher data rate extension in the
2.4 GHz band,” IEEE 802.11G. Standard for IT - Telecommunications
and information exchange between systems LAN/MAN - Part II, 2003.
[3] [Online.] Available: www.wigwam-project.com
[4] [Online.] Available: www.multibandofdm.org
[5] M. Speth, S. A. Fechtel, G. Fock, and H. Meyr, “Optimum receiver
design for wireless broad-band systems using OFDM - Part I,” IEEE
Trans. Commun., vol. 47, no. 11, pp. 1668–1677, Nov. 1999.
[6] P. Robertson and S. Kaiser, “Analysis of the effects of phase-noise in
orthogonal frequency division multiplex (OFDM) systems,” in Proc.
IEEE ICC, June 1995, pp. 1652–1657.
[7] H. Meyr, M. Moeneclaey, and S. Fechtel, Digital Communication
Receivers: Synchronization, Channel Estimation, and Signal Processing.
New York: Wiley, 1998.
[8] M. Krsti´ c, A. Troya, K. Maharatna, and E. Grass, “Optimized low-
power synchronizer design for the IEEE 802.11a standard,” in Proc.
IEEE ICASSP, Apr. 2003, vol. 2, pp. 333–336.
[ 9 ] A .T r o y a ,K .M a h a r a t n a ,M .K r s t i ´ c, and E. Grass, “Method and device
for frame detection and synchronizer.” PCT Patent WO 2004/008706
A2, pending, Jan. 22, 2004.
[10] K. Maharatna, A. Troya, S. Banerjee, and E. Grass, “A CORDIC like
processor for computation of arctangent and absolute magnitude of a
vector,” in Proc. IEEE ISCAS, May 2004, vol. 2, pp. 713–716.
[11] K. Maharatna, A. Troya, S. Banerjee, E. Grass, and M. Krsti´ c, “A 16-bit
CORDIC rotator for high-speed wireless LAN,” in Proc. IEEE PIMRC,
Sep. 2004, vol. 3, pp. 1747–1751.
[12] K. Maharatna, S. Banerjee, E. Grass, M. Krsti´ c, and A. Troya, “Modiﬁed
virtually scaling-free CORDIC rotator algorithm and architecture,” IEEE
Trans. Circuits Syst. Video Technol., vol. 15, no. 11, pp. 1463–1474, Nov.
2005.
[13] T. M. Schmidl and D. C. Cox, “Robust frequency and timing synchro-
nization for OFDM,” IEEE Trans. Commun., vol. 45, no. 12, pp. 1613–
1621, Dec. 1997.
[14] B. Stantchev and G. Fettweis, “Burst synchronization for OFDM-based
cellular systems with separate signaling channel,” in Proc. IEEE VTC,
May 1998, pp. 758–762.
[15] V. Mignone and A. Morello, “CD3-OFDM: A novel demodulation
scheme for ﬁxed and mobile receivers,” IEEE Trans. Commun.,v o l .
44, no. 9, pp. 1144–1151, Sep. 1996.
[16] J.-J. van de Beek, O. Edfors, M. Sandell, S. K. Wilson, and P. O. Bör-
jesson, “On channel estimation in OFDM systems,” in Proc. IEEE VTC,
July 1995, vol. 2, pp. 815–819.
[17] O. Edfors, M. Sandell, J.-J. van de Beek, S. K. Wilson, and P. O. Bör-
jesson, “OFDM channel estimation by singular value decomposition,”
IEEE Trans. Commun., vol. 46, no. 7, pp. 931–939, July 1998.
[18] H. Schmidt, V. Kühn, K.-D. Kammeyer, R. Rueckriem, and S. Fechtel,
“Channel tracking in wireless OFDM systems,” in Proc. 5th World
Multi-Conference on Systemics, Cybernetics and Informatics, July 2001,
vol. 4, pp. 402–406.
[19] A. Troya, M. Krsti´ c, and K. Maharatna, “Simpliﬁed residual phase
correction mechanism for the IEEE 802.11a standard,” in Proc. IEEE
VTC-Fall, Oct. 2003, vol. 2, pp. 1137–1141.
[20] A. Troya, K. Maharatna, and M. Krsti´ c, “Verfahren und Vorrichtung zur
Fehlerkorrektur von Multiplex-Signalen.” PCT Patent WO 2004/036863
A1, pending, Apr. 29, 2004.
[21] B. Yang, K. B. Letaief, R. S. Cheng, and Z. Cao, “An improved
combined symbol and sampling clock synchronization method for
OFDM systems,” in Proc. IEEE Wireless Commun. Networking Conf.,
Sep. 1999, vol. 3, pp. 1153–1157.
[22] L. Erup, F. M. Gardner, and R. A. Harris, “Interpolation in digital
modems - Part II: Implementation and performance,” IEEE Trans.
Commun., vol. 41, no. 6, pp. 998–1008, June 1993.
[23] “Criteria for comparison,” ETSI Technical Report 30701F, BRAN WG3
PHY Subgroup, May 1998.TROYA et al.: EFFICIENT INNER RECEIVER DESIGN FOR OFDM-BASED WLAN SYSTEMS: ALGORITHM AND ARCHITECTURE 1385
Alfonso Troya (M’95) was born in Barcelona,
Spain, in 1975. He received the M.Sc. degree in
Telecommunications Engineering from the Techni-
cal University of Catalonia, Barcelona, Spain, in
1999, and the Dr.-Ing. degree from Brandenburg
University of Technology, Cottbus, Germany, in
2004.
He joined the IHP (Institute for High Performance
microelectronics), Frankfurt (Oder), Germany, in
1999, as a Research Associate in the Wireless Com-
munication Systems Department, where he worked
on the development and implementation of digital signal processing algo-
rithms for broadband wireless communication systems. In October 2004
he joined Inﬁneon Technologies AG, Munich, Germany, as an Algorithm
Concept Engineer. He is currently involved in the development of OFDM-
based communication systems and their implementation on Software-Deﬁned
Radio architectures.
Dr. Troya is a member of the IEEE Signal Processing Society.
Koushik Maharatna (M’02) received the M.Sc. de-
gree in Electronic Science from Calcutta University,
Calcutta, India, in 1995 and the Ph.D. degree from
Jadavpur University, Calcutta, India, in 2002. From
1996 to 2000, he was involved in projects sponsored
by the Government of India undertaken at the In-
dian Institute of Technology (IIT), Kharagpur, India.
From 2000 to 2003, he was a Research Scientist with
IHP, Frankfurt (Oder), Germany. During this phase,
his main involvement was related to the design of a
single-chip modem for the IEEE 802.11a standard.
In August 2003, he joined the Department of Electrical & Electronics
Engineering, University of Bristol, Bristol, UK as a Lecturer. From October
2006 he is with the School of Electronics and Computer Science, University of
Southampton, where he is currently a Senior Lecturer. His research interests
include development of VLSI architectures for the application in DSP and
communication, computer arithmetic, low-power design, and analog signal
processing.
Dr. Maharatna has served as session chair for IEEE ISCAS 2005 and
VLSI design Conference 2006, and also acted as a reviewer for several IEEE
Journals and Conferences. He is currently a member of the Engineering and
Physical Research Council (EPSRC) college in the UK. He is a member of
the IEEE Circuits and Systems Society.
Milo˘ sK r s t i ´ c w a sb o r ni nN i ˇ s, Serbia, in 1973.
He received the Dipl.-Ing. and the M.Sc. degrees
in Electronics from the University of Niˇ s, Serbia,
in 1997 and 2001, respectively, and the Dr.-Ing.
degree from Brandenburg University of Technology,
Cottbus, Germany, in 2006.
Since 2001 he has been with IHP, Frankfurt
(Oder), Germany, as a Research Associate within
the Wireless Communication Systems Department,
where he is currently working on low-power digi-
tal design techniques for wireless applications and
globally-asynchronous locally-synchronous (GALS) methodologies for digital
systems integration.
Eckhard Grass received his Dr.-Ing. degree in
Electronics from the Humboldt University, Berlin,
Germany, in 1992.
He worked as a Visiting Research Fellow at
Loughborough University (U.K.) from 1993 to 1995
and as a Senior Lecturer in Microelectronics at the
University of Westminster, London, U.K., from 1995
to 1999. He has been with IHP, Frankfurt (Oder),
Germany, since 1999, where he currently leads a
project on the development and implementation of
a wireless broadband communication system in the
60 GHz band. His research interests include data-driven (asynchronous) signal
processing structures and low-power VLSI implementation of communication
systems.
Ulrich Jagdhold received the Diploma in Physics
(M.Sc. degree) from the Technical University of
Dresden, Dresden, Germany, in 1987. From 1987 to
1996, he was with the Technology Integration Group
of the IHP, Frankfurt (Oder), Germany, working
on CMOS, BiCMOS, and SiGe technologies and
device physics. In 1997 he joined the Wireless
Communication Systems Department of IHP, where
he has been working on WLAN system development
projects, focusing on baseband integration issues
and ASIC design, including development of digital
CMOS libraries.
Rolf Kraemer (M’79) received his Diploma and
Ph.D. degrees in Electrical Engineering and Com-
puter Science from the RWTH Aachen, Germany,
in 1979 and 1985, respectively. He has worked for
15 years in R&D of communication and multimedia
systems at Philips Research Laboratories in Ham-
burg and Aachen. Since 1998 he is a professor of
systems at the Brandenburg University of Technol-
ogy, Cottbus, Germany. He also leads the Wireless
Communications Systems Department of the IHP,
where his research focus is on wireless Internet
systems spanning from application down to Systems-on-Chip. He is co-
founder of the startup company lesswire AG, where he holds the position
of the CTO.
Prof. Kraemer has published over 150 conference and journal papers, and
holds 16 international patents. He is a member of the IEEE Computer Society,
the VDE-NTG, and the German Informatics Society.