An OFDM Receiver Implemented on the Coarse-grain Reconfigurable Montium Processor by Rauwerda, Gerard K. et al.
9th International OFDM-Workshop 2004, Dresden 1
An OFDM receiver implemented on the
coarse-grain reconfigurable Montium processor
Gerard K. Rauwerda, Paul M. Heysters, Gerard J.M. Smit
University of Twente, Department of EEMCS
P.O. Box 217, 7500 AE Enschede, the Netherlands
g.k.rauwerda@utwente.nl
Abstract— Future mobile terminals become multi-
mode communication systems. In order to handle
different standards, we propose to perform baseband
processing in heterogeneous reconfigurable hardware.
OFDM is one of the techniques that exists in multi-
mode communication systems. As an example, we
present the results of implementing an HiperLAN/2 re-
ceiver in reconfigurable hardware. The receiver can be
implemented with small configuration overhead, and
the required performance can be obtained at low clock
frequencies.
I. INTRODUCTION
Future mobile communication systems tend to be-
come flexible devices capable of handling multi-
ple wireless communication standards. Furthermore,
these flexible systems will be aware of their environ-
ment and adapt to this environment. Since mobile de-
vices are battery-powered energy-efficiency is an im-
portant issue. In the Adaptive Wireless Networking
(AWGN) project [8] we aim at the implementation of
adaptive wireless communication systems in hetero-
geneous reconfigurable hardware.
Orthogonal Frequency Division Multiplexing
(OFDM) is a promising candidate for mobile
communication systems. It is a multiple carrier
modulation technique that eliminates the need for
complex equalizers and utilizes the bandwidth
efficiently.
This paper addresses various aspects on imple-
menting an OFDM receiver in heterogeneous recon-
figurable hardware. In section II a hardware platform
for future mobile devices is presented. The Hiper-
LAN/2 standard is used as an example. The main
properties of HiperLAN/2 are presented in section III.
The baseband processing part of the HiperLAN/2 re-
ceiver is mapped on reconfigurable hardware in sec-
tion IV. In section V simulations are performed to
show the performance of the implemented receiver.
Finally, we conclude our approach in section VI and
present directions for future work.
II. RECONFIGURABLE HARDWARE
Heterogeneous reconfigurable systems might be-
come the future of mobile hardware. The basic idea
behind the use of heterogeneous reconfigurable hard-
ware is that one can match the granularity of algo-
rithms with the granularity of the hardware. Four pro-
cessor types are distinguished: general-purpose pro-
cessor, fine-grained reconfigurable hardware, coarse-
grained reconfigurable hardware and dedicated hard-
ware.
GPP
GPP
FPGA
FPGA
FPGA
FPGA
ASIC
ASIC
Montium
DSP
DSP
Montium
Montium
Montium
Montium
Montium
Fig. 1. The proposed System-on-Chip.
We propose a System-on-Chip (SoC), which con-
sists of the above mentioned processors types (Fig-
ure 1). The different processors are interconnected
to each other by a Network-on-Chip (NoC). Both the
SoC and NoC are dynamically reconfigurable, which
means that the programs (running on the reconfig-
urable processors) as well as the communication links
between the processors are defined at run-time. It
is expected that performance and power gains are
achieved by applying dynamically reconfigurable het-
erogeneous architectures [7].
A. The Montium reconfigurable architecture
The MONTIUM is an example of a coarse-grain re-
configurable processor. The MONTIUM [4] targets the
16-bit digital signal processing (DSP) algorithm do-
main. A single MONTIUM processor tile is depicted
2 9th International OFDM-Workshop 2004, Dresden
M01 M02
Communication and Configuration Unit
M03 M04 M05 M06 M07 M08 M09 M10
ALU5
A C DB
W
OUT2 OUT1
ALU4 E
A C DB
W
OUT2 OUT1
ALU3 E
A C DB
W
OUT2 OUT1
ALU2 E
A C DB
W
OUT2 OUT1
ALU1 E
A C DB
OUT2 OUT1
Sequencer
Memory
decoder
Crossbar
decoder
Register
decoder
ALU
decoder
Fig. 2. The MONTIUM tile processor.
in Figure 2. At first glance the MONTIUM architec-
ture bears a resemblance to a VLIW processor. How-
ever, the control structure of the MONTIUM is very
different. For (energy-) efficiency it is imperative to
minimize the control overhead. This can be accom-
plished by statically scheduling instructions as much
as possible at compile time.
The lower part of Figure 2 shows the Communica-
tion and Configuration Unit (CCU) and the upper part
shows the reconfigurable Tile Processor (TP). The
CCU implements the interface for off-tile communi-
cation. The definition of the off-tile interface depends
on the NoC technology that is used in the SoC. The
CCU enables the MONTIUM to run in ’streaming’ as
well as in ’block’ mode.
The TP is the computing part that can be configured
to implement a particular algorithm. The hardware
organization of the tile processor is very regular. The
five identical ALUs in a tile can exploit spatial con-
currency to enhance performance. This parallelism
demands a very high memory bandwidth, which is
obtained by having 10 local memories in parallel. The
small local memories are also motivated by the local-
ity of reference principle. The data path has a width
of 16-bits and the ALUs support both signed integer
and signed fixed-point arithmetic. The ALU input
registers provide an even more local level of storage.
Locality of reference is one of the guiding principles
applied to obtain energy-efficiency in the MONTIUM.
A relatively simple sequencer controls the entire TP.
The sequencer selects configurable instructions that
are stored in the decoders of Figure 2.
Each local SRAM is 16-bit wide and has a depth of
512 positions, which adds up to a storage capacity of
8 Kbit per local memory. A reconfigurable Address
Generation Unit (AGU) accompanies each memory.
The memory can also be used as a lookup table for
complicated functions that cannot be calculated using
an ALU, such as sine or division (with one constant).
A single ALU has four 16-bit inputs. Each input
has a private input register file that can store up to
four operands. The input register file cannot be by-
passed, i.e. an operand is always read from an in-
put register. Input registers can be written by vari-
ous sources via a flexible interconnect. An ALU has
two 16-bit outputs, which are connected to the inter-
connect. The ALU is entirely combinational and con-
sequentially there are no pipeline registers within the
ALU. Neighbouring ALUs can also communicate di-
rectly; The West-output of an ALU connects to the
East-input of the ALU neighbouring on the left. The
East-West connection does not introduce a delay or
pipeline, as it is not registered.
III. HIPERLAN/2 RECEIVER
HiperLAN/2 is a wireless local area network
(WLAN) access technology and is similar to the IEEE
802.11a WLAN standard. HiperLAN/2 operates in
the 5 GHz frequency band and makes use of orthogo-
nal frequency division multiplexing (OFDM) to trans-
mit the analogue signals. The bit rate of HiperLAN/2
at the physical level depends on the modulation type
and is either 12, 24, 48 or 72 Mbit/s.
The basic idea of OFDM is to transmit high data
rate information by dividing the data into several par-
allel bit streams, and let each one of these bit streams
modulate a separate subcarrier. A HiperLAN/2 chan-
nel contains 52 subcarriers and has a channel spacing
of 20 MHz. 48 subcarriers carry actual data and 4
carry pilots.
Prefix
removal
Freq. offset
correction
Inverse
OFDM
HiperLAN/2 receiver
Equalization
De-
mapping
Phase offset
correction
Fig. 3. The baseband functions in the HiperLAN/2 receiver.
The receiver not only performs the inverse of the
transmitter, it also has to correct for all the distortions
that are introduced in the wireless channel. Figure 3
depicts a model of the HiperLAN/2 receiver. In gen-
eral, the model can be used for any OFDM-like sys-
tem. The diffent standards for OFDM-like systems,
9th International OFDM-Workshop 2004, Dresden 3
e.g. HiperLAN/2, DAB, DRM, are generally different
in the number of carries and the transmission band-
width. Table I summarizes the OFDM properties for
different standards.
TABLE I
PROPERTIES OF THE DIFFERENT OFDM STANDARDS.
Hiper DAB DRM
LAN/2 I II III IV A B
Bandwidth [MHz] 20 1.54 1.54 1.54 1.54 0.012 0.012
# carriers 52 1536 384 192 768 203 181
Symbol time [µs] 4 1,246 312 156 623 26,667 26,667
Frame time [ms] 2 96 24 24 48 400 400
The synchronization of the receiver is performed
in two steps. Firstly, coarse synchronization is per-
formed in order to synchronize the receiver with the
frame. During coarse-synchronization the received
signal is correlated with known preambles, which
indicate the start of a frame. Secondly, the prefix
information of an OFDM symbol is used for fine-
synchronization. After fine-synchronization, the pre-
fix is removed from the OFDM symbol.
Differences between the oscillator frequencies of
the transmitter and the receiver result in frequency
offset and cause inter-subcarrier interference. The
HiperLAN/2 receiver can compensate for frequency
offset by multiplying the data samples of an OFDM
symbol with the frequency offset correction coeffi-
cient. The frequency offset correction coefficient can
be determined by using information from the received
preamble sections of the MAC frame.
The inverse OFDM part of the receiver converts the
received signal into received subcarrier values. The
received sub-carrier values may still suffer from dis-
tortions that need to be corrected before de-mapping
them to a bitstream.
The equalizer corrects the distortions caused by
frequency selective fading. The coefficients for the
equalizer can be determined by using information
from the received preamble sections of the MAC
frame. Since the coherence time of a HiperLAN/2
channel is about 20 ms and a burst of a MAC frame
has a duration of 2 ms, the coefficients need to be de-
termined only at the start of the MAC frame [1].
Based on the equalized pilot values, the phase dis-
tortion of the received signal is corrected. The phase
correction coefficient is determined using pilots.
The received complex-number samples will be
translated into an useful received bitstream. The de-
map function assumes that the most likely symbol
that was transmitted, was the symbol that maps to the
value closest to the received value.
IV. IMPLEMENTATION
We have implemented a HiperLAN/2 receiver in
heterogeneous reconfigurable hardware. The imple-
mentation of the baseband processing part was imple-
mented in a combination of general purpose proces-
sor and coarse-grained reconfigurable hardware. The
physical layer of the HiperLAN/2 receiver [3] has
been implemented in three MONTIUM tiles. Figure
4 shows the functional blocks in the receiver that are
implemented in each MONTIUM tile. The synchro-
nization part (prefix-removal) has still not been im-
plemented. Nevertheless the function, which consists
of correlation operations, can easily be implemented
in the MONTIUM architecture.
Prefix
removal
Freq. offset
correction
Inverse
OFDM
Equalization
De-
mapping
Phase offset
correction
Montium Tile 1 Montium Tile 2
Montium Tile 3
(not implemented)
Fig. 4. The HiperLAN/2 receiver in MONTIUM tiles.
Irregular tasks, which are outside the algorithm
domain of the MONTIUM, are performed in soft-
ware (i.e. on a GPP). The irregular processes in the
HiperLAN/2 receiver are the estimation of frequency
offset and computation of equalization coefficients.
These coefficients have to be determined only once
per MAC frame, i.e. once per 2 ms. Table II shows the
results of partioning the receiver’s functionality over
the MONTIUM and the general-purpose processor.
TABLE II
PARTITIONING OF THE HIPERLAN/2 FUNCTIONALITY.
Multiplies Additions
Implemented Block per per
in size MAC frame MAC frame
Determine
frequency software 32 64 64
offset
Determine
equalizer software 52 0 0
coefficients
Prefix removal - 80 - -
Frequency
offset MONTIUM 64 127,744 95,309
correction
Inverse OFDM MONTIUM 64 383,232 574,848
Equalizer,
Phase offset, MONTIUM 52 203,184 104,082
De-mapper
The frequency offset correction is implemented
in one MONTIUM tile. During correction every
complex-number sample is multiplied with the fre-
quency offset correction factor. The correction factor
is determined with a Lookup table (LUT) based on
4 9th International OFDM-Workshop 2004, Dresden
the estimated frequency offset. The frequency offset
is estimated in software by the GPP once per MAC
frame. One OFDM symbol, containing 64 complex-
number samples, can be corrected in 67 clock cycles.
A Fast Fourier Transform (FFT) on a vector of 64
complex-number time samples can perform the in-
verse OFDM function. The 64-FFT can be performed
in 204 clock cycles for one OFDM symbol.
The equalizer, phase offset correction and de-
mapping functionality are implemented in one MON-
TIUM tile in a pipelined fashion. The coefficients for
equalization are determined once every 2 ms in soft-
ware by the GPP. During equalization, the received
carriers are multiplied with the equalization coeffi-
cients. After equalization the pilot values are used
to determine the phase offset correction factor. The
phase offset correction factor is determined in the
MONTIUM, since the phase offset can vary for ev-
ery OFDM symbol and the correction factor has to be
determined on an OFDM symbol basis (once every
4 µs). Hence, determining the phase offset correc-
tion factor in software (i.e. GPP) would create large
communication overhead between the GPP and the
MONTIUM tile. Phase offset correction invokes also
a complex multiplication, like equalization. As a
consequence the equalizer and phase offset correc-
tor use the same functionality of the MONTIUM. In
a pipelined, parallel manner the corrected complex-
number samples are translated into a bitstream. Hard-
decision de-mapping is implemented with LUT func-
tionality. A parametrizable de-mapper has been im-
plemented, which can be used for QPSK, 16-QAM
and 64-QAM modulated signals by only changing the
LUT table in the memory of the MONTIUM.
TABLE III
PROPERTIES OF THE HIPERLAN/2 IMPLEMENTATION.
Frequency Equalizer,
offset Inverse Phase offset,
correction OFDM De-mapper
Execution time [cycles] 67 204 110
Communication time [cycles] 128 116 <100
Minimum system clock with
streaming communication [MHz] 17 51 28
Minimum processor clock
with block communication 25 72 37
(@ 100 MHz) [MHz]
Configuration size [bytes] 274 946 576
Configuration time [cycles] 137 473 288
A. Configuration
The total configuration sizes of the MONTIUM are
small for the different functions (as seen in Table III).
MONTIUM tile 2, on which the inverse OFDM is per-
formed, requires the largest configuration size. The
configuration of tile 2 contains less than 1 Kbyte of
data. The configuration data can be written into the
configuration memory of the MONTIUM in about 500
clock cycles, since 2 bytes are written in one clock
cycle. Suppose that the MONTIUM is running at a
clock frequency of 100 MHz, then tile 2 can be (re-)-
configured in 4.73 µs. Notice that the maximum radio
turn-around time of the HiperLAN/2 system is 6 µs
[2], so the implemented HiperLAN/2 receiver can be
considered as a real-time dynamically reconfigurable
receiver.
B. Flexible clock frequency
All operations in the physical layer are performed
on OFDM symbols. So, one should assure that each
4 µs a new OFDM symbol can be processed. When
a streaming on-chip network between the processors
is assumed, the communication time is not a bottle-
neck and one only has to guarantee that, for example,
the data processing for frequency offset correction is
performed during 67 clock cycles in 4 µs. Hence,
the minimum clock frequency of the MONTIUM is 17
MHz, when a streaming on-chip network between the
tiles is assumed.
The clock frequency of the reconfigurable proces-
sors is important from an energy-efficiency point-of-
view. The dynamic power consumption of the het-
erogeneous tile processor depends on the clock fre-
quency. Hence, the lower the clock frequency, the
lower the supply voltage can be and the lower the
dynamic power comsumption will be. Typically, the
clock frequency of the NoC, connecting the reconfig-
urable processors, will be fixed and only the clock fre-
quency of the reconfigurable processor can be varied.
When we assume the clock frequency of the NoC to
be fixed at 100 MHz, then the clock frequency of the
MONTIUM for frequency offset correction has to be
at least 25 MHz (Table III).
V. SIMULATION
For functional simulation we have used a co-
simulation environment in which the VHDL simula-
tion results of the implemented HiperLAN/2 receiver
are compared with the results of a reference model,
specified in Matlab. During each simulation a down-
link burst, containing 500 OFDM symbols, was re-
ceived. The first two OFDM symbols in each frame,
the so-called Preamble C [3], were used for frequency
offset and channel estimation. The received informa-
tion was 16-QAM modulated, so in each simulation
9th International OFDM-Workshop 2004, Dresden 5
95,616 bits were received. The receiver was simu-
lated for different realistic channel settings [6].
The curves in Figure 5 show the performance of
the implemented HiperLAN/2 receiver in the MON-
TIUM (’Montium’) and the reference model in Mat-
lab (’Reference’). Although, the error correction in
HiperLAN/2 was not implemented explicitly, we can
calculate the upper bounds of the BER after error cor-
rection [9] with the obtained results. Figure 6 depicts
these upper-bounds.
0 5 10 15 20 25
10−2
10−1
100
Bit error rate vs. signal−to−noise ratio for 16−QAM modulation
SNR [dB]
BE
R
A channel (Reference)
A channel (Montium)
E channel (Reference)
E channel (Montium)
AWGN (Reference)
AWGN (Montium)
Fig. 5. The BER before error correction for different settings.
10 15 20 25
10−4
10−3
10−2
10−1
100
Montium bit error rate for coded 16−QAM modulation
SNR [dB]
BE
R
A channel R=9/16
A channel R=3/4
E channel R=9/16
E channel R=3/4
AWGN channel R=9/16
AWGN channel R=3/4
10% PER 
Fig. 6. The BER after error correction for different settings.
The HiperLAN/2 receiver needs a BER of 2.4·10−3
in order to reach the minimum defined sensitivity of
10% packet error rate [3]. The upper bounds show
that the minimum sensitivity can be met with the
MONTIUM implementation. The BER performance
of the MONTIUM implementation compares well with
the simulation results in [5].
VI. CONCLUSION AND FUTURE WORK
We proposed heterogeneous reconfigurable hard-
ware as a future technology for adaptive multi-mode
communication systems. The feasibility of using het-
erogeneous hardware is demonstrated by implement-
ing an HiperLAN/2 receiver. Our experiment showed
that the performance obtained by the HiperLAN/2
implementation satisfied to the performance require-
ments. The required performance can be obtained at
low clock speeds, and with low configuration over-
head.
Currently, we aim at the implementation of the
baseband processing part of an UMTS receiver.
Moreover, we studied the baseband processing in
Bluetooth. Based on the inventarisation in Table I we
will study the requirements for implementing differ-
ent OFDM systems in the same reconfigurable hard-
ware. These investigations will result in requirements
for a multi-mode receiver in reconfigurable hardware.
ACKNOWLEDGEMENT
This research is supported by the Freeband Knowl-
edge Impulse programme, a joint initiative of the
Dutch Ministry of Economic Affairs, knowledge in-
stitutions and industry.
REFERENCES
[1] A. Berno. Time and Frequency Synchronization Algorithms
for HIPERLAN/2. Master’s thesis, University of Padova,
Italy, Oct. 2001.
[2] ETSI. Broadband Radio Access Networks (BRAN); Hiper-
LAN Type 2; Data Link Control (DLC) Layer Part 1: Basic
Data Transport Functions. ETSI TS 101 761-1 v1.1.1 (2000-
04), Apr. 2000.
[3] ETSI. Broadband Radio Access Networks (BRAN); Hiper-
LAN Type 2; Physical (PHY) Layer. ETSI TS 101 475 v1.2.2
(2001-02), Feb. 2001.
[4] P. M. Heysters, G. J. M. Smit, and E. Molenkamp. A Flexi-
ble and Energy-Efficient Coarse-Grained Reconfigurable Ar-
chitecture for Mobile Systems. Journal of Supercomputing,
26(3):283–308, Nov. 2003.
[5] J. Khun-Jush, P. Schramm, U. Wachsmann, and F. Wenger.
Structure and Performance of the HIPERLAN/2 Physical
Layer. In Proceedings VTC ’99 - Fall, pages 2667 – 2671,
Amsterdam, the Netherlands, Sept. 1999.
[6] J. Medbo and P. Schramm. Channel Models for HIPER-
LAN/2. ETSI/BRAN 3ERI085B, Sept. 1998.
[7] J. Rabaey. Silicon Architectures for Wireless Systems. Tuto-
rial, Hotchips 2001, Aug. 2001.
[8] G. Rauwerda, J. Potman, F. Hoeksema, and G. Smit. Adap-
tive Wireless Networking. In Proceedings of the 4th
PROGRESS Symposium on Embedded System, pages 205 –
211, Nieuwegein, the Netherlands, Oct. 2003.
[9] R. Ziemer and R. Peterson. Introduction to digital communi-
cation. Prentic Hall, USA, second edition, 2001.
