Hardware Aspects Of Fixed Relay Station Design For Ofdm(A) Based  Wireless Relay Networks by Can, Basak et al.
   
 
Aalborg Universitet
Hardware Aspects Of Fixed Relay Station Design For Ofdm(A) Based  Wireless Relay
Networks
Can, Basak; Portalski, Maciej; Le Moullec, Yannick
Published in:
Proceedings of the 21st<strong><sup> </sup></strong>Canadian Conference on Electrical and Computer
Engineering
DOI (link to publication from Publisher):
10.1109/CCECE.2008.4564556
Publication date:
2008
Document Version
Peer reviewed version
Link to publication from Aalborg University
Citation for published version (APA):
Can, B., Portalski, M., & Le Moullec, Y. (2008). Hardware Aspects Of Fixed Relay Station Design For Ofdm(A)
Based  Wireless Relay Networks. In Proceedings of the 21st Canadian Conference on Electrical and Computer
Engineering (pp. 355-560). IEEE. DOI: 10.1109/CCECE.2008.4564556
General rights
Copyright and moral rights for the publications made accessible in the public portal are retained by the authors and/or other copyright owners
and it is a condition of accessing publications that users recognise and abide by the legal requirements associated with these rights.
            ? Users may download and print one copy of any publication from the public portal for the purpose of private study or research.
            ? You may not further distribute the material or use it for any profit-making activity or commercial gain
            ? You may freely distribute the URL identifying the publication in the public portal ?
Take down policy
If you believe that this document breaches copyright please contact us at vbn@aub.aau.dk providing details, and we will remove access to
the work immediately and investigate your claim.
Downloaded from vbn.aau.dk on: May 01, 2017
HARDWARE ASPECTS OF FIXED RELAY STATION DESIGN FOR OFDM(A) BASED
WIRELESS RELAY NETWORKS
Bas¸ak Can1, Maciej Portalski1, Yannick Le Moullec1,2
1Department of Electronic Systems 2Center for Software Defined Radio
Aalborg University, Denmark
Emails: bc@es.aau.dk, mportal1@es.aau.dk, ylm@es.aau.dk
ABSTRACT
Hardware aspects of infrastructured based (i.e., fixed) relay
station design with focus on digital baseband processing in
OFDM(A) based systems are investigated. A novel architec-
ture for the digital baseband processor for fixed relay stations
is proposed to minimize the relay complexity and the imple-
mentation cost. Two methods of signal forwarding -Amplify
and Forward (AF) and -Decode and Forward (DF) are devel-
oped and implemented on FPGA, exploiting parallel process-
ing and a pipelined system architecture. In contrast to com-
mon heuristical conclusions existing in the literature, we con-
clude that the AF based implementation introduces signifi-
cant hardware complexity resulting from high memory usage.
On the other hand, the DF implementation requires higher
clocking frequencies, making it more power-consuming as
comapred to AF-based relaying.
Index Terms— Fixed relay station, wireless relay net-
work, Implementation, FPGA, Design space exploration
1. INTRODUCTION
Achieving high data rates in future wireless communication
systems presents numerous challenges. Issues such as path-
loss or multipath fading effects result in the need for enhanc-
ing wireless network infrastructure to achieve better perfor-
mance. The increase in data rate causes a proportional de-
crease in bit energy. As a result, maintaining ubiquitous sys-
tem coverage at increased data rates would typically require
larger density of Base Stations (BS) as well as increased trans-
mit power level [1].
A promising alternative to such costly solution is to intro-
duce multi-hop transmission either by deploying simple fixed
relay nodes or by exploiting subscriber equipment present in
the system as intermediate nodes in wireless transmission.
The latter solution presents problematic issues such as the
The authors would like to thank the following persons and companies
for their support: J.M. Kristensen, M.I. Rahman, Samsung Electronics, Al-
tera Corporation, and Intel Corporation. This is the authors’ version of a
paper published in the proceedings of the 21st IEEE Canadian Conference
on Electrical and Computer Engineering, 2008.
need of exploiting complex dynamic routing mechanisms [2],
increase in power consumption and hardware complexity of
subscriber equipment due to additional signal processing and
data transmission needed for the forwarding process [3]. An
efficient solution is to integrate multi-hop capability into the
infrastructure of wireless communication systems by means
of fixed Relay Station (RS) deployment. The benefits of in-
corporating fixed relays in wireless communications are: cov-
erage enhancement [1], [4], throughput and capacity enhance-
ments [1], [4], power and cost efficiency [1]. Achieving the
benefits of infrastructure based wireless multi-hop network-
ing requires efficient and low-cost RS hardware design with
consideration of not only performance, but also feasibility,
hardware complexity and power consumption metrics. Al-
though numerous theoretical derivations concerning infras-
tructure based multi-hop wireless networks appear in recent
publications [1], [2], [4], very few consider issues related to
actual hardware implementations of fixed relay stations [5].
Rare examples of infrastructure based relay network testbeds
are either based on low-complexity hardware and simple trans-
mission protocols [5] or exploit IEEE 802.11 protocol imple-
mented on PC based stations [6]. They are, however, not suit-
able for realizing relay support for Metropolitan Area Net-
works (MAN) such as IEEE 802.16j. This presents a need for
research on fixed RS hardware solutions with focus on high
performance and low complexity.
The research described in this paper contributes to this
need by presenting feasibility, performance and complexity
analysis for efficient RS design. Since future wireless sys-
tems are very likely to exploit Orthogonal Frequency Division
Multiplexing / Multiple Access (OFDM(A)) to achieve high
performance at relatively low cost, this work focuses on base-
band processing hardware for OFDM(A) based systems. The
design and evaluation presented in this paper can be useful for
IEEE 802-16j, 802-16e, and 802-16m based networks.
Fig. 1. OFDMA DL frame structure supporting two-hop Time
Division Duplexed (TDD) transmission [7]
2. FUNCTIONAL SPECIFICATION AND INITIAL
ARCHITECTURE
2.1. Functional specification of the system
Figure 1 presents the OFDMA frame structure considered in
this work. The frame consists of blocks such as:
∙ BS preamble - first symbol in the first subframe used for
frame synchronization and initial estimation of BS− >RS
channel at the RS
∙ Frame Control Header (FCH) - fragment containing con-
trol information concerning general frame structure such
as DL map length and subframe duration
∙ DL MAP - structure following the FCH containing de-
tailed information concerning subchannel parameters
and user scheduling. This includes user< − >subchannel
mapping and user burst profiles.
∙ First phase user data bursts - user bit streams transmit-
ted from BS to RS using a number of subchannels ac-
cording to the user allocation scheme transmitted in the
DL map
∙ Relay processing gap - minimal time required for sig-
nal processing at the RS in order to produce OFDMA
symbols for RS− >MS transmission. The gap also in-
cludes a guard time needed for radio circuitry to switch
from Rx to Tx mode. The influence of utilizing various
forwarding methods and digital hardware architectures
on the duration of the processing gap is one the main
investigations presented in this work.
∙ Optional RS preamble - first symbol in the second sub-
frame used for frame synchronization and initial esti-
mation of 푅푆− > 푀푆 channel at the MS
∙ Second phase user data bursts - user bit streams trans-
mitted from RS to MS using a number of subchannels
according to the user allocation
The following are considered to maintain system efficiency:
∙ For AF forwarding, the same Modulation and Coding
(MCS) modes are used in first and second transmis-
sion phase for each subchannel because it is not pos-
sible to change the MCS mode between transmission
phases without signal decoding and re-encoding. For
DF forwarding changing the MCS modes over different
phases [7] is possible. This enables to adjust the MCS
modes for both transmission phases independently.
∙ The length of the second subframe is fixed to 12 OFDMA
data symbols. For AF forwarding both subframes con-
tain 12 OFDMA symbols due to having same MCS lev-
els in both phases. For DF forwarding the first sub-
frame duration in each subchannel is determined by the
MCS chosen for each sub-channel.
∙ Only one user is allocated to each subchannel. Each
user can be allocated to more than one subchannel ac-
cording to a proper scheduling algorithm. Allocation
of spectral resources in the first subframe is based on
maintaining the same user< − >subcarrier allocation
in both transmission phases. Equal amount of infor-
mation is transmitted in the first and second phases.
For DF forwarding this can result in unused spectral
resources in the first phase due to different MCS modes
in both transmission phases as shown in Figure 1.
∙ To draw complexity measures for most demanding cases,
we assume that relaying is used over all the frequency
band.
The OFDMA based RS nodes require a specialized hard-
ware architecture for baseband signal processing. Such an
architecture is proposed in the Subsection 2.2.
2.2. First version of the architecture
The digital part of an OFDM(A) transceiver typically consists
of several dedicated signal processing modules. The RS base-
band processing module integrates functionalities related to
signal reception, processing for forwarding and transmission
to the next network node. Figure 2 shows the proposed struc-
ture of a digital baseband processing module for OFDM(A)
based RS transceivers.
A crucial component of the RS baseband processor is the
amplification/regeneration block which is responsible for the
processing needed for forwarding. This module can be re-
alized as a multicomponent entity with an internal architec-
ture depending on the exploited forwarding method. For AF-
based forwarding, signal processing at the RS is comprised of
multiplying consecutive frequency-domain signal values with
amplification coefficients calculated from SNR value for each
subchannel [7]. Therefore, for AF forwarding a fairly simple
amplification architecture as shown in Figure 3 can be used.
Fig. 2. The proposed digital baseband processing module for
OFDM(A) based RS transceivers
For DF forwarding, signal processing is significantly more
complex as compared to AF due to the need for complete sig-
nal decoding and re-encoding which typically requires imple-
menting components such as ML detector, Viterbi decoder,
convolutional codes (CC) encoder and constellation symbol
demapper/mapper. The proposed structure is shown in Figure
4.
Fig. 3. Proposed architecture of the signal amplification mod-
ule for AF based relaying
The presented regeneration module architecture enables
implementing DF forwarding in OFDM(A) based systems with
CC-based Forward Error Correction (FEC). Such a solution
can be considered to be costly in terms of hardware resource
usage.
3. DESIGN SPACE EXPLORATION
3.1. Hardware Platform Selection
Qualitative metrics and criteria considered in selecting the
hardware platform for the design include the following:
∙ Priority of achieving high system performance for real-
time operation
∙ Since fixed RS is considered, low priority is given to
metrics such as chip area and power consumption
Fig. 4. Proposed architecture of the signal regeneration mod-
ule for DF based relaying
∙ Platform support for component-based design and rapid
system prototyping
∙ Availability of OFDM(A) related Intellectual Property
(IP) components supported by the platform
Based on these criteria, Field Programmable Gate Array
(FPGA) technology has been selected as the hardware plat-
form for further development.
3.2. Pipelined System Architecture
With FPGA as the hardware platform and the RS modular
baseband processor structure presented in 2, an efficient pipelined
system implementation is possible. Figure 5 presents a pipelined
symbol-after-symbol processing scheme with the two-hop OFDM(A)
frame proposed in Section 2. Thanks to the simple data flow
path and constant symbol rate, no complex hardware resource
scheduling algorithms are required for the investigated sys-
tem. Pipeline synchronization can in that case be based solely
on the data flow. This results in relatively low hardware com-
plexity while maintaining high system performance. For static-
scheduled pipelined processing with data-flow synchroniza-
tion and constant input data rate, we suggest two basic re-
quirements for efficient data processing:
∙ Processing throughput of each pipeline stage defined
as number of data tokens processed in a unit of time
should not exceed the input data rate
∙ Pipeline stalls caused by external events and data de-
pendencies should be avoided
Fig. 5. Pipelined two-hop OFDM(A) frame processing
scheme
Direct mapping of the component-based system architec-
ture presented in Section 2 into a pipelined structure presents
two basic challenges for maintaining constant data flow in the
pipeline:
∙ Time Division Duplexed (TDD) system operation re-
quires that processed symbols are generated after the
first phase of transmission is finished. This issue con-
cerns both AF and DF schemes and results in the need
for data buffering within the processing pipeline. Thus
it provides proper subframe synchronization. The data
buffer can be placed between various stages of the pipeline
depending on design requirements and optimization goals
∙ Multi-user system operation and MCS utilization re-
quire that FEC encoding is applied independently for
each user data burst and each subchannel. For DF for-
warding this presents a challenge: data needs to be de-
coded to bit level and re-encoded with a proper FEC
scheme for each subchannel separately. Since FEC en-
coders and decoders are state-based devices, this im-
plies maintaining encoder and decoder states for each
subchannel while processing each OFDM(A) symbol.
A solution is to constantly store trellis state of the Viterbi
decoder and CC register state for each subchannel and
to switch the encoder and decoder processing contexts
each time a different subchannel is processed.
The TDD operation issue is solved by placing a buffer in
the processing pipeline. For AF forwarding it is placed at
the end of the pipeline to minimize the processing gap be-
tween subframes. This enables the stored samples to be out-
put directly to the Tx part without unnecessary delays. For
DF forwarding, buffering can be performed on bit level by
storing user data after the puncturing stage. This minimizes
the buffer size since data stored in bits or short bit sequences
result in smaller buffer size as compared to storing FFT points
as in the case of the AF scheme.
Fig. 6. Modified architecture of the signal regeneration mod-
ule used for DF forwarding
The FEC processing issue is solved by combining the idea
of parallel subchannel processing with a simplification to the
system structure based on the system model considered in this
work. The Line-Of-Sight (LOS) conditions in the 퐵푆− >
푅푆 channel and the deployment of the relays at strategic po-
sitions in the cell enable using high modulation levels (e.g.,
64QAM) without the need for FEC [8]. This simplifies the
baseband processing pipeline structure by removing the Viterbi
decoder at the RS.
Additionally, to maintain system synchronization on bit
level, all bit-level components such as parallel CC encoders,
puncturer and multi-level bit buffer are integrated in a single
hardware component. Such solutions enable constant and ef-
ficient data flow in the processing pipeline while maintaining
relatively low hardware complexity. Based on the above stat-
ments, a modified structure of the signal regeneration module
used in case of DF forwarding is proposed and presented in
Figure 6.
4. IMPLEMENTATION RESULTS
A reference design realizing the IEEE 802.16e (Mobile WiMAX)
PHY layer provided by Altera Corporation [9] is used as a
supplementary source of IPs. The overall architecture is shown
in Figure 7. The inner details of the developed blocks can be
found in [10].
Table 1 presents the results for the component cycle count
evaluation. Due to operation in two separate clocking do-
mains (the slow clocking domain is used at the boundary com-
ponents of the system such as cyclic prefix removal and cyclic
prefix insertion with the slow clock frequency matching ex-
actly the input data rate for proper I/O synchronization. The
fast clocking domain covers all signal processing blocks in
order to increase system performance and efficiency in hard-
ware resource usage), the CP removal and CP insertion com-
ponents are evaluated only in terms of the fast clocking fre-
quency. Cycle count measures are obtained for the most de-
manding case, i.e., 64QAM modulation. An evaluation of the
minimal system clock frequency can be performed using the
obtained cycle count values. The terms 푓푠푢푓 and 푓푛푒푐 rep-
Fig. 7. RTL RS baseband processor architecture
resent the minimal and necessary system clock frequencies,
respectively. In order to prevent data loss or pipeline stalls,
푓푛푒푐 guarantees that each data token doesn not overlap. Due
to back-to-back OFDM(A) symbol arrangement in the frame,
data input period equals to the OFDM(A) symbol duration
which is 푇퐼푁 = 102.8휇푠 [11]. In case of AF forwarding, the
calculated clock frequency values are: 푓퐴퐹푛푒푐 = 19.9푀퐻푧 and
푓퐴퐹푠푢푓 = 87.2푀퐻푧 The bottleneck component responsible for
high 푓푠푢푓 value in this case is the signal amplification block
with 8966 busy cycles within a single symbol processing pe-
riod.
The architecture is coarse-grained data processing oriented.
Therefore the components operate on entire data packets and
do not process data from the input before a complete output
data packet is produced. The sufficient clock frequency value
in that case is also the necessary minimal system clock fre-
quency which is 푓퐴퐹푚푖푛 = 87.2푀퐻푧. In case of DF forward-
ing the calculated frequency range is expressed by: 푓퐷퐹푛푒푐 =
89.6푀퐻푧 and 푓퐷퐹푠푢푓 = 166.3푀퐻푧 In case of DF implemen-
tation, the bottleneck component is the EQ block which as in
case of the signal amplification module determines the system
clock frequency at the 푓푠푢푓 value of 푓퐷퐹푚푖푛 = 166.3푀퐻푧.
The differences in < 푓푛푒푐, 푓푠푢푓 > ranges illustrate differ-
ent requirements for clocking frequency in case of both de-
signs. The DF implementation requires approximately two
Component
name
Latency cy-
cles (퐶퐿퐴푇 )
Output cy-
cles (퐶푂푈푇 )
Non-idle
cycles
(퐶퐿퐴푇 +
퐶푂푈푇 )
CP removal - 2048 2048
FFT/IFFT 3712 2048 5760
Pilot extrac-
tion (AF)
1890 1729 3619
Pilot extrac-
tion (DF)
1890 1537 3427
EQ 7878 9216 17094
Symbol
demapper
24 9216 9240
Symbol
mapper
(64QAM
case)
12 9216 9228
Pilot in-
sertion
(64QAM
case)
9211 2048 11259
Signal am-
plification
6918 2048 8966
CP insertion 2048 - 2048
Table 1. Cycle counts obtained from component simulations
Component
name
ALMs
(푁퐴퐿푀 )
RAM bits
(푁푅퐴푀 )
9-bit mul-
tipliers
(푁푀푈퐿푇 )
CP removal 101 65536 0
FFT/IFFT 725 311296 36
Pilot extrac-
tion (AF)
134 55296 0
Pilot extrac-
tion (DF)
134 55296 0
EQ 1455 57600 0
Symbol
demapper
430 512 0
Symbol
mapper
61 272 0
Pilot inser-
tion
147 49152 0
Signal am-
plification
1004 58176 0
CP insertion 111 131072 0
bit-level
buffer (DF)
(estimated
value)
0 110592 0
symbol-
level buffer
(AF) (es-
timated
value)
0 655360 0
Table 2. Component synthesis results on Altera Stratix II
platform
times higher clocking frequency as compared to AF, which
can significantly affect the system’s power consumption. More-
over, eliminating the bottleneck lying in the channel EQual-
izer (EQ) block internal architecture (namely a costly divi-
sion needed to remove the channel coefficient) can lead to
clock frequency reduction to a value not smaller than 푓퐷퐹푛푒푐 =
89.6푀퐻푧. On the other hand, eliminating bottlenecks in
the AF implementation can lead to clock frequency as low
as 푓퐴퐹푛푒푐 = 19.9푀퐻푧.
The component synthesis results1 enable to evaluate the
designs in terms of utilization of specific hardware resources
as well as to calculate overall hardware complexity cost. Ta-
ble 2 presents hardware utilization measures of the imple-
mented components.
For the Altera Stratix II family, hardware complexity is
characterized by the number of Adaptive Logic Module units
(푁퐴퐿푀 ), embedded RAM bits (푁푁푅퐴푀 ), and dedicated 9-
bit multipliers (푁푁푀푈퐿푇 ). In order to evaluate and compare
hardware complexity of various system components, the fol-
lowing hardware complexity cost function is used:
퐻푊퐶푂푀푃 = 훼푁퐴퐿푀 + 훽푁푁푅퐴푀 + 훾푁푁푀푈퐿푇
The terms 훼, 훽 and 훾 are weights reflecting the significance
of each metric. Weight values are selected according to the
number of resources available for each type typically placed
on a single FPGA chip. Analysis of the Stratix II family ar-
chitecture results in applying weights of 훼 = 100, 훽 = 1 and
훾 = 10000 based on average proportions concerning on-chip
resource availability [10].
The results of hardware complexity evaluation are shown
in Figure 8. The evaluation shows two main observations:
∙ FFT / IFFT blocks require significantly more hardware
resources as compared to other system components
∙ Symbol-level buffering utilized in AF based schemes
introduces significant RAM consumption due to the need
of storing complex numbers in contrast to storing short
bit sequences as in case of DF implementation.
5. CONCLUSION
Two signal forwarding methods - AF and DF - were investi-
gated and evaluated in terms of implementation feasibility,
hardware requirements and processing performance. Both
forwarding methods prove to be feasible in terms of hard-
ware implementations in OFDM(A) based wireless relay net-
works. The implementation models developed meet all the
design constraints for real-time operation at achievable sys-
tem clock frequencies. The proposed pipelined processing ar-
chitecture introduces high processing performance which en-
ables to minimize the processing gap to zero.
1Obtained by using the Quartus II environment
Fig. 8. Hardware complexity evaluation of system compo-
nents synthesized on Stratix II FPGA
In terms of overall hardware complexity reflecting hard-
ware resource requirements relatively to their availability, the
AF implementation is more complex due to higher memory
usage as compared to DF based systems. Interestingly, this
observation contradicts the assumptions typically found in the
literature. On the other hand, DF baseband processing hard-
ware requires higher clocking frequencies which results in
increased power consumption at the RS. Feasibility of both
methods depends on specific application characteristics and
limitations such as availability of power supply or hardware
resources at the RS.
6. REFERENCES
[1] R. Pabst et al., “Relay-based deployment concepts for wireless
and mobile broadband radio,” IEEE Communications Maga-
zine, , no. 42, pp. 80–88, September 2004.
[2] H. Li and D. Yu, “Comparison of ad hoc and centralized mul-
tihop routing,” October 2002.
[3] A. Lindgren and O. Schelen, “Infrastructured ad hoc net-
works,” August 2002.
[4] C. Hoymann et al., “Fireworks. Flexible Relay Wireless
OFDM-Based Networks. Cellular Deployment Concepts for
Relay-Based Systems,” Technical report, Infor- mation Society
Technologies, January 2007.
[5] A. Bletsas and A. Lippman, “Implementing cooperative diver-
sity antenna arrays with commodity hardware,” IEEE Commu-
nications Magazine, , no. 44, pp. 33–40, December 2006.
[6] J. Bicket et al., “Architecture and Evaluation of an Unplanned
802.11b Mesh Net-work,” August 2005.
[7] B. Can, H. Yomo, and E. De Carvalho, “Link adaptation
and selection method for ofdm based wireless relay networks,”
JCN Journal on MIMO OFDM and its Applications, June
2007.
[8] B. Can, Link Adaptive Transmission Techniques for OFDMA
Based Wireless Relay Networks, Ph.D. Thesis, Aalborg Uni-
versity, Denmark, 2008.
[9] Altera Corporation, “A Scalable OFDMA Engine for Mobile
WiMAX,” 2006.
[10] M. Portalski, “Hardware aspects of fixed relay station design
for ofdm(a) based wireless relay networks,” 2007.
[11] WiMAX Forum, “Mobile WiMAX - Part I: A Technical
Overview and Performance Evaluation,” Tech. Rep., 2006.
