Energy-efficient design and implementation of turbo codes for wireless sensor network by Li, Liang
University of Southampton Research Repository
ePrints Soton
Copyright © and Moral Rights for this thesis are retained by the author and/or other 
copyright owners. A copy can be downloaded for personal non-commercial 
research or study, without prior permission or charge. This thesis cannot be 
reproduced or quoted extensively from without first obtaining permission in writing 
from the copyright holder/s. The content must not be changed in any way or sold 
commercially in any format or medium without the formal permission of the 
copyright holders.
  
 When referring to this work, full bibliographic details including the author, title, 
awarding institution and date of the thesis must be given e.g.
AUTHOR (year of submission) "Full thesis title", University of Southampton, name 
of the University School or Department, PhD Thesis, pagination
http://eprints.soton.ac.ukUNIVERSITY OF SOUTHAMPTON
Energy-ecient design and
implementation of turbo codes for
wireless sensor network
by
Liang Li
A thesis submitted in partial fulllment for the
degree of Doctor of Philosophy
in the
Faculty of Physical and Applied Science
School of Electronics and Computer Science
November 2012UNIVERSITY OF SOUTHAMPTON
ABSTRACT
FACULTY OF PHYSICAL AND APPLIED SCIENCE
SCHOOL OF ELECTRONICS AND COMPUTER SCIENCE
A thesis submitted in partial fulllment for the
degree of Doctor of Philosophy
by Liang Li
The objective of this thesis is to apply near Shannon limit Error-Correcting Codes
(ECCs), particularly the turbo-like codes, to energy-constrained wireless devices, for
the purpose of extending their lifetime. Conventionally, sophisticated ECCs are applied
to applications, such as mobile telephone networks or satellite television networks, to
facilitate long range and high throughput wireless communication. For low power ap-
plications, such as Wireless Sensor Networks (WSNs), these ECCs were considered due
to their high decoder complexities. In particular, the energy eciency of the sensor
nodes in WSNs is one of the most important factors in their design. The processing
energy consumption required by high complexity ECCs decoders is a signicant draw-
back, which impacts upon the overall energy consumption of the system. However, as
Integrated Circuit (IC) processing technology is scaled down, the processing energy con-
sumed by hardware resources reduces exponentially. As a result, near Shannon limit
ECCs have recently begun to be considered for use in WSNs to reduce the transmission
energy consumption [1,2]. However, to ensure that the transmission energy consumption
reduction granted by the employed ECC makes a positive improvement on the overall
energy eciency of the system, the processing energy consumption must still be carefully
considered.
The main subject of this thesis is to optimise the design of turbo codes at both an
algorithmic and a hardware implementation level for WSN scenarios. The communi-
cation requirements of the target WSN applications, such as communication distance,
channel throughput, network scale, transmission frequency, network topology, etc, are
investigated. Those requirements are important factors for designing a channel coding
system. Especially when energy resources are limited, the trade-o between the require-
ments placed on dierent parameters must be carefully considered, in order to minimise
the overall energy consumption. Moreover, based on this investigation, the advantages
of employing near Shannon limit ECCs in WSNs are discussed. Low complexity and
energy-ecient hardware implementations of the ECC decoders are essential for the
target applications.iv
A systematic approach to the study is employed, by considering the star network topol-
ogy before the more intricate multi-hop topology. The investigation concludes that in
a star network, decoders are not typically required in the sensor nodes. Therefore, the
near Shannon limit coding gain that is oered by a turbo-like code can provide a sig-
nicant energy saving for the energy-constrained sensor nodes. By contrast, in the case
of multi-hop networks, decode-and-forward is typically applied, requiring the decoder to
be employed in the sensor nodes. These decoders consume additional energy that erodes
the energy saving provided by the turbo-like code.
To realise a low complexity and a high energy eciency for a turbo decoder, it is essential
to nd the most desirable xed-point parameterisation for the hardware implementa-
tions. This must be chosen to have the best trade-o between the decoding performance
and the energy consumption. Previous research has shown that the best choice is highly
dependent on a wide range of factors. Therefore, an ecient method to nd the op-
timal parameterisation is proposed for designing an energy ecient turbo code. More
specically, a framework using EXtrinsic Information Transfer (EXIT) chart analysis is
developed to investigate desirable xed-point parameterisation for turbo decoders.
Conventional turbo decoder architectures have been targeted for high throughput appli-
cations, such as the 3rd Generation Partnership Project (3GPP) Long Term Evolution
(LTE) and Digital Video Broadcasting (DVB) standards. However, the communication
requirements of WSNs are dierent from those conventional turbo codes applications.
Motivated by this dierence, a low complexity energy-ecient Look-Up Table based Log-
arithmic Bahl-Cocke-Jelinek-Raviv (LUT-Log-BCJR) decoder architecture is proposed.
The proposed architecture employs an order of magnitude fewer gates than the most
recent LUT-Log-BCJR architectures, facilitating a 71% energy consumption reduction,
compared to comparable conventional turbo decoders. Moreover, by considering the
trade-o between the transmission and decoding energy consumption, the proposed ar-
chitecture facilitates a 10% reduction in the overall energy consumption at transmission
ranges above 39 m, compared with the other type of widely used turbo decoder, the
Maximum Logarithmic Bahl-Cocke-Jelinek-Raviv (Max-Log-BCJR) decoder.
Finally, based on the studies above, a framework for estimating the decoding energy
consumption of a turbo code at an early design stage is proposed. Furthermore, a holistic
design method for parameterising the turbo code design and optimising the overall
energy eciency is proposed. This is in contrast to the conventional design method,
which lacks the capability to estimate the decoding energy consumption of a turbo code
during the code design stage. Therefore, computation complexity is conventionally used
to trade o with the decoding performance, without optimising the parameterisations of
a turbo code for the purpose of overall energy saving. In this thesis, an example turbo
code design is developed using the proposed holistic design method, in order to minimise
the overall energy consumption.Acknowledgements
Numerous people have supported me during my graduate research in diverse ways, with-
out whom the completion of this thesis would be impossible. Only a few words here
could not adequately capture all my appreciation.
I would like to express my heartfelt gratitude to my supervisors, Prof. Lajos Hanzo,
Prof. Bashir Al-Hashimi and Dr. Rob Maunder, for their exceptional supervision,
insightful guidance and overall for their supreme friendship. Their constant inspiration
and unfailing encouragement have greatly beneted me. Especially, their diligence and
endless energy deserve my sincere respect. I may not complete this thesis without
those thorough discussions with Dr. Rob Maunder. I am very grateful for his careful
explanation and unreserved teaching.
Many thanks also to my colleagues and the stas of the Communications Group and
ESD Group for all the useful discussions and comments throughout my research. I wish
to specially thank Dr. Soon Xin Ng, Dr. Amit Acharyya, Xin Zuo, Sheng Yang and
Jatin N. Mistry for fruitful discussions as well as for their friendship. Many thanks to
my atmates Dr. Yang Qin and Ning Wang for their help and friendship throughout
my Ph.D. study.
As always, I would express my sincere appreciation to my dear parents for their love,
unconditional support as well as for their cultivation, without whom I would not have
reached where I am.
vTo my family
viiList of Publications
1. L. Li, R. G. Maunder, B. M. Al-Hashimi, and L. Hanzo, \An energy-ecient error
correction scheme for IEEE 802.15.4 wireless sensor networks," IEEE Transactions
on Circuits and Systems II, vol. 57, no. 3, pp. 233-237, 2010.
2. L. Li, R. G. Maunder, B. M. Al-Hashimi, and L. Hanzo, \Design of xed-point
processing based turbo codes using extrinsic information transfer charts," in Pro-
ceeding of IEEE Vehicular Technology Conference, Ottawa, Canada, 2010, pp. 1-5.
3. L. Li, R. G. Maunder, B. M. Al-Hashimi, and L. Hanzo, \A low-complexity turbo
decoder architecture for energy-ecient wireless sensor networks, " IEEE Trans-
action on Very Large Scale Integration (VLSI) Systems, in press.
ixContents
Acknowledgements v
List of Publications ix
List of Symbols xv
1 Introduction 1
1.1 Communications in out-door WSNs . . . . . . . . . . . . . . . . . . . . . . 3
1.1.1 Coverage and communication range . . . . . . . . . . . . . . . . . 4
1.1.2 Date rate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.1.3 Network topology . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.1.4 Frequency bands . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.1.5 Scalability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.1.6 Lifetime . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.1.7 Energy consumption . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.1.8 Path loss model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.2 Communications in in-door WSNs . . . . . . . . . . . . . . . . . . . . . . 10
1.2.1 Home automation and smart environment applications . . . . . . . 10
1.2.2 Other applications . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.3 Body area networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
1.3.1 Frequency bands . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
1.3.2 Data rate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
1.3.3 Reliability, accuracy and latency . . . . . . . . . . . . . . . . . . . 15
1.3.4 Path loss model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
1.3.5 Energy consumption . . . . . . . . . . . . . . . . . . . . . . . . . . 16
1.3.6 Network topology . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
1.3.7 Security . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
1.4 Error-correcting code solutions in WSNs . . . . . . . . . . . . . . . . . . . 17
1.4.1 Energy-constrained challenge in WSNs . . . . . . . . . . . . . . . . 17
1.4.2 ECCs for energy ecient wireless communication . . . . . . . . . . 20
1.5 Objectives and organisation of the thesis . . . . . . . . . . . . . . . . . . . 21
1.6 Novel Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
2 Turbo codes, EXIT charts and the xed-point representation 27
2.1 Turbo-like codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
2.2 Turbo encoder and decoder . . . . . . . . . . . . . . . . . . . . . . . . . . 33
2.2.1 UMTS/LTE turbo encoder and decoder schematics . . . . . . . . . 33
2.2.2 Log-BCJR algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . 39
xixii CONTENTS
2.3 EXIT chart . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
2.4 Fixed-point numerical representation . . . . . . . . . . . . . . . . . . . . . 46
2.5 Chapter Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
3 A SCCC scheme for star topology WSNs 51
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
3.2 Review of the augmented PHYsical layer . . . . . . . . . . . . . . . . . . . 53
3.3 Module design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
3.4 Module parametrisation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
3.5 Module implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
3.6 Energy consumption analysis . . . . . . . . . . . . . . . . . . . . . . . . . 62
3.7 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
4 EXIT chart based xed-point turbo code parameter design 65
4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
4.2 EXIT chart analysis of the xed-point UMTS/LTE turbo Decoder . . . . 73
4.3 Design study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
4.3.1 Comparison of dierent oating-point log-domain methods . . . . 74
4.3.2 Comparison and analysis of xed-point implementations . . . . . . 75
4.3.2.1 Wrapping technique . . . . . . . . . . . . . . . . . . . . . 78
4.3.2.2 Saturation technique . . . . . . . . . . . . . . . . . . . . . 82
4.3.2.3 Normalisation technique . . . . . . . . . . . . . . . . . . . 84
4.3.2.4 Final validation . . . . . . . . . . . . . . . . . . . . . . . 85
4.4 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
5 A low complexity energy-ecient turbo decoder architecture 93
5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
5.2 Trade o of employing turbo codes for energy saving . . . . . . . . . . . . 95
5.3 Conventional architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
5.4 Proposed LUT-Log-BCJR architecture . . . . . . . . . . . . . . . . . . . . 104
5.4.1 Decomposition of the LUT-Log-BCJR algorithm into ACS opera-
tions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
5.4.2 Proposed architecture . . . . . . . . . . . . . . . . . . . . . . . . . 106
5.4.3 Proposed ACS unit and calculation unit . . . . . . . . . . . . . . . 107
5.4.4 Optimal number of parallel calculation units . . . . . . . . . . . . 111
5.5 Turbo decoder complexity and energy analysis . . . . . . . . . . . . . . . 114
5.6 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117
6 A turbo decoder energy estimation model allowing overall energy op-
timisation 119
6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119
6.2 Generalised architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122
6.2.1 Register banks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124
6.2.2 Calculation unit . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125
6.2.3 Memories . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126
6.2.4 Controller design and decoding time consumption . . . . . . . . . 126
6.3 Energy estimation framework . . . . . . . . . . . . . . . . . . . . . . . . . 133
6.3.1 Calculation unit . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134CONTENTS xiii
6.3.2 Register banks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140
6.3.3 Datapath . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143
6.3.4 Controller . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143
6.3.5 Energy estimation of LUT-Log-BCJR decoders and results vali-
dation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147
6.3.6 Memories . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148
6.3.7 Interleaver . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149
6.3.8 Energy estimation of the turbo decoders and results validation . . 151
6.4 Holistic design method of turbo codes for energy-constrained communi-
cation systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153
6.4.1 Decoding energy estimation . . . . . . . . . . . . . . . . . . . . . . 154
6.4.2 Transmission energy estimation . . . . . . . . . . . . . . . . . . . . 157
6.4.3 Overall energy eciency analysis . . . . . . . . . . . . . . . . . . . 158
6.5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 160
7 Conclusions and future work 163
7.1 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163
7.2 Future work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165
Glossary 171
Bibliography 175
Author Index 193
Subject Index 203List of Symbols
General notation
 The notation ~ x denotes an LLR value that pertains to the bit value x.
 The superscript a is used to indicate an a priori LLR.
 The superscript e is used to indicate an extrinsic LLR.
 The superscript p is used to indicate an a posterior LLR.
 p and q are used to represent the operands of a calculation.
Special symbols
a Memory access rate.
ai A bit sequence having the index i 2 [1;2;3;:::].
A Power amplier eciency.
B The number of BCJR operations performed.
C Turbo code complexity.
bi A bit sequence having the index i 2 [1;2;3;:::].
d Transmission distance.
Dmin
H Minimum hamming distance.
Eb Energy per bit.
Epr Decoding/processing energy consumption.
Etx Transmission energy consumption.
Ecyc Energy consumption per clock cycle.
f Frequency.
Gc Coding gain
h The number of the component encoders in an MCTC.
I The number of decoding iterations performed.
xvxvi LIST OF SYMBOLS
k The number of inputs to each component encoder in a turbo code.
K Constraint length of a convolutional code.
m The number of memory elements (registers) employed in a convolutional code.
n The number of non-systematic outputs generated by each component encoder in a
turbo code.
N The block/interleaver length of a turbo code.
Nbi The number of bits in the bit sequence bi.
N0 Noise power spectral density.
P Power consumption.
Pl Path loss.
p Path loss exponent.
r Receiver noise gure.
R Coding rate.
s Decoding throughput of a decoder.
S A state in a turbo decoding trellis.
S0 Minimum received SNR required to achieve the target BER in an uncoded wireless
communication system.
Sc Minimum received SNR required to achieve the target BER in an coded wireless
communication system.
t Time consumption.
Ti A transition in a turbo decoding trellis.
V;v Supply voltage.
ws The sliding-window length of the sliding-window turbo decoder employed in the
Log-BCJR algorithm.
wp The pre-backward recursion length of the sliding-window turbo decoder employed
in the Log-BCJR algorithm.
 The  values in the Log-BCJR algorithm.
 The  values in the Log-BCJR algorithm.
 The  values in the Log-BCJR algorithm.
 The  values in the Log-BCJR algorithm.Chapter 1
Introduction
A Wireless Sensor Network (WSN) is a network composed of a number of sensor devices,
each having multiple integrated functions [3]. These include sensing and processing the
phenomenons that occur in the surrounding environment, as well as the ability to com-
municate wirelessly with each other and with higher-level networks. Figure 1.1 illustrates
a typical WSN topology, in which a number of sensor nodes transmit information to the
central node. This may then forward the information to a higher-level network, such
as the Internet or a cellular network. Figure 1.1 includes examples of both single- and
Upper Level Network
sensor node
central node
Figure 1.1: A typical WSN conguration.
multi-hop communications from the sensor nodes to the central node. A number of
sensor nodes act as relays in the case of multi-hop communications, receiving messages
from the direction of the source sensor node and retransmitting them in the direction of
the central node. In some cases, the network topology may be more complicated than
that shown in Figure 1.1. For example, there could be more than one central node for
collecting the information and forwarding it to the higher-level network. Alternatively,
the communication between a sensor node and a central node could take place over mul-
tiple routes, rather than just a single route. On the other hand, a simpler star network
12 Chapter 1 Introduction
topology may be employed in some scenarios, where each sensor node uses only a single
hop to transmit directly to the central node. As a result, the topologies of WSNs are
highly exible, allowing them to be deployed in dierent environments for various appli-
cations. One of the unique feature of WSNs, compared to other wireless communication
applications, is that, typically, information only ows in one direction, from the sensor
nodes to the central node. The sensor nodes are typically only required to receive when
relaying each other's messages. In some applications, the sensor nodes may receive a
small amount of control signaling from the central node, but the amount of information
is so small when compared to the information transmitting from the sensor nodes, that
it can be treated as a much simpler case, employing a much simpler communication
scheme.
WSNs have become a popular technology during the last decade, which has been widely
applied in industrial, medical and home applications. With the growing functions of
the sensors, WSNs can be used for continuous monitoring environmental [4] or other
conditions and events for a wide range of applications, from long range geophysical in-
spection [5] to short range personal health care [6]. This has been made possible owing to
advances in wireless communications, digital electronics and Micro-Electro-Mechanical
Systems (MEMS), which allow the sensor devices to be implemented with reduced size,
energy consumption and cost [7, 8]. Wireless technology gives signicant advantages
compared to conventional wired sensor network solutions, including improved network
coverage, scalability, reliability, installation and maintenance cost. These benets are
achieved by avoiding the requirement for wired connections between nodes.
In a WSN, each sensor node has processing and communication capabilities. They
typically also include a number of MEMS sensors, which are able to monitor a selection
of parameters and conditions. These include temperature, humidity, movement, lighting
condition, pressure, soil makeup, noise level, the presence or absence of certain kinds of
objects, mechanical stress levels on attached objects, as well as the current characteristics
such as speed, direction, and size of an object [8]. Because the scope of sensor node
applications is vast, there are a wide variety of requirements for their characteristics,
such as their coverage, data rate, sensor functions and lifetime, for example. Therefore,
there is vast scale of the hardware and software solutions of sensor node implementations.
However, some common features are shared by all sensor nodes [7,9]. The sensors are
deployed either inside the phenomenon being observed, or very close to it. To avoid
interference with the phenomenon, the sensor nodes are typically required to be small
and lightweight. In addition, the sensors are often deployed in large numbers and may be
disposable, in which case the sensor nodes must be low cost. As a result, sensor nodes are
typically powered by batteries that are small, lightweight and inexpensive, or by a limited
energy harvesting ability. Furthermore, the recharging or replacement of the batteries
is typically dicult and avoided, since the deployed sensors are often dicult to re-
collect for battery recharging or replacement. Therefore, the fundamental requirementChapter 1 Introduction 3
of sensor nodes is low energy consumption. Maximising the energy eciency of the
sensor nodes is an important challenge for all WSN implementations. In particular, any
optimisations made for energy saving purposes must not sacrice the other specications
of the design that are required to maintain the WSN functionality. For example, a WSN
designed for a particular scenario is required to meet the particular communication
performance requirements, such as range, reliability and data rate. An optimal design
of the communication system achieves a minimal energy consumption while meeting
all the other requirements. In this way, the lifetime of the WSN may be maximised,
without requiring the replacement or recharging of the sensor node batteries. Therefore,
before studying optimisation approaches for minimising the energy consumption of WSN
communication systems, their communication requirements must be considered carefully.
In this chapter, the communication requirements of WSNs are investigated for dierent
types of WSNs application. The real world applications of WSNs are summarised into
three categories, namely our-door applications, in-door applications and Body Area
Networks (BANs). All the three categories are discussed in order to give an overall
investigation of the WSN communication requirements. Furthermore, based on the
investigation results, turbo and turbo-like channel codes are proposed for reducing the
communication energy consumption of sensor nodes. The potential advantages and
disadvantages of this approach are discussed in Section 1.4.2. Finally, the outline of the
thesis is given in Section 1.5.
1.1 Communications in out-door wireless sensor networks
As described above, the sensors in WSNs are required to monitor dierent parameters
and conditions within the deployed phenomenon. For this reason, the main family of
applications for WSNs is environmental monitoring. In out-door applications, examples
include forest re detection, water quality monitoring [10,11], infrastructure monitoring
[12], precision agriculture [13], geophysical inspection [5,14,15] and habitat monitoring
for animals [16,17]. The purpose of the sensors could be gathering and transmitting
data to the central nodes for recording, or detecting unusual changes in the target
phenomenon and rasing alarms.
In these WSNs applications, the sensor nodes are typically required to be deployed over
a large area such a forest, a mine plant, a suspension bridge, a stretch of river or a farm,
naturally resulting in a relatively long-range communication requirement. In addition,
a long lifetime of the order of months or years is typically required without recharging
or replacement of the sensor node batteries. However, the data rate requirement is
typically low, since the monitored signals often have a low dynamic range and low
frequency. Furthermore, in many cases, data delivery delays of several minutes do not
prevent the WSN from fullling its purpose [4]. Table 1.1 summarises the communication
specication of several WSN deployments for out-door environmental monitoring that4 Chapter 1 Introduction
have been reported in the literature. In the following sub-sections, the communication
Pub Application Number
of sen-
sors
Channel
through-
put1
Transmission
frequency
Communication
distance
Coverage
area
[15] Landslide detec-
tion
150 - 2.4 GHz 50 m -
[14] Volcano monitor-
ing
16 100 Kb/s 2.4 GHz 200-400 m 3 km lin-
ear area
[13] Precision agricul-
ture
27 153 Kb/s 868 MHz 300 m to gate-
way node
5000 m2
[16] Habit monitoring - 40 Kb/s 916 Mhz 3600 m 959 km2
[4] Cattle monitor-
ing
28 72 Kb/s 433 MHz 500 m -
[4] Ground water
quality monitor-
ing
9 72 Kb/s 915 MHz 800 m 6 km2
[18] Mountain en-
vironmental
monitoring
- 76 Kb/s 868 MHz - 900 m lin-
ear area
[17] Habit monitoring 98 128 Kb/s 453 MHz - 9000 m2
Table 1.1: A summary of the typical communication requirements of a selection of
studies of environmental monitoring WSNs.
requirements of out-door environmental monitoring WSNs are discussed, based on the
investigation results of previous work.
1.1.1 Coverage and communication range
As shown in Table 1.1, the coverage and the communication range requirements of WSNs
are diverse, since they depend on the particular scenario. Here, the coverage depends
upon the area of the target forest, farm or factory, for example, while the communication
range additionally depends on the sensor nodes' deployment density that is required
in order to obtain the desired monitoring accuracy when the multi-hop technique is
employed. In many cases, the communication range requirements vary from tens to
hundreds of metres. In some particular cases, the communication range can be thousands
of metres. Figure 1.2 summarises the communication range, computational capacity and
storage capacity of various sensor node devices. The specied communication ranges
agree with the communication range requirements that are summarised in Table 1.1.
Since the small size requirement of the sensor nodes imposes a corresponding limitation
upon the transmitter, antenna and batteries, typical WSN communication range in out-
door scenarios can be considered to be relatively long distances. Moreover, the path loss
of the transmitted signal between a pair of sensor nodes may be as high as the fourth
1Note that the channel throughputs are the maximum channel throughput of the employed sensor
node devices, not the actual requirements of the applications.Chapter 1 Introduction 5
Figure 1.2: Classication of WSN sensor node devices based on communication range,
computational capacity and storage capacity. c V. Potdar et al. 2009 [19]
order exponent of the distance between them. This is because the antennas of the sensor
nodes are close to the ground and because WSNs are typically deployed in complex
environments [8]. Therefore, maintaining reliable communication across the WSN is a
challenge when designing the communication systems of sensor nodes. In particular,
the energy consumed by transmission is typically a large portion of the overall energy
consumption of the sensor nodes. For example, the investigation in [20] shows that, in a
Rayleigh fading channel with fourth power distance loss, the energy cost of transmitting
1 Kb over a distance of 100 metres is approximately the same as executing 3 million
instructions by a 100 Million Instructions Per Second (MIPS) general-purpose processor.
This is far beyond the signal processing requirement of a typical sensor node.
1.1.2 Date rate
Compared with other wireless applications, such as cellular networks and Wireless Local
Area Networks (WLANs), the typical data rate of environmental monitoring WSNs is
relatively low. In Table 1.1, the data rate of the considered schemes range from tens
of Kb/s to hundreds of Kb/s. Moreover, the channel throughput is often signicantly
higher than the required data rate, enabling low duty cycle communications between
sensors. In general, WSN applications are expected to have relatively low data rate
requirements compared with conventional wireless communication applications, such as
mobile phones and WPANs. This can be exploited to reduce the energy consumption of
the sensor nodes.6 Chapter 1 Introduction
1.1.3 Network topology
The network topologies of environmental monitoring WSNs vary from application to
application. A simple star network is only suitable for small scale WSNs with limited
numbers of sensor nodes and limited coverage. However, as discussed above, many
applications are required to cover very large areas measured in km2. Therefore, multi-
hop technology is generally applied to reduce the communication range requirement
for each transmitting node. Here, sensor nodes situated between the source node and
the central node are employed as relays. The transmitted information is received and
retransmitted by the relays, reducing the communication range of each transmission.
For example, a 46-hop system comprising 64 sensor nodes has been deployed on the
Golden Gate Bridge to monitor the health of its infrastructure [12]. This reduces the
maximum communication range from 4200 feet to just 75 feet. Although the number of
sensor nodes deployed in the examples of Table 1.1 are mostly several tens, these WSNs
were developed for the purpose of research work. In real life applications, the number of
sensor nodes and the network coverage area can be several orders of magnitude higher [8].
More than one central node or gateway node may be included in a WSN for collecting
data from the sensors and communicating with higher-level networks. In addition, in
some cases, a single network may comprise several interconnected sub-networks having
dierent topologies [21]. Furthermore, for habitat monitoring applications, sensor nodes
may be attached to the target animals, which causes their positions to change with time.
In these cases, the network becomes ad-hoc, requiring self-organisation, since the network
topology can be expected to change over time. Similarly, in other applications, the
network topology can also change owing to sensor movement or temporary inaccessibility.
In these cases, maintaining the network topology becomes a signicant a challenge.
1.1.4 Frequency bands
The frequency bands used by WSNs may need to be available globally, if they are to
be used world wide. Furthermore, WSNs must not interfere with other networks, since
they are often deployed in places like factories, hospitals and oce buildings, where
other wireless communication services may be used at the same time. As listed in
Table 1.2, Industrial Scientic and Medical (ISM) bands present an option, since they
are license-free in most countries.
According to [22], certain hardware constraints imposed by the sensors and the tradeo
between antenna eciency and power consumption limit the choice of carrier frequency
for WSNs to the ultrahigh frequency range. Indeed, the 433.05 - 434.79 MHz, 902 - 928
MHz and 2400 - 2500 MHz bands are typically employed in environmental monitoring
WSNs, as shown in Table 1.1. In addition, the 868 MHz band, which is approved by
European Telecommunications Standards Institute (ETSI) in Europe, is also used inChapter 1 Introduction 7
Frequency bandwidth Center frequency
30 kHz 6780 kHz
14 kHz 13,560 kHz
326 kHz 27,120 kHz
40 kHz 40.68 MHz
1.74 MHz 433.92 MHz
26 MHz 915 MHz
100 MHz 2450 MHz
150 MHz 5800 MHz
250 MHz 24.125 GHz
500 MHz 61.25 GHz
1 GHz 122.5 GHz
2 GHz 245 GHz
Table 1.2: Frequency bands available for ISM applications.
some WSNs. On the other hand, higher transmission frequencies allow smaller antennas
[4]. As a result, the 902 - 928 MHz and 2400 - 2500 MHz bands have been popular
choices in the latest environmental monitoring WSNs.
1.1.5 Scalability
The number of sensor nodes deployed in an environmental monitoring WSN depends
upon the required network coverage and sensor deployment density. This may exceed
thousands of sensor nodes [8]. Indeed, more than 1000 sensors were deployed in the
experimental environmental monitoring WSN of [23] and it was suggested that more
than 10000 sensors would be required for a real-life network having a coverage 5 km2. In
addition, the deployment densities of WSNs are also highly variable. In [24], the sensor
nodes density was as high as 20 nodes/m2. By contrast, in [14], only 16 sensor nodes
were required to cover a 3 km linear area. Therefore, a high scalability is required for
WSN protocols and algorithms, in order to deal with dierent challenges in dierent
scales of network. For example, the high interference of high-density networks or the
long communication range for low-density networks.
1.1.6 Lifetime
Depending on the application, the required lifetime of an environmental monitoring
WSN may range from some hours to several years. For example, in [4,14,16], the lifetime
specication of the deployed WSNs are 19 days, nine months and 1.5 years, respectively.
In the general case, a maximal WSN lifetime is desirable during the design, which leads
to a requirement for high energy eciency and reliability. Due to the small size of the8 Chapter 1 Introduction
sensor nodes, maintaining a lifetime of the order of days is a signicant challenge. In
cases of lifetimes of the order of years, the sensors must enter a sleep mode for most of
the time, when sensing and transmitting are not required [25].
1.1.7 Energy consumption
As discussed, the WSN sensor nodes may be very small, so that they are cheap to fab-
ricate, easy to deploy and so that they do not interfere with the phenomenon that they
are intended to monitor. With the latest development in MEMS, signal processing and
wireless communication technologies, the sensor nodes can be manufactured with the
volume of several cubic centimetres or even smaller. For example, in [4,16], the size of
the sensor nodes are reported as 4:93:71:2 cm3 and 5060 mm2, respectively. These
small sizes render the corresponding WSNs naturally energy-constrained, even though
most of the sensor node volume is occupied by batteries in many cases. In addition,
WSNs are also expected to have a long lifetime without the requirement for battery re-
placement while performing reliable communications over the specied communication
distance. Hence, on board energy harvesting systems have been developed for providing
an energy supplement for WSNs. Potential energy sources include solar, piezoelectric,
vibration, thermoelectric and acoustic noise [26]. However, owing to the limited size of
sensor nodes, current embedded energy harvesting systems only oer limited compen-
sation of the energy constrained situation of WSNs. Table 1.3 gives some examples of
the power generation potential of several energy harvesting technologies for WSNs [27].
A summary of some previous works in [19] shows that the power consumptions of the
Energy harvesting technology Power density
Solar cells (outdoors at noon) 15 mW/cm2
Piezoelectric (shoe inserts) 330 W/cm3
Vibration (small microwave oven) 116 W/cm3
Acoustic noise (100 dB) 960 nW/cm3
Table 1.3: Power densities of harvesting technologies.
signal processing systems in WSNs can vary from several milliwatts to several tens of
milliwatts. The transmission power can reach 20 dBm (100mW) when the communica-
tion distance is of the order of tens of metres and increases exponentially as the distance
increases. As a result, even solar cells do not generate as much power as is consumed by
a typical wireless sensor in environmental WSNs. Therefore, the energy consumption of
the communication system on sensor nodes is a very important issue for WSNs. On the
other hand, in some cases the central node of a WSN has abundant energy resources,
since it can typically aord a larger size than the sensor nodes or has access to mains
power. If not, it is also easier to replace or recharge the batteries, since there are usually
few central nodes in a WSN.Chapter 1 Introduction 9
1.1.8 Path loss model
Path loss has a signicant impact on the design of communication systems for WSNs.
Therefore, the WSN communication scheme should exploit the begin channel charac-
teristics oered by short range communications, but should also cope with the malign
channel characteristics that are associated with long ranges. Depending on the envi-
ronment of the target scenario, path loss can be modelled in a number of ways when
using simulations to estimate the performance of the designed system [1,2,28,29]. How-
ever, regardless of the particular target environment, the path loss Pl is described as a
function of the transmission distance d. A model that has been widely used [1,2,28{33]
increases the mean path loss exponentially with transmission distance d, according to
Pl(d) /

d
d0
p
; (1.1)
where d0 is a reference distance and p is the path loss exponent, which indicates how fast
the path loss increases with transmission distance d. Typically, the path loss obtained
using Equation 1.1 is expressed in the logarithmic domain, using decibels.
In [29], wireless communications in various environments for WSNs was investigated us-
ing carrier frequency of 915 MHz. The investigation particularly considered the situation
when sensors lie on or near the ground as opposed to previously developed models, which
assumed that a person is holding the transceiver at a height of 1.5 m from the ground.
The study also considered the dierent multipath characteristics of urban environments,
open terrain, woody terrain and woody/hilly terrain separately. The proposed path loss
model is given by
Pl(d)[dB] = X + 10plog10

d
d0

; (1.2)
where X models the random shadow fading eect using a zero-mean Gaussian distribu-
tion having a standard deviation in X expressed in decibels. The measured path loss
exponents p and shadow fading standard deviation X are summarised in Table 1.4.
Terrain p (dB) X (dB)
Open 3.41 4.70
Woody 2.35 4.37
Woody and Hilly 2.90 4.17
Table 1.4: Summary of measured eld results.
The most signicant factor that can be abstracted from the environment to model the
path loss is the path loss exponent p. During the design of the wireless communication
systems for WSNs, an appropriate path loss model should be employed to evaluate the
performance and the energy eciency of the system over a range of typical transmission
distances in the target scenario. This is particularly important for WSNs, since it is10 Chapter 1 Introduction
important to ensure both high quality performance and high energy eciency. The
evaluation of the WSN design using an appropriate path loss model can help to strike a
good balance between these two characteristics of the design.
1.2 Communications in in-door wireless sensor networks
1.2.1 Home automation and smart environment applications
WSNs can be also applied for home applications. For example, home automation tech-
nology uses smart sensor nodes buried in appliances, such as vacuum cleaners, micro-
wave ovens and refrigerators, to manage them [34]. Environmental monitoring WSNs
can be also deployed for assisted living. In contrast to out-door WSNs, home-based
WSNs can exploit other available wireless networks such as WLAN or Bluetooth net-
works, when the size of the sensor nodes is aordable. If these networks cannot be
utilised, the communication requirements are similar to those of out-door WSNs appli-
cations, except that the communication distance is much shorter. More specically, in
the home environment, the communication range varies from several metres to several
tens of metres. As a result, the required transmission power is typically much lower
than in out-door WSNs, but interference becomes a more important issue, since the de-
ployment density can be high within the relatively small area of a home, and the WSN
must co-exist with other wireless networks.
1.2.2 Other applications
There are numerous of other applications for in-door WSNs, such as the environmental
monitoring of oce buildings, green houses and factories. The characteristics of these
applications lie in-between those of out-door and home-base WSNs. For example, the
typical communication distance in these applications is longer than in home-based WSNs
but shorter than in long-range out-door WSNs. Here, the transmission distance varies
from several metres to several hundreds of metres.
In addition, the propagation environment of in-door applications is dierent from the
out-door applications. The reection and absorption of the transmitting signals by walls
and other in-door obstacles must be considered [35].
In [28], indoor wireless communication in multi-oored buildings is investigated for car-
rier frequencies of 914 MHz. This work also concluded that the path loss does not
depend on the carrier frequency signicantly in the range 900 MHz to 4 GHz. A path
loss model is proposed, which is dened as the path loss from the transmitter to theChapter 1 Introduction 11
reference distance d0, plus the additional path loss described by Equation 1.3 in decibels,
Pl(d)[dB] = Pl(d0)[dB] + 10plog10

d
d0

: (1.3)
A d0 = 1 m reference distance was chosen and Pl(d0) was assumed to be due to free
space propagation. The study considered 14 dierent scenarios, including transmission
within dierent types of buildings and through dierent number of oors. The path loss
exponent p was found to vary from 1.81 to 5.04. A similar model is proposed in [1,2],
according to
Pl(d)[dB] = 20log10

4


+ 10plog10(d); (1.4)
where  is wave length of the transmission signal and d0 is assumed to adopt a default
value of 1 m.
1.3 Body area networks
Body Area Networks (BANs), also referred to as Body Sensor Networks (BSNs), Body
Area Sensor Networks (BASNs) or Wireless Body Area Networks (WBANs), are an
emerging application of short range WSNs in the healthcare industry. These networks
comprise a number of wireless sensors placed on or in the human body for continual
monitoring of physiological parameters such as heart rate, ElectroCardioGraphy (ECG)
data, ElectroEncephaloGraphy (EEG) data, blood pressure, body temperature, motion
and levels of certain chemicals such as sugar, oxygen and medications in the blood [6].
The monitoring of these parameters could be important or even life critical for some
people or patients, such as the ageing population, chronic disease patients, cerebrovas-
cular and cardiovascular disease patients. Long-term monitoring and logging of patients'
physiological parameters could help doctors to provide personalised treatment or dis-
cover risks earlier. For example, the blood sugar level and other related human body
conditions of diabetes patients have to be regularly monitored by using examinations
in hospital and self-examination by the patients. However, this approach is unreliable,
inconvenient and not frequent enough [6]. With BAN solutions, the conditions of the
patients can be monitored and logged continuously, without involving eort from the
patients. For some dangerous diseases, such as coronary heart disease, which could cause
sudden death, BANs are able to monitor and discover the potential danger, raise the
alarm and even contact the hospital before the patient is even aware of the on-setting
problem. BANs [36] can also be applied in sports training or gym applications to help
instructors to monitor the physiological parameters of people undertaking exercise. Fur-
thermore, BANs can be deployed in hospitals or home-based healthcare systems to create
a more comfortable, convenient and economical way to perform the patient monitoring.
Over the past few years, advancements in electronic systems and wireless technologies
have enabled the development of small and intelligent medical sensors which can be12 Chapter 1 Introduction
attached to or implanted into the human body. The healthcare industry is becoming
increasingly interested in using these technologies to develop practical BANs [37]. This
has also stimulated interest in related research areas, such as energy harvesting, signal
processing and communication. In 2008, the Institute of Electrical and Electronics Engi-
neers (IEEE) 802.15 working group established the body area network task group (IEEE
802.15.6), to develop guidelines for using wireless technologies for BANs applications in
various healthcare services.
The scenario of BANs has some unique features compared with the conventional WSN
applications. Figure 1.3 illustrates a typical BAN scenario, which comprises a number of
sensor nodes attached on or implanted in the human body to perform dierent functions
for collecting dierent physiological parameters. A central device such as a smart phone
receives the data from the sensor nodes over a wireless channel, forming a BAN. A BAN
may connect to an external network for communicating with a higher-level system, such
as the hospital, depending on the requirement of dierent scenarios. The number of
Sensor nodes Central node
Figure 1.3: A typical BANs conguration.
sensors required could be variable depending on dierent applications. To monitor one
particular disease, typically only a few (< 3) sensor nodes are required [37]. However, for
more complicated situations, more sensors might be required. Especially, when motion
detection is involved, for example, for people who need after treatment to help recover
mobility. In this case, sensors might be required on every movable part of the body, with
more than one sensor required where accurate motion detection is sought. According
to [6], typically no more than 20 sensors is required for any one person.
Owing to the wide variety of characteristics in BAN applications, they must be carefully
considered when developing communication technologies for BANs.
 Firstly, the size of the sensors is required to be as small as possible for comfort
and convenience issues. Smaller nodes imply limited onboard resources, requiring
the energy consumed by processing, storage, and communication to be traded-o
against the delity, throughput and latency required for the transmitted data [38].
Since one of the purposes of BANs is to create a convenient healthcare system for
patients without support from professionals, the wireless sensors on human bodiesChapter 1 Introduction 13
need to have long lifetimes without any need for maintenance. This leads to an
extreme energy eciency requirement, relying only on battery-stored or harvested
energy.
 Secondly, it is assumed that all the sensors and the central device are always near
to the patient, leading to a very short transmission range requirement for BANs.
Some previous work [6,39,40] suggests that 2-5 metres communication range is
sucient for most BAN applications.
 Thirdly, the transmission power must be limited in order to avoid the detrimental
eects of high power transmission on the health of humans within the vicinity.
The dierences between the scenarios of BANs and the conventional WSNs lead to some
unique communication requirements. Therefore, this section discusses these require-
ments in detail.
1.3.1 Frequency bands
As a special case of WSNs, the frequency bands that are appropriate for use in BANs are
similar to those of conventional WSNs. For example, many previous works are based on
IEEE 802.15.4 standards, which uses 868/915MHz and 2.4 GHz frequency bands [41{44].
Similar frequency bands are also chosen by [45{51]. An alternative solution is to use
Ultra-WideBand (UWB)) technology, which is authorised for communication between
3.1 GHz to 10.6 GHz [52]. [53{57] discussed the potential and the advantages of applying
UWB technology to BANs. The USA Ultra-WideBand (FCC) is considering several
possible frequency bands for use by BANs [58]:
 2300-2305 MHz and 2360-2395 MHz Band: The 802.15.TG6 Group and GE Health-
care (GEHC) propose to use this band for BANs. However, this band is currently
used by several other services, including Aeronautical Mobile Telemetry (AMT),
federal radio location and amateur radio users. This could pose interference and
security problems. The FCC is considering the proposed potential use of these
bands by BANs on a coexistence and non-interference basis.
 2400-2483.5 MHz Band: This band is used by ISM equipment on a non-licensed
basis under the FCC's rules. The FCC seeks comment on whether BANs could
operate in this band under current rules or whether new rules would be required
to regulate BANs using this band.
 Other Frequency Bands: The FCC is seeking comment on whether other frequency
bands may be appropriate for BANs, including the 5150-5250 MHz band, which is
now allocated for federal and non-federal aeronautical navigation and non-federal
xed-satellite use, as well as Unlicensed National Information Infrastructure (U-
NII) devices.14 Chapter 1 Introduction
According to the latest development of the IEEE 802.15.6 task group [59], three PHYsical
layer (PHY) specications are included, the NarrowBand (NB) PHY operating in 402
- 405 MHz, 420 - 450 MHz, 863 - 870 MHz, 902 - 928 MHz, 950 - 956 MHz, 2360-2400
MHz and 2400-2483.5 MHz, the UWB PHY operating at 3993.6 MHz and 7987.2 MHz
with a bandwidth of 449.2 MHz, and the Human Body Communications (HBC) PHY
operating in two frequency bands centred at 16 MHz and 27 MHz, with the bandwidth
of 4 MHz.
1.3.2 Data rate
BANs typically require real-time low data rate communications. For example, the inves-
tigation results from [37], [39] and [60] are summarised in Table 1.5. Figure 1.4, quoted
Healthcare applications H. Li [39] B. Zhen [37] M. Patel [60]
Heartbeat <0.1 Kb/s 0.05 Kb/s -
Body temperature <0.1 Kb/s 0.05 Kb/s <10 Kb/s
ECG 2.5 Kb/s 72 Kb/s 72 Kb/s
EEG 0.54 Kb/s 131.1 Kb/s 86.4 Kb/s
ElectroMyoGraphy (EMG) - 1152 Kb/s 1.536 Mb/s
Blood pressure <0.1 Kb/s 0.05 Kb/s <10 Kb/s
Blood sugar level <0.1 Kb/s - -
Other Blood analysis - 8.192 Kb/s -
Table 1.5: Data rate requirement of dierent applications in BANs.
from [61], summarised the data rate requirement possible ranges of several typical BAN
applications. The investigation of [6] concluded that 500 Kb/s data rate is sucient for
Figure 1.4: Summaries of data rate requirements of a selection of BAN applications.
c Y.-Q. Zhang 2011 et al. [61]Chapter 1 Introduction 15
BAN applications. Note that for low-rate applications, such as heartbeat, body temper-
ature and blood pressure, the specic data rate values that were identied in the various
studies are relatively consistent. However, for some high-rate applications, such as ECG
and EEG, the conclusions are quite varied. These discrepancies may be attributed to
the dierent assumptions of how much pre-processing is applied to the signals before
they are transmitted. For example, if a sensor transmits compressed data rather than
raw data, the data rate requirement could be signicantly reduced. In all the previous
investigations into this topic, the highest data rate requirement is given by [60], which
identied that up to 1.536 Mb/s may be required by EMG monitoring. On the other
hand, some studies include video stream transmission in BANs scenarios, increasing the
data rate requirement to 100 Mb/s [6,60]. However, these scenarios are not typically
considered by BAN applications. According to [62], the 802.15.6 standard has a limita-
tion of the maximum data rate of r 971.4 Kb/s including the communication payload,
which gives a eective throughput of 674.7 Kb/s.
1.3.3 Reliability, accuracy and latency
The performance of a BAN can be quantied by the delay prole, the information loss
rate, the Bit Error Rate (BER) and Frame Error Rate (FER), which must be considered
carefully during the design of the communication system. The prevalence of commu-
nication errors should be less than a strictly dened limit in order to avoid disastrous
behaviour [37,63,64], since an erroneous or erased record could aect a doctor's judge-
ment when making life-critical decisions, for example. On the other hand, transmission
latencies of the order of seconds is not a critical issue in practice, since the reaction of the
hospital, such as sending an ambulance, is associated with a higher order of magnitude
delay.
In addition, when a human body moves, the attached sensors will change positions rela-
tive to each other. Furthermore, when the environment changes, the channel conditions
will also change and aect the network performance. Despite these varying conditions,
BANs are required to maintain reliability. Consideration must be given to the various
dierent activities that can be conducted by human bodies, such as walking, running
and turning, for example. The human body may move through various transmission
environments, such as tunnels, subways and parks, for example. Each of these is as-
sociated with dierent propagation characteristics and dierent interference patterns
from coexisting BANs and other networks.The design of BANs must be prepared for all
practical scenarios.16 Chapter 1 Introduction
1.3.4 Path loss model
Although BANs typically have short communication ranges of 2 - 5 m, the propagation
environment is hostile, since human bodies strongly attenuate RF signals [64,65]. A
characterising feature of conventional WSNs is that path loss is highly variable, since
human activities are typically performed nearby. The path loss model for BANs has the
same mathematical expression as the conventional WSNs, as discussed in Section 1.1.8
In [30], the path loss exponent is reported to range from 3.2 to 4.9 depending on the
transmission frequency bands. In [31], the study of channel models for Non-Line Of
Sight (NLOS) propagation along the surface of a human body found that the path loss
exponent has a value of about 3. In [32], a path loss exponent of 7 was found in NLOS
situations for propagation around human bodies. In [33], the path loss exponent was
reported to vary from 4.22 to 6.26. As these results demonstrate, the human body is
not a benign environment for wireless communication.
A further challenge of BANs is that all the sensor nodes operate in each others' vicinity,
potentially inducing an interference problem. Therefore, maintaining a reliable commu-
nication in BANs despite the unique unstable propagation environment is a particular
challenge, even though the transmission range is short.
1.3.5 Energy consumption
The energy consumption of the BAN sensor nodes is a very crucial issue owing to their
limited energy resources and the long life-time requirement. As mentioned before, the
sensors in BANs must rely on energy harvesting or battery operation without recharging
or replacement. Since the sensors are expected to be as small as possible, they cannot
employ high capacity batteries or energy harvesters and so their energy resources are
extremely limited. Therefore, every function of the sensors is required to be energy
ecient, including the wireless communication mechanism. It is widely agreed that the
low power requirement is one of the most challenging issues in developing BANs [6,37,66].
1.3.6 Network topology
Because of the small network scale and the one-way communication assumption, a star
topology is an obvious solution of BANs [67,68]. Here, the sensor nodes transmit di-
rectly to a single central sink node. The advantages of the star topology are its simple
architecture and highly concentration of system complexity upon the central node, as-
sumed to have access to plentiful energy resources [69]. However, it may be desirable
to employ multi-hop relaying in some BANs applications that employ a suciently high
number of sensor nodes. This is because the propagation environment near or inside
human body is not benign for wireless communication and so multi-hop transmissionChapter 1 Introduction 17
has the benet of increasing the communication reliability and reducing the transmission
power [66]. Hence, many recent eorts have focused on multi-hop network solutions for
BANs [70,71].
1.3.7 Security
Security is another important issue in BANs for medical applications [40]. Safety and
privacy should be considered for all involved parties, including doctors, nurses, patients,
administrative personal and medical service providers. BAN devices also require au-
thentication for security purposes. The interferences imposed by external devices or
intentional attacks made by third parties must be considered. On the other hand, owing
to the limited resources that are available in the sensor nodes and the requirement for
the system to be user friendly, any security measures must be simple. In particular, the
insertion and removal of a node in a BAN must be easy for the user to accomplish.
1.4 Error-correcting code solutions in wireless sensor net-
works
1.4.1 Energy-constrained challenge in wireless sensor networks
As discussed, the key challenge that applies to all WSNs is the energy constrained nature
of the sensor nodes. Therefore, the energy consumption of a WSN's communication
system must be considered throughout the whole design process. However, specications
such as the communication ranges, data rates, network topologies and carrier frequencies,
are highly dependent on the targeted scenario. The optimisation of the communication
system's energy eciency must take these factors into account. Previous research has
considered this subject in dierent layers of the communication system, including the
Media Access Control (MAC) layer, the physical layer and the hardware architectures.
Table 1.6 reviews the literature in this area.
Table 1.6: Previous contributions on energy-ecient wireless com-
munication systems for WSNs.
Author(s) Contribution
J. Heidemann and D. Estrin Dierent scheduling schemes are proposed on MAC layer
to reduce the energy consumption on the sensor nodes
in WSNs by scheduling the communication systems on
the sensor nodes in sleeping mode periodically.
2002 [72]
T. V. Dam and K. Langendoen
2003 [73]
Continued on next page18 Chapter 1 Introduction
Table 1.6 { Continued from previous page
Author(s) Contribution
E. J. Coyle 2003 [74]
Dierent clustering techniques are proposed to cluster
sensors into groups, so that sensors communicate infor-
mation only to clusterheads and then the clusterheads
communicate the aggregated information to the central
node. As a result, the total energy spent in the network
for communications is reduced.
D.-H. Nam and H.-K. Min
2007 [75]
L. Kyounghwa et al. 2010 [76]
M. C. M. Thein and T. Thein
2010 [77]
M. Cardei and M. Tha A method that divides randomly deployed sensors into
dierent sets and to be active alternatively while main-
taining the required coverage is proposed, which can re-
duce the average energy consumption of the entire net-
work and extend the network lifetime.
2005 [78]
A. Wang and A. Chandrakasan It investigates the challenges of energy-ecient Digital
Signal Processor (DSP) that are specic designed for
WSNs.
2002 [79]
X.-H. Li 2003 [80] A novel communication scheme which uses two trans-
mitting sensors and space-time block codes to provide
transmission diversity in distributed WSNs with neither
antenna-arrays nor transmission synchronisation is pro-
posed for the purpose of saving transmission energy.
O. C. Omeni et al. 2007 [81] A novel concept of a wake-up fall-back time is intro-
duced to handle time slot overlaps while Listen-Before-
Transmit (LBT) technique is used in BANs to minimise
the overall energy consumption.
C. C. Enz et al. 2004 [82]
An ultra low-power platform for the implementation
of WSNs that achieves low-power operation through a
careful co-design approach is developed by the Swiss
Centre for Electronics and Microtechnology.
C. Schurgers and M. B. Srivastava
Dierent routing algorithms are proposed to optimise
the transmission routes in ad-hoc WSNs for reducing
the overall transmission energy consumption.
2001 [83]
H. Oh and K. Chae
2007 [84]
M. Zhang et al. 2008 [85]
W. N. W. Muhamad et al.
2009 [86]
Continued on next pageChapter 1 Introduction 19
Table 1.6 { Continued from previous page
Author(s) Contribution
X. Chen et al. 2009 [87]
A novel transmission scheme in which the sensor nodes'
transmissions are ordered according to the magnitude
of their measurements. Sensors having magnitudes,
smaller than a threshold do not transmit. The results
show that the proposed approach is very energy ecient,
with only a negligible measurement error introduced.
N. Sadeghi et al. 2006 [2]
Error-correcting coding schemes are considered for em-
ployment in WSNs to reduce the transmission energy
consumption.
S. L. Howard et al. 2006 [1]
M. C. Vuran and I. F. Akyildiz
2009 [88]
M. E. Pellenz et al. 2009 [89]
A. Brokalakis and I. Papaefstathiou
2012 [90]
J. Abouei et al. 2011 [91] The energy eciency of Luby Transform (LT) codes is
analysed for the scenario of WSNs.
A. Brokalakis et al. 2011 [92] An innovative platform which combines a standard wire-
less node with very low cost recongurable hardware is
designed in order to to evaluate the eciency of this
pioneering approach. Three dierent networking and
security protocols are implemented in the present sys-
tem: a) Turbo coding, b) Blowsh encryption and c)
XMesh routing.
J. Singh and D. Pesch A novel Error-Correcting Code (ECC) based adaptive
error control strategy is proposed for employment with
a cascaded fuzzy inference system, in order to combat
the communication unreliability of indoor environments.
2011 [93]
J. Rahhal 2012 [94] An LDPC code is considered for employment in
Multiple-Input and Multiple-Output (MIMO) WSNs to
enhance the transmission and hence reduce the energy
consumed by the sensor.
A key principle for reducing the energy consumption of a WSN communication system is
to use the energy eciently according to the communication requirements. For example,
overachieving the communication requirements of the target applications, such as data
rate or communication range may result in unnecessary energy waste. In addition, with20 Chapter 1 Introduction
an optimised specication based on the communication requirements, an ecient design,
employing appropriate coding techniques, algorithms and hardware implementations, is
also important for further reducing the system energy consumption.
1.4.2 Error-correcting code for energy ecient wireless communica-
tion
There are some existing standards which have been applied in previous WSNs appli-
cations and research works, such as IEEE 802.15.4 [19, 35, 95] and Medical Implant
Communication Service (MICS) [96]. New standards are also being developing for po-
tential BANs applications, such as IEEE 802.15.6. In contrast to standards for long
range or high speed applications, such as WLAN, mobile phones and satellites commu-
nications, the WSN standards are designed to be simple and ecient. More specically,
they are intended to have a simple implementation requiring a small amount of hardware
resource and having low energy consumptions. As a result, the existing communication
protocols tend to employ simple codes that are easy to implement. For example, in the
IEEE 802.15.4 protocol, a simple 4-bit symbol to 32-bit chip sequence spreading scheme
is used [95].
This is in contrast to more sophisticated ECCs, namely near Shannon limit error correct-
ing codes, such as turbo codes [97] and Low-Density Parity-Check (LDPC) codes [98].
The complex decoders of these sophisticated ECCs require more processing energy con-
sumption Epr by the hardware implementation, but facilitate a lower transmission energy
consumption Etx.
The ECCs that are adopted by existing WSN standards are appropriate for short range
communications. For example, the typical transmission range of IEEE 802.15.4 is 10-75
metres [99]. At these communication ranges, the transmission energy consumption Etx
is low enough that the processing energy consumption Epr introduced by the coding
schemes is not negligible. Considering the overall energy consumption Etx+Epr, simple
ECC schemes are motivated for these standards. In general, from the energy eciency
point of view, sophisticated ECC codes are more suitable to long range communication,
since Etx increases exponentially with the communication range. In this case, Epr
becomes negligible compared to Etx. In these circumstances, a sophisticated ECC can
reduce Etx by a signicant amount, which is greater than its energy consumption E
pr
b .
Depending on the ECC scheme and its implementation, there is a critical distance,
beyond which the ECC scheme reduces the overall energy consumption Etx + Epr and
can be considered to be energy ecient. The critical distance is also high dependent on
the particular transmission environment, since the path loss exponent n can be highly
variable, as discussed in Section 1.1.8 and 1.3.4.Chapter 1 Introduction 21
However, as the Integrated Circuit (IC) process technology is scaled down, the energy
consumption Epr of a particular ECC reduces exponentially. This also reduces the
critical distance of sophisticated ECCs, motivating their employment in WSNs applica-
tions. Moreover, as discussed in Section 1.1.1, some WSN applications have relatively
long communication ranges, which prevent reliable communication using the short range
standards, without employing an excessive transmission energy Etx. As a result, near
Shannon limit ECCs have recently been considered to improve the energy eciency of
WSNs [1,2,89,90]. The near Shannon limit performance of sophisticated ECCs facilitates
greater coding gains than the simple ECCs that are employed in conventional commu-
nication protocols for WSNs. This allows a lower transmission energy to be employed,
without increasing the transmission error rate. In [2], a selection of ECC schemes and
their implementations were considered and found to give critical distances that vary
from several metres to several tens of metres. In addition, near Shannon limit ECCs are
attractive in scenarios where the permitted transmission energy is constrained. For ex-
ample, in BANs, the transmission energy may be limited in order to avoid human body
health issues. Furthermore, it is often challenging to achieve reliable communication in
BANs scenarios, owing to their high path losses. Therefore, a near Shannon limit ECC
is desirable, since it allows limited transmission energy communications over malignant
channels with very low error rates.
In this thesis, the design and implementation turbo-like codes for WSNs is studied.
Turbo-like codes are a family of near Shannon limit ECCs that has been widely employed
in wireless communication systems. However, it has been hardly considered for use in
energy-constrained systems such as WSNs before. For this reason, the algorithm and
implementational design of turbo-like codes have been previously particularly focused
on achieving a high coding gain and throughput, rather than a high energy eciency.
In this thesis, the potential of applying turbo-like code for the purpose of improving the
communication system's energy eciency is explored. In particular, the holistic design
of turbo-like coding algorithms and implementations is considered for improving the
energy eciency of the system.
1.5 Objectives and organisation of the thesis
As discussed in Section 1.4.2, the near Shannon limit performance of turbo-like codes can
facilitate low transmission energy consumption Etx in wireless communication schemes.
Energy-constrained applications such as WSNs may benet from this reduction of trans-
mission energy consumption Etx in order to extend the lifetime of the system. However,
in realistic scenarios, the additional processing energy consumption Epr introduced by
employing the turbo-like code in the system must be considered as an oset against the22 Chapter 1 Introduction
energy saving. The main objective of this thesis is to explore methods for holistically de-
signing a turbo-like coding algorithm and implementation, in order to reduce the overall
energy consumption of the wireless communication system.
The outline of the thesis is shown in Figure 1.5. The chapters are organised as follows.
Introduction
Chapter 2
Background knowledge
Chapter 4
Fixed−point specification of
LUT−Log−BCJR decoders
Chapter 6
Turbo decoder energy estimation framework
and holistic turbo code design method
Chapter 7
Conclusion and Future work
Chapter 3
An energy−efficient SCCC encoder
implemenation for star WSNs
Chapter 5
Energy−efficient LUT−Log−BCJR decoder
architecture for multi−hop WSNs
Chapter 1
Figure 1.5: Thesis outline.
 Chapter 2 is a background chapter, providing the key knowledge related to this
thesis. It commences with a full introduction to ECCs and turbo codes in partic-
ular. The EXtrinsic Information Transfer (EXIT) chart [100] analysis method for
turbo-like codes is presented. The xed-point implementational issues of hardware
design for ECC algorithms are also discussed.
 In Chapter 3, the simple but widely-applied star network scenario of WSNs is
investigated as a special case of WSNs. WSN applications having a star network
topology have a unique character, namely that on one-way communication is re-
quired, from the sensor nodes to the central node. As a result, the employment
of turbo-like codes in star WSNs requires only the encoding system to be imple-
mented in the sensor nodes, the decoding system is only required in the central
node. The encoders of turbo-like codes, such as the Serial Concatenated Convolu-
tional Code (SCCC) [101] and turbo code typically have very low complexities. In
this section, the benets associated with saving energy consumption by employing
a SCCC in star sensor networks is explored. In particular, an augmentation of the
IEEE 802.15.4 [102] channel coding scheme is implemented for the investigation.Chapter 1 Introduction 23
The results presented in Chapter 3 show that the proposed SCCC encoder has only
an insignicant energy consumption compared to the transmission energy reduc-
tion that it aords. On the other hand, the decoder that is deployed on the central
node has a high complexity and hence high energy consumption. Therefore, the
result of employing the SCCC channel coding system in a star WSN is that the
energy consumption of the system is redistributed from the sensor nodes to the
central node. This energy redistribution is ideal for the purpose of extending the
lifetime of the WSNs. This is because in WSNs, the sensor nodes typically have
limited energy resources due to their small size. However, the central node can
typically aord to have a larger size and is often integrated into a higher-level
system having abundant energy resources.
 Following the star network investigation of Chapter 3, more complicated topolo-
gies are considered in Chapter 4, 5 and 6 of this thesis. In many applications,
complicated network topologies involving multi-hop communications among the
sensor nodes are required for WSNs. For example, when the WSN is required
to cover a large area or include a large number of sensor nodes, as discussed in
Section 1.1.1. Alternatively, multi-hop is required when the transmission power
is limited by other issues, such as human health protection, as discussed in Sec-
tion 1.4.2. In multi-hop networks, it becomes necessary to deploy high-complexity
turbo-like decoders in not only the central node, but also the sensor nodes. As a
result, the energy consumed by the decoder Epr may signicantly oset the energy
saving gained by the reduced transmission energy consumption Etx. According to
previous studies on this topic [1,2,89,90], the energy eciency of the decoder is
a key issue for applying turbo-like codes in WSNs in order to reduce the overall
energy consumption of the system. Chapter 4 proposes a method for investigating
the eect on the decoding performance when using dierent word length settings
for a xed-point turbo decoder. Since hardware complexity is directly related
to the word length setting, determining the minimum word length requirement
of a turbo decoder without signicantly aecting its decoding performance is an
important issue for achieving an energy-ecient hardware implementation.
 In the conventional design procedure of turbo codes, the processing energy con-
sumed by the hardware is not typically considered in detail during the design. This
is because turbo-like codes are conventionally applied in long range and high speed
communication systems, such as the 3GPP Long Term Evolution (LTE) [103] and
Digital Video Broadcasting (DVB) [104]. In these applications, the processing en-
ergy consumption is insignicant compared with the transmission energy consump-
tion. As a result, the decoding throughput becomes the highest priority in conven-
tional turbo decoder design. The hardware complexity and energy eciency are
traded o for high decoding throughput. In Chapter 5, a novel Application-Specic
Integrated Circuit (ASIC) architecture is proposed for a widely used type of turbo24 Chapter 1 Introduction
decoder, namely the Look-Up Table based Logarithmic Bahl-Cocke-Jelinek-Raviv
(LUT-Log-BCJR) [105] decoder, which has a low energy consumption and com-
plexity when used in WSN applications.
 Conventionally, the turbo-like decoder's processing energy consumption can only
be estimated at the hardware implementation stage of the design process, when
it is already too late to make any changes to the code design. In Chapter 6, a
holistic turbo code design method is proposed to consider both the transmission
energy consumption Etx and the processing energy consumption Epr, aiming for
an overall energy optimised design for WSN applications. In order to realise the
holistic design method, a framework for estimating the energy consumption of the
LUT-Log-BCJR decoder architecture at an early code design stage is proposed.
By using the framework and a relevant path loss model for the transmission energy
estimation, a holistic design method for designing a turbo code having an opti-
mised overall energy consumption is proposed. This allows the hardware energy
consumption to be considered during the design of the turbo coding algorithm.
This overall optimisation of the energy consumption allows the communication
system to take full advantage of the turbo code's near Shannon limit performance.
A case study is presented, in which the proposed method is used to investigate a
previous turbo code design work [106]. The advantages of using the holistic design
method from the energy eciency point of view, compared with the conventional
design method, are demonstrated.
 Chapter 7 concludes the thesis and discusses opportunities for future work.
1.6 Novel Contributions
The novel contributions of this thesis are listed below.
 In Chapter 3, a SCCC channel coding scheme based on an augmentation of the
IEEE 802.15.4 PHY is proposed by Dr. Rob Maunder [69] for WSN applications.
The augmentation achieves a desirable redistribution of the energy consumption
in WSNs from the sensor nodes to the central node. An ASIC design of the aug-
mentation is proposed and implemented in this chapter. At the cost of increasing
the decoding complexity of the central nodes, this approach signicantly reduces
the communication energy consumption of the sensor nodes.
 Chapter 4 proposes a novel EXIT chart analysis technique for investigating the
trade-o between complexity and performance that is associated with the word
length setting of xed-point turbo code implementations. Conventionally, BER
simulations are used for this task. However, the BER result can only provide the
performance of a particular xed-point turbo code implementation. By contrast,Chapter 1 Introduction 25
EXIT chart analysis provides the convergence behaviour information of the de-
coder, which oers insights into the reasons causing the performance degradation
due to the limited word length in xed-point implementations. These analysis
results may help the designer to nd desirable parameterisations or adjust the de-
sign to have a lower word length requirement for certain applications. In addition,
rather than providing results only for one paricular number of decoding iterations
as a BER simulation does, an EXIT simulation allows an arbitrary number of iter-
ations to be considered. In this way, the proposed approach provides an exhaustive
investigation of the candidate turbo code with a signicantly lower simulation time
than the BER simulation based method.
 In Chapter 5, an energy-ecient LUT-Log-BCJR decoder architecture for ASIC
implementation is proposed for WSN applications. Unlike the conventional LUT-
Log-BCJR architecture, which uses dedicated modules to perform dierent tasks
and achieve a high throughput, the proposed LUT-Log-BCJR architecture per-
forms a number of Add-Compare-Select (ACS) operations in parallel during each
clock cycle, using ACS units that are designed to have a low gate count and a short
critical path. As a result, the proposed architecture uses the available hardware
resources more eciently. The low complexity and balanced data paths in the
architecture reduces the wastage of energy that is caused by spurious transitions
and clock tree distribution. The short critical path of the proposed architecture
allows it to operate at a high clock frequency without requiring complicated com-
binational logic to control the data path lengths. It therefore reduces the energy
wastage associated with static power consumption.
 In Chapter 6, the proposed LUT-Log-BCJR architecture is employed to derive a
bottom-up energy estimation framework for estimating the turbo decoder energy
consumption at an early design stage. The framework is based on generalised
analysis and individual post-layout simulations of the sub-modules of the proposed
architecture. The EXIT chart analysis method of Chapter 4 for investigating the
xed-point specication of the hardware implementation can be used to provide
xed-point specication information for the framework. As a result, the framework
allows a turbo code designer to predict the energy consumption of their turbo
decoder at early design stage, facilitating an overall energy optimised design.
 In Chapter 6, the proposed framework is used as the basis of a holistic design
procedure for designing an overall energy optimised turbo code design for WSN
applications. By considering both the transmission energy consumption Etx and
the processing energy consumption Epr, the holistic design procedure allows a
turbo code design to be overall energy optimised for a particular scenario, having
a paricular path loss exponent and transmission range.Chapter 2
Turbo codes, extrinsic
information transfer charts and
the xed-point representation
This chapter introduces all the background information related to this thesis, including
the basic principle of turbo-like codes, the encoding and decoding schemes of turbo codes,
EXtrinsic Information Transfer (EXIT) chart analysis for turbo-like codes and xed-
point number representation in Application-Specic Integrated Circuit (ASIC) designs.
2.1 Turbo-like codes
The turbo principle is a concept of Error-Correcting Codes (ECCs) that include iterative
decoding processes, also referred to as turbo decoding processes, such as serial or parallel
concatenated codes [97,101]. A unique feature of turbo-like codes is that they include
two or more concatenated component codes. These types of codes were rst proposed
in [107]. The concatenation between the component codes in the encoding process could
be parallel or serial, as shown in Figure 2.1. In a Parallel Concatenated Code (PCC),
the inputs of the two encoders come from the same source; in a Serial Concatenated
Code (SCC), one encoder's output provides the other encoder's input. An interleaver,
which rearranges the data in a non-contiguous and random way, is used between the
component encoders. The success of turbo-like codes is that they introduce an iterative
decoding process to approach the optimal decoding performance. The two decoding
schemes corresponding to the two encoding schemes are given in Figure 2.2. An iterative
decoding process is performed between the two concatenated decoders by feeding the
decoded results back to each other's input. In these schemes, the decoded result improves
in each iteration, until the best result is achieved after a certain number of iterations.
2728 Chapter 2 Turbo codes, EXIT charts and the xed-point representation
Output
Encoder1
Encoder2
π Encoder1 π Encoder2 Input Output
Parallel Concatenated Code Serial Concatenated Code
Input
M
U
X
Figure 2.1: Two concatenation way of turbo-like codes.
Output
π π−1
d
e
M
U
X
Decoder2
Decoder1
Decoder1
π
π−1
Decoder2 Input
Parallel Decoding Scheme
Output Input
Serial Decoding Scheme
Figure 2.2: Two decoding schemes of two types of turbo-like codes.
The rst version of concatenation codes was SCC. A famous example consists of a Reed-
Solomon code [108] as the outer code (applied rst and removed last) and followed
by a convolutional code [109] as the inner code (applied last and removed rst) [110].
In the early concatenated coding schemes, despite including two or more component
codes, there is no iterative decoding process in the decoder. The decoder generates
hard decisions (i.e. the determined bit results) directly. In a communication receiver,
the demodulator can produce soft decisions in the demodulation process. Instead of
giving the decoded bit results, soft decisions are reliability information expressed by
the a posteriori probability of each bit. Soft decisions express not just what the most
likely value of a bit is, but also how likely it is, while hard decisions only express the
former. In the logarithmic domain, the soft decisions become Logarithmic Likelihood
Ratios (LLRs) dened as:
~ zi = ln

P(zi = 0)
P(zi = 1)

; (2.1)
where ~ zi is the soft decision of the received bit zi. Before the turbo principle was dis-
covered, a typical decoder would utilise the soft decisions in the decoding process and
generate hard decisions at its output. This type of decoder is called a Soft-In Hard-Out
(SIHO) decoder. Therefore, a straight forward way of decoding SCCs involves the use of
a SIHO decoder for the inner decoder and a Hard-In Hard-Out (HIHO) decoder for the
outer decoder. If a convolutional encoder is concerned, a Viterbi decoder [111] is used at
the corresponding place to give hard decisions. As discussed in [112], the rst drawback
of such a structure is that the inner decoder generates hard decisions, thus preventing
the outer decoder from utilizing its ability to accept soft decisions at its input. The
second drawback is that if the inner decoder makes a continual error sequence, the outer
decoder is typically unable to correct the errors. The second drawback can be conquered
by inserting an interleaver between the inner and the outer encoder and correspondinglyChapter 2 Turbo codes, EXIT charts and the xed-point representation 29
a deinterleaver between the inner and the outer decoder. The function of an interleaver
is to rearrange the order of a sequence in a pseudo-random way. The function of a dein-
terleaver, with knowledge of the rearranging method of the corresponding interleaver,
is to restore the order of an interleaved sequence. Thus, a continual error sequence in
the inner decoder's output becomes dispersed in the input to the outer decoder. The
transmission scheme is shown in Figure 2.3. However, if errors occur at the output of the
Decoded bits
Input bits Outer encoder Interleaver Inner encoder
Channel
Inner decoder Deinterleaver Outer decoder
Figure 2.3: Transmission scheme of SCCs.
outer decoder, these would remain in the nal decoded results. A turbo-like code can
be considered to be a renement of the concatenated encoding schemes employing an
improved decoding process including iterative algorithms. The concept of turbo decod-
ing is for a system with two component codes to pass soft decisions from the output of
one decoder to the input of the other decoder, and to iterate this process many times in
order to produce more reliable decisions. To obtain benets from an iterative decoding
process, it is required that the two decoders feed soft decisions to each other. This is be-
cause using hard decisions as an input to a decoder degrades its performance compared
with soft decisions [113]. Therefore, turbo decoding requires Soft-In Soft-Out (SISO)
decoders for the decoding of each component code. The introduction of turbo codes
in [97] is also the rst introduction of PCCs. It was reported that the scheme could
achieve a Bit Error Rate (BER) of 10 5 using a rate 1/2 code over an Additive White
Gaussian Noise (AWGN) channel and Binary Phase-Shift Keying (BPSK) modulation at
an Eb=N0 of 0.7 dB [97,114]. According to the discussion in [97,114], the near Shannon
limit for BPSK modulation matches the throughput of this scheme at Eb=N0 = 0 dB.
Hence the performance is 0.7 dB from Shannon limit. Most importantly, owing to its
iterative decoding scheme, the complexity of a turbo decoder is much less than that of a
non-iterative decoder having the same performance. According to [115], the complexity
required to allow the non-iterative codes to approach the Shannon limit would be not
feasible to implement. The discovery of turbo codes revolutionised the eld of error
correcting codes since it allowed performance very close to the near Shannon limit in
practice.
To evaluate the performance of a turbo or turbo-like code, a BER chart is a commonly
used tool. A typical BER chart of turbo codes is shown in Figure 2.4. Here, the Y
axis is the BER of the decoding result after a certain number of decoding iterations
and the X axis is Eb=N0, where Eb is the transmission energy per bit and N0 is the
noise power spectral density (i.e. noise power in a 1 Hz bandwidth). As shown in
Figure 2.4, a typical turbo code can achieve very low BER when the Eb=N0 exceeds
a certain threshold. The region where the BER curve rapidly falls is called the turbo30 Chapter 2 Turbo codes, EXIT charts and the xed-point representation
0 0.5 1 1.5 2 2.5
B
E
R
Turbo cliff
Error floor
10−7
10−6
10−5
10−4
10−3
10−2
10−1
10−0
Eb/N0
Threshold Eb/N0
Figure 2.4: A typical BER chart for turbo codes.
cli region and the region where the BER curve is at at a very low BER is called the
error oor region. To understand how the turbo codes outperform the earlier coding
schemes, Figure 2.5 is quoted from [115]. This shows simulation results of the original
Figure 2.5: Performance comparison of a turbo code and a convolutional code. c C.
Schlegel et al. 2004 [115]
rate R=1/2 turbo code presented in [97] and a maximum free distance (MFD) R=1/2,
memory  = 14 convolutional code with Viterbi Algorithm (VA). The simulation results
show that the turbo code outperforms the convolutional code by 1.7 dB at a BER of
10 5. The comparison is distinct, especially since a detailed complexity analysis reveals
that the complexity of the turbo decoder is much smaller than the Viterbi decoder used
for the convolutional code.Chapter 2 Turbo codes, EXIT charts and the xed-point representation 31
Based on the turbo principle, a classical turbo encoder is composed of two Recursive Sys-
tematic Convolutional (RSC) encoders, as shown in Figure 2.6. In addition to Figure 2.1,
b1 RSC1
RSC2
Output Input
b3
b2
b4 π
M
U
X
Figure 2.6: A classical turbo encoder scheme.
a systematic output is usually included in practice. The input information sequence is
encoded twice by the two RSC encoders. The rst encoder processes the information
b1 in its original order, while the second encoder processes the same sequence, but in a
dierent order b3 that is imposed by an interleaver. In this scheme, the systematic bit
sequence is also transmitted to the decoder. As shown in the gure, sequences b2 and
b4 are the outputs of each encoder. Sequence b1 is the systematic bit sequence and b3
is the interleaved systematic bit sequence. Note that b3 is not transmitted since it can
be obtained by an identical interleaver in the decoder.
In the decoding process, as shown in Figure 2.7, two A Posteriori Probability (APP)
decoders are used correspondingly for the two convolutional encoders in the encoding
scheme. In the gure, ~ b1, ~ b2 and ~ b4 are the LLR sequences corresponding to the bit
d
e
M
U
X
Input
Output
˜ b2
˜ b4
˜ b1
Decoder 1
Decoder 2
π−1 π π
Figure 2.7: A classical turbo decoder scheme.
sequences b1, b2 and b4 in Figure 2.6. The purpose of an APP decoder is to compute
a posteriori probabilities on either the information bits or the encoded symbols. Its ap-
plication in turbo-like codes was made it become the major representative of the SISO
decoders. The algorithm was originally invented by Bahl, Cocke, Jelinek and Raviv in
1972, as the so called Bahl-Cocke-Jelinek-Raviv (BCJR) algorithm [105]. Its capability
of generating soft decisions is well suited for iterative decoding. In Figure 2.7, the two
decoders are working alternatively in an iterative way. To get the correct order of the
input sequences, an identical interleaver with the one used in the encoding scheme and a
corresponding deinterleaver is used between the decoders. Furthermore, an extra inter-
leaver is used for providing the systematic sequence for both of the decoders. The main
advantage of this decoding process compared with using the Viterbi decoders is that32 Chapter 2 Turbo codes, EXIT charts and the xed-point representation
it utilises the ability of the decoders to accept soft decisions at their input. However,
in iterative decoding schemes, the information provided for one decoder from the other
one, is extrinsic information instead of a posteriori information. The extrinsic informa-
tion represents only the new information obtained by a decoder. The reason for using
extrinsic information is to prevent the decoding scheme from being a positive feedback
amplier [112]. As shown in Figure 2.7, the a priori information from the systematic se-
quence is added to the input of the decoders. Since the a posteriori information already
includes the a priori information from the previous decoding process from the other
decoder, this would create a positive feedback amplier in the loop. By using extrinsic
information instead of a posteriori information, this problem can be solved. Therefore,
the output of the decoders in Figure 2.7 is extrinsic information. It can be obtained
using a simple subtraction between the a posteriori and the a priori LLRs. Alternately,
it can also be generated directly by a modied BCJR algorithm. By receiving the new
extrinsic information from the other decoder, the reliability of the decoding increases in
each iteration. The whole decoding process stops when the required reliability is reached
or until no further reliability can be gleaned.
A further improved version of the BCJR algorithm is called the Logarithmic Bahl-
Cocke-Jelinek-Raviv (Log-BCJR) algorithm [115]. This is a transferred version of BCJR
algorithm into the logarithmic domain. Its purpose is to avoid the high number of mul-
tiplication operations that are required in the original BCJR algorithm and more impor-
tantly, the Log-BCJR algorithm has variables with a much more manageable dynamic
range than those of the BCJR algorithm, reducing the memory requirement and allowing
xed-point processing to be used. Since it avoids the complex circuit implementation
due to many multipliers required by the original BCJR algorithm and requires much less
memory, the Log-BCJR algorithm is widely used in practice. Hence, in this report, since
only the practical applications of the BCJR algorithm in iterative decoding schemes is
investigated, the discussed and simulated algorithm is the modied Log-BCJR algorithm
that generate the extrinsic information directly. The detail of the algorithm is discussed
in the next section.
The turbo principle can also be applied to SCCs, which is another primary category of
turbo-like codes, referred to as Serial Concatenated Convolutional Codes (SCCCs) [101].
Instead of using the decoding scheme in Figure 2.3, a scheme similar to the turbo decoder
is used, as shown in Figure 2.2. The two SIHO decoders are replaced with SISO decoders
and an interleaver and a deinterleaver are required to form the iterative decoding loop.
According to [112], serial turbo-like codes perform better than parallel turbo codes in
the error oor region. On the other hand, in the turbo cli region, parallel turbo codes
perform better with the same overall coding rate. Moreover, Low-Density Parity-Check
(LDPC) codes [98] is another recently widely applied near Shannon limit ECC that
applies the turbo principle using an iterative decoding scheme.Chapter 2 Turbo codes, EXIT charts and the xed-point representation 33
2.2 Turbo encoder and decoder
Turbo-like codes generally have a simple encoding scheme and a relatively complicated
decoding scheme owning to the high computation complexity of the Log-BCJR algo-
rithm. In this section, the turbo code in 3rd Generation Partnership Project (3GPP)
Universal Mobile Telecommunications System (UMTS)/Long Term Evolution (LTE)
Standards [103,116] is used as an example to introduce the typical turbo coding schemes
and the widely used decoding algorithm for turbo codes, the Log-BCJR algorithm. The
UMTS/LTE turbo code is also used as an application example throughout this thesis,
so the description in this section is frequently referred to in later chapters.
2.2.1 Universal Mobile Telecommunications System (UMTS)/Long Term
Evolution (LTE) turbo encoder and decoder schematics
To simplify the description, assuming BPSK modulation is used in this case, each trans-
mitted symbol conveys one bit. For other modulation methods, the transmitted bits here
would be replaced by transmitted symbols. According to [103,116], the concatenated
RSC encoders of UMTS/LTE turbo code is a rate R=1/2, K=4 constraint length and
m=3 memory convolutional code. Two identical RSC encoder of this type form the rate
1/3, 8-state Parallel Concatenated Convolutional Code (PCCC) illustrated in Figure 2.8.
As shown in the gure, each rate R=1/2 encoder gives an encoded output sequence. In
M
Interleaver
M
U
X
Output
b3
b1
b2
b5
b4
b6
b1 Input M M M
M M
Figure 2.8: Scheme of the UMTS/LTE turbo encoder.
addition, the systematic sequence is also given as an output. Since the systematic out-
put of both encoders are identical, the whole encoder has a rate approaching R=1/3. In
the RSC encoder, the three memory bits forms an 8-state Finite-State Machine (FSM).34 Chapter 2 Turbo codes, EXIT charts and the xed-point representation
Notation N is dened to represent the length of a bit sequence. The length of uncoded
sequence b1 is Nb1. Before the encoding of the bit sequence b1 commences, the shift
registers of each concatenated convolutional code are initialised in a state that is known
to the receiver. Typically, the m=3 memory elements of each shift register are initialised
with logic-zeros, placing them in what is referred to as the `all zeros' state. However,
following the encoding of the Nb1 bits in the sequence b1, the shift registers will enter
states that are not inherently known to the receiver. A number of techniques have been
proposed to cope with this [112]:
 No termination: In this case, in the decoding process, the end of a block sequence is
considered to have a equivalent possibility of each possible state. No information of
the nal state is provided to the decoder. The decoding process is then less eective
for the last bits of the encoded data and the performance may be reduced. This
degradation is a function of the block length. However, for some applications the
degradation may be acceptable.
 Termination: This method involves the transmission of several extra bits (three
bits in the UMTS/LTE example) at the end of each block sequence to force the
encoder return to the `all zero' state. The UMTS/LTE turbo code of Figure 2.8
is an example of this technique. The extra tail bits on b2 and b4, and the extra
sequences b5 and b6, also need to be sent to the decoder. This method conquers
the uncertain nal state issue but induces another two drawbacks. Firstly, extra
redundancy information is added to the transmission. Nevertheless, the redun-
dancy is negligible except for very short blocks and it is useful for error correction.
Secondly, for parallel codes, the tail bits b5 and b6 are not identical for each
constituent code, which means in the iterative decoding process, the extrinsic in-
formation of the tail bits cannot be exchanged between the decoders. Hence, the
data at the end of the block sequence will get less benet from the turbo decoding
process. The SCCC also has a similar problem.
 Tail-biting: [117] introduced a technique which allows any state of the encoder
as the initial state. This method involves a double encoding process: Firstly, a
normal encoding of the sequence starting from `all zero' state is performed, but
the output of the encoder is ignored. Only the nal state of the encoder is stored.
Secondly, the encoding process is performed again in order to actually generate
the output. In this step, the initial state is a function of the nal state previously
stored. The result of this process is the nal state of the encoder is equal to its
initial state. The advantage of this method is no extra bits have to be added and
transmitted. However, the double encoding process is the main drawback of this
method. In addition, it only works for the convolutional codes where the BCJR
algorithm is especially adapted.Chapter 2 Turbo codes, EXIT charts and the xed-point representation 35
In the UMTS/LTE turbo code, the termination technique is used as shown in Figure 2.8.
The initial states of the shift registers are all set to zeros before starting to encode the
bit sequence b1. Note that once the encoding of b1 is nished, the two switches in
the gure switch down to form a closed loop in the two encoders. Following this is the
termination process. The termination is performed by taking the tail bits from the shift
register feedback after all information bits are encoded. It takes m bits to force the nal
state back to the `all zero' state for each encoder. The output of the turbo encoder is b1,
b2, b5, b4 and b6, where b1 is the systematic bit sequence, b2 and b4 are the encoded
bit sequences from the two encoders, respectively, and b5 and b6 are the termination
sequence of the two encoders, respectively. As a result, in the case where b1 has Nb1
bits, b2 and b4 will have Nb2 = Nb4 = Nb1 + m bits each, while b5 and b6 will have
Nb5 = Nb6 = m bits each. The possible block length of the turbo code (i.e. the length
of bit sequence b1) for the UMTS and the LTE standards are Nb1 2 [40;5114] and
Nb1 2 [40;6144], respectively. For the interleaved sequence b2, the length is Nb2 = Nb1.
The termination bits b5 and b6 have a length equal to the number of memory bits in
the RSC encoders, which is m = 3. Consequently, for the encoded sequence b2 and b4,
there is Nb3 = Nb4 = Nb1 + 3. Note that the additional termination bits (b5 and b6)
make the coding rate R of the encoder is
Nb1
3Nb1+12, which is slightly lower than 1/3.
To understand the operation of the FSM in the component encoders, the transitions
between the stages can be shown as the transition diagrams in Figure 2.9. b1 = fb1;jg
Nb1
j=1,
1 1 1
M1 M2 M3
M
+
1 M
+
2 M
+
3
b2
b5
M1M2M3
0 0 0 State1
0 0 1
0 1 0
0 1 1
1 0 0
1 0 1
1 1 1
0/0
b5,j/b2,j
State2
State3
State4
State5
State6
State7
State8
0/0
1/1
1/0
0/1
0/1
1/0
1/1
Transition trellis for encoding bits Transition trellis for termination bits
M
+
1 M
+
2 M
+
3
0 0 0
0 0 1
0 1 0
0 1 1
1 0 0
1 0 1
1 1 0
1 1 1
1 1 0
b1
M1M2M3
0 0 0 State1
0 0 1
0 1 0
0 1 1
1 0 0
1 0 1
1 1 0
1 1 1
0/0
b1,j/b2,j
State2
State3
State4
State5
State6
State7
State8
0/0
1/1
1/0
0/1
0/1
1/0
1/1
1/1
0/0
0/1
1/0
1/0
0/1
0/0
1/1
M
+
1 M
+
2 M
+
3
0 0 0
0 0 1
0 1 0
0 1 1
1 0 0
1 0 1
1 1 0
Figure 2.9: Scheme of the convolutional encoder and the state transition diagrams.36 Chapter 2 Turbo codes, EXIT charts and the xed-point representation
b2 = fb2;jg
Nb1+3
j=1 and b5 = fb5;jg
Nb1+3
j=Nb1+1 are the input sequence and the output sequence.
M1, M2 and M3 are the current values of the three memory bits in the encoder. M+
1 ,
M+
2 and M+
3 are the next values of the three memory bits. The transition of the values
and the decoding results can be expressed as the following equations.
 For encoding bits, where j 2 [1;Nb1].
M+
1 = b1;j  M2  M3 (2.2)
M+
2 = M1 (2.3)
M+
3 = M2 (2.4)
b2;j = M+
1  M1  M3 (2.5)
 For termination bits, where j 2 [Nb1 + 1;Nb1 + 3].
M+
1 = 0 (2.6)
M+
2 = M1 (2.7)
M+
3 = M2 (2.8)
b5;j = M2  M3 (2.9)
b2;Nb1+j = 0  M1  M3 (2.10)
The eight possible states are corresponding to the State1 to State8 as shown in the
gure. The trellis diagrams gives all the possible transitions of the FSM. The left trellis
diagram shows the transitions for the encoding bits in a sequence. The right diagram
shows the transitions for the termination bits in a sequence. Note that the rst state in a
transition sequence is `all zero', which is the State1 in the gure. With the termination
technique, the last state in the sequence is forced back to State1. This causes the
possible transitions at the rst three steps and the last three steps to be limited. A
transition trellis diagram of a transition sequence is shown in Figure 2.10. The input
sequence is fb1;jg and the output sequence fb2;jg and fb5;jg can be obtained by tracking
the state transition in Figure 2.10. For instance, for a 5-bit input sequence example
b1 = [0;1;1;0;1], the transitions in the trellis is shown in Figure 2.11. Note that there
are 8 steps in the trellis since m = 3 termination bits are included. The encoded bit
sequence would be b2 = [0;1;0;0;1;0;1;1] and the actually transmitted systematic bit
sequence is [b1;b5] = [0;1;1;0;1;1;0;1]. The trellis diagram is not only helpful to
understand the encoding operations of a convolutional code, but also useful to explain
the Log-BCJR decoding algorithm, as shall be discussed later.
The architecture of the decoder is as shown in Figure 2.12. A feedback loop is formed
between Decoder 1 and Decoder 2 to realise the iterative decoding process. Each it-
eration consists of two half iterations, one for each constituent RSC code. The two
decoders operate alternately since the input of one decoder includes the output of theChapter 2 Turbo codes, EXIT charts and the xed-point representation 37
b5,Nb1+3
0 0 0
0 0 1
0 1 0
0 1 1
1 0 0
1 0 1
1 1 0
1 1 1
State2
State3
State4
State5
State6
State7
State8
b2,4 b2,5 b2,Nb1+2 b2,Nb1+3 b2,Nb1+1 b2,2 b2,3 b2,1
b1,5 b1,4 b1,3 b1,2 b1,1
State1
b2,Nb1
M1M2M3
b1,Nb1 b5,Nb1+2 b5,Nb1+1
Figure 2.10: The trellis diagram of the convolutional code of Figure 2.9.
0 0 0
0 0 1
0 1 0
0 1 1
1 0 0
1 0 1
1 1 0
1 1 1
[b1,b5]/b2
State1
State2
State3
State5
State6
State7
State8
State4
M1M2M3
0/0 1/1 1/0 0/0 1/1 0/0 0/1 1/1
Figure 2.11: A example transition sequence in the trellis diagram of Figure 2.10.
other decoder from the previous half iteration. The operation of the RSC decoder (i.e.
the Log-BCJR algorithm) is described in Section 2.2.2. In the gure, the input of the
decoding scheme is assumed to be a sequence of LLRs. The ve input ~ ba
1, ~ ba
2, ~ ba
4, ~ ba
5
and ~ ba
6 are the input LLR sequences corresponding to the coded output b1, b2, b4,
b5 and b6 in the encoding scheme. Each decoder receives two LLR sequences. One is
the LLR sequence corresponding to the encoded sequence, which is received from the
transmission channel directly (~ ba
2 for Decoder 1 and ~ ba
4 for Decoder 2). The other is
the uncoded a priori LLR sequence from the other decoder (~ ca
1 for Decoder 1 and ~ ca
2 for
Decoder 2), which is simply formed by adding the extrinsic information provided by the
other decoder (~ ba0
1 and ~ ba0
2 ) to the received systematic information (~ ba
1). The extrinsic
information (~ be
1 and ~ be
2) generated by the decoders need to be rearranged by the proper
interleaver () or deinterleaver ( 1). For Decoder 1, the input LLR from the other
decoder ~ ca
1 is the sum of the ~ ba0
1 and ~ ba
1 following with ~ ba
5 as shown in the gure. Be-
cause the two encoders have independent tails, the soft decisions of the tail bits are not38 Chapter 2 Turbo codes, EXIT charts and the xed-point representation
˜ b
p
1
π π−1 π
˜ ba
2 ˜ be
2
Input
d
e
M
U
X
˜ ba′
1 ˜ be
1
Decoder 1
Decoder 2
˜ ba
1
˜ ba
5
˜ ba
2
˜ ba
6
˜ ba
4
˜ ca
1
˜ ce
1
˜ ca
2
˜ ce
2
˜ ba′
2
Figure 2.12: Scheme of the UMTS/LTE turbo decoder.
passed between the decoders. Thus the information of the termination bits need to be
considered carefully. The systematic information of the Decoder 1's termination bits ~ ba
5
need to be appended at the end for a complete ~ ca
1. Therefore, ~ ca
1 = [~ ba
1+~ ba0
1 ; ~ ba
5]. On the
other hand, for the extrinsic information generated by Decoder 1, ~ ce
1, the information of
the termination bits need to be removed before it is interleaved and passed to Decoder
2, as shown in the gure. Therefore, the length of ~ ca
1 and ~ ce
1 are Nc1 = Nb1 + 3. For
Decoder 2, respectively, the uncoded a priori information is the sum of ~ ba0
2 (i.e. inter-
leaved ~ be
1) and interleaved systematic information ~ ba
2. Therefore, ~ ca
2 = [~ ba
2 + ~ ba0
2 ; ~ ba
6]
and the same processing of the termination bits is applied on ~ ca
2 and ~ ce
2. The length of
~ ca
2 and ~ ce
2 are Nc2 = Nb2 + 3 = Nb1 + 3. In the rst iteration, ~ be
2 is initialised with a
sequence of zero valued LLRs which implies that the values of the corresponding bits are
completely unknown. ~ ba
1 is the received systematic information. Note that two identical
copies of the interleaver employed in the encoding scheme and a corresponding deinter-
leaver are used between the decoders, in order to give the correct ordering of the input
sequences. As discussed before, in the Log-BCJR algorithm, the extrinsic information
is directly generated by the decoding algorithm, which is done inside the decoder. Af-
ter all the iterations are completed, the a posteriori output of the decoding scheme is
obtained by adding the nal extrinsic output to the nal a priori input of the Decoder
1, ~ b
p
1 = ~ ba
1 + ~ be
1 + ~ ba0
1 , as shown in the gure. And the SISO decoding process is then
completed. Based on the soft decisions, hard bit decisions can be taken to give the nal
decoding result.Chapter 2 Turbo codes, EXIT charts and the xed-point representation 39
2.2.2 Look-Up Table based Logarithmic Bahl-Cocke-Jelinek-Raviv al-
gorithm
As discussed in Section 2.1, the Log-BCJR algorithm [118] is typically used in prac-
tice, because compared with the original BCJR algorithm [105], it does not require
multiplications and its internal variables have a much lower dynamic range, allowing
a xed-point number representation to be employed. For reducing the computation
complexity, there are two popular variations of the Log-BCJR algorithm, namely the
Look-Up Table based Logarithmic Bahl-Cocke-Jelinek-Raviv (LUT-Log-BCJR) and the
Maximum Logarithmic Bahl-Cocke-Jelinek-Raviv (Max-Log-BCJR) algorithms. In this
section, the Log-BCJR algorithm is described in detail. The two variations are also
introduced.
The two fundamental operations of the original BCJR algorithm are addition and mul-
tiplication. For p = ln(P) and q = ln(Q), multiplication in the normal domain becomes
addition in the logarithmic domain.
ln(PQ) = ln(epeq) = p + q (2.11)
Addition in the normal domain can be solved by the Jacobian logarithm in the logarith-
mic domain, which is dened using max*, as given in Equation 2.12.
ln(P + Q) = ln(ep + eq) = max(p;q) + ln(1 + e jp qj) = max*(p;q): (2.12)
The max* function is usually computed by successive pairwise operations when there
are more than two operands, since it is associative. In practice, the function fc =
ln(1 + e jp qj) can be implemented by a Look-Up Table (LUT), so the function can be
performed by a select operation in the LUT. The LUT realised version of the Log-BCJR
algorithm is called the LUT-Log-BCJR algorithm. In the LUT-Log-BCJR, the max
operation can be performed by a compare operation between p and q. As a result, all
the operations required by the LUT-Log-BCJR algorithm are either `add', `compare'
or `select', namely the Add-Compare-Select (ACS) operations. The other variation
of the Log-BCJR algorithm, the Max-Log-BCJR algorithm, simply replaces the max*
operation with a max operation, which further reduces the computation complexity, but
also induces a performance degradation.
In order to present the Log-BCJR algorithm, the RSC code employed in the UMTS/LTE
turbo code is taken as an example. Figure 2.13 shows the example trellis diagram
provided in Section 2.2. For Decoder 1 in Figure 2.12, b1 = fb1;jg
Nb1
j=1 are the systematic
bits; b5 = fb5;jg
Nb1+3
j=Nb1+1 are the tail bits and b2 = fb2;jg are the encoded bits. There are
three tail bits used for termination, as shown in the trellis, in order to drive the encoder
back into the `all zero' state, State1. For simplifying the notation, sequence fb1;j;b5;jg40 Chapter 2 Turbo codes, EXIT charts and the xed-point representation
0 0 0
0 0 1
0 1 0
0 1 1
1 0 0
1 0 1
1 1 0
1 1 1
T59 State1
State2
State3
State5
State6
State7
State8
State4
0/0 1/1 1/0 0/0 1/1 0/0 0/1 1/1 x/y
State
T1 T3
T4
T5
T6
S2 S4
S3
T12
T28
T46
T54
T58
T60
S7 S14
S6
S5
S1
T2
S23S31
S35
S37
S38
Figure 2.13: A example transition sequence of a short terminated trellis.
is replaced by x = fxjg
Nb1+3
j=1 and sequence b2 = fb2;jg is replaced by y = fyjg
Nb1+3
j=1 .
The same decoding trellis can be also applied to decoder 2 in Figure 2.12.
In each step of the trellis diagram, there are 16 possible transitions, except for in the
case of the three initial steps at the start and the three termination steps at the end.
For a certain input systematic sequence fxjg and the corresponding encoded sequence
fyjg, only one transition is used in each step in a encoding trellis, as exemplied in
Figure 2.13. The corresponding systematic bit and encoded bit of each transition is also
given in the gure. To represent each state in Figure 2.13, notation fS1;S2;S3;:::;S38g is
used to identify the possible states in the trellis following the order from top to bottom
and from left to right, as shown in the gure. Similarly, notation fT1;T2;T3;:::;T60g
is used to notate each possible transition in the trellis following the same order. In
addition, notation tj is used to represent the transition employed in the encoder trel-
lis for the jth bit. Similarly sj is the state entered by the encoder after the jth bit.
Therefore, for the example sequence of x and y in Figure 2.13, the traced transi-
tion sequence is t = fT1;T4;T12;T28;T46;T54;T58;T60g and the traced state sequence
is s = fS1;S2;S6;S14;S23;S31;S35;S37;S38g. For describing the algorithm, the follow-
ing notations are dened.
 fr(T) is the starting state of the transition T. For example, in Figure 2.13,
fr(T1)=S1 and fr(T3)=S2.
 to(T) is the ending state of the transition T. For example, in Figure 2.13, to(T1)=S2
and to(T2)=S3.
 fr(S) is the set of all transitions that start from state S. For example, in Fig-
ure 2.13, fr(S2)=fT3;T4g.Chapter 2 Turbo codes, EXIT charts and the xed-point representation 41
 to(S) is the set of all transitions that end at the state S. For example, in Fig-
ure 2.13, to(S38)=fT59;T60g.
 Bx(T) is the value for the bit in x that is implied by the transition T. For example,
in Figure 2.13, Bx(T1) = 0 since transition T1 implies that x1 = 0. Similarly, there
is Bx(T4) = 1.
 By(T) is the value for the bit in y that is implied by the transition T. For example,
in Figure 2.13, there are By(T1) = 0 and By(T2) = 1.
 J(T) is the bit index associated with the transition T. For example, in Figure 2.13,
J(T1) = J(T2) = 1 and J(T3) = J(T4) = J(T5) = J(T6) = 2.
Using the notations dened above, the Log-BCJR algorithm is described as follows.
The ultimate purpose of the algorithm is to calculate the extrinsic LLRs of the decoded
sequence ~ xe, based on the a priori LLRs ~ xa and ~ ya that are passed to the decoder. The
calculation of the extrinsic LLRs ~ xe leads to the calculations of another three groups of
internal variables, ,  and .
 The  values are conditional transition probabilities. In this case, the  values is
divided into two sub-groups, the a priori transition probability x and the channel
transition probability y. They corresponding to each transition in the trellis. For
each transition in each step, there is a x(T) and a y(T). For example, for a
traced transition sequence t, x(T) represents the probability ln[P(tJ(T) = Tj~ xa
j)],
and y(T) represents the probability ln[P(tJ(T) = Tj~ ya
j)].
 The  values are corresponding to each state in each step in the trellis. It is the
conditional probability that in step j (i.e. the decoding process is working on
the trellis step that corresponding to the received xa
j and ya
j), the traversed path
reaches a particular state S, that is (S) represents the probability ln[P(SJ(T) =
Sjf~ xa
jg
J(T)
j=1 ;f~ ya
jg
J(T)
j=1 )].
 The  values, on the other hand, are the conditional probabilities of the tra-
versed path starting from a particular state, that is (S) represents the probability
ln[P(SJ(T) = Sjf~ xa
jgNx
j=J(T)+1;f~ ya
jg
Ny
j=J(T)+1)].
Finally, the three groups of variables can be used to calculate the probability that
the encoder traversed a specic transition T in the trellis. Notation  is dened to
represent such a probability. For calculating the extrinsic information, a group of x
is considered here. For a particular T, x(T) represents the probability ln[P(tJ(T) =
Tjf~ xa
jgNx
j 1;j6=J;f~ ya
jg
Ny
j 1)]. It is the joint probability of the corresponding y,  and  of
the transition T.
For computing all these internal variables, the Log-BCJR algorithm is composed of the
following four parts.42 Chapter 2 Turbo codes, EXIT charts and the xed-point representation
1.  calculation: The values of  depend on the inputs of the Log-BCJR decoder.
There are two inputs, the uncoded LLRs input ~ x and the encoded LLRs input
~ y. As shown in Figure 2.8, the encoded LLR input is the LLRs of the encoded
sequence received from the channel ~ ya
j. The uncoded LLR input is ~ xa
j. For a
transition T, the x and y can be calculated as:
x(T) = (1   Bx(T))~ xa
J(T); (2.13)
y(T) = (1   By(T))~ ya
J(T): (2.14)
2.  calculation: The values of  depend on the  values and  values from the
previous step in the trellis. Hence, it requires a forward recursion in the trellis to
obtain all the  values. For a state S, in step j, the function to calculate  is:
(S) = max*
T2to(S)
(x(T) + y(T) + (fr(T))): (2.15)
where (S1) = 0, in the case of Figure 2.13.
3.  calculation: The values of  depend on the  values and  values from the next
step in the trellis. Hence, it requires a backward recursion in the trellis to obtain
all the  values. For a state S, in step j, the function to calculate  is:
(S) = max*
T2fr(S)
(x(T) + y(T) + (to(T))): (2.16)
where (S38) = 0, in the case of Figure 2.13.
4. y calculation: The values of x can be calculated according to (2.17).
x(T) = y(T) + (fr(T)) + (to(T)) (2.17)
5. Finally, the extrinsic information can be calculated based on  values. The extrinsic
LLRs of the uncoded bits ~ xe are:
~ xe
j = max*
T
  
Bx(T)=0
J(T)=j
 
x(T)

  max*
T
  
Bx(T)=1
J(T)=j
 
x(T)

: (2.18)
The algorithm is accomplished.
2.3 Extrinsic information transfer chart
As mentioned in Section 2.1, the BER chart is a powerful tool for analysing the per-
formance of turbo-like codes. However, it is unable to characterise the convergence
behaviour of a turbo-like code, for example at the onset of the turbo cli. This requiresChapter 2 Turbo codes, EXIT charts and the xed-point representation 43
a dierent analysis tool, namely the EXIT chart [100]. An EXIT chart uses Mutual
Information (MI) measurement to quantity the quality of the extrinsic information ex-
changed between the constituent decoders in an iterative decoding system. The MI
I(~ b;b) is a scalar in the range of [0;1], where zero is low quality and one is high quality
information. There are a number of dierent methods of measuring MI. The averaging
method [119] uses the equation:
I(~ b;b) = 1 +
1
Nb
Nb X
j 1
1 X
b0=0
(1   b0)e
~ bj
1 + e
~ bj
log2
"
(1   b0)e
~ bj
1 + e
~ bj
#
(2.19)
This method has the advantage of not requiring any knowledge of the bit sequence b.
This is achieved by assuming that the LLRs in ~ b satisfy the consistency condition [100],
that is the LLRs do not express too much condence or too little condence. Since
the averaging method `believe' what the LLRs say, it does not need to consider the
true values of the bits in b. However, this assumption is only valid if there are no
sub-optimalities in the receiver design. This requires perfect channel estimation, perfect
carrier recovery, perfect synchronisation, perfect equalisation and optimal decoding using
the Log-BCJR algorithm. By contrast, The histogram method [119] of measuring MI
does not make the described assumption and is therefore better suited when a sub-
optimal receiver is employed. This method uses knowledge of the true values of the bits
in b to avoid having to `believe' what the LLRs say.
The EXIT chart is comprised of two curves for the two decoders in the system. Each
curve plots the MI of the extrinsic LLRs versus the MI of the a priori LLRs of one
decoder in the system. This measures the quality of the output as a function of the
quality of the input. For example, taking the UMTS/LTE decoding scheme as an ex-
ample, for the rst decoder, the EXIT curve plots I(~ be
1;b1) as a function of I(~ ba0
1 ;b1)
as shown in Figure 2.14, where I(~ ba
1;b1) is the MI between ~ ba0
1 and b1, while I(~ be
1;b1)
is the MI between ~ be
1 and b1. For plotting the EXIT curve, a simulator is employed
to generate sequences of a priori LLRs ~ ba0
1 having a range of MI (0 < I(~ ba0
1 ;b1) < 1).
Using simulations that include the channel model, the modulation model and the BCJR
decoder, the extrinsic output ~ be
1 can be obtained and measured. The MI is averaged over
many frames. If I(~ be
1) is used to represent I(~ be
1;b1) and I(~ ba0
1 ) to represent I(~ ba0
1 ;b1),
the EXIT function I(~ be
1) = F(I(~ ba0
1 )) of the UMTS/LTE turbo code is shown in Fig-
ure 2.15. In the simulation, the exact convolutional code shown in Figure 2.14 is chosen,
with BPSK modulation and an AWGN channel. The Signal-to-Noise Ratio (SNR) is
-4dB. The SNR is dened as:
SNR =
Es
N0
; (2.20)
where Es is the energy per symbol and N0 is the noise power spectral density.
Similarly, the EXIT curve for the other decoder can be plotted. For a turbo code, owing
to the symmetry of the two concatenated codes, the EXIT function of the lower decoder44 Chapter 2 Turbo codes, EXIT charts and the xed-point representation
clip
clip
clip
M M M
˜ ba
1
Channel
LLRs
generate
MI
measure
b1
b2
b5
b1 Input
M
U
X
Modulator
deModulator
d
e
M
U
X
BCJR
decoder
˜ ba′
1
˜ be
1
I(˜ ba′
1 ,b)
I(˜ be
1,b)
˜ ca
1
˜ ce
1
˜ ba
5
˜ ba
2
Figure 2.14: Schematic of an EXIT chart simulation.
Figure 2.15: One EXIT curve I(~ be
1) = F(I(~ ba
0
1 )) of UMTS/LTE turbo code using
BPSK to transmit over an AWGN channel having an SNR of -4 dB.
is identical to that of the upper decoder. In an EXIT chart, the second curve is plotted
with the inverted axes, that is the horizontal axis plots the MI of the extrinsic output
and the vertical axis plots the MI of the a priori input. The reason for displaying the
second curve with the inverted axes is because in the iterative decoding process, the
output of one decoder becomes the input of the other decoder in the next iteration.
By putting the input of the decoder and the output of the other decoder in the sameChapter 2 Turbo codes, EXIT charts and the xed-point representation 45
axis, the interaction of the two concatenated decoders can be predicted on an EXIT
chart. The complete EXIT chart of the UMTS/LTE turbo code generated is given in
Figure 2.16.
Figure 2.16: EXIT chart of the UMTS turbo decoder.
Figure 2.17: Example decoding trajectories in the EXIT chart.
The iterative decoding process of the turbo code can be revealed by plotting decoding
trajectories in the EXIT chart, as shown in Figure 2.17. A decoding trajectory begins at
(0,0) point in the bottom left corner of the EXIT chart, since no a priori information is
provided by the other decoder at the start of the decoding process. The MI of the output46 Chapter 2 Turbo codes, EXIT charts and the xed-point representation
of the rst decoder can be obtained by the upper curve in the EXIT chart and is provided
as the input of the second decoder. Based on the MI provided by the rst decoder, the
MI of the output of the second decoder can be obtained by the lower curve in the EXIT
chart. The decoding performance of the next iteration can be obtained by the same way.
Thus, a decoding a decoding trajectory can be obtained as a staircase that propagates
upwards to the upper curve to rightwards towards to the lower curve. Note that the
condition for the decoding trajectory to have a high probability of reaching the (1,1)
point is that the EXIT chart has open tunnel, that is the two curves do not cross each
other before they reach the (1,1) point. By reaching the (1,1) point, the BER will be in
the error oor region. However, since the EXIT chart represents the statistical analysis of
a large number of simulated samples, the trajectories will typically vary from the EXIT
chart in practice. Dierent input sequences have dierent traces of trajectories. As
demonstrated in Figure 2.17, the three trajectories are all dierent and depart from the
EXIT chart. The EXIT chart gives the average convergence behaviour of the investigated
code. An EXIT chart allows the two concatenated codes to be considered in isolation
of each other. Since EXIT charts can predict the iterative interaction of the two codes,
the iterative decoding process does not need to be simulated in order to draw an EXIT
chart. Thus, EXIT charts can be obtained faster than BER/Frame Error Rate (FER)
charts.
2.4 Fixed-point numerical representation
The xed-point representation of numerical values is a widely used technique in ASIC
digital signal processing circuits. For the hardware implementation of the turbo de-
coders, it is a very important technique to keep the hardware complexity low. This
section provides an introduction to xed-point representation in hardware design. Fixed-
point representation, compared with oating-point representation, is easily implemented
in a small memory space and it is fast to execute arithmetic calculations. Therefore, it is
well-suited to real-time or low-power applications. Internally, the computation of xed-
point numbers take the values as integers, but considers the integer part and fraction
part separately with an imaginary point.
Two's complement representation is the most widely used xed-point representation in
practice. A two's complement binary number is divided into three parts, a sign bit, an
integer part and a fraction part. First, consider the two's complement representation of
signed integers before considering the representation of numbers having a fraction part.
The most signicant bit is used as the sign bit, where 0 is used to represent positive
signs and 1 is to represent negative signs. The rest of the bits represent the magnitude
of the number. A negative number is represented by its absolute value complemented
bit by bit and incremented by 1. For example the 3-bit representation of 2 is 010. The
complement of this is 101. Adding 1 to this gives the two's complement representationChapter 2 Turbo codes, EXIT charts and the xed-point representation 47
of -2, namely 110. The complete set of 3-bit two's complement representations is given
in Table 2.1. In addition, another two signed integer representation methods, sign and
absolute value notation and one's complement notation, are also given as examples in
the table for comparison. As shown, compared with the other two methods, the two's
Binary number 000 001 010 011 100 101 110 111
Sign and absolute value +0 +1 +2 +3 -0 -1 -2 -3
One's complement +0 +1 +2 +3 -3 -2 -1 -0
Two's complement +0 +1 +2 +3 -4 -3 -2 -1
Table 2.1: Dierent representation methods for integer numbers
complement representation avoids the double representation of zero. As a consequence,
the range of negative values is greater than the range of positive values by one. The
main advantage of two's complement notation is the ability to perform the addition of
negative numbers, without needing to take the sign of the operands into consideration.
Furthermore, in two's complement notation, subtraction is achieved by performing the
complement and adding. For example, in 3-bit representation, 2   3 can be performed
by calculating the sum of 2 (010) and -3 (101).
2   3 = 2 + ( 3) = 010 + 101 = 111 =  1 (2.21)
For the subtractions in two's complement notation, letting the result overow is neces-
sary. Take the following calculation as an example:
2   2 = 2 + ( 2) = 010 + 110 = (1)000 = 000 = 0 (2.22)
When the overowed part is removed, the calculation gives the correct result naturally.
By contrast, subtraction in the other two notation methods is more complicated since
complement and adding does not give the correct result. For example, in one's comple-
ment notation:
3   2 = 3 + ( 2) = 011 + 101 = 000 = 0 (2.23)
Therefore, the addition of dierent signed components need to be considered carefully
and extra correction is required.
For a fractional xed-point number, A , an imaginary point is set at a certain place.
A 3-bit two's complement xed-point number with a 2-bit fraction part is exemplied
in Table 2.2. In the table, the imaginary point is placed after the most signicant bit
in the binary representation. For n-bit two's complement representation, notation Qy:z
Binary number 0.00 0.01 0.10 0.11 1.00 1.01 1.10 1.11
Two's complement +0.00 +0.25 +0.50 +0.75 -1 -0.75 -0.5 -0.25
Table 2.2: Two's complement representation method for fraction numbers48 Chapter 2 Turbo codes, EXIT charts and the xed-point representation
is dened to represent the point setting, where y represents the number of bits in the
integer component and z represents the number of bits in the fraction component. The
total number of bits is given by n = y + z + 1. For example, an 8-bit two's complement
number with a imaginary point after the 5th bit is 01100.010. The integer part is 12 and
the fraction part is 0.25, thus the decimal value of 01100.010 is 12.25. The representation
range of the representation are given by (2.24), and the resolution r is given by (2.25).
 
2p+q
2q  A 
2p+q   1
2q (2.24)
r = 2 q (2.25)
Owing to the limited representation range and the limited resolution of xed-point rep-
resentation, errors occur when the values are outside of the range or the resolution,
causing overow and underow, respectively. For example, for a Q3:2 xed-point rep-
resentation, the representation range is  8  A  7:75. The resolution is 0:25. To
calculate 6 + 7, the binary process is 0110 + 0111 = 1101. The result in binary 1101
is  3 in decimal, which is incorrect because of the overow. For a value that is out of
the resolution limit, take 1:23 as an example, in the xed-point representation, it has to
be approximated to the nearest quantised value, which is 1:25. The 0:02 inaccuracy is
caused by the underow.
2.5 Chapter Summary
This chapter commenced with the introduction of the widely used type of near Shannon
limit ECCs, namely turbo-like codes. Secondly, a powerful analysis tool for investigating
the designs, the performance and the convergence behaviours of turbo-like codes, namely
EXIT charts, is presented. Finally, an important issue that directly related to the
hardware complexity and energy consumption of the turbo-like codes' implementations,
the xed-point numerical representation, is discussed. The key points of this background
chapter are as follows.
 Section 2.1 introduced typical congurations and the BER performance of the two
types of turbo-like codes, namely SCCCs and PCCCs. By employing SISO compo-
nent decoders and an iterative decoding process to exchange extrinsic information
between the component decoders, turbo-like codes are capable of achieving a near
Shannon limit performance.
 In Section 2.2, using the UMTS/LTE turbo code [103] as an example, the typ-
ical turbo encoder and decoder schemes are presented. Furthermore, a widely
used decoding algorithm of turbo codes, namely the LUT-Log-BCJR algorithm, is
described in detail.Chapter 2 Turbo codes, EXIT charts and the xed-point representation 49
 In Section 2.3, the EXIT chart [100] is introduced for visualising the EXIT function
of a LUT-Log-BCJR decoder. The demonstration illustrates how that EXIT charts
are capable of characterising the convergence behaviour of turbo-like codes. The
typical simulation scheme for generating EXIT charts and decoding trajectory
chart is presented.
 In Section 2.4, the xed-point numerical representation is introduced. The xed-
point representation has a lower complexity when compared to the oating-point
representation, facilitating practical ASIC implementations of LUT-Log-BCJR de-
coders. This section presented three xed-point representation methods, namely
sign and absolute value, one's complement and two's complement. The advantage
of using the two'c complement xed-point representation instead of the other two
representation methods is discussed.Chapter 3
A serially concatenated
convolutional code scheme for
star topology wireless sensor
networks1
3.1 Introduction
As discussed in Chapter 1, turbo-like codes have the capability to reduce the trans-
mission energy consumption of the wireless communication systems in Wireless Sensor
Networks (WSNs), thanks to their near Shannon limit performance. However, the ad-
ditional energy consumption introduced by the encoder and decoder precessing must be
considered, when determing the overall energy consumption. In this chapter, a simple
but widely used scenario, namely the star network, is investigated on this subject. A Se-
rial Concatenated Convolutional Code (SCCC) scheme based on an augmentation of the
PHYsical layer (PHY) of the Institute of Electrical and Electronics Engineers (IEEE)
802.15.4 standard is used for the investigation.
The sensor nodes of a WSN are typically required to maintain sporadic but reliable
data transmissions for extended periods of time. Furthermore, in some applications,
the sensor nodes are required to be small, preventing the use of bulky batteries [3].
Therefore, improvements in the energy eciency of the sensor nodes are also required
in order for WSNs to nd further diverse applications. A star WSN topology is often
employed in low power and low rate applications, with all data frames being transmitted
to a central node, which coordinates the reactions of a higher-level system to the sensed
1This work is collaborated with Dr. Rob Maunder. The presented SCCC scheme is proposed in Dr.
Rob Maunder's previous work [69].
5152 Chapter 3 A SCCC scheme for star topology WSNs
data. Many previous eorts have focused on the energy consumption of star topology
WSNs [81,120,121]. In these scenarios, the central node typically has abundant energy
resources, owing to its integration into the higher-level system. It is therefore benecial
to redistribute energy consumption from the sensor nodes to the central node, whenever
possible.
In [69], an augmentation to the PHY of the IEEE 802.15.4 standard for WSNs [102]
was introduced in order to redistribute energy consumption in the manner described
above. This was achieved by deploying a sophisticated decoder of the employed Error-
Correcting Code (ECC) within the central node. This reduced the transmission energy
required to achieve a desirable Frame Error Rate (FER) of 10 3 by 3.22 - 6.75 dB,
depending on the length of the PHY payload [69]. Although, additional energy is con-
sumed by the additional error correction encoding operations that are performed in the
transmitting sensor nodes, the analysis in [69] demonstrated that the proposed approach
can reduce the overall energy consumption by more than 17.4 - 23.3%. However, this
result was based on the assumption that the additional error correction encoding op-
erations are performed by software running on the 8051 microprocessor of a Chipcon
CC2430 [122] sensor node. This is a conservative assumption since the required func-
tionality is signicantly simpler than the capability of an 8051 microcontroller. A sensor
node energy consumption reduction of more than 20% could therefore be expected if
error correction encoding was performed in hardware. This chapter therefore introduces
and characterises a dedicated Application-Specic Integrated Circuit (ASIC) design that
was implemented for this purpose.
The rest of this chapter is organised as follows. Section 3.2 reviews the augmented
IEEE 802.15.4 PHY and the analysis of its energy savings that was detailed in [69]. In
Section 3.3, the novel deterministic interleaver design that was employed to obtain the
results of [69] is detailed for the rst time. This interleaver design is suitable for all
possible PHY payload lengths without imposing an excessive memory requirement. A
novel Genetic Algorithm (GA) [123] is employed for parameterising the interleaver in
order to maximise its performance, as detailed for the rst time in Section 3.4. Sec-
tion 3.5 discusses a novel hardware implementation of the ECC encoder, which employs
parallel `just-in-time' processing in order to achieve a low processing latency and en-
ergy consumption. This energy consumption is analysed in Section 3.6 and the analysis
shows that it is insignicant compared to the transmission energy saving that it aords.
As a result, the augmented PHY is shown to oer net energy consumption savings of
24.8 { 31.4%, which are signicantly greater that those reported in [69] of 17.4 { 23.3%.
Finally, conclusions are oered in Section 3.7.Chapter 3 A SCCC scheme for star topology WSNs 53
3.2 Review of the augmented PHYsical layer
As detailed in [69], the proposed augmentation to the IEEE 802.15.4 PHY can be em-
ployed to convey the payloads of IEEE 802.15.4 data frames using a reduced transmission
energy. However, the augmented PHY imposes additional interleaving and rate-1 en-
coding operations upon the transmitting sensor nodes. As shown in the schematic of
Figure 3.1, these operations are performed between the Pseudo Noise (PN) spreading
and Oset Quadrature Phase-Shift Keying (O-QPSK) operations of the standard IEEE
802.15.4 PHY [102].
Interleaver
Interleaver
Deinterleaver
Rate-1
encoder
Rate-1
decoder
O-QPSK
modulator
demodulator
O-QPSK
Channel
spreader
despreader
PN
PN
a
˜ a
˜ ba
˜ be
e
˜ ee
˜ ea
˜ f
g
˜ g
b f
Figure 3.1: Schematic of the augmented IEEE 802.15.4 PHY.
An interleaver and a rate-1 encoder is proposed to be concatenated to the PN spreader
from the original IEEE 802.15.4 PHY, as shown in the dashed line box in Figure 3.1.
The augmented PHY applies PN spreading to the M-byte PHY payload a, where M 2
[10:::127] [102]. This is achieved by decomposing the payload a into sets of k = 4
consecutive bits and mapping these to n = 32-bit PN sequences [102], as in the standard
PHY. These PN sequences are concatenated to obtain the N-bit sequence b, where
N = 8Mn
k . The interleaver of the augmented PHY then employs a three-step so-called
Dithered Relative Prime (DRP) process [124] to rearrange the order of the bits in b.
This process is detailed for the rst time in Sections 3.3 and 3.4 of this chapter. As
shown in Figure 3.1, the resultant N-bit sequence e is rate-1 encoded [125], as detailed
in Section 3.3. Finally, the encoded bit sequence f is O-QPSK modulated, as in the
standard PHY. As detailed in Section 3.3, the input of the augmented PHY's O-QPSK
modulator f comprises the same number N of bits as the output of the PN spreader
b, like in the standard PHY. For this reason, the PN spreader and O-QPSK modulator
remain completely unchanged, when the augmented PHY is employed in the transmitting
sensor nodes.
In the receiving central node, additional rate-1 decoding and deinterleaving operations
are employed by the augmented PHY. This employs iterative decoding [126], which re-
peatedly alternates the operation of the PN despreader and the rate-1 decoder, as shown
in Figure 3.1. This is in contrast to the receiver of the standard PHY, which employs
only the `one-shot' operation of the PN despreader. Since the augmented PHY invests
more decoding complexity in the central node than the standard PHY, it can achieve a
desirable FER using a reduced sensor node transmission energy. This is demonstrated
by the simulation results of Figure 3.2, which considers transmission over a Line-Of-
Sight (LOS) channel in the presence of Additive White Gaussian Noise (AWGN) having54 Chapter 3 A SCCC scheme for star topology WSNs
a constant power spectral density N0, in common with [102]. These results show that
when transmitting N = 640-bit payloads, the augmented PHY can achieve a desirable
FER of 10 3 at a transmission energy per bit Ec that is 3.22 dB lower than that required
by the standard PHY. Furthermore, this gain increases to 6.75 dB, when N = 8128-bit
payloads are employed, owing to the augmented PHY's interleaver gain [69], which is
obtained when transmitting longer payloads.
N = 8128
N = 4288
N = 2304
N = 1216
N = 640
The proposed augmented PHY
Original IEEE 802.15.4 PHY
Ec
N0 [dB]
P
E
R
0 -2 -4 -6 -8 -10
100
10−1
10−2
10−3
10−4
10−5
Figure 3.2: FER performance of the standard and augmented PHYs for payloads
comprising various numbers of bits N, when communicating over LOS AWGN channels,
having a range of values for the SNR per payload bit Ec=N0.
In order to assess the practical sensor node energy saving, the augmentation of the
Chipcon CC2430 PHY [122] was considered in [69]. The energy consumed during trans-
mission is given by Etx = Itx  V  ttx, where ttx = N=ftx is the transmit duration and
the IEEE 802.15.4 transmission rate is ftx = 2  106 bits per second [102]. As may be
expected, the current Itx consumed during the transmission of a data payload depends
on the particular transmit energy per bit Ec employed. In its maximum transmit power
mode of 0.6 dBm, the Chipcon CC2430 consumes Itx
std = 32:4 mA [122]. At this transmit
power, the amount of energy Etx
std = Itx
stdV ttx consumed by the standard PHY without
augmentation is illustrated in Figure 3.3 for payloads comprising various numbers N of
bits. As described above, the augmented PHY reduces the transmission energy required
to achieve a desirable FER of 10 3 by 3.22 { 6.75 dB, depending on the length of the
payload N. Corresponding reductions from the Chipcon CC2430's maximum transmit
power of 0.6 dBm allow its current consumption Itx
aug to be lowered from 32.4 mA to 21.7
{ 23.7 mA [122]. Figure 3.3 shows the amount of transmission energy Etx
aug = Itx
augV ttx
consumed by the augmented PHY for various values of N. These results show that the
augmented PHY facilitates gross sensor node energy savings of (Etx
std   Etx
aug) that are
27.0 { 33.0% of Etx
std, depending on the payload length N.Chapter 3 A SCCC scheme for star topology WSNs 55
Figure 3.3: Total energy consumed in the standard Chipcon CC2430 PHY Etx
std and
in the software implementation of the augmented PHY (Etx
aug + Eaug
pr ).
However, in order to determine the net sensor node energy saving [Etx
std (Etx
aug +E
pr
aug)]
that is aorded by the augmented PHY, it is necessary to additionally consider the
energy E
pr
aug consumed during the operation of the interleaver and rate-1 encoder that
are boxed in Figure 3.1. In [69], it was assumed that these operations were performed
by software running on the 8051 processor of a Chipcon CC2430 sensor node. Since
the interleaver of Figure 3.1 employs a three-stage process and rate-1 encoding can
be completed in a single step, it was assumed that each bit in the N-bit payload b
can be processed using 4 clock cycles, requiring C
pr
aug = 4N clock cycles in total. The
duration of this processing is therefore t
pr
aug = C
pr
aug=f
pr
aug, where f
pr
aug is the system's
clock frequency. Here, f
pr
aug is assumed to be 32 MHz, which is the clock frequency on
the Chipcon CC2430 [122]. The energy consumed by interleaving and rate-1 encoding
is given by E
pr
aug = I
pr
aug  V  t
pr1
aug. Here, a supply voltage V = 3 Volts equal to that of
the Chipcon CC2430 and a current consumption of I
aug
pr1 = 12:3 mA equal to the peak
consumption of the on-board 8051 processor [122] were conservatively assumed. The
resultant processing energy consumptions E
pr
aug are shown in Figure 3.3 for a variety
of payload lengths N. Using this software implementation of the augmented PHY, net
energy savings [Etx
std (Etx
aug+E
pr
aug)] that correspond to 17.4 { 23.3% of Etx
std are aorded,
as reported in [69].
As described in Section 3.1 however, a more ecient implementation of the augmented
PHY in a transmitting sensor node would resemble the Chipcon CC2430, but with the
addition of a ASIC module dedicated to performing interleaving and rate-1 encoding.
Since a dedicated module would be much simpler than an 8051 processor and because
it could benet from parallel processing, this approach would oer a reduced processing
energy consumption E
pr
aug2 and therefore an increased net energy saving. The following
sections detail the design, parametrisation, implementation and characterisation of a
hardware module for this purpose.56 Chapter 3 A SCCC scheme for star topology WSNs
3.3 Module design
As described in Section 3.2, the proposed module implements the interleaver and rate-1
encoder that are boxed in Figure 3.1. Here, the standard IEEE 802.15.4 PN spreader
[102] provides the input bit sequence b. As described in Section 3.2, this comprises
8M=k number of n = 32-bit PN sequences [102], where M is the number of bytes in
the data payload a. Since this has 118 possible values M 2 [10:::127], there are 118
possible lengths N = 8Mn
k 2 f640;704;768;:::;8128g for the bit sequence b [69].
When repositioning the bits in the sequence b = fbigN 1
i=0 , the interleaver of Figure 3.1
is required to desirably `randomise' the order of the bits in the resultant sequence e =
feigN 1
i=0 . In order to achieve this, the interleaver must fully exploit the grade of freedom
for re-positioning the bits, which increases with the number of bits N. As a result,
dierent interleaver designs are required for each of the 118 possible values of N and the
associated parameters of the interleaver design must be stored in Read-Only Memory
(ROM).
A naive interleaver design would be parameterised by 118 arrays f640; 704; 768; :::;
8128g, each of which would comprise N unique integers in the range of [0;N   1].
The operation of the naive interleaver can be formally specied as ei = bN[i], where
N[i] is the ith element in the array N and i 2 [0;N   1]. However, this approach
would require approximately 800 KB of ROM, which is considered to be excessive, since
memory accesses are typically associated with relatively high energy consumptions in
systems [127].
This problem was solved in the implementations detailed in [128{130] by employing
deterministic interleaver designs. These require the storage of only a limited number
of parameters, which are employed to compute the elements of the interleaver pattern
in an on-line manner, as and when they are required. However, the designs detailed
in [128{130] are optimised for turbo codes and are not suitable for the augmented PHY.
This is because the interleaver is required to mitigate a higher level of correlation within
the soft information exchanged in the augmented PHY, since the PN despreader of
Figure 3.1 operates on relatively long blocks, comprising n = 32 bits.
For this reason, a deterministic design resembling a DRP interleaver [124] is employed,
which has been shown to eectively `randomise' the order of the bits in the sequence
b without requiring an excessive amount of ROM. Indeed, only 12 KB of ROM are
required to store the parameters of the interleaver design, as listed in Table 3.1. Like a
DRP interleaver, the proposed design is implemented in three stages, which are referred
to as `Interleaver 1', `Interleaver 2' and `Interleaver 3' in the discussions below. As
exemplied by the criss-crossing arrows in Figure 3.4, these interleavers are employed
to `randomise' the order of the intermediate bit sequences c, d and e, respectively.Chapter 3 A SCCC scheme for star topology WSNs 57
Note that the intermediate bit sequences comprise the same number of bits as the input
sequence b, namely N.
Table 3.1: Parameters of the interleavers shown in Figure 3.4.
fr0; r1; r2; :::; r253g Each of the 254 arrays ru is comprised of n = 32
unique integers from the range [0;n   1].
fs640; s704; s768; :::; s8128g Each of the 118 integers sN has a value from the
range [0;n   1].
fp640; p704; p768; :::;
p8128g
Each of the 118 integers pN has a value from the
range [1;N   1].
fW640; W704; W768; :::;
W8128g
Each of the 118 integers WN has a value from the
range [1;N].
fw640; w704; w768; :::;
w8128g
Each of the 118 arrays wN is comprised of WN unique
integers from the range [0;WN   1].
f1 f2 f0
b32 b33 b34 b63 b1 b2 b31 b64 b65 b66 b95 b0
c32 c34 c63 c0 c1 c2 c31 c64 c65 c66 c95 c33
d0 d1 d2 d63
e0 e1 e2 e63
c:
e:
f:
Interleaver 1:
Rate-1 encoder:
d:
b:
Interleaver 2:
Interleaver 3:
b639 b608b609b610
c639
d639
c610 c609 c608
d576d577d578
e576e577e578
f639
e639
R = 32
W640 = 64
r0 r1 r2 r19
w640 w640
s640 = 0 p640 = 33
Figure 3.4: Example operation of the interleaver and rate-1 encoder of Figure 3.1 for
N = 640.
Interleaver 1 Similarly to the rst stage of a DRP interleaver, Interleaver 1 of Figure 3.4
employs a block-based rearrangement of the bits in the input sequence b = fbigN 1
i=0 in
order to generate the sequence c = fcigN 1
i=0 . More specically, each block of n = 32 bits
in c is provided by rearranging the order of the corresponding n = 32-bit PN sequence
in b. As shown in Figure 3.4, a dierent rearrangement is employed for each n = 32-bit
PN sequence, as specied by the parameters fr0; r1; r2; :::; r253g of Table 3.1. Note
that 254 rearrangements are required, because the sequence b comprises N=n = 254 PN
sequences, when it has a maximal length of N = 8128 bits. The operation of Interleaver
1 can be formally specied as ci = bji, where
ji = n  u + ru[v]; (3.1)
ru[v] is the vth element in the array ru, v = imodn, u = idivn and i 2 [0;N  1]. Here,
the `div' operator indicates integer division, while `mod' is employed to represent the
modulo operator.58 Chapter 3 A SCCC scheme for star topology WSNs
Interleaver 2 Similarly, the operation of Interleaver 2 from Figure 3.4 can be specied
as di = cji, where
ji = (sN + pN  i)modN (3.2)
and i 2 [0;N   1]. As in the second stage of a DRP interleaver, sN identies the index
of the bit in c that provides the rst bit in d = fdigN 1
i=0 , as shown in Figure 3.4. The
subsequent bits in d are provided by employing successive hops of pN bits (modulo N)
to select the corresponding bit in the sequence c. Here, pN is required to be a relative
prime of N in order to ensure that each bit in c provides exactly one bit for d. Note
that the particular values that are employed for sN and pN depend upon the length N
of the bit sequence, as shown in Table 3.1.
Interleaver 3 Similarly, the parameters employed for Interleaver 3 of Figure 3.4 de-
pend upon the length N of the bit sequence. This interleaver employs a block-based
rearrangement of the bits in d in order to obtain the sequence e, similarly to Interleaver
1. However in contrast to Interleaver 1, Interleaver 3 employs the same rearrangement
for each block of WN bits in the sequence d, as shown in Figure 3.4. As seen in Ta-
ble 3.1, this rearrangement and the block length are described by the parameters wN
and WN, respectively. Clearly, WN is required to be a factor of the bit sequence length
N. For example, W640 = 16, W1216 = 32, W2304 = 64, W4288 = 64 and W8128 = 64. The
operation of Interleaver 3 can be formally specied as ei = dji, where
ji = WN  u + wN[v]; (3.3)
wN[v] is the vth element in the array wN, v = imodWN, u = idivWN and i 2 [0;N 1].
Rate-1 encoder Finally, for each bit in the sequence e = feigN 1
i=0 , the rate-1 encoder
of Figure 3.1 generates one bit for its output sequence f = ffigN 1
i=0 . More specically,
f0 = e0 and fi = eifi 1 for i 2 [1;N  1], as shown in Figure 3.4, in which  indicates
the modulo-2 addition of two binary bits. As a result, the output bit sequence f input
to the standard IEEE 802.15.4 O-QPSK modulator [102] also comprises N bits.
3.4 Module parametrisation
The o-line algorithm employed to design values for the interleaver parameters of Ta-
ble 3.1 is described in this section, in order to ensure that the order of the bits in
the sequence b is eectively `randomised'. Note that the N-bit input sequence b has
2N=8 legitimate permutations, since the PN spreader of Figure 3.1 has a coding rate of
k=n = 1=8 [102]. As described above, the module detailed in this section maps each
of these permutations to a dierent permutation of the output bit sequence f. The
particular mapping that is employed depends upon the parameters of the interleavers.
The o-line algorithm used for designing these parameters attempts to maximise theChapter 3 A SCCC scheme for star topology WSNs 59
minimum Hamming distance Dmin
H between the legitimate permutations of f, as de-
tailed below. In this way, the number of bit errors that is required to transform the
transmitted permutation of f into any other legitimate permutation is maximised. This
maximises the probability that transmission errors can be detected and corrected by the
iterative decoder of Figure 3.1, optimising its performance.
Though it is beyond the scope of this chapter, it can be shown that Dmin
H can be as low
as six if the interleaver does not eectively `randomise' the order of its bits. However,
it can be shown that Dmin
H will increase to at least 24, provided that the interleaver
parametrisation satises two conditions. The rst condition requires the interleaver to
separate every pair of bits from each n = 32-bit PN sequence in b with at least two bits
from other PN sequences, when they are re-positioned in e. For example, this condition
will not be satised, if the bits b66 and b98 (which are constituent of the same n = 32-bit
PN sequence in b, as shown in Figure 3.4) are interleaved to positions e343 and e341,
respectively (which are separated by only one other bit position, namely e342). The
second condition of achieving Dmin
H  24 requires each n = 32-bit PN sequence in b to
have no more than one bit in e that is adjacent to a bit within each of the other PN
sequences. For example, if the bits b32 and b34 are interleaved to positions e512 and e125,
respectively, and the bits b610 and b639 are interleaved to e124 and e513, respectively, then
the second condition will not be satised.
The described conditions of achieving Dmin
H  24 motivated the design of a GA [123],
which was employed for selecting benecial values for the parameters of Table 3.1. The
rst goal of the proposed GA was to maximise the minimum positional separation be-
tween any two bits in e that originate from the same n = 32-bit PN sequence in b.
The GA's second goal was to minimise the maximum number of occurrences that any
two n = 32-bit PN sequences in b have bits that are positioned next to each other in
e. Clearly, in order to achieve these goals, the grade of freedom for the interleavers to
re-position the N bits in the sequence b must be fully exploited, as described above.
3.5 Module implementation
This section describes a ASIC module that implements the interleaver and rate-1 en-
coder that are boxed in Figure 3.1. The schematic of Figure 3.5 is employed for the
proposed module, which could be integrated between the PN spreader and the O-QPSK
modulator of a standard IEEE 802.15.4 implementation, such as the Chipcon CC2430.
In the following discussions, the proposed module's Input and Output (I/O) interface,
datapath, ROM and controller are detailed.
The proposed module is specically designed to avoid imposing any changes upon the I/O
interfaces of the standard PN spreader and O-QPSK modulator. These exchange n = 32-
bit PN sequences [102], at a rate of ftx=n = 62:5103 per second, where ftx = 2106 bits60 Chapter 3 A SCCC scheme for star topology WSNs
per second, as described in Section 3.2. These features motivate the proposed module's
employment of a f
aug
pr2 = 62:5 kHz clock, which is supplied using the `Clk' port shown in
Figure 3.5, as well as the 32-bit I/O ports `Data in' and `Data out'.
C
o
n
t
r
o
l
S
i
g
n
a
l
s
7
−
b
i
t
128−bit
128−bit
C
o
n
t
r
o
l
S
i
g
n
a
l
s
b
b
c
d
e
f f
Control
ROM
length
Data_out
Datapath
nReset
Clk
Data_in
Supply
  Data
Input Buffer
Interleaver1
Register Bank
Interleaver2
Interleaver3
Rate−1 Encoder
Output Buffer
128-bit
128-bit
128-bit
3
2
-
b
i
t
3
2
-
b
i
t
En
Figure 3.5: Schematic of the proposed hardware implementation.
Note that the proposed module has three other ports, as shown in Figure 3.5. The
module begins operating when a logic one is placed upon the `En' port, allowing its
input port to be synchronised with the PN spreader's output port. As described in
Section 3.3, the module operates in one of 118 dierent modes, depending on the length
N of the bit sequence b. The value of N can be extracted from the 7-bit `frame-length
eld' employed in the IEEE 802.15.4 PHY header [102], which conveys the number of
bytes in the PHY payload a, namely M. This `frame-length eld' is provided to the
module using its 7-bit `Length' port. Finally, the `nReset' port may be used to reset the
registers employed within the proposed module.
A serial structure is employed within the datapath block of the proposed module, as
shown in Figure 3.5. Here, the interleaver of Figure 3.1 is implemented in three stages,
namely Interleaver 1, Interleaver 2 and Interleaver 3, as described in Section 3.3. These
stages interleave multiple bits in parallel [131], in order to process the bits at the same
rate that they are supplied by the PN spreader, as shown in the timing diagram of
Figure 3.6 and detailed below. This approach facilitates a low processing latency and
`just-in-time' processing, which reduces the number of registers that are required to store
intermediate results.
A uniform 128-bit dataow width is chosen within the datapath block of Figure 3.5. This
uniform width allows Interleaver 2, Interleaver 3 and the rate-1 encoder to be operatedChapter 3 A SCCC scheme for star topology WSNs 61
￿￿￿￿￿￿￿￿￿￿
￿￿￿￿￿￿￿￿￿￿
￿￿￿￿￿￿￿￿￿￿
￿￿￿￿￿￿￿￿￿￿
￿￿￿￿￿￿￿￿￿￿
￿￿￿￿￿￿￿￿￿￿
￿￿￿￿￿￿￿￿￿￿
￿￿￿￿￿￿￿￿￿￿
￿￿
￿￿
￿￿
￿￿
￿￿
￿￿
￿￿
￿￿
￿￿￿￿￿￿￿￿￿￿
￿￿￿￿￿￿￿￿￿￿
￿￿￿￿￿￿￿￿￿￿
￿￿￿￿￿￿￿￿￿￿
￿￿￿￿￿￿￿￿￿￿
￿￿￿￿￿￿￿￿￿￿
￿￿￿￿￿￿￿￿￿￿
￿￿￿￿￿￿￿￿￿￿
￿
￿
￿
￿
￿
￿
￿
￿
￿
￿
￿
￿
￿
￿
￿
￿
￿￿
￿￿
￿￿
￿￿
￿￿
￿￿
￿￿
￿￿
￿￿
￿￿
￿￿
￿￿
￿￿
￿￿
￿￿
￿￿
￿￿
￿￿
￿￿
￿￿
￿￿
￿￿
￿￿
￿￿
￿￿
￿￿
￿￿
￿￿
￿￿
￿￿
￿￿
￿￿
￿
￿
￿
￿
￿
￿
￿
￿
￿￿￿￿￿￿￿
￿￿￿￿￿￿￿
￿￿￿￿￿￿￿
￿￿￿￿￿￿￿
￿￿￿￿￿￿￿
￿￿￿￿￿￿￿
￿￿￿￿￿￿￿
￿￿￿￿￿￿￿
￿￿￿￿￿￿￿￿
￿￿￿￿￿￿￿￿
￿￿￿￿￿￿￿￿
￿￿￿￿￿￿￿￿
￿
￿
￿
￿
￿
￿
￿
￿
￿
￿
￿
￿
￿
￿
￿
￿
￿
￿
￿
￿
￿
￿
￿
￿
￿
￿
￿
￿
￿
￿
￿
￿
￿
￿
￿
￿
￿
￿
￿
￿
￿
￿
￿
￿
￿
￿
￿
￿
￿￿
￿￿
￿￿
￿￿
￿￿
￿￿
￿￿
￿￿
￿
￿
￿
￿
￿
￿
￿
￿
￿￿￿￿￿￿￿￿
￿￿￿￿￿￿￿￿
￿￿￿￿￿￿￿￿
￿￿￿￿￿￿￿￿
￿￿￿￿￿￿￿
￿￿￿￿￿￿￿
￿￿￿￿￿￿￿
￿￿￿￿￿￿￿
Receiving Data
Reading ROM for Interleaver1
Processing in Interleaver1
Transmitting Results
Start End
Interleaver 2 and Interleaver 3
Genertating parameters
Interleaver3 and Rate−1 Encoder
Reading ROM for 
for Interleaver 2
Processing in Interleaver 2
1 clock cycles
3 clock cycles
1 clock cycles
1 clock cycles
Np/32 clock cycles
Wn + 7 clock cycles
C = Np/16 + Wn + 11 clock cycles
Np/32 clock cycles
Figure 3.6: Timing diagram of the proposed hardware implementation.
within a single clock cycle, without the need for intermediate registers. However, owing
to its 8128-bit register bank, Interleaver 2 cannot be tightly connected to Interleaver
1. More specically, this register bank must collect all of the bits from the sequence
c, before Interleaver 2 can commence generating the sequence d, owing to the relative
prime number based hops that are required, as discussed in Section 3.3. The 128-bit
input and output buers shown in Figure 3.5 are required owing to the dierent port
widths employed inside and outside of the datapath block.
As described in Section 3.3, the three interleaving stages are parameterised as described
in Table 3.1. In the proposed module, these parameters are stored in the ROM block
shown in Figure 3.5. Combinational logic is employed to convert these parameters into
the bit indices ji, according to (3.1), (3.2) and (3.3) for Interleaver 1, Interleaver 2 and
Interleaver 3, respectively. Here, the complexity of calculating (3.2) can be reduced, if
it is implemented recursively according to
ji = (ji 1 + pN)modN; (3.4)
where i 2 [1;N   1] and j(0) = sN. As shown in Figure 3.5, Interleaver 2 is required
to interleave 128 bits in parallel. Note that 128 adders could be chained together to
perform the required calculations of (3.4). However, the necessary combinational logic
path would be too long to be performed within a single clock cycle. For this reason,
a parallel adder chains is employed, each employing 16 adders to determine a dierent
subset of the 128 bit indices ji in a single clock cycle, striking a trade-o between bit
area and speed.
As described above, 128-bit subsets of the sequence b are processed by Interleaver 1
as and when they are supplied to the input buer, at a rate of 32 bits per clock cycle.
However, 128 elements from the arrays ru of Table 3.1 are required in order to calculate62 Chapter 3 A SCCC scheme for star topology WSNs
(3.1) for each 128-bit subset of b. For this reason, a multiport ROM is employed to
provide these 128 parameters in just three clock cycles, as shown in the timing diagram
of Figure 3.6. In a fourth clock cycle, the 128 bits that have been loaded into the input
buer during the previous four clock cycles are interleaved and stored in the register
bank of Interleaver 2. Note that during one of the four clock cycles in which Interleaver
1 is operated, it is necessary to both read from and write to the input buer of Figure 3.5.
When this process completes, the register bank of Interleaver 2 will store the bit sequence
c, as shown in Figure 3.4.
Next, the registers that were used to store the parameters of Interleaver 1 are reused
for storing the parameters of Interleaver 2 and Interleaver 3. Since the register bank of
Interleaver 2 can store the bit sequence c indenitely, there is no need to use multiport
ROM accesses, when loading the parameters of Interleaver 2 and Interleaver 3. Hence,
as shown in Figure 3.6, WN +7 clock cycles are used to load the parameters sN, pN and
WN, as well as the WN elements of the array wN of Table 3.1.
Following this, the bits of the sequence c are processed by Interleaver 2, Interleaver 3
and the rate-1 encoder in order to obtain the sequence f, as shown in Figure 3.4. Here,
128 bits are processed and written into the output buer of Figure 3.5 at a time. As
shown in Figure 3.6, 'just-in-time' processing is employed to populate the output buer
at the same rate that it supplies bits to the O-QPSK modulator of Figure 3.1, namely
at 32 bits per clock cycle. Hence, one clock cycle is employed to recursively perform the
calculations of (3.4), one clock cycle is employed to process the 128 bits and two idle
clock cycles are employed.
As shown in Figure 3.6, the proposed module employs a total of C
aug
pr2 = N=16+WN +10
clock cycles to perform interleaving and rate-1 encoding.
3.6 Energy consumption analysis
In this section, the energy consumption results of the proposed module P
pr
aug2 are inves-
tigated within the context of the overall energy consumption Etx
aug + E
pr
aug2. The eect
that integrating this module into the Chipcon CC2430 hardware [122] is estimated. This
estimation will be compared with that of [69], which considered the implementation of
the interleaver and rate-1 encoder of Figure 3.1 in software running on the Chipcon
CC2430's 8051 processor, as described in Section 3.1.
The Synopsys Design Compiler was employed to synthesise a gate-level implementation
of the module detailed in Section 3.5. The synthesis employs a STMicroelectronics
0.12 m technology standard cell library, resulting in a 1.6 mm2 bit area, including the
ROM. Synopsys PrimeTime was employed to determine the resultant implementation's
average power consumption, which was found to be P
pr
aug2 = 666:9 W, where a supplyChapter 3 A SCCC scheme for star topology WSNs 63
voltage of V = 3 Volts is assumed. Assuming that the proposed module is shut down,
when it is deactivated by placing a logic zero upon its En port, as shown in Figure 3.5.
Hence, the duration t
pr
aug2 for which the proposed module consumes current is given
by t
pr
aug2 = C
pr
aug2=f
pr
aug2, where C
pr
aug2 = N=16 + WN + 10 and f
pr
aug2 = 62:5 kHz, as
described in Section 3.5. Finally, the proposed module's energy consumption is given
by E
pr
aug2 = P
pr
aug2  t
pr
aug2, as shown in Figure 3.7.
Figure 3.7: Total energy consumed in the standard Chipcon CC2430 PHY Etx
std and
in the proposed ASIC implementation of the augmented PHY (Etx
aug + E
pr
aug2).
Figure 3.7 provides the energy consumed by the proposed module E
pr
aug2 for payloads
comprising various numbers N of bits. These energy consumptions are 76.7 { 83.4% lower
than the E
pr
aug values estimated in [69] for the case where interleaving and rate-1 encoding
are performed in software running on the 8051 processor of a Chipcon CC2430. As a
result, the energy savings [Etx
std   (Etx
aug + E
pr
aug2)] aorded by employing the augmented
PHY are increased from 17.4 { 23.3% to 24.8 { 31.4% of Etx
std, as shown in Figure 3.3.
Indeed, the energy E
pr
aug consumed by the proposed module only erodes 4.8 { 8.3% of
the transmission energy reduction (Etx
std   Etx
aug) that it facilitates, which is considered
to be an attractive engineering trade-o.
3.7 Conclusions
In this chapter, a detailed design of an augmentation to the IEEE 802.15.4 physical layer
of WSN sensor nodes has been introduced. This augmentation signicantly reduces the
transmission energy required to achieve a target FER of 10 3 at the cost of requiring
some additional processing within the sensor nodes. Using a gate-level implementation
of the design, the overall energy consumption of the augmented PHY was estimated.
The results showed that the energy consumed by the additional processing was negligi-
ble compared to the transmission energy saving that it facilitate. Indeed, the augmented
PHY facilitates a signicant reduction in the sensor nodes' overall energy consumption64 Chapter 3 A SCCC scheme for star topology WSNs
of 26.65 { 32.78%. Based on the study in this chapter, it can be concluded that with
the proper management of the additional processing energy consumption that is intro-
duced by a sophisticated ECC, such as turbo-like codes, the overall energy consumption
of sensor nodes in WSNs can be reduced. The energy saving is signicant in a star
network because the encoders tend to be simple and only consume an insignicant part
of the overall energy consumption. However, in order to apply the same philosophy in
decode-and-forward multi-hop networks [132], the decoder must be considered as part of
the energy trade-o between the transmission and processing energy consumptions. In
the following chapters, the widely used turbo decoder, the Look-Up Table based Loga-
rithmic Bahl-Cocke-Jelinek-Raviv (LUT-Log-BCJR) decoder considered in this regard.
Its energy-ecient hardware design and its eect on the overall energy consumption in
a WSN are fully investigated.Chapter 4
Extrinsic information transfer
chart based xed-point turbo
code parameter design
4.1 Introduction
As discussed in Section 1.4.2, when turbo-like codes are applied in multi-hop Wire-
less Sensor Networks (WSNs), decoding-and-forwarding is required at the sensor nodes.
Hence, additional energy consumption is imposed. As a result, the overall energy con-
sumption can only be reduced, when the transmission distance is suciently high, so
that the transmission energy saving achieved by the employed Forward Error Correc-
tion (FEC) code is suciently high to overcome the additional energy consumption of
the decoders. This challenging topic was investigated in [1,2] for the sake of extending
the life time of WSNs. Therefore, the energy consumption of the decoder's hardware
implementation became an important factor in the design of a wireless sensor node. In
this chapter, an important issue that is directly related to the energy consumption of
a decoder's hardware implementation, namely the choice of xed-point operand-word
lengths is considered. An EXtrinsic Information Transfer (EXIT) chart based method
is proposed for investigating the trade-os between the operand-word lengths of the de-
coder and the achievable decoding performance. In contrast to the conventional method
of using Bit Error Rate (BER) simulations for solving this design problem, the proposed
method provides deeper insights into the specic causes of the performance degrada-
tions encountered. Additionally, the EXIT chart based method is less time-consuming
than the BER based method, because the latter requires the consideration of a range of
channel Signal-to-Noise Ratios (SNRs) in the associated Monte-Carlo simulations.
6566 Chapter 4 EXIT chart based xed-point turbo code parameter design
The performance of the turbo-like codes is usually evaluated based on oating-point
simulation during the design process. However, in practical implementations, for rea-
sons of energy eciency, a xed-point number representation would be preferable for
Digital Signal Processors (DSPs), Field-Programmable Gate Arrays (FPGAs) or for
Application-Specic Integrated Circuit (ASIC) implementations [133]. This is because
compared to oating-point implementations, xed-point implementations facilitate sig-
nicant energy consumption reductions at a modest performance degradation [134].
As discussed in Section 2.1, one of the most important advantages of using the Look-Up
Table based Logarithmic Bahl-Cocke-Jelinek-Raviv (LUT-Log-BCJR) algorithm is the
reduced dynamic range of both its internal variables and of the Logarithmic Likelihood
Ratios (LLRs). In practice, this allows a xed-point representation to be used. In
xed-point implementations, the hardware complexity typically increases linearly with
the internal operand-width of the data, since the operand-width determines the size of
all the data paths and of the computing resources in the architecture [112]. Moreover,
the iterative decoding process of turbo-like coding schemes requires a large amount
of memory for storing the LLRs and the internal variables. Using less bits for each
LLR and each variable is capable of signicantly reducing the memory requirement,
hence reducing the energy consumption of the decoder. Therefore, for a low-power
implementation, minimising the number of bits required for representing the xed-point
quantities in the algorithm is an important issue. However, the information lost due
to reducing of the operand-width lengths will degrade the performance. Hence, there
is a trade o between the performance attained and the hardware complexity imposed,
which has to be carefully considered, when aiming for a low-power design.
Various previous studies investigated the xed-point implementation of turbo decoders
by exploring the minimum word lengths of the dierent quantities to ensure a tolerable
BER/Frame Error Rate (FER) degradation [133{141]. However, no universal conclusion
has been obtained. Even though some of the studies considered the same specications,
namely those of the Universal Mobile Telecommunications System (UMTS)/Long Term
Evolution (LTE) turbo decoder combined with Binary Phase-Shift Keying (BPSK) for
transmission over an Additive White Gaussian Noise (AWGN) channel, the conclusions
were somewhat dierent [133, 134, 136, 138, 140, 141]. This is because, in xed-point
implementations, a range of dierent issues aect the achievable decoding performance
and diverse techniques were investigated for dealing with these issues.
The performance degradations caused by xed-point implementations are primarily im-
posed by the underow and overow phenomena. In the context of underow, the
operand-fraction accuracy limits the overall computational accuracy of the calculations
in the algorithm. Since the Jacobian logarithm in the LUT-Log-BCJR algorithm [142]
is realised using piece-wise linear approximation stored in a Look-Up Table (LUT), the
precision of the xed-point representation is directly limited to the number of entries in
the LUT. As discussed in Section 2.2.2, the max* operator of the Jacobian algorithm isChapter 4 EXIT chart based xed-point turbo code parameter design 67
dened as:
max*(p;q) = ln(ep + eq) (4.1)
= max(p;q) + ln(1 + e jp qj) (4.2)
= max(p;q) + fc(jp   qj); (4.3)
where the function fc = ln(1 + e jp qj) may be readily approximated using a LUT.
The number of entries (the quantised outputs) in the LUT is determined by how many
quantised levels are dened by the specic xed-point representation for covering the
value range of the function fc, 0 < ln(1+e jp qj) < ln(2). The xed-point quantisation
of fc is shown in Figure 4.1. For example, using a z = 3-bit fraction-word length gives
Figure 4.1: The oating-point correction function fc and its xed-point implementa-
tions using dierent fraction operand-widths.
a LUT resolution of 0.125, hence the LUT has the following seven elements, f0, 0.125,
0.25, 0.375, 0.5, 0.625, 0.75g. Similarly, a z = 2-bit fraction-word length gives a four-
element LUT and a z = 1-bit fraction-word length denes a two-element LUT, as shown
in Figure 4.1. Hence, the fraction-word length aects not only the width of databus, the
computing resources and the memory requirements, but also the size of the LUT used.
In contrast to the above-mentioned underow-issues, the occurrence of overow depends
on both the dynamic range of the variables and on the number of bits assigned to
the integer part of the xed-point representation. In the event of overow, the lost
information might become fatal for the success of the decoding all together. However,
the dynamic range of the variables is dicult to predict and sometimes becomes quite
wide, hence requiring a large number of bits in xed-point representations to guarantee
that the entire range is covered. In LUT-Log-BCJR decoders, there are only three68 Chapter 4 EXIT chart based xed-point turbo code parameter design
dierent operations in the Add-Compare-Select (ACS) operations. The `compare' and
`select' operations cannot induce any overow. However, the `add' operations might
result in overows. Consider the LUT-Log-BCJR algorithm of Section 2.2.2. In the
decoding trellis of Figure 2.13, each  is given by the sum of two , an  value from
the previous step and the correct function fc of Equation 4.3. Since it includes an 
from the previous step, the calculation of  is constituted by an accumulation of the
 values in the trellis. Therefore, the values of  would increase without limits as the
coded block-length is increased. The overow imposed by having a limited word length
is the most signicant eect to be considered. The calculation of  imposes the same
problem. According to Equation 2.17, the calculation of  is constituted by the sum of
,  and , which might also result in an overow. To deal with these issues, a number
of dierent techniques have been proposed [136,139].
The rst approach is to arrange for the saturation of the over owing data during its
processing. This method is widely used in xed-point digital lters [143,144]. A dis-
advantage of this approach is that it requires some additional hardware within each
computing element that might cause an overow, such as adders. The simulation re-
sults of Section 4.3 will demonstrate that in its own right this technique is unsuitable
for the LUT-Log-BCJR algorithm when applied in isolation, but it can work well in
collaboration with a second technique, namely with normalisation [136].
More particularly, normalisation is applied in the context of the LUT-Log-BCJR al-
gorithm for the sake of dealing with the overow on the  and  internal variables.
It scales down the increasing metrics during each step, in order to prevent them from
increasing without a bound. This reduces the occurrence probability of overows and
allows the word length required for representing the variables to be further reduced.
As discussed above, the  and  values are accumulated, as traversing through in the
decoding trellis. For example, a decoding trellis of the UMTS/LTE turbo decoder is
given in Figure 4.2. Based on the algorithm described in Section 2.2.2, (S4), (S5),
(S6) and (S7) of Figure 4.2 accumulate from (S2) and (S3), which in turn ac-
cumulate from (S1). This accumulation continues as the forward recursion proceeds,
with subsequent  values increasing without limits and hence potentially resulting in
overows. However, the decoding results do not depend on the absolute values of 
| they only depend on the dierence between the  values of the states at the same
trellis stage [139]. For example, the calculation of the extrinsic LLR x2 in Figure 4.2 is
insensitive to the specic values of (S4), (S5), (S6) and (S7), but sensitive to the
dierence (S4)   (S5), (S4)   (S6), (S4)   (S7) and so on. The same conclu-
sion can also be applied to the  values. Therefore, the above-mentioned normalisation
technique can be applied for controlling the dynamic range of the path metrics without
aecting the decoding performance. This normalisation may be achieved by subtracting
a constant from all the path metrics at a specic trellis stage [136,139,145]. A range
of, dierent approaches have been used in previous studies. In [136], the path metricsChapter 4 EXIT chart based xed-point turbo code parameter design 69
0 0 0
0 0 1
0 1 0
0 1 1
1 0 0
1 0 1
1 1 0
1 1 1
T2
State1
State2
State3
State5
State6
State7
State8
State4
0/0 1/1 1/0 0/0 1/1 0/0 0/1 1/1 xi/yi
State
T1 T3
T4
T5
T6
S2 S4
S3
S7
S6
S5
S1
Figure 4.2: Possible accumulation routes in the an example decoding trellis of the
UMTS/LTE turbo decoder.
were reduced by an amount corresponding to the minimum path metric at each trellis
stage. By contrast, in [139], the path metrics were reduced by the maximum amongst
them at each trellis stage. Both of these two approaches require extra computations for
nding the minimum or the maximum path metric and for performing the subtractions.
In [145], a modied version was advocated, where instead of searching for the smallest
or largest metric at each step, the rst trellis state metric is subtracted at each trellis
stage. All the dierent approaches required dierent word lengths in the conclusions.
A third approach proposed in [136] explores the nature of the two's complement rep-
resentation, namely that of the wrap-around technique. The basis of the wrap-around
technique is that the decoding results of the LUT-Log-BCJR only depend on the dif-
ference between the path metrics, as discussed above. To demonstrate the wrap-around
nature of the two's complement representation, three examples of the dierence calcula-
tion are portrayed in Figure 4.3. Note that in the calculation of (1+3)-2 and (2+2+2)-3,
both results in the brackets are overown in a 3-bit two's complement representation,
but the equivalent calculation in two's complement representation still gives the correct
answer, as long as the result does not overow. Therefore, even the calculation result
of a path metrics is overowed, as long as the dierence between the two involved path
metrics is still within the dynamic range of the chosen xed-point parameterisation, the
decoder can still give the correct result. However, this technique has its limitation on
the tolerance of overow. For the third calculation, the dierence in the last calculation
step is overown. In this situation, the two's complement representation fails to provide
the correct result, as shown in Figure 4.3. This technique was rst introduced for Viterbi
decoders in [146] and then it was also invoked for Soft-In Soft-Out (SISO) decoders [146].
In the LUT-Log-BCJR algorithm, it can be shown that all possible dierences between
the pairs of path metrics are upper bounded [146]. Therefore, as long as the dierence70 Chapter 4 EXIT chart based xed-point turbo code parameter design
000
100
010 110
111 001
011 101
(1+3)-2=2
(001+011)-010=100-010=100+110=010=2
(2+2+2)-3=3
(010+010+010)-011=110-011=110+101=011=3
(2+2+2)-1=5
(010+010+010)-001=110-001=110+111=101=-3
-2
0
2
-4
-1 1
-3 3
Figure 4.3: Example of dierence calculation in two's complement representation.
between two metrics is not above the largest possible value that can be represented by
the specied word length in two's complement representation, the subtraction can be
performed correctly using modulo 2n arithmetic by simply ignoring the overow of the
operands. The advantage of the wrap-around technique is that no additional hardware
requirements are imposed. According to [136], one or two more bits may be required for
this approach compared to the subtractive normalisation and saturation, because it only
has a limited capability of tolerating overows, as demonstrated by the third example of
Figure 4.3. The trade-o between introducing extra hardware for the normalisation/sat-
uration and using the wrap-around technique has to be carefully considered, depending
on the turbo code specications.
In conclusion, to implement a LUT-Log-BCJR algorithm in xed-point representation,
the dierent techniques have dierent word length requirements. In the similar previous
contributions of [133{138,140,141], the dierent environment, design and implementa-
tion congurations lead to dierent conclusions. A brief summary of the congurations
of the above eight papers is provided in Table 4.1. Three similar turbo codes are con-
sidered in these papers, as shown in Figure 4.4. The type-2 conguration corresponds
D D D
D D D
D D D
D D D D D D
D D D
π π
π
Type-1 Type-2 Type-3
Figure 4.4: Three dierent types of turbo codes investigated in the previous studies
of Table 4.1.
to the UMTS/LTE turbo encoder discussed in Chapter 2.
Only a few papers discussed the eects of nite word length based on mathematical tech-
niques [136,138,139]. However, it was shown in [136,139] that mathematical techniques
are inadequate for deciding the desirable word lengths in practice. More specically,
mathematical techniques are capable of determining the upper bounds of the path met-
rics that will never be exceeded [139], nonetheless, practical BER/FER simulation resultsChapter 4 EXIT chart based xed-point turbo code parameter design 71
Authors [135] [138] [133] [136]
Encoder Type-1 Type-2 Type-2 Type-2
Modulation BPSK BPSK BPSK BPSK
Channel AWGN AWGN AWGN/Rayleigh AWGN/Rayleigh
Interleaver helical N/A N/A 3GPP compliant
Block length (bit) 216 4828 600 600
Iteration times 5 10 5/10 5/7/10
Overow control Normalisation Saturation N/A Normalisation
Saturation
Look-Up-Table 16 elements 22 elements 7/10 elements N/A
Authors [134] [141] [137] [140]
Encoder Type-2 Type-2 Type-3 Type-2
Modulation BPSK BPSK BPSK BPSK
Channel AWGN AWGN AWGN/Rayleigh AWGN/Rayleigh
Interleaver block prime N/A N/A ideal
Block length (bit) 1024 N/A 640 2896
Iteration times 3/8 5/8 N/A 7/18 half
Overow control Saturation N/A N/A Normalisation
Look-Up-Table 2 elements 7 elements 2/4/8 elements N/A
Table 4.1: A summary of previous studies on xed-point parameterisations of turbo
decoders.
show that the further reduction of the word lengths beyond the theoretical bounds can
be tolerated in practice, resulting only in a modest reduction of the achievable per-
formance [136]. As a result, the appropriate xed-point parametrisation of a decoder
cannot be carried out purely on the basis of mathematical analysis.
Traditional BER/FER Monte-Carlo simulation based methods are time-consuming. Ad-
ditionally, the BER/FER curves do not provides any insight into the iterative decoding
convergence of the process. The dierent types of variables in a decoding scheme tend
to have dierent desirable word length requirements. Hence, a large number of com-
binations have to be tested with the aid of time-consuming simulations to identify the
desirable settings. Hence, considering the eects of dierent techniques with the aid of
Monte-Carlo simulations is on the verge of being unfeasible. In this chapter, an EXIT
chart [100] analysis based technique is proposed for determining the desirable xed-point
parameterisations of turbo-like decoders. The EXIT chart based investigations are nat-
urally less time-consuming compared to BER/FER simulations. Moreover, the BER
simulation results only characterises the decoding performance of a particular number
of iterations in the decoding process, while an EXIT chart explicitly characterise the
convergence behaviour of the decoder, hence allowing an arbitrary number of iterations
to be considered.
The results of Section 4.3 will demonstrate that based on the proposed EXIT chart
analysis method, not only the decoding performance can be investigated, but also the
reasons for imposing a performance degradation by a specic xed-point implementa-
tion may be identied. This insightful information assists the designers in determining
the appropriate technique of preventing any degradation, hence determining the desired72 Chapter 4 EXIT chart based xed-point turbo code parameter design
word lengths more promptly. For presenting the proposed method, the 3rd Generation
Partnership Project (3GPP) UMTS/LTE turbo decoder's [116] desirable word length
settings are investigated. The results are compared to these of previous works. Addi-
tionally, the investigations show that the proposed method may be readily applied for
any turbo code and potentially any turbo-like code, including iterative decoding schemes
which can be analysed by EXIT charts.
As introduced in Section 2.3, the EXIT chart constitutes a powerful tool of analysing
the convergence behaviour of iterative systems, such as turbo-like decoders. Unlike
BER/FER simulations, generating an EXIT chart is less time-consuming, since the
simulation of the interleaver in the decoder and that of the actual iterative decoding
process is not required. Although the eects of a realistic nite-duration sub-optimal
interleaver cannot be explicitly revealed, an interleaver only changes the order of the
information bits, but no information is lost during the interleaving process due to a
xed-point implementation. Since the objective of the proposed method is to investigate
the performance degradation imposed by xed-point implementations, the interleaver
would not aect the results of the proposed method. Nonetheless a disadvantage of the
EXIT chart based method is that it only considers a xed SNR while a BER/FER chart
considers a wide range of SNR or Eb=N0 values. The target SNR can be chosen to be that
particular value, where the tunnel is narrow and the performance is most sensitive to
the xed-point representation's limitations. Additionally, the EXIT simulations are less
time-consuming than BER/FER simulations. Naturally, it is possible to draw dierent
EXIT charts for dierent SNRs if necessary. In summary, EXIT charts constitute a
more ecient tool than BER/FER charts, when aiming for determining the desirable
word lengths for a xed-point decoder.
The eect on EXIT chart simulations recorded for dierent word lengths of a turbo
decoder were rst provided in [147], albeit no convincing conclusions were given. Hence,
in this chapter, a detailed analysis method of using EXIT charts for determining the
desirable word length setting of xed-point implementations of turbo-like decoders is
proposed for the rst time, by providing comprehensive investigations for the design
example of the UMTS/LTE turbo decoder [116]. In Section 4.2, the proposed method is
applied for investigating the desirable word length setting for the UMTS/LTE turbo de-
coder under the comprehensive consideration of xed-point implementation techniques.
The conclusions are then compared to those of previous treatises in Section 4.3. Sec-
tion 4.4 concludes the chapter.Chapter 4 EXIT chart based xed-point turbo code parameter design 73
4.2 Extrinsic information transfer chart analysis of the
xed-point UMTS/LTE turbo Decoder
The specications and structure of the UMTS/LTE encoder and decoder pair were
presented in Chapter 2. BPSK modulated transmission over an AWGN channel was
considered, initially assuming SNR = -4 dB. As shown in Section 4.3, at this SNR,
the EXIT chart of the UMTS/LTE turbo code has a moderately open tunnel, hence the
performance degradations imposed by xed-point operands may be readily observed. An
SNR = -4.83 dB is also used, where the EXIT tunnel is almost closed, because both the
onset of the turbo cli in the EXIT chart and the BER performance are most sensitive
to the limitations imposed by the xed-point representation, which allows the desirable
operand-width to be critically appraised. Random bit sequences are fed into the input
of the turbo encoder. An interleaver length of 453-bit is used in the simulations, which is
the geometric mean of the minimum and maximum block length in the UMTS standard.
This value is opted because the performance degradation of turbo codes is proportional
to the block length [148]. The shortest (40-bit) and longest (5114-bit) frame lengths
in the UMTS Standard are also used in the simulations for investigating the eect of
the frame length on the achievable performance. Additionally, the conclusions emerging
from previous works [133{138,140,141] are summarised for the sake of demonstrating
the validity of the proposed method.
Firstly, three dierent versions of the Bahl-Cocke-Jelinek-Raviv (BCJR) algorithm are
investigated using oating-point operand representations. These three algorithms are the
LUT-Log-BCJR algorithm relying on the exact calculation of the Jacobian logarithm,
the LUT-Log-BCJR algorithm using the eight-element look-up-table (LUT) based Jaco-
bian logarithm and Maximum Logarithmic Bahl-Cocke-Jelinek-Raviv (Max-Log-BCJR)
algorithm [118]. It has been shown in [145] that the performance loss imposed by using
the Jacobian logarithm is less than 0.1 dB relative to the exact log-domain calculation,
which is usually considered acceptable. The performance degradation of the Max-Log-
BCJR UMTS/LTE turbo decoder has also been well documented [149,150]. According
to [150], the Eb=N0 performance degradation observed at a BER of 10 5 between the
LUT-Log-BCJR and Max-Log-BCJR algorithms is a moderate 0.3 dB for a block length
of 640 bits and 0.54dB for the 5114-bit block length for transmission over AWGN chan-
nels and higher in Rayleigh fading channel [149], which is considered signicant. The
purpose of this work is to have similar xed-point EXIT chart results to those of the
oating-point LUT-Log-BCJR technique, while considering an EXIT chart similar to
that of the oating-point Max-Log-BCJR unacceptable.
Secondly, the eects of having a limited fraction-part on the overall xed-point repre-
sentation are investigated. Since the limitation of fraction-part also limited not only
the accuracy, but also the number of elements in the LUT, the eects of the number of
elements in the LUT is also considered.74 Chapter 4 EXIT chart based xed-point turbo code parameter design
Thirdly, the eects of a limited integer-part on the xed-point representation are in-
vestigated. The three overow control approaches discussed in Section 4.1 are also
investigated.
Finally, based on the analysis to be provided in Section 4.3.2.4, the desirable combi-
nations of fraction-length and integer-length are investigated under the assumptions of
dierent interleaver block lengths. The eect of trellis termination techniques will also
be investigated in Section 4.3.2.4. The BER simulation results for the selected specica-
tions are also provided for validating the proposed method. The conclusions obtained by
the proposed method are then compared to those of previous studies [133{138,140,141].
4.3 Design study
4.3.1 Comparison of dierent oating-point log-domain methods
Figure 4.5 portrays the EXIT chart of the UMTS/LTE turbo decoder using the above-
mentioned three variations of the log-domain BCJR decoding algorithms, namely the
Logarithmic Bahl-Cocke-Jelinek-Raviv (Log-BCJR) algorithm, the LUT-Log-BCJR al-
gorithm and the Max-Log-BCJR algorithm. The open EXIT tunnel between the two
Figure 4.5: The EXIT chart of three variations of the log-domain BCJR decoding
algorithm.
curves becomes narrower due to the information-loss imposed by the Max-Log-BCJR.
Therefore, given a certain number of decoding iterations, the mutual information as-
sociated with the Max-Log-BCJR remains lower than that of the LUT-Log-BCJR al-
gorithm, viewing the associated degradation from a dierent perspective, in order toChapter 4 EXIT chart based xed-point turbo code parameter design 75
obtain a certain target BER, more decoding iterations might be required by the Max-
Log-BCJR algorithm. Therefore, it can be anticipated that the BER degradation due
to the information lost by the xed-point implementation can be reected in the EXIT
chart. Further simulation results will underline this conclusion.
4.3.2 Comparison and analysis of xed-point implementations
In order to investigate the decoding performance of using xed-point operand repre-
sentations in the UMTS/LTE turbo decoder, xed-point operands are used for all the
variables in the proposed EXIT chart based investigations. As described in Chapter 2,
the operand-width specication includes those of the integer-part and of the fraction-
part. Additionally, according to the previous studies in [133{138,140,141], the dynamic
ranges, which are determined by the operand-width of the integer-part in xed-point
representation, of the input LLRs and of the internal variables of a turbo decoder have
to be specied separately for the sake of obtaining the desirable operand-widths.
As described in Section 4.1, the gradually increasing values of the internal variables is
caused by the accumulated additions of the input LLRs, which is the typical cause of the
overow during the decoding process. Therefore, clipping the input LLRs to a smaller
integer operand-width than that of the internal variables is capable of preventing an over-
ow. As a result, there are three parameters that have to be determined for computing
the operand-width specications of a turbo decoder, namely the integer operand-width
of the input LLRs, the integer operand-width of the internal variables and the fraction
operand-width of all variables in the simulation. The two integer operand-widths domi-
nate the eect of the overow during the decoding process. The fraction operand-width
determines the eect of the underow during the decoding process. The total number
of possible specications is given by all the combinations of the legitimate values of
these three parameters. In order to reduce the number of specications that have to be
characterised, the operand-widths of both the integer-part and of the fraction-part are
investigated separately.
Firstly, a long operand-width of 32 bits is assumed for both the integer operand-widths,
while using a limited fraction operand-width for investigating the performance of a lim-
ited operand-precision. Secondly, the opposite scenario is considered in order to investi-
gate the degradation imposed by the limited dynamic range. For the sake of presenting
the simulation results, the notation FP(x, y, z)-Log-BCJR is used for specifying the
parameters of the xed-point approximation in the Log-BCJR decoder. In the above
notation, z is the fraction operand-width of the xed-point representation applied, while
y is the integer operand-width, including the sign bit, of the internal variables. The pa-
rameter combination of x < y identies the integer operand-width of the input LLRs,
when they are input to the LUT-Log-BCJR decoder. This is necessary, for the sake of
mitigating the eects of overow. For example, in the FP(3, 4, 2)-Log-BCJR scenario,76 Chapter 4 EXIT chart based xed-point turbo code parameter design
an a priori LLR having the value of 1011.11 would be clipped to 1100.00, which is -4 in
its decimal representation.
Firstly, for an n-bit fraction part representation, up to 2n elements are used in the LUT,
as described in Section 4.1. The corresponding EXIT chart results are shown in Fig-
ure 4.6. The simulation results show that using a coarse 1-bit fraction-part and two
elements in the LUT imposes an observable degradation on the EXIT chart, while a
2-bit fraction-part associated with four elements in the LUT gives almost no observable
degradation in the EXIT chart. As shown in Figure 4.6, 2-bit fraction-length guaran-
Figure 4.6: The EXIT chart of xed-point LUT-Log-BCJR decoder using dierent
fraction operand-widths.
tees a virtually identical result compared to the oating point result. Similarly, a 1-bit
fraction-length also results in an EXIT chart similar to the oating-point result. The
associated degradation is lower than that imposed by the oating-point Max-Log-BCJR
algorithm. Note that having 0-bit fraction-length eectively removes the LUT, trans-
forming it from the LUT-Log-BCJR to the Max-Log-BCJR. However, the associated
EXIT chart degradation is more severe than that of the Max-Log-BCJR which is a
consequence of the low resolution used for the variables.
Considering the trade-o between the complexity imposed and the performance attained,
relying on 1-bit or 2-bit fraction-length may be deemed appropriate for most applica-
tions. Further discussion are deferred, until the combined simulation results associated
with a limited fraction and integer lengths are considered. The BER analysis of dier-
ent fraction lengths is given in [134,138,141]. Both [134,138] concluded that a 2-bit
fraction-length is capable of approaching the performance of the oating-point decoder.
Nonetheless, the authors of [141] suggested that a 3-bit fraction-length provides a bet-
ter performance, which only incurs a SNR penalty of 0.015 dB. The simulation resultsChapter 4 EXIT chart based xed-point turbo code parameter design 77
of [138] demonstrated that a 1-bit fraction length only causes a loss of 0.1 dB for medium-
to-low SNRs, but has no detrimental consequences on the error oor performance. In the
eight papers studies considered [133{138,140,141], ve opted for a 2-bit fraction-length
as the most desirable choice and three of them chose a 3-bit fraction-length.
Numerous authors have investigated the desirable operand-width of the dierent internal
variables much as the input LLRs, , ,  and . However, in practice, it is inconvenient
to store and process dierent variables using dierent accuracy [140,141]. Although
the authors of [133] claimed that further operand-width minimisation applied to the
dierent variables is capable of reducing the switching activity, which has an inuence
on the energy consumption, as the integration density increases, the associated total
power consumption contribution by the dynamic power component becomes smaller
and smaller. Thus, the benets of reducing the operand-width of the dierent variables
considered is reduced. A further potential disadvantage of dierent operand-widths
is that the employment of such a strategy requires additional extension and clipping
mechanism for the data-buses in the datapath, which increases the design complexity.
Therefore, a single operand-width setting is considered. Nonetheless, it is necessary
to consider the input LLRs and the internal variables of the SISO decoder separately,
because the limited accuracy of the input LLRs directly aects the dynamic range of
the internal variables. According to [138], the possible dierences  and  between the
pairs of path metrics MAX (i.e. the possible dierences between the  values or 
values in the decoding trellis), plays an inuential role in the LUT-Log-BCJR decoder,
as discussed in Section 4.1, which are upper-bounded by the following function of the
input LLRs' dynamic range:
MAX = min[w  Mu + dmin(w)  Mc]; (4.4)
where dmin(w) is the minimum Hamming-weight of the bit sequences generated by the
input bit sequences having a weight of w, Mu and Mc are the dynamic ranges of the
SISO decoder's two inputs, namely of the extrinsic information and of the LLRs output
by the soft demodulator. Hence, dmin(w) depends on the bit sequence considered. Fur-
thermore, Mu and Mc are simply related to the operand-width of the integer part
of the input LLRs. As discussed in the previous section, it is important to keep the
dierence between pairs of metrics within a limited range for the sake of maintaining
an unimpaired performance. However, the dierent overow control techniques require
dierent operand-widths for guaranteeing this condition. Therefore, based on the dis-
cussions in [138], the operand-width of the internal variables typically requires more
bits than that of the input LLRs. As a result, the operand-width specications of the
xed-point representation based turbo decoders has to be carefully considered on an
individual basis for any specic code, because the specications derived for any specic
turbo code cannot directly applied to another code.78 Chapter 4 EXIT chart based xed-point turbo code parameter design
Furthermore, given the dierent operand-width settings for the input LLRs and the
internal variables, the conversion between dierent representations has to be carefully
managed. Extending a shorter operand-width to a longer one would not induce any
problem since the values remain unaltered. More explicitly, an extension mechanism
simply attaches zeros to solve this problem. This may be readily realised in hardware
and no extra operations are required. However, conversion in the other direction may
inict a performance degradation. Moreover, the Most Signicant Bit(MSB) in two's
complement representation determines the sign of the operand. Hence, simply ignoring
the extra bits during the conversion may in fact change the polarity of the operand,
which would signicantly aect the success of the decoding process. Hence, a clipping
mechanism associated with saturation is required during the conversion. If the original
Decoder 1
Decoder 2 clip
clip
clip
clip
clip
clip
clip
e dc
e ae
e ap
e aa
e ba e be e bc
e cc
e fc
e ac
e ec
π−1 π π
Figure 4.7: The schematic of the UMTS/LTE turbo decoder relying on operand-
clipping.
operand value is over the aordable operand-width, the converted operand must be set
to the maximum legitimate value. This method requires extra hardware in practice and
extra operations in the simulations. Figure 4.7 portrays the schematic of the clipping
operations used in the decoder.
4.3.2.1 Wrapping technique
As discussed in Section 4.1, the two's complement representation can naturally prevent
the errors caused by the overows for LUT-Log-BCJR algorithm, since the overowed
data may be considered as being wrapped in a circle. Hence the dierence between two
operands remains the same. The benet of this wrapping technique is that no extra
hardware is required. Therefore, it is suitable for the scenarios where sucient memory
is available or a low-dimensional datapath is required. Note however that the clippingChapter 4 EXIT chart based xed-point turbo code parameter design 79
of the input LLRs is still required. The wrapping technique detailed in Section 4.1 is
only suitable for the calculation of the internal variables.
Figure 4.8 shows the associated EXIT chart results for the FP(5, 7, 32)-Log-BCJR,
FP(4, 7, 32)-Log-BCJR and FP(3, 7, 32)-Log-BCJR scenarios. The simulation results of
Figure 4.8: The EXIT chart for dierent integer operand-widths, including FP(5, 7,
32)-Log-BCJR, FP(4, 7, 32)-Log-BCJR and FP(3, 7, 32)-Log-BCJR, in conjunction
with the wrapping technique.
Figure 4.8 suggest that for the FP(5, 7, 32)-Log-BCJR algorithm almost no degradation
is observed in the EXIT chart compared to the oating-point result. By contrast, the
EXIT chart result of the FP(3, 7, 32)-Log-BCJR scenario indicates that the tunnel closed
before the curves reached the (1,1) point, which suggests that the decoding performance
would be signicantly degraded.
The FP(5, 7, 32)-Log-BCJR and FP(4, 7, 32)-Log-BCJR scenarios provide dierent
conclusions based on the EXIT chart simulations, which are shown in Figure 4.9. More
specically, Figure 4.9 is a zoomed version of Figure 4.8. For the FP(5, 7, 32)-Log-
BCJR algorithm, the EXIT function Ie(Ia) reaches a peak value at a certain Ia abscissa
values and then starts decreasing, which results in a closed tunnel. Hence, the BER
performance of the FP(4, 7, 32)-Log-BCJR scenario cannot compete with that of the
FP(5, 7, 32)-Log-BCJR algorithm.
Note that in Figure 4.9, the EXIT tunnels of the FP(5, 7, 32)-Log-BCJR and FP(3,
7, 32)-Log-BCJR scenarios become closed in dierent ways, which reveals the dierent
reasons for their closures. For FP(3, 7, 32)-Log-BCJR, the closure is caused by the lower
gradient of the curves Ie(Ia) seen in Figure 4.9. Since the only dierence between the
FP(3, 7, 32)-Log-BCJR and FP(4, 7, 32)-Log-BCJR schemes is having a 1 bit lower80 Chapter 4 EXIT chart based xed-point turbo code parameter design
Figure 4.9: The EXIT chart's zoomed-in top-right corner extracted from Figure 4.8
for dierent integer operand-widths, including FP(5, 7, 32)-Log-BCJR, FP(4, 7, 32)-
Log-BCJR and FP(3, 7, 32)-Log-BCJR, in conjunction with the wrapping technique.
integer operand-width for the input LLRs, it may be conjectured that the lower Ie(Ia) is
due to the LLR degradation imposed by the reduced operand-width. Thus, having a 4-bit
integer operand-width is the minimum operand-width for the input LLRs. For the FP(5,
7, 32)-Log-BCJR scheme, the closure is due to the decay of the curves Ie(Ia) beyond
their peak points. The EXIT chart result shows that the bit-width specication of the
FP(4, 7, 32)-Log-BCJR is adequate for all the variables. Hence it may be inferred that
the performance degradation of FP(5, 7, 32)-Log-BCJR is due to the insucient integer
operand-width dierence between the input LLRs and the internal variables. More
explicitly, when the number of iterations is increased, the mutual information converged
by the a priori LLRs is also increased, which means that the average absolute value of the
LLRs is also increased. Due to the add and accumulate operations of the input LLRs in
LUT-Log-BCJR algorithm, having an insucient dierence between the operand-width
of the input LLRs and the internal variables 1 may cause a serious overow problem in the
calculations of the internal variables. Hence for abscissa values above 0.95 the function
Ie(Ia) starts to decrease. This reveals that the overows of the internal variables are
beyond the tolerance limit of the wrapping technique, which results in the EXIT curve
failing to reach the (1,1) point. This eect may be explicitly observed in Figure 4.10
and Figure 4.11.
Specically, for the FP(5, 7, 32)-Log-BCJR and FP(6, 7, 32)-Log-BCJR scenarios the
peak point of the curves occur in Figure 4.10 earlier due to the even more limited
operand-width dierence between the input LLRs and the internal variables. With a
1The dierence may be calculated by y   x for FP(x, y, z)-Log-BCJR.Chapter 4 EXIT chart based xed-point turbo code parameter design 81
Figure 4.10: The EXIT chart's zoomed-in top-right corner for dierent integer
operand-widths, including FP(6, 7, 32)-Log-BCJR, FP(6, 7, 32)-Log-BCJR and FP(4,
7, 32)-Log-BCJR, in conjunction with the wrapping technique.
Figure 4.11: The EXIT chart for dierent integer operand-widths, including FP(4,
7, 32)-Log-BCJR, FP(3, 7, 32)-Log-BCJR and FP(2, 7, 32)-Log-BCJR, in conjunction
with the wrapping technique.
further reduction of the integer operand-width of the input LLRs, the best performance
is achieved by FP(4, 7, 32)-Log-BCJR. On the other hand, as shown in Figure 4.11, when
the integer operand-width of the input LLRs becomes less than 4 bits, the performance
starts to degrade. However, since the dierence remains sucient, no decaying tendency
is observed for the curves of FP(3, 7, 32)-Log-BCJR and FP(2, 7, 32)-Log-BCJR. Only82 Chapter 4 EXIT chart based xed-point turbo code parameter design
the positive gradient of the curves is lower due to the insucient operand-width of
the input LLRs, which caused the closure of the open EXIT tunnel. If the internal
variables' integer operand-width is further reduced to 6 bits, the EXIT tunnel always
becomes closed before reaching the (1,1) point, regardless of the integer operand-width
of the LLRs.
In conclusion, having a 4-bit integer operand-width for the input LLRs is the minimum
acceptable setting for UMTS/LTE turbo decoders. For the wrapping technique, the
minimum dierence between the input LLRs and the internal variables is 3 bits. There-
fore, the desirable integer length setting is constituted by the FP(4, 7, 32)-Log-BCJR
scenario.
4.3.2.2 Saturation technique
The wrapping technique requires no additional operations or hardware for handling the
overow of the internal variables. Another simple overow control technique often used
in practice is the saturation technique. As mentioned before, the input LLRs are clipped
due to the reduced operand-width of the internal variables in the specication. The same
technique may also be applied to the internal variables. However, this approach has a
disadvantage, when applied to the LUT-Log-BCJR decoders. The problem is namely
that the LUT-Log-BCJR algorithm is using the distance between dierent groups of the
internal variables for determining the output of the decoder, i.e. the extrinsic LLRs,
as described before. The saturation technique clips all the overowing data values to
the maximum or minimum value that can be represented by the specied xed-point
representation. The related variation imposed on the values of the overowing data
may change the distance between two internal variables. Under extreme conditions,
when two variables overow in the same direction, both of the variables are clipped to
the maximum or minimum value of the specic xed-point representation considered,
hence the distance between them becomes 0. The simulation results of Figure 4.12
demonstrated that this problem makes the results of the saturation technique even worse
than those of the wrapping technique. Figure 4.12 portrays the simulation results for
using the saturation technique, which may be contrasted to Figure 4.8 using the wrapping
technique. Since the conditions for the input LLRs remain unchanged, the minimum
integer operand-width of them remains 4 bits. However, the required operand-width
dierence between the input LLRs and the internal variables is signicantly increased
due to the application of the saturation technique.
Although it may be observed in Figure 4.12 that for the setting of FP(4, 12, 32)-Log-
BCJR, the EXIT tunnel is closed before reaching the (1,1) point, as shown in the gure,
the point of closure is close to the (1,1) point and the EXIT curves have almost no
dierence with respect to the oating point result. Hence, it may be concluded that for
Figure 4.12 that FP(4, 12, 32)-Log-BCJR is the desirable integer operand-width settingChapter 4 EXIT chart based xed-point turbo code parameter design 83
Figure 4.12: The EXIT chart for dierent integer operand-widths in conjunction with
the saturation technique.
for the saturation technique. The corresponding EXIT chart result is almost identical
to the oating-point result in Figure 4.5. For the FP(4, 11, 32)-Log-BCJR scenario the
EXIT tunnel becomes closed far before reaching the (1,1) point.
Note in Figure 4.12 that, in contrast to Figure 4.8, the results of using the wrapping
technique, when the integer operand-width of the internal variables is reduced to 11
bits, the function Ie(Ia) falls to near 0 values soon after reaching the peak point. As
discussed in the context of Figure 4.9, the reason for forcing an EXIT chart curve (i.e.
the function Ie(Ia)) to decay is the insucient dierence of the integer lengths between
the input LLRs and internal variables. Since the EXIT chart of the FP(4, 13, 32)-
Log-BCJR scenario can reach the (1,1) point, in Figure 4.12, having a 4-bit integer
operand-width for the input LLRs is still sucient for the saturation technique. Hence,
the saturation technique increases the required integer operand-width dierence between
the input LLRs and the internal variables, as seen by comparing Figure 4.9.
As mentioned before, the decoding result only depends on the dierence between the
path metrics (i.e. internal variables), but not on their actual values. The saturation
technique clipped the overowing variables to the positive and negative limits. It may
be speculated that while the overowing internal variables clipped to their limited values,
the dierence between the path metrics becomes 0. Hence no reliable soft outputs can be
obtained. Once a number of internal variables overow, the EXIT chart curves rapidly
decay to 0, as shown in Figure 4.12 for the result of FP(4, 11, 32)-Log-BCJR scenario.
In conclusion, the saturation technique is not suitable for LUT-Log-BCJR decoders.
However, in next section, further simulation results not included here showed that the84 Chapter 4 EXIT chart based xed-point turbo code parameter design
employment of saturation is necessary in combination with the normalisation technique
for controlling the overow in LUT-Log-BCJR decoders [145]. Only a delicate combi-
nation of the saturation and normalisation techniques is capable of obtaining the most
desirable operand-width setting.
4.3.2.3 Normalisation technique
A limitation of the wrapping technique is that if the dierence between the path met-
rics exceeds the dynamic range, the associated path metric subtraction would not give
the correct result. The goal of the saturation technique is to eliminated this problem.
However, demonstrated by the simulation results seen in Figure 4.12, for the saturation
technique, it imposed another problem, namely that many of the overowing variables
are clipped to the same value. Hence, a normalisation technique is introduced for dealing
with this problem. The simulation results in this section show that given the appropriate
combination of saturation and normalisation, the integer operand-width requirement of
the internal variables can be further reduced. As the variables  and  are gradually
accumulated, the largest one of them is subtracted from them in each step. For example,
in the UMTS/LTE turbo decoder described in Chapter 2, the decoder's trellis has eight
 variables at each trellis stage of Figure 2.10. Their values depend on the specic 
and  values of the previous trellis stage of Figure 2.10. Therefore, as the  values are
accumulated along the trellis, during the decoding process, the eight  values of the
current step are reduced by the largest one of them, before calculating the  values of
the next step. Therefore, the accumulation of the  values can be decelerated. The
dynamic range of the  values may be signicantly reduced. The same process is also
applied for the calculations of the  values. The corresponding EXIT chart results are
shown in Figure 4.13. The desirable operand-width setting of the integer length is FP(4,
5, 32)-Log-BCJR, which necessitates two bits less for the internal variables.
When employing the saturation and normalisation techniques for the UMTS/LTE turbo
decoder of Chapter 2, the operand-width required for the internal variables is 1 bit lower
than for the wrapping technique of Section 4.3.2.1. The drawback of the saturation and
normalisation approach is that extra operations are required for the decoding process
in addition to the LUT-Log-BCJR algorithm. Hence, the extra hardware has to be
employed in practical implementations. Additionally, the extra calculations may also
require extra processing time, which inevitably reduces the decoding speed. As a result,
both approaches has their own benets and disadvantages, hence, they may be suitable
for dierent turbo decoders used in diverse applications. The trade-os between em-
ploying the saturation and normalisation approach or the wrapping approach have to be
carefully considered bearing in mind the dierent constraints of the specic application,
the aordable complexity, energy consumption, decoding speeds, and so on.Chapter 4 EXIT chart based xed-point turbo code parameter design 85
Figure 4.13: The EXIT chart of dierent integer operand-widths in conjunction with
the normalisation technique.
4.3.2.4 Final validation
In order to determine and validate the desirable operand-width setting for the xed-
point implementation of the UMTS/LTE turbo code, the proposed EXIT chart based
investigations considered various combinations of integer and fraction lengths. Since the
simulation results of Figure 4.6 only limited the xed-point fraction length, in the overall
study of this section, both 1-bit and 2-bit fraction lengths are considered in the nal
validation simulations. The fraction length settings combined with the desirable integer
length setting and applied for both the wrapping technique FP(4, 7, 32)-Log-BCJR
as well as for the normalisation technique FP(4, 5, 32)-Log-BCJR are evaluated here.
Moreover, for the dierent settings, dierent turbo code specications are considered,
which include the longest 5114-bit block length, the shortest 40-bit block length and the
most critical SNR value of -4.83 dB where the EXIT becomes just open.
When considering the normalisation technique of Section 4.3.2.3, the nal validation is
shown in Figure 4.14 for the longest block length, in Figure 4.15 for the shortest block
length and in Figure 4.16 for the most sensitive SNR value of -4.83 dB. According
to the results of Figure 4.14 to 4.16, FP(4, 5, 2)-Log-BCJR attains almost the same
performance, as the oating-point solution, despite considering dierent situations. By
contrast, the FP(4, 5, 1)-Log-BCJR imposes further degradations due to the combined
eects of the limited integer and fraction lengths, albeit this degradation is not as severe
as for the Max-Log-BCJR. Moreover, according to the simulation results Figure 4.14
and 4.15, the block length does not have a signicant eect on the EXIT chart. Conse-
quently, a single compromise specication can perform adequately for any block length.86 Chapter 4 EXIT chart based xed-point turbo code parameter design
Figure 4.14: The EXIT chart of 5114-bit block length relying on xed-point operand-
width representation, using the normalisation technique in comparison to the oating-
point solution.
Figure 4.15: The EXIT chart of 40-bit block length relying on xed-point operand-
width representation, using the normalisation technique in comparison to the oating-
point solution.
Therefore, it is concluded for the normalisation technique that FP(4, 5, 2)-Log-BCJR
represents the desirable operand-width setting for the UMTS/LTE turbo decoder.
The nal validation results are shown in Figure 4.17 for the wrapping technique for
the longest block length, while in Figure 4.18 for the shortest block length. Finally,Chapter 4 EXIT chart based xed-point turbo code parameter design 87
Figure 4.16: The EXIT chart of SNR = -4.83|dB/453-bit block length relying on
xed-point operand-width representation, using the normalisation technique in com-
parison to the oating-point solution.
Figure 4.19 portrays the corresponding result for the most sensitive SNR value of -
4.83 dB. It emerges from these results that the 2-bit fraction part represents the best
Figure 4.17: The EXIT chart of 5114-bit block length relying on xed-point operand-
width representation, using the wrapping technique in comparison to the oating-point
solution.
option in this case. Hence, FP(4, 7, 2)-Log-BCJR is the desirable operand-width setting
of the UMTS/LTE turbo decoder for the wrapping technique.88 Chapter 4 EXIT chart based xed-point turbo code parameter design
Figure 4.18: The EXIT chart of 40-bit block length relying on xed-point operand-
width representation, using the wrapping technique in comparison to the oating-point
solution.
Figure 4.19: The EXIT chart of SNR = -4.83 dB/453-bit block length relying on
xed-point operand-width representation, using the wrapping technique in comparison
to the oating-point solution.
Finally, a range of selected BER simulation results are provided in Figure 4.20 for validat-
ing that the EXIT chart analysis results are indeed conrmed by the BER performance
results. As shown in the gure, the BER results of both FP(1, 1, 0)-Log-BCJR and of
FP(3, 1, 1)-Log-BCJR exhibit a performance degradation compared to the oating-
point results due to having an insuciently high operand-width for both the fractionChapter 4 EXIT chart based xed-point turbo code parameter design 89
FP(4,7,2)-Log-BCJR (Wrapping)
FP(4,4,∞)-Log-BCJR (Normalisation)
FP(4,6,∞)-Log-BCJR (Wrapping)
FP(3,∞,∞)-Log-BCJR
FP(∞,∞,0)-Log-BCJR
Log-BCJR
SNR [dB]
B
E
R
-1 -1.5 -2 -2.5 -3 -3.5 -4 -4.5 -5 -5.5 -6
100
10−1
10−2
10−3
10−4
10−5
Figure 4.20: A selection of BER results for a 453-bit block length using dierent
operand-width specication and dierent overow control techniques for validating the
proposed method.
part and for the integer part of the LLRs. The BER results of FP(4, 6, 1)-Log-BCJR
conrmed with the wrapping technique and of the FP(4, 4, 1)-Log-BCJR relying on
the normalisation technique exhibit a performance degradation owning to having an
the insucient operand-width dierence between the LLRs and the internal variables.
The BER result of Figure 4.20 for FP(4, 7, 2)-Log-BCJR using the wrapping technique
shows almost no performance degradation which agrees with the conclusion given by the
proposed method.
Note that while the diverse causes of performance degradation detailed above impose
dierent eects upon the EXIT functions during the analysis, they all have the a similarly
detrimental eect upon the BER results shown in Figure 4.20. Therefore, it may be
argued that the EXIT chart analysis oers deeper insights that are not revealed by BER
analysis, hence allowing desirable parameterisations to be found without relying on a
brute-force full search across the entire parameter space.
In order to compare the results of this chapter to the selected previous work seen in
Table 4.1, the authors of [133,138,140] claimed that having a 3-bit integer operand-width
is sucient for the input LLRs. However, they only considered the input LLRs received
from the channel. In the EXIT chart simulations, the input LLRs considered constitute
the a priori input of the concatenated decoders, which includes the channel's output
and the extrinsic information gleaned from the other decoder of Figure 2.14. Since they90 Chapter 4 EXIT chart based xed-point turbo code parameter design
are both input to the concatenated decoders, it is reasonable to assume that they have
the same operand-width. Hence, the simulation results in this chapter showed that 4-bit
integer operand-width constitute a desirable setting for the input LLRs. The authors
of [137] arrived at the same conclusion concerning the integer operand-width of the
input LLRs. By contrast, the authors of [134{136,141] did not consider the input LLRs
separately. For the internal variables, the studies in [135,137] considered all the dierent
internal variables (i.e. , ,  and ) separately. The authors of [135,137] considered
eight bits for the longest integer operand-width of the internal variables, while the study
in [141] concluded that assigning 7-bit was the desirable setting, while the conclusion
of [140] was that of using six bits. The investigations of [133,134] suggested that using
ve bits was the most desirable setting. The reason for arriving at diverse conclusions is
the choice of experimental dierent circumstances, as shown in Table 4.1. For example,
the authors of [140] used normalisation in their simulations, but only at a couple of
specic processing steps in the decoding process, not at each step.
4.4 Conclusions
An EXIT chart based technique was conceived for investigating the desirable operand-
width setting of the xed-point representation used in a turbo decoder. Using the
appropriate operand-width setting is important for the hardware implementation of the
turbo decoders, since it has a direct impact on their complexity, energy consumption and
speed. The desirable codec parameterisation depends on the turbo code's design, on the
specic decoding algorithm invoked and a range of other factors. The proposed EXIT
chart analysis based technique substantially simplies the conventional BER analysis.
The employment of the proposed technique for the design of the UMTS/LTE turbo
decoder demonstrated the advantage of the proposed method compared to the conven-
tional method based on time-consuming BER analysis. Furthermore, the EXIT chart
based technique reveals the essential reasons for any specic performance degradation
caused by the inappropriately chosen operand-width specications. The investigation of
this chapter using the EXIT chart based xed-point turbo code parameter design tech-
nique reveals three reasons that may impose a performance degradation by a specic
xed-point implementation, which the conventional BER analysis failed to provide.
1. When the fraction operand-width is insucient, the decoding performance de-
grades due to reduced resolution of the LUT in the LUT-Log-BCJR algorithm.
2. When the integer operand-width of the LLRs is insucient, the LUT-Log-BCJR
decoder fails to receive enough information from the input LLRs and hence the
decoding performance degrades.
3. When the integer operand-width is not suciently longer than the integer operand-
width of the LLRs, the overows during the decoding process will become severe,Chapter 4 EXIT chart based xed-point turbo code parameter design 91
hence degrading the decoding performance. The required dierence between the
integer operand-widths of the internal variables and the LLRs varies depending on
the overow control techniques employed.
As a result, for a specic turbo decoder associated with an appropriately chosen overow
control technique, the three parameters of the xed-point implementation should be in-
vestigated individually, in the order of the fraction operand-width of all the variables,
the integer operand-width of the LLRs and the integer operand-width of the internal
variables, as demonstrated in Section 4.3. Furthermore, during the design process of the
parameter specication of a xed-point turbo decoder, the reasons for the performance
degradation high lighted by the EXIT charts can be used for nding the desirable pa-
rameterisation without time-consuming brute-force search across a vast design space.
Finally, the conclusions obtained for the UMTS/LTE turbo decoder were compared to
those of previous studies on the same subject for the sake of validating the benets of
the proposed method.
In conclusion, the ndings of this chapter indicate for the UMTS/LTE turbo decoders
with a BPSK modulation for transmission over an AWGN channel that the FP(4,7,2)-
Log-BCJR setting using the wrapping technique and the FP(4,5,2)-Log-BCJR relying
on the normalisation technique employ the lowest possible operand-widths that avoid
a signicant performance degradation and therefore oer the most attractive trade-o
between energy-consumption and performance. Note that this specic example is used
for demonstrating the proposed method and comparing with previous work. In practice,
this method can be generally applied to dierent turbo codes with dierent modulations
and channel models.
However, the operand-width setting is not the only important issue in developing a low-
complexity and hence energy-ecient turbo decoder. The next stage of the implementa-
tion process, namely the hardware architecture design of the decoder is also important.
Since turbo codes have not been considered for employment in energy-constrained ap-
plications, such as WSNs, in Chapter 5 a low complexity energy-ecient turbo decoder
architecture is proposed.Chapter 5
A low complexity energy-ecient
turbo decoder architecture
5.1 Introduction
As discussed in Chapter 1, the near Shannon limit performance of turbo-like codes can
be exploited to reduce the transmission energy consumption in a wireless communica-
tion system. Hence, they are nding applications in Wireless Sensor Networks (WSNs),
where the energy consumption is dominated by wireless communication, and the commu-
nication systems are energy-constrained [1,2,89,90]. Chapter 3 showed that turbo-like
codes can be easily applied in star topology WSNs for reducing transmission power when
only one-way message transmission from the sensor nodes to the central node is required.
This is because the high complexity turbo-like decoder is only required on the central
node, which typically has sucient energy supply, and the extra energy consumption
cost by inducing the required encoder on the sensor nodes is negligible compared with
the transmission energy reduction that it aords owing to its improved Bit Error Rate
(BER) performance. However, in some WSN scenarios, a high number of sensors and a
long average transmission distance are employed. As discussed in Chapter 1, in WSNs
designed for environmental monitoring, a sensor network including hundreds of sensor
nodes could be deployed into a large and variable environment. The sensor nodes are
required to communicate with each other over ranges from several metres to hundreds
of metres depending on the application for extended periods of time, while relying on
batteries that are small, lightweight and inexpensive. In these applications, multi-hop
network topologies or two-way communication may be applied, requiring the decoder of
the applied Error-Correcting Code (ECC) to be embedded on the energy-constrained
sensor nodes. However, the computational complexity of turbo-like decoders are typi-
cally signicantly higher than the corresponding encoders. For the typical transmission
distance of WSNs, the energy consumption of the turbo decoders may not be negligible
9394 Chapter 5 A low complexity energy-ecient turbo decoder architecture
compared with the overall energy consumption of the communication systems. There-
fore, in these cases, the transmission energy saving by introducing the turbo code can
be partially or even completely oset by the energy consumption of the decoder. In or-
der to benet from employing turbo codes, the trade-o between dierent coding gains
and computational complexities of the decoding algorithms as well as the average trans-
mission distance of the target application has to be carefully considered. In addition,
minimising the energy consumption of the decoder at the hardware implementation level
is important for improving the energy saving oered by employing turbo codes.
Compared with conventional turbo codes applications, energy-constrained applications
exploit the near Shannon limit performance of turbo codes in a dierent way. Conven-
tionally, turbo codes have found application in cellular and broadcast standards, such as
3rd Generation Partnership Project (3GPP) Long Term Evolution (LTE) [103] and Dig-
ital Video Broadcasting (DVB) [104] standards. These schemes aim for a high spectral
eciency (bit/s/Hz) and in particular a high throughput (bit/s) in order to facilitate
high data rate real-time communication. Therefore, previous Application-Specic Inte-
grated Circuit (ASIC) implementations [151{157] of turbo codes have been designed for
achieving a high processing throughput, at the cost of having a relatively high energy
consumption. On the other hand, in energy-constrained scenarios, instead of increasing
the transmission throughput towards the theoretical upper limit, turbo codes can be
employed to maintain a particular throughput and instead reduce the required trans-
mission energy towards the corresponding lower limit. The coding gain oered by turbo
codes allows them to achieve a particular transmission spectral eciency  at a lower
Signal-to-Noise Ratio (SNR) per bit Eb=N0. However, in these scenarios only a mod-
est throughput is required, since relatively low transmission throughputs of less than 1
Mb/s are typical [4,18], particularly if transmission duty-cycling is employed. Therefore,
having a low complexity and low energy decoder becomes a higher priority than having
a high throughput when applying turbo codes for the purpose of energy saving. For
this reason, the trade-o between the energy consumption and processing throughput
of ASICs iterative soft decoders designed for these energy-ecient scenarios is dierent
from that of area spectrally ecient applications. Therefore, the previously proposed
Look-Up Table based Logarithmic Bahl-Cocke-Jelinek-Raviv (LUT-Log-BCJR) decoder
architectures [151{157] are not suitable for energy-ecient applications, since they pri-
oritise throughput over energy eciency.
In this chapter, the trade o between the decoding and transmission energy consumption
in WSNs when employing turbo codes in the communication systems is investigated in
Section 5.2. The energy eciency of the conventional architecture for turbo decoder
hardware implementation is analysed in Section 5.3. Based on the investigation, a
low-complexity energy-ecient turbo decoder architecture is proposed in Section 5.4.
In Section 5.5, a LTE turbo decoder is implemented using the proposed architecture,Chapter 5 A low complexity energy-ecient turbo decoder architecture 95
its energy eciency is analysed and compared with similar previous works. Finally,
Section 5.6 concludes the chapter.
5.2 Trade o of employing turbo codes for energy saving
As mentioned in Section 5.1, when employing turbo codes in energy-constrained appli-
cations, the transmission energy Etx
b (measured in nJ/bit) can be reduced owing to the
turbo coding gain. However, the reduction in Etx
b is partially oset by the additional
energy that is consumed by the turbo decoder E
pr
b . For these reasons, turbo decoders
conceived for energy constrained scenarios are required to facilitate a low overall energy
consumption of Etx
b +E
pr
b . As discussed in Chapter 2, there are a number of variations of
the turbo decoding algorithm, which provide dierent coding gains and computational
complexities. The coding gain determines the reduction in transmission energy Etx
b and
the computational complexity is directly related to the decoding energy consumption
E
pr
b . The investigation in this section will show that the dierences between dierent
decoding algorithms can have a signicant eect on the overall energy consumption,
when employing turbo codes in wireless communication systems. In particular, this
section considers the two most popular variations of the turbo decoding algorithm, the
LUT-Log-BCJR algorithm and the Maximum Logarithmic Bahl-Cocke-Jelinek-Raviv
(Max-Log-BCJR) algorithm. As discussed in Chapter 2, The Max-Log-BCJR algorithm
has a 50% lower computational complexity than the LUT-Log-BCJR algorithm, at the
cost of 0.5 dB less coding gain in an uncorrelated Rayleigh fading channel [149]. The
much lower complexity of the Max-Log-BCJR algorithm has a signicant eect on the
hardware implementation. Table 5.1 summarises several state of the art designs for
the LUT-Log-BCJR and Max-Log-BCJR decoders of the LTE standard turbo code. As
shown in Table 5.1, when only considering the energy consumption of the decoder, the
Max-Log-BCJR decoders achieve a much higher throughput and a lower energy con-
sumption than the LUT-Log-BCJR decoders.
However, the Max-Log-BCJR decoder has a 0.5 dB BER performance degradation com-
pared to the LUT-Log-BCJR decoder, as the simulation results show in Figure 5.1. This
performance degradation requires a 0.5 dB higher transmission energy consumption, in
order to maintain the same BER in wireless communication. The absolute transmission
energy consumption can be estimated using the transmission power model of [1,2]. Here,
the environmental parameters and WSN system specications in Table 5.2 are assumed.
In the table, note that N0 = 10log(k T), where k = 1:380650310 23 is Boltzmann
constant and T = 300K is room temperature.
The path loss can be estimated by,
Pl(d)[dB] = 20log10

4


+ 10plog10(d); (5.1)96 Chapter 5 A low complexity energy-ecient turbo decoder architecture
T
a
b
l
e
5
.
1
:
C
o
m
p
a
r
i
s
o
n
o
f
t
h
e
i
m
p
l
e
m
e
n
t
e
d
T
u
r
b
o
d
e
c
o
d
e
r
.
L
U
T
-
I
m
p
-
1
L
U
T
-
I
m
p
-
2
L
U
T
-
I
m
p
-
3
M
a
x
-
I
m
p
-
1
M
a
x
-
I
m
p
-
2
P
u
b
l
i
c
a
t
i
o
n
[
1
5
4
]
[
1
5
6
]
[
1
5
8
]
[
1
5
9
]
[
1
6
0
]
A
l
g
o
r
i
t
h
m
L
U
T
-
L
o
g
L
U
T
-
L
o
g
L
U
T
-
L
o
g
M
a
x
-
L
o
g
M
a
x
-
L
o
g
B
l
o
c
k
s
i
z
e
(
b
i
t
)
5
1
1
4
5
1
1
4
5
1
1
4
6
1
4
4
6
1
4
4
T
e
c
h
n
o
l
o
g
y
(
n
m
)
1
8
0
1
8
0
1
8
0
6
5
1
2
0
S
u
p
p
l
y
v
o
l
t
a
g
e
(
V
)
1
.
8
1
.
8
1
.
8
-
1
.
2
A
r
e
a
A
(
m
m
2
)
9
1
4
.
5
8
.
2
2
.
1
3
.
5
7
(
S
c
a
l
e
d
f
o
r
9
0
n
m
)
(
2
.
2
5
)
(
3
.
6
3
)
(
2
.
0
5
)
(
4
.
0
)
(
2
.
0
)
G
a
t
e
c
o
u
n
t
(
e
x
c
l
u
s
i
v
e
o
f
m
e
m
o
r
y
)
8
5
k
4
1
0
k
6
5
k
-
5
5
3
k
M
e
m
o
r
y
r
e
q
u
i
r
e
d
(
K
b
)
2
3
9
4
5
0
1
6
1
-
1
2
9
C
l
o
c
k
f
r
e
q
u
e
n
c
y
F
(
M
H
z
)
1
1
1
1
4
5
1
0
0
3
0
0
3
9
0
.
6
D
e
c
o
d
i
n
g
i
t
e
r
a
t
i
o
n
s
1
0
8
6
.
5
6
5
.
5
T
h
r
o
u
g
h
p
u
t
T
(
M
b
/
s
)
2
1
0
.
8
4
.
1
7
1
5
0
3
9
0
.
6
P
o
w
e
r
c
o
n
s
u
m
p
t
i
o
n
(
m
W
)
2
9
2
9
5
6
3
2
0
3
0
0
7
8
8
.
9
(
S
c
a
l
e
d
f
o
r
9
0
n
m
)
(
3
6
.
5
)
(
1
1
9
.
4
)
(
4
0
)
(
7
9
6
.
4
)
(
3
3
2
.
8
)
E
n
e
r
g
y
c
o
n
s
u
m
p
t
i
o
n
(
n
J
/
b
i
t
/
i
t
e
r
a
t
i
o
n
)
1
4
.
6
1
1
.
1
1
2
.
7
0
.
3
1
0
.
3
7
(
S
c
a
l
e
d
f
o
r
9
0
n
m
)
(
1
.
8
)
(
1
.
4
)
(
1
.
5
9
)
(
0
.
8
1
)
(
0
.
1
6
)
E
t
x
b
+
E
p
r
b
(
n
J
/
b
i
t
)
w
h
e
n
t
r
a
n
s
m
i
t
t
i
n
g
o
v
e
r
3
9
m
(
5
i
t
e
r
a
t
i
o
n
s
)
1
7
.
1
6
1
5
.
1
6
1
6
.
0
6
1
3
.
4
2
1
0
.
1
7
E
t
x
b
+
E
p
r
b
(
n
J
/
b
i
t
)
w
h
e
n
t
r
a
n
s
m
i
t
t
i
n
g
o
v
e
r
5
8
m
(
5
i
t
e
r
a
t
i
o
n
s
)
4
8
.
9
2
4
6
.
9
2
4
7
.
8
2
4
9
.
8
8
4
6
.
6
3
E
t
x
b
+
E
p
r
b
(
n
J
/
b
i
t
)
w
h
e
n
t
r
a
n
s
m
i
t
t
i
n
g
o
v
e
r
1
0
0
m
(
5
i
t
e
r
a
t
i
o
n
s
)
3
6
1
.
7
3
5
9
.
7
3
6
0
.
6
4
0
9
.
0
4
0
5
.
8Chapter 5 A low complexity energy-ecient turbo decoder architecture 97
Ideal Max-Log-BCJR
Ideal LUT-Log-BCJR
SNR
B
E
R
-2.2 -2.6 -3 -3.4 -3.8 -4.2 -4.6 -5 -5.4 -5.8 -6.2
100
10−2
10−4
10−6
SNR
B
E
R
-2.2 -2.6 -3 -3.4 -3.8 -4.2 -4.6 -5 -5.4 -5.8 -6.2
100
10−2
10−4
10−6
SNR
B
E
R
-2.2 -2.6 -3 -3.4 -3.8 -4.2 -4.6 -5 -5.4 -5.8 -6.2
100
10−2
10−4
10−6
Figure 5.1: The BER performance of the LUT-Log-BCJR decoder and the Max-Log-
BCJR decoder.
Transmission frequency (f) 5.8 GHz
Power amplier eciency (A) 33%
Receiver noise gure (r) 4
Path loss exponent (p) 4
BER target 10 4
Uncoded system minimum received SNR at the target BER (S0) 34 dB
Temperature 300 K
Thermal noise (N0) -203.8 dBm
Table 5.2: Environment assumptions and system specication of the estimated WSN.
where  = c=f is the wave length of the transmission signals carrier, c = 2:998  108
m/s is the speed of light and d is the transmission distance.
The transmission SNR required to achieve the target BER when no coding is employed
is given by
Su = N0 + S0 + r + Pl + A: (5.2)
The improved SNR when employing a turbo code having a coding gain Gc is given by
Sc = Su   Gc; (5.3)
Finally, the transmission energy in nJ/bit is given by
Etx
b = 10Sc=10 (5.4)98 Chapter 5 A low complexity energy-ecient turbo decoder architecture
Based on this model, the transmission energy consumption when using the LUT-Log-
BCJR decoder and the Max-Log-BCJR decoder can be calculated and compared. To
investigate the overall energy consumption of the dierent turbo decoders, the four state-
of-the-art LUT-Log-BCJR and Max-Log-BCJR decoder implementations of Table 5.1
and their energy consumptions E
pr
b are used as examples. Their transmission energy
consumptions, Etx
b , are estimated using the described model and the simulation results
of Figure 5.1. The overall energy consumptions, Etx
b + E
pr
b , associated with using the
four dierent turbo decoder implementations are plotted as a function of transmission
range d in Figure 5.2.
Figure 5.2: The comparison of the overall energy consumptions Etx
b +E
pr
b of the ve
chosen implementations in Table 5.1.
As shown in Figure 5.2, when transmission distance is relatively low (d < 20m), the de-
coding energy consumption E
pr
b makes the major contribution to the overall energy con-
sumption. Therefore, there is no signicant increase in the overall energy consumption
as the transmission range is increased from 0 to 20 m. In these case, the Max-Log-BCJR
schemes give the lowest overall energy consumption. However, as the transmission dis-
tance is increased above 20 m, the transmission energy consumption becomes the major
contributor to the overall energy consumption. When the transmission distance is rela-
tively long (d > 75m), the improved coding gain of the LUT-Log-BCJR decoder allows
it to use a lower transmission energy, which leads to a lower overall energy consumption.
The results in Figure 5.2 show that the LUT-Log-BCJR decoders give at least a 10% re-
duction in the overall energy consumption when the transmission distance is d > 100m.
There are critical distances in the range 25m < d < 75m, where the LUT-Log-BCJR
decoders begin to oer lower overall energy consumptions than the Max-Log-BCJR de-
coders, as shown in Figure 5.3. Note that these results are based on the analysis of the
Universal Mobile Telecommunications System (UMTS)/LTE turbo code, which is notChapter 5 A low complexity energy-ecient turbo decoder architecture 99
specically designed for energy-constrained applications. As will be demonstrated in
Chapter 6, if a turbo code scheme was designed with the consideration of the overall en-
ergy consumption, the resulting decoding energy consumption E
pr
b may be signicantly
lower than the example considered above. Therefore, the dierence in E
pr
b between the
LUT-Log-BCJR and Max-Log-BCJR decoders may be smaller in such scheme, which
implies that the critical distances discussed above may be even shorter in practice.
Figure 5.3: A zoomed in version of Figure 5.2 at where the overall energy consump-
tions of the LUT-Log-BCJR decoders become lower than the Max-Log-BCJR decoders.
At the cost of requiring 10% more overall energy consumption, the Max-Log-BCJR de-
coders have a low computational complexity. The higher computational complexity of
the LUT-Log-BCJR algorithm means that they cannot achieve the same high through-
puts, which are required by the conventional turbo codes applications, such as the LTE
and DVB standards. Therefore, a 10% increase in on overall energy consumption is
a reasonable compromise for making the decoders meet the high throughput require-
ment by the applications. Since turbo codes have previously only found application in
high throughput schemes, the literature shows that, since 2003, most of the turbo de-
coder architecture research has focused on the high throughput Max-Log-BCJR decoder
implementations [159{163]. However, for low throughput energy-constrained applica-
tions, the avove mentioned compromise is not necessary. As discussed in Chapter 1,
low throughput energy-constrained applications, such as environmental WSNs, employ
a wide variety of transmission ranges. For most of these possible communication ranges,
the LUT-Log-BCJR decoders are more desirable than the Max-Log-BCJR decoder, as
shown in Figure 5.2. Figure 5.3 shows that depending on the decoding energy consump-
tion dierence between the implementations of the LUT-Log-BCJR decoder and the
Max-Log-BCJR decoder, there is a critical distance, where the LUT-Log-BCJR decoder100 Chapter 5 A low complexity energy-ecient turbo decoder architecture
yields a lower overall energy consumption than the Max-Log-BCJR decoder. Further-
more, there are some other variations of the LUT-Log-BCJR decoding algorithm, such as
the constant-Log-BCJR [164,165] and the Enhanced-Max-Log-BCJR [166] algorithms.
These variations provide dierent trade os between the computational complexity and
the BER performance. Based on the principle discussed above, since the LUT-Log-
BCJR decoder gives the best BER performance of all the variations, it can be concluded
that the LUT-Log-BCJR decoder is the most desirable type of Bahl-Cocke-Jelinek-Raviv
(BCJR) decoder for low throughput energy-constrained wireless applications, except for
when the transmission range is very short.
5.3 Conventional architecture
The previous section discussed the overall energy eciency of wireless communication
systems using the LUT-Log-BCJR and the Max-Log-BCJR turbo decoders. The LUT-
Log-BCJR decoder is more desirable for most communication ranges from the overall
energy consumption point of view, owing to its improved BER performance compared
with the Max-Log-BCJR decoder. However, for very short communication ranges, the
Max-Log-BCJR decoder oers better energy-eciency since the transmission energy
consumption is relatively low and the Max-Log-BCJR decoder tends to have a lower
decoding energy consumption, thanks to its low computational complexity. The simula-
tion results show that depending on the dierence of the decoding energy consumptions
of two types of decoders, there is a critical distance, beyond which the LUT-Log-BCJR
decoder becomes more energy-ecient. Therefore, if the decoding energy consumption
of the LUT-Log-BCJR decoder can be further reduced, not only can the overall energy
consumption can be reduced, but also the critical distance can be shortened. This max-
imises the benets of applying turbo codes in wireless communication systems using the
LUT-Log-BCJR decoder, particularly when the possible communication range of the
target scenario is highly variable from a very short distance to a very long distance. In
this section, opportunities to improve the energy eciency of the LUT-Log-BCJR and
the Max-Log-BCJR decoders are discussed. The analysis of the conventional turbo de-
coder architecture shows that due to its relatively high complexity, the LUT-Log-BCJR
decoder's energy consumption can be further reduced signicantly by introducing a new
architecture.
As described in Section 2.2.1, a turbo encoder [97] comprises a parallel concatenation
of two convolutional encoders, each of which has a structure comprising m number of
memory elements, as exemplied for m = 3 in Figure 5.4, which is the convolutional
code that is used in the LTE standard [103]. Each encoder converts an uncoded bit
sequence b1 = fb1;jgN
j=1 into the corresponding encoded bit sequence b2 = fb2;jgN
j=1,
where N is the block length of the bit sequences.Chapter 5 A low complexity energy-ecient turbo decoder architecture 101
b2
M1 M2 M3 b1
Figure 5.4: The convolutional encoder scheme employed in the UMTS/LTE turbo
decoder.
Correspondingly, a turbo decoder [167,168] comprises a parallel concatenation of two
LUT-Log-BCJR decoders. Rather than operating on bits, each LUT-Log-BCJR decoder
processes Logarithmic Likelihood Ratios (LLRs) [97], where each LLR ~ b = ln
P(b=0)
P(b=1)
quanties the decoder's condence concerning its estimate of a bit b from the bit sequence
b1 or b2. Each LUT-Log-BCJR decoder processes two a priori LLR sequences, namely
~ ba
1 = f~ ba
1;jgN
j=1 and ~ ba
2 = f~ ba
2;jgN
j=1, which are converted into the extrinsic LLR sequence
~ be
1 = f~ be
1;jgN
j=1. This extrinsic LLR sequence is iteratively exchanged with that generated
by the other LUT-Log-BCJR decoder, which is used as the a priori LLR sequence ~ ba
1
in the next iteration [118].
A LUT-Log-BCJR decoder typically employs the sliding-window technique [169] to gen-
erate the LLR sequence ~ be
1 as the concatenation of ws number of equal length sub-
sequences. Each of these windows is generated separately, using a forward, a preback-
ward and a backward recursion, as shown in Figure 5.5. These three dierent recursions
are performed concurrently for three dierent windows, as exemplied in Figure 5.5 (b)
for ws = 5. This schedule results in the completion of the windows in their natural order,
starting with that containing the rst LLR ~ be
1;1 and ending with the one containing the
last LLR ~ be
1;N.
When the forward recursion is performed for a particular window, one pair of its corre-
sponding a priori LLRs ~ ba
1;j and ~ ba
2;j is read from Mem 1 of Figure 5.5 (a) and processed
per clock cycle, in the ascending order of the bit index j. The forward recursion of
the LUT-Log-BCJR algorithm can be performed in two pipelined steps using the corre-
sponding dedicated hardware components of Figure 5.5 (a). The notations used in this
chapter are described in Chapter 2.
1. Firstly, the  values that correspond to the current window are generated. Here,
each  value i(T) corresponding to one transition in the decoding trellis is set
either equal to the corresponding a priori LLR ~ ba
i;J(T) or to zero, depending on
the particular pair of states that the transition T is between and on the Generator
Polynomials (GPs) of the encoder, as discussed in Section 2.2.2.
2. Next, the  values that correspond to the current window are generated. Here,
each  value corresponds to a state S in the current step in the decoding trellis.102 Chapter 5 A low complexity energy-ecient turbo decoder architecture
window 3
window 4
window 5
window 1
window 2
(b)
bit
N
T 0
time
γ unit Mem 2
γ unit
LLR unit ˜ be
1 γ unit δ unit β unit
backward recursion
˜ ba
1
˜ ba
2
forward recursion
pre-backward recursion
backward recursion
α unit
β unit
pre-backward recursion
forward recursion
Mem 1
(a)
Figure 5.5: (a) The conventional LUT-Log-BCJR architecture. (b) The timing sched-
ule of the sliding-window technique.
The calculation of  is given by
(S) =

max
T2to(S)
(T)
 

 
fr(T)

+
k+n X
i=1
i(T)
!
; (5.5)
where, T 2 to(S) represents the set of transitions T that end at S, depending on
the GPs of the encoder. Note that the forward recursion for the rst window is
initialised independently. By contrast, the forward recursion for the other windows
is initialised using  values that were obtained during the forward recursion of
the preceding window. It is for this reason that the windows must be processed
in order, as shown in Figure 5.5. The max* operation is used to represent the
Jacobian logarithm detailed in [170,171], which is dened for two parameters p
and q as
max*(~ p; ~ q) = max(~ p; ~ q) + ln[1 + exp( j~ p   ~ qj)] (5.6)
and can be extended to three or more parameters using associativity.
One set of  values is written to Mem 2 of gure 5.5 (a) per clock cycle in the ascending
order of the bit index j.Chapter 5 A low complexity energy-ecient turbo decoder architecture 103
When the backward recursion is performed for a particular window, one pair of its
corresponding a priori LLRs ~ ba
1;j and ~ ba
2;j is read from Mem 1 of Figure 5.5 (a) and
processed per clock cycle, in the descending order of the bit index j. Simultaneously,
the corresponding set of  values are read from Mem 2 and processed per clock cycle.
As a result, a particular window's backwards recursion cannot be performed until after
its forward recursion has been completed, as shown in Figure 5.5 (b). The backward
recursion of the LUT-Log-BCJR algorithm can be performed in four pipelined steps
using the corresponding dedicated hardware components of Figure 5.5 (a).
1. Firstly, the  values that correspond to the current window are re-generated, as
described above.
2. Next, the  values that correspond to the current window are generated. Here,
each  value is given by
(S) =

max
T2fr(S)
(T)
 

 
to(T)

+
k+n X
i=1
i(T)
!
: (5.7)
Note that the backward recursion for the last window is initialised independently.
By contrast, the backward recursions for the other windows are initialised using
 values that were previously obtained during the prebackward recursion of the
next window. This is achieved using step 1 and 2 of the backward recursion and
initialising the latter independently. It is for this reason that the prebackward
recursions of Figure 5.5 (b) are performed before the backward recursions of the
preceding windows.
3. Next, the  values that correspond to the current window are generated, according
to
i(T) = 
 
fr(T)

+ 
 
to(T)

+
 k+n X
i0=1;i06=i
i0(T)

(5.8)
4. Finally, the value of each extrinsic LLR in the current window of the sequence ~ be
1
is generated according to
~ be
1;j = max*
T
  
B1(T)=0
J(T)=j
 
1(T)

  max*
T
  
B1(T)=1
J(T)=j
 
1(T)

: (5.9)
A typical conventional LUT-Log-BCJR architecture is based on Figure 5.5 (a). By
implementing each module in the gure individually, the forward, pre-backward and
backward recursions are performed by separate dedicated hardware blocks. This archi-
tecture is able to generate one extrinsic LLR per clock cycle, by following the processing
schedule of Figure 5.5 (b). Moreover, some techniques to improve the throughput based
on the architecture have been developed. For example, parallel processing can be used
to achieve a very high decoding throughput, by including more than one architecture104 Chapter 5 A low complexity energy-ecient turbo decoder architecture
of Figure 5.5 (a) in the decoder [172]. A radix-4 architecture has been proposed, which
increases the throughput of the LUT-Log-BCJR decoder by executing more decoding
operations per clock cycle [156]. However, since these techniques aim for improving the
throughput and not the energy eciency, they are not discussed any further in this
section.
As described above, the conventional LUT-Log-BCJR architecture executes as many
operations as possible in one clock cycle by pipelining the forward, prebackward and
backward recursion according to Figure 5.5 (b). To achieve this, the recursions involve
calculations that must be performed in series, resulting in a very long critical path in
the hardware implementation. As a result, the conventional architecture cannot achieve
a high energy eciency for three reasons. Firstly, the lengthening of the critical path
implies a greater variety of data path lengths. The dierences between the data path
lengths in the circuit can cause signicant energy wastage owing to spurious transi-
tions (glitches) [173]. Spurious transitions can account for a signicant part of the
dynamic power in a ASIC implementation [174]. Reducing spurious transitions requires
the lengths of the paths that converge at each gate in the circuit to be roughly equal.
Secondly, a long critical path prevents the decoder from using a high clock frequency. To
implement the conventional LUT-Log-BCJR architecture at a high clock frequency, it
is necessary to employ additional hardware during the synthesis in order to shorten the
critical path. This is achieved by employing more complicated functional circuits such
as the `look-ahead adder' to minimise their long critical paths. This increases the size of
the datapath, resulting in a higher energy consumption. On the other hand, operating
at a lower clock frequency would make the utilisation of the circuit lower. This would
cause hardware resources to idle for longer, increasing the static energy consumption.
The energy wasted by the static energy consumption becomes more and more critical
when the process technology is scaled down [175]. Thirdly, the high complexity of the
conventional architecture due to its dedicated hardware for dierent tasks increases the
requirements of the clock tree and the buers for multiple loaded input signals. Hence,
this may become a signicant additional consumer of energy in the decoder. In sum-
mary, the conventional architecture naturally results in energy wastage, since it is not
designed with consideration of the energy eciency optimisation as a high priority.
5.4 Proposed LUT-Log-BCJR architecture
In this section, a novel LUT-Log-BCJR architecture is proposed for energy-constrained
scenarios, which avoids the wastage of energy that is inherent to the conventional archi-
tecture of Section 5.3. The philosophy of the proposed architecture is to redesign the
timing of the conventional architecture in a manner that allows its components to beChapter 5 A low complexity energy-ecient turbo decoder architecture 105
eciently merged. This produces an architecture comprising only a low number of low-
complexity functional units, which are collectively capable of performing the entire LUT-
Log-BCJR algorithm with a high hardware utility. Further wastage is avoided since the
critical paths of the proposed functional units are naturally short- and equally-lengthed,
eliminating the requirement for additional hardware to manage them. Furthermore, the
proposed approach naturally results in a low area and a high clock frequency, which
implies a low static energy consumption.
5.4.1 Decomposition of the LUT-Log-BCJR algorithm into Add-Compare-
Select operations
As discussed in Section 2.1, the LUT-Log-BCJR algorithm is attractive in practical hard-
ware implementations because the low dynamic range of its variables allows them to be
represented using xed-point binary numbers, which are associated with a low hard-
ware complexity. It is demonstrated that 9-bit two's complement xed-point binary
numbers including z = 2 fraction bits are sucient to approach the LUT-Log-BCJR
decoding performance that is oered by 64-bit oating-point numbers. In addition,
it is unnecessary to prevent overow in xed-point implementations of the LUT-Log-
BCJR because it is inherently resilient to the resultant error. Besides its suitability
for xed-point representation, the LUT-Log-BCJR is attractive in practical implemen-
tations of turbo decoders because it only requires calculations that can be decomposed
into Add-Compare-Select (ACS) operations, which are associated with a low computa-
tional complexity. More specically, the LUT-Log-BCJR algorithm can be performed
entirely on the basis of three calculations, namely addition, subtraction and the max*
calculation of Equation 5.10.
Note that, in the case where z = 2 fraction bits are used in a xed-point representation,
the right hand term of the max* calculation from Equation (5.6) can be approximated
using a Look-Up Table (LUT) comprising 2z = 4 entries according to
max*(~ p; ~ q) = max(~ p; ~ q) +
8
> > > > > > <
> > > > > > :
0:75 if j~ p   ~ qj = 0
0:5 if j~ p   ~ qj 2 f0:25;0:5;0:75g
0:25 if j~ p   ~ qj 2 f1;1:25;1:5;1:75;2g
0 otherwise
; (5.10)
Note that all of the values used in the Look-Up Table (LUT) are multiples of 0.25, since
this is the lowest non-zero positive value that can be represented using z = 2 fraction
bits.
While each addition and subtraction calculation performed in the LUT-Log-BCJR can
be considered to be a single ACS operation, each max* calculation can be considered to
require four ACS operations:106 Chapter 5 A low complexity energy-ecient turbo decoder architecture
1. one to simultaneously calculate max(~ p; ~ q) and j~ p   ~ qj;
2. one to determine if j~ p   ~ qj > 0:75;
3. one to determine if j~ p ~ qj > 0 or j~ p ~ qj > 2, depending on the outcome of operation
(2);
4. one to add max(~ p; ~ q) to the value selected from the set f0.75, 0.5, 0.25, 0g.
Note that in the general case where z > 0, a total of z + 2 ACS operations are required
to complete the max* calculation. The LUT-Log-BCJR algorithm is transformed into
the Max-Log-BCJR algorithm in the special case of z = 0, where only the rst of the
above listed ACS operations is required.
5.4.2 Proposed architecture
While the conventional LUT-Log-BCJR architecture of Section 5.3 employs dierent
hardware components for dierent steps in the LUT-Log-BCJR algorithm, this section
proposes an architecture which uses the ACS unit of Section 5.4.3 to perform the whole
decoding process. As shown in Figure 5.6, the proposed architecture employs M number
of Calculation Units (CUs) in parallel. Each CU includes an ACS unit, three dedicated
registers (R1, R2 and R3) and the interconnection structure between them.
CU
R
e
g
b
a
n
k
1
R
e
g
b
a
n
k
2
M
e
t
r
i
c
 
m
e
m
o
r
y
L
L
R
 
m
e
m
o
r
i
e
s
M
U
X
M
U
X
M
U
X
R1 R2 R3 Interconnection ACS
R1 R2 R3 Interconnection ACS
R1 R2 R3 Interconnection ACS
M
a
i
n
 
m
e
m
o
r
i
e
s
CU
CU
Figure 5.6: The proposed LUT-Log-BCJR architecture.
Furthermore, in order to minimise the number of costly accesses to the main memory,
which consume a large amount of energy, the proposed architecture additionally employs
two register banks, namely Regbank1 and Regbank2. The combined usage of the dedi-
cated registers and the register banks facilitate an ecient max* calculation and allow
an entire stage of the trellis to be processed without accessing the main memory.Chapter 5 A low complexity energy-ecient turbo decoder architecture 107
The register banks, Regbank1 and Regbank2, are used to temporarily store the LUT-
Log-BCJR variables from one trellis stage j to the next. This facilitates an entire step
of the LUT-Log-BCJR algorithm to be processed without further accesses to the main
memory. More specically, Regbank1 stores the pair of a priori LLRs from the a priori
LLR sequences ~ ba
1 and ~ ba
2 that correspond to the currently considered trellis stage j, as
well as the seven LUT constants of Equation 5.10. Meanwhile, the ,  and  values
are stored in Regbank2. In addition, some internal results for  calculations during the
backward recursion may also be stored in Regbank2. The required number of registers
in Regbank2 is 2m.
While the second register level facilitates the LUT-Log-BCJR processing of one step j of
the LUT-Log-BCJR algorithm, the main memory facilitates the processing of the entire
algorithm. More specically, it stores the two a priori LLR sequences ~ ba
1 and ~ ba
2, as well
as the extrinsic LLR sequence ~ be
1 that is generated in response. Furthermore, following
the completion of the forward recursion, the  values are stored in the main memory
for the duration of the backward recursion. As in the eorts of [151,153,176,177], the
throughput of the proposed architecture can be signicantly increased by decomposing
the main memory into M +2 parallel blocks. More specically, one memory access port
is employed for each of the M CUs, as well as two ports for loading the two a priori
LLR sequences into cache.
Since the proposed architecture supports a fully parallel arrangement of any number
of CUs, it can be readily applied to any LUT-Log-BCJR decoder, regardless of the
corresponding convolutional encoder parameters that are employed. These parameters
include the number of uncoded bit sequence k and non-systematic encoded bit sequences
n, the constraint length notation and the generator polynomials notation. Furthermore,
the proposed architecture can be readily adapted to support any number z of fraction
bits in the two's complement representation, as well as the Max-Log-BCJR algorithm,
as discussed in Section 5.4.1. However, a specialised controller is required for any par-
ticular LUT-Log-BCJR decoder. Furthermore, the gate count of these controllers are
typically higher than those of the conventional architectures. However, it will be shown
in Section 5.5 that the controller accounts for less than 10% of the proposed architec-
ture's relatively low gate count. Let us now discuss the parametrisation of the proposed
architecture for the implementation of a specic LUT-Log-BCJR decoder, namely that
of LTE [103].
5.4.3 Proposed ACS unit and calculation unit
Section 5.4.1 demonstrates that the addition, subtraction and max* calculations of the
LUT-Log-BCJR algorithm can all be decomposed into individual ACS operations. In
order to minimise the number of gates required in the LUT-Log-BCJR decoder, this
section proposes the novel ACS unit of Figure 5.7, which can perform one ACS operation108 Chapter 5 A low complexity energy-ecient turbo decoder architecture
per clock cycle. This ACS unit can perform two types of calculation results, namely
xed-point addition or subtraction operations from port ~ r or 1-bit comparison results
from the three 1-bit registers C = fC0;C1;C2g.
9
9
9
9
9
+
9
C2
˜ r ˜ q
˜ p
c
a
r
r
y
M
S
B
0
1
O0
O2
O3
O4
O5
C0
C1
C2
Loading signal for 1-bit register C0
Loading signal for 1-bit register C1
Loading signal for 1-bit register C2
O1
C1
C0
Figure 5.7: The proposed ACS unit.
As shown in Figure 5.7, the addition calculations ~ r = ~ p + ~ q of the LUT-Log-BCJR
algorithm may be performed in the ACS unit by using the operation code O = fO0, O1,
O2, O3, O4, O5g = 0000002. Note that a subscript of 2 is applied to the operation codes
and register contents in order to indicate that these are binary numbers. By contrast,
decimal values are used for all other variables in the following discussion. Meanwhile, the
operation code O = 1000002 may be used to perform the LUT-Log-BCJR algorithm's
subtraction calculations ~ r = ~ p  ~ q. As shown in Figure 5.7, the variables ~ r, ~ p and ~ q have
a 9-bit word length, for the reasons discussed in Chapter 4. The procedure of performing
a max* operation using the proposed ACS unit is described as follow. Three general
purpose registers, R1, R2 and R3, as given in Figure 5.8, are required for providing
inputs or storing internal results of the ACS unit during the procedure.
1. In this clock cycle, the max* calculation is begun by using the operation code
O = 1011002 and loading the ACS unit operands ~ p and ~ q from the registers R1
and R2, respectively. The result ~ r is stored in the register R3, according to
R3 =
8
<
:
R1   R2 if R1  R2
(R2   R1)   0:25 if R1 < R2
: (5.11)
This operation approximates the jR1 R2j calculation that is implied by Equation
(5.10). Note however, that when R1 < R2, it is the value of R1   R2, which is the
one's complement result of R1   R2, that is calculated and stored in R3. Owing
to the two's complement representation, this result is equivalent to decrementing
the binary representation of R2   R1. Since z = 2 fraction bits are employed,
this decrementation is equivalent to subtracting 0.25, which is the lowest non-zero
positive value that can be represented. In order to emphasise that this value of 0.25Chapter 5 A low complexity energy-ecient turbo decoder architecture 109
is caused by decrementation, it is underlined in Equation (5.11). Note that this
inaccuracy aords a simpler ACS unit than would otherwise be required. Besides,
it can be trivially cancelled in the following clock cycles.
In order to identify max(R1;R2), the Most Signicant Bit (MSB) of the adder's
output, is stored in the register C0. This gives
C0 =
8
<
:
02 if R1  R2
12 if R1 < R2
: (5.12)
Having approximated jR1   R2j and determined max(R1;R2), the rst ACS op-
eration described in Section 5.4.1 is completed.
2. The LUT comparison that is performed during the second ACS operation is
achieved using the operation code O = 1100102. The value stored in R3 is used for
the ACS unit's operand ~ q, while ~ p uses the constant decimal value 0.75 obtained
from Regbank1, which is the second entry of the LUT described in Equation (5.10).
This value is eectively compared with jR1   R2j, according to
~ r =
8
<
:
0:75   R3 = 0:75   (R1   R2) if C0 = 02
0:75   R3   0:25 = 0:75   (R2   R1) if C0 = 12
: (5.13)
Note that the ACS unit's operation is dictated by the value stored in C0. In the
case where C0 = 12, the ACS unit takes the opportunity to cancel the associated
decrementation that was introduced in the previous clock cycle. This is achieved
by using a carry in of carry = 02 for the adder, rather than a value of 12, as is
conventional when performing a subtraction. The result of the LUT comparison
is stored in the register C1 by considering the value of MSB in the adder's output,
according to
C1 =
8
<
:
02 if ~ r  0
12 if ~ r < 0
: (5.14)
Following this clock cycle, the register C1 stores the outcome of the test jR1 R2j >
0:75, as required by the second ACS operation described in Section 5.4.1.
3. In analogy with the previous clock cycle, the result of the test jR1   R2j > 0
or the test jR1   R2j > 2 is determined depending on whether it was previously
decided that jR1   R2j > 0:75. More specically, O = 1100012 for the operation
code is employed, the value stored in R3 for the ACS unit's operand ~ q and the
constant value 0 or 2 for ~ p, as appropriate. As shown in the Equation (5.10),
these constant values are the rst and third entries of the LUT. The comparison110 Chapter 5 A low complexity energy-ecient turbo decoder architecture
is achieved according to
~ r =
8
> > > > > > <
> > > > > > :
0   R3 = 0   (R1   R2) if C0 = 02;C1 = 02
0   R3   0:25 = 0   (R2   R1) if C0 = 12;C1 = 02
2   R3 = 2   (R1   R2) if C0 = 02;C1 = 12
2   R3   0:25 = 2   (R2   R1) if C0 = 12;C1 = 12
: (5.15)
The result of the LUT comparison is stored in the register C2 by considering the
value of MSB in the adder's output, according to
C2 =
8
<
:
02 if ~ r  0
12 if ~ r < 0
: (5.16)
4. The max* calculation is completed in the fourth clock cycle by using the operation
code O = 0000002. Here the operand ~ p is provided by the maximum of R1 and
R2, as identied by C0. Meanwhile, a value for the operand ~ q is selected from the
set f0:75;0:5;0:25;0g that stored in Regbank1, depending on the contents of C1
and C2. As a result,
~ r = max(R1;R2) +
8
> > > > > > <
> > > > > > :
0:75 if C1 = 12;C2 = 12
0:5 if C1 = 12;C2 = 02
0:25 if C1 = 02;C2 = 12
0 if C1 = 02;C2 = 02
; (5.17)
as required.
The ACS unit works with the memory components of Figure 5.6, including the main
memory, the register banks and the general purpose registers (R1, R2 and R3). Cor-
responding structures are required to connect between the ACS unit and the memory
components. The CU of Figure 5.8 is designed for this purpose. As described above,
the ACS unit can perform the max* calculation of Equation (5.10) in four clock cy-
cles, namely one for each of the ACS operations that are described in Section 5.4.1.
In between these clock cycles, three 9-bit registers R1, R2 and R3 are used to store
intermediate results. The ACS unit takes input from two databuses, 'databus1' and
'databus2', as shown in Figure 5.8. The output 'res' gives the calculation results, which
can be either fed back to the general purpose registers (R1, R2 and R3) or output to the
main memory through 'data out'. The output 'C' gives the 1-bit comparison results,
which can be either transmitted to the multiplexer controlled by 'C0' for determining
the result of a max operation or transmitted to the LUT in Regbank1 to determine the
correction value in a max* operation. As shown in Figure 5.8, the inputs from 'databus1'
and 'databus2' are highly exible. A group of tri-state buers provides dierent possibleChapter 5 A low complexity energy-ecient turbo decoder architecture 111
3
LLRs LUT
R3
R2
R1
EnR2
EnMax
EnR3
LoadR2
LoadR3
LoadR1
C0
Op2
Op1
ACS
OpCode
C
res
EnExt2
EnExt1
EnR1
d
a
t
a
b
u
s
1
d
a
t
a
b
u
s
2
Sop
{
C
1
,
C
2
}
0
1
EnExt3
EnExt4
EnExt5
metrics data in data out Sel LUT
Figure 5.8: the proposed CU.
combinations of the inputs from all the memory components in the architecture. This
allows the CU to provide the required operands from any memory components to the
ACS unit. This avoids additional clock cycles wasteful for moving the data between
dieren memory components.
5.4.4 Optimal number of parallel calculation units
Intuitively, it may seem that the energy eciency of the proposed architecture is inde-
pendent of the number of parallel CUs M that it employs. For example, it may seem that
doubling M would double the power consumption, but halve the total processing time,
maintaining a constant energy eciency. However, the situation is complicated by the
data dependencies of the LUT-Log-BCJR algorithm. More specically, a high number
of parallel CUs facilitates the just-in-time calculation of the LUT-Log-BCJR variables,
reducing the amount of storage that they require. However, the dierent steps of the
LUT-Log-BCJR algorithm are suited to dierent amounts of parallelism. This is be-
cause each stage of the trellis is associated with 2m  and  values, but 2m+1  values
and only a single extrinsic LLR. Therefore, a high number of parallel CUs cannot be
exploited at all times. This reduced hardware activity diminishes the energy eciency112 Chapter 5 A low complexity energy-ecient turbo decoder architecture
owing to the static power that is consumed by inactive gates. Therefore, there exists a
particular number of parallel CUs which optimises this trade-o, maximising the energy
eciency.
Since the number of the transitions in each stage of the trellis is 2m+1, to equally dis-
tribute the calculations of , ,  and , the candidate values for M are f1;2;22;:::;2mg.
In order to determine the optimal number M of parallel CUs, the components of the
proposed architecture, including CUs, Regank1 and Regbank2, were implemented using
the Taiwan Semiconductor Manufacturing Company (TSMC) 90 nm process technology.
The areas a required for the nal implementation of the components, which reect their
hardware complexity, were obtained by post-layout simulation. For a particular M,
based on the operation distributions in the CUs, the time consumption of decoding one
extrinsic LLR and their hardware complexity, the hardware utilisation eciency can be
quantied.
For example, when M = 2m in the LTE turbo code, the operation distribution can be
presented by the timing diagram of Figure 5.9. Note that addition and subtraction cal-
culations can be performed in one clock cycle, while the max* calculations require four
clock cycles, as described in Section 5.4.3. As described in Section 5.3, the proposed
reading from the registers
Forward
128
repetitions
0 4
(clock cycle)
t
0 4
(clock cycle)
t t
0 4 8 12 16 18 20 24
(clock cycle)
addition/subtraction calculation max* calculation
writing to the registers
Recursion
REG bank 1
REG bank 2
ACS−8
ACS−7
ACS−6
ACS−5
ACS−4
ACS−3
ACS−2
ACS−1
Recursion
Pre−backward
repetitions
24
LLRs
Backward Recursion
128 repetitions
α β β δ
Figure 5.9: The timing diagram for the operation distribution in the LTE decoder's
components, when M = 2m.
architecture implements the sliding-window LUT-Log-BCJR decoding algorithm by re-
peatedly cycling through three dierent recursions, namely the forward, pre-backward
and backward recursions. These are applied in turn to bN=128c number of windowsChapter 5 A low complexity energy-ecient turbo decoder architecture 113
comprising 128 trellis stages, as well as to a nal window comprising the remainder of
the N trellis stages. As shown in Figure 5.9, the forward recursion determines the set
of  values for each of the 128 trellis stages in the current window using seven clock
cycles. More specically, the M = 8 CUs are employed to calculate the 2m = 8  values
in parallel, with the assistance of Regank1 and Regbank2. Similarly, the pre-backward
recursion uses seven clock cycles to determine the set of  values for each of the 24
trellis stages beyond the right-hand edge of the current window. Finally, the backward
recursion requires 24 clock cycles to calculate the ,  and extrinsic LLR values for each
of the 128 trellis stages in the current window, as shown in Figure 5.9. On average, it
can be calculated that 32.3 clock cycles are required to calculate each extrinsic LLR.
Therefore, the time consumption c in clock cycles per bit can be obtained using the
timing diagram of Figure 5.9. Moreover, the timing diagram shows for what fraction
of time r each of the components is operational. The hardware utilisation rate u of
the decoder is dened as the average of the r values, weighted by the a values of each
component in the decoder, according to
u =
P
r  a
P
a
: (5.18)
In addition, the total datapath area can be estimated as A =
P
a.
To evaluate the hardware utilisation eciency, the metric u  c  A is proposed to
quantify the hardware resources that are actually used to decode each extrinsic LLR. A
lower value of u  c  A represents a decoder using less hardware resources to decode
one extrinsic LLR. For the LTE turbo code, the metric values corresponding to dierent
values of M are given in Table 5.3.
M 2m 2 = 2 2m 1 = 4 2m = 8
c 106.3 55.6 32.3
A 5564 8550 13377
u 75.21% 77.56% 71.08%
u  c  A 4:45  105 3:69  105 3:07  105
Table 5.3: Datapath characteristics for various numbers M of parallel CUs.
In the case where fewer than 2m parallel CUs are employed, the calculations of Figure
5.9 are rearranged into fewer rows. This has the benet of increasing the hardware
activity during the extrinsic LLR calculations, as shown in Table 5.3. However, Table
5.3 shows that a greater number of clock cycles per LLR c are required. Furthermore,
just-in-time processing is prevented, requiring Regbank2 to be enlarged in order to store
additional intermediate variables, as shown in Table 5.3. Table 5.3 shows that ucA
is minimised for M = 2m. For this reason, in Section 5.5, M = 8 is adopted in order to
implement an energy-ecient LUT-Log-BCJR decoder for the turbo code of LTE.Chapter 5 A low complexity energy-ecient turbo decoder architecture 115
T
a
b
l
e
5
.
4
:
C
o
m
p
a
r
i
s
o
n
o
f
t
h
e
i
m
p
l
e
m
e
n
t
e
d
t
u
r
b
o
d
e
c
o
d
e
r
s
.
P
u
b
l
i
c
a
t
i
o
n
P
r
o
p
o
s
e
d
[
1
5
4
]
[
1
5
6
]
[
1
5
8
]
[
1
5
9
]
[
1
6
0
]
A
l
g
o
r
i
t
h
m
L
U
T
-
L
o
g
L
U
T
-
L
o
g
L
U
T
-
L
o
g
L
U
T
-
L
o
g
M
a
x
-
L
o
g
M
a
x
-
L
o
g
B
l
o
c
k
s
i
z
e
(
b
i
t
)
6
1
4
4
5
1
1
4
5
1
1
4
5
1
1
4
6
1
4
4
6
1
4
4
T
e
c
h
n
o
l
o
g
y
(
n
m
)
9
0
1
8
0
1
8
0
1
8
0
6
5
1
2
0
S
u
p
p
l
y
v
o
l
t
a
g
e
(
V
)
1
.
0
1
.
8
1
.
8
1
.
8
-
1
.
2
A
r
e
a
A
(
m
m
2
)
0
.
3
5
9
1
4
.
5
8
.
2
2
.
1
3
.
5
7
(
S
c
a
l
e
d
f
o
r
9
0
n
m
)
(
2
.
2
5
)
(
3
.
6
3
)
(
2
.
0
5
)
(
4
.
0
)
(
2
.
0
)
G
a
t
e
c
o
u
n
t
(
e
x
c
l
u
s
i
v
e
o
f
m
e
m
o
r
y
)
7
.
5
k
8
5
k
4
1
0
k
6
5
k
-
5
5
3
k
M
e
m
o
r
y
r
e
q
u
i
r
e
d
(
K
b
)
1
8
8
2
3
9
4
5
0
1
6
1
-
1
2
9
C
l
o
c
k
f
r
e
q
u
e
n
c
y
F
(
M
H
z
)
3
3
3
1
1
1
1
4
5
1
0
0
3
0
0
3
9
0
.
6
D
e
c
o
d
i
n
g
i
t
e
r
a
t
i
o
n
s
5
1
0
8
6
.
5
6
5
.
5
T
h
r
o
u
g
h
p
u
t
T
(
M
b
/
s
)
1
.
0
3
2
1
0
.
8
4
.
1
7
1
5
0
3
9
0
.
6
P
o
w
e
r
c
o
n
s
u
m
p
t
i
o
n
(
m
W
)
4
.
1
7
2
9
2
9
5
6
3
2
0
3
0
0
7
8
8
.
9
(
S
c
a
l
e
d
f
o
r
9
0
n
m
)
(
3
6
.
5
)
(
1
1
9
.
4
)
(
4
0
)
(
7
9
6
.
4
)
(
3
3
2
.
8
)
E
n
e
r
g
y
c
o
n
s
u
m
p
t
i
o
n
(
n
J
/
b
i
t
/
i
t
e
r
a
t
i
o
n
)
0
.
4
1
4
.
6
1
1
.
1
1
2
.
7
0
.
3
1
0
.
3
7
(
S
c
a
l
e
d
f
o
r
9
0
n
m
)
(
1
.
8
)
(
1
.
4
)
(
1
.
5
9
)
(
0
.
8
1
)
(
0
.
1
6
)
E
t
x
b
+
E
p
r
b
(
n
J
/
b
i
t
)
w
h
e
n
t
r
a
n
s
m
i
t
t
i
n
g
o
v
e
r
3
9
m
(
5
i
t
e
r
a
t
i
o
n
s
)
1
0
.
1
6
1
7
.
1
6
1
5
.
1
6
1
6
.
0
6
1
3
.
4
2
1
0
.
1
7
E
t
x
b
+
E
p
r
b
(
n
J
/
b
i
t
)
w
h
e
n
t
r
a
n
s
m
i
t
t
i
n
g
o
v
e
r
5
8
m
(
5
i
t
e
r
a
t
i
o
n
s
)
4
1
.
9
2
4
8
.
9
2
4
6
.
9
2
4
7
.
8
2
4
9
.
8
8
4
6
.
6
3
E
t
x
b
+
E
p
r
b
(
n
J
/
b
i
t
)
w
h
e
n
t
r
a
n
s
m
i
t
t
i
n
g
o
v
e
r
1
0
0
m
(
5
i
t
e
r
a
t
i
o
n
s
)
3
5
4
.
7
3
6
1
.
7
3
5
9
.
7
3
6
0
.
6
4
0
9
.
0
4
0
5
.
8116 Chapter 5 A low complexity energy-ecient turbo decoder architecture
recent Max-Log-BCJR decoders. In addition, because the proposed decoder's datapath
is simple and has a low energy consumption, the simulation results show that 40% of
the energy consumption of the implemented decoder can be attributed to the memory.
Considering that energy-constrained WSNs require shorter packet lengths than the LTE
standard, the energy consumption of the proposed architecture could be further reduced
in these applications where less memory is required.
To analyse the overall energy consumption Etx
b + E
pr
b of the LUT-Log-BCJR and the
Max-Log-BCJR decoders, the BER performance of the proposed architecture and the
ideal performance of the two types of the decoders are quantied in Figure 5.112. As
The proposed implemenation
Ideal Max-Log-BCJR
Ideal LUT-Log-BCJR
SNR
B
E
R
-2.2 -2.6 -3 -3.4 -3.8 -4.2 -4.6 -5 -5.4 -5.8 -6.2
100
10−2
10−4
10−6
SNR
B
E
R
-2.2 -2.6 -3 -3.4 -3.8 -4.2 -4.6 -5 -5.4 -5.8 -6.2
100
10−2
10−4
10−6
SNR
B
E
R
-2.2 -2.6 -3 -3.4 -3.8 -4.2 -4.6 -5 -5.4 -5.8 -6.2
100
10−2
10−4
10−6
SNR
B
E
R
-2.2 -2.6 -3 -3.4 -3.8 -4.2 -4.6 -5 -5.4 -5.8 -6.2
100
10−2
10−4
10−6
Figure 5.11: BER simulation results, in the case where 5 iterations are employed to
decode a 6144-bit LTE block, which was transmitted over an uncorrelated Rayleigh
fading channel.
discussed in Section 5.2, the high energy eciency of the Max-Log-BCJR is achieved at
the cost of requiring a 0.5 dB higher transmission energy per bit to achieve a BER of
10 4, as shown in Figure 5.11. As a result, the LUT-Log-BCJR algorithm facilitates an
overall energy consumption - including the energy consumed during both transmission
and decoding - that is 10% lower than that of the Max-Log-BCJR at long transmission
ranges, where the energy consumption of the turbo decoder is negligible compared to the
transmission energy required. Indeed, the analysis in [1,2] reveals that a small dierence
in BER performance has a signicant eect on the overall energy consumption Etx
b +E
pr
b .
Some estimation results of the overall energy consumptions Etx
b + E
pr
b for each decoder
in Table 5.4 are provided for a variety of transmission distances. Figure 5.12 shows
the dierence in overall energy consumption between the proposed architecture and the
2Since dierent simulation parameters and channel models are used in previous publications, the BER
performance of the proposed architecture is compared with the idealised upper-bound performance of
the various algorithms, which was obtained using oating-point simulation.Chapter 5 A low complexity energy-ecient turbo decoder architecture 117
Transmission range (m)
T
r
a
n
s
m
i
s
s
i
o
n
e
n
e
r
g
y
d
i
ﬀ
e
r
e
n
c
e
f
(
d
)
(
n
J
/
b
i
t
)
100 80 60 40 20 0
60
50
40
30
20
10
0
-10
Figure 5.12: The energy consumption dierence between the proposed architecture
and the Max-Log-BCJR decoder in [160] at BER = 10 4.
Max-Log-BCJR of [160], f(d) = (Etx
b +E
pr
b )LUT Log BCJR (Etx
b +E
pr
b )Max Log BCJR.
As shown in Figure 5.12, when the function f(d) has a negative value, the Max-Log-
BCJR decoder oers a greater energy eciency than the proposed architecture. By
contrast, when f(d) has a positive value, the proposed architecture oers a greater
energy-ecient than the Max-Log-BCJR decoder. The magnitude of the value is the
dierence in overall energy consumption between the two schemes. As a result, the
proposed architecture oers an overall energy saving that increases exponentially beyond
a range of 39 m, relative to the state-of-the-art Max-Log-BCJR decoder [160]. Further
calculation shows that, compared with the most energy-ecient design [160] of Table 5.4,
which has an energy consumption of 0.16 nJ/bit/iteration, the proposed LUT-Log-BCJR
decoder achieves more than 10% overall energy saving when the transmission distance
reaches 58 m.
5.6 Conclusions
This chapter commenced by investigating the potential energy saving that can be oered
by employing turbo codes in WSNs. Since turbo decoders have a relatively high com-
plexity, their energy consumption E
pr
b must be considered in these energy-constrained
scenarios. The saving in transmission energy Etx
b that is facilitated by turbo codes
must overcome E
pr
b in order to achieve an overall energy saving. As a result, overall
energy consumption E
pr
b + Etx
b is proposed to evaluate the energy eciency of a turbo118 Chapter 5 A low complexity energy-ecient turbo decoder architecture
code which is employed in a WSN. An energy-ecient turbo decoder is essential for
minimising E
pr
b + Etx
b , when employing turbo codes in WSNs.
The evaluation of E
pr
b +Etx
b is applied to ve state-of-the-art ASIC turbo decoder designs,
employing either the LUT-Log-BCJR or the Max-Log-BCJR algorithms. The study
reveals that Max-Log-BCJR turbo decoders typically have lower E
pr
b than LUT-Log-
BCJR turbo decoders because of their lower computational complexity. However, LUT-
Log-BCJR decoders facilitate 10% lower Etx
b than Max-Log-BCJR decoders because they
achieve higher coding gains. As a result, when the transmission distance is suciently
short, E
pr
b makes a greater contribution than Etx
b to the overall energy consumption
and Max-Log-BCJR decoders are more energy ecient than LUT-Log-BCJR decoders.
However, for any pair of LUT-Log-BCJR and Max-Log-BCJR decoders, there is a critical
transmission distance at which E
pr
b is suciently high and the LUT-Log-BCJR decoder
becomes benecial for the overall energy eciency. Therefore, the key issue for designing
LUT-Log-BCJR decoders is making them more energy ecient over a wider transmission
range in order to achieve a low E
pr
b for the ASIC implementation.
For the purpose of designing a hardware architecture for LUT-Log-BCJR turbo decoders
having a low E
pr
b , this chapter has also analysed the energy eciency of conventional
turbo decoder architectures. Based on the analysis, a low-complexity energy-ecient
turbo decoder architecture was proposed specically for WSNs. The LUT-Log-BCJR
algorithm is decomposed into its fundamental operations, namely the ACS operations.
A novel ACS unit is designed for performing all the operations during the decoding
process. A datapath architecture is proposed to support a fully parallel arrangement
of any number of CUs, with a low hardware complexity. Finally, the proposed archi-
tecture is demonstrated and validated by implementing the LTE turbo decoder. The
implementation results show that the proposed architecture provides an overall energy
saving compared to both the conventional LUT-Log-BCJR and Max-Log-BCJR decoder
architectures, when the transmission range exceeds 39 m. Moreover, the proposed ar-
chitecture is highly exible and can be readily applied to any turbo code.Chapter 6
A turbo decoder energy
estimation model allowing overall
energy optimisation
6.1 Introduction
As discussed in Chapter 1, turbo codes can benet in Wireless Sensor Networks (WSNs)
by reducing their transmission energy consumption, since the sensor nodes in WSNs
typically require a relatively long lifetime and have constrained energy resources. The
investigation of Chapter 3 concludes that employing a Serial Concatenated Convolutional
Code (SCCC) in a star topology WSN can signicantly reduce the sensor nodes' energy
consumption, and hence increase their lifetime. In this case, the only extra device
required on the sensor nodes is the SCCC encoder which uses forward error correction
to allow a signicant transmission power reduction. As discussed in Chapter 1, this is
applicable in WSNs where information only ows in one direction, from the sensor nodes
to the central node. Since the encoder of a SCCC is very low complexity, it consumes
only a negligible amount of energy compared with the energy saving associated with the
transmission power reduction provided by the employed SCCC. The trade-o is the high
complexity SCCC decoders that are required on the central node, which have a relatively
large energy consumption. Since WSN typically have only one or a few central nodes
having abundant energy resources, the described trade-o is desirable.
However, in more complicated network topologies, multi-hop transmission is typically
employed [180] and the sensor nodes need to receive and retransmit the signals. In
decode-and-forward schemes [132], the high complexity decoders are required on the
sensor nodes and will consume extra energy. As a result, whether or not the employed
turbo code can still provide energy saving for the sensor nodes depends on whether or
not the transmission energy saving can overcome the decoder's energy consumption.
119120
Chapter 6 A turbo decoder energy estimation model allowing overall energy
optimisation
The investigation in Chapter 5 shows that when the communication distance is long
enough between two nodes in the WSN, the transmission energy consumption becomes
dominant compared with the energy consumed by the decoders integrated into the sen-
sor nodes. Therefore, the situation becomes similar to that of the star network, since
both the encoder and the decoder's energy consumption is negligible compared with the
transmission energy saving granted by the turbo code. As a result, the overall energy
consumption of the sensor nodes can be reduced and the lifetime of the WSN can be
extended. However, this condition limits the applicable minimum transmission range
when adopting a turbo-like code in multi-hop WSNs.
The energy consumed by the turbo-like decoder is an important factor that aects the
minimum transmission range for which an overall energy reduction can be achieved. As
discussed in Chapter 5, there are two signicant parts of energy consumption in wireless
communication systems that relate to the use of channel coding schemes, namely the
transmission energy saving Etx
b and the decoding energy consumption E
pr
b . Therefore,
the overall energy saving of the sensor nodes can be considered to be Etx
b   E
pr
b .
Therefore, to design a turbo code for WSNs, both Etx
b and E
pr
b are very important
specications that need to be considered carefully. The transmission energy saving Etx
b
can be estimated by using Bit Error Rate (BER) analysis and a relevant path loss model,
as demonstrated in Chapter 5.
However, it is dicult to estimate the decoding energy consumption E
pr
b during the code
design stage. Therefore, conventionally, it is not considered in the design process [91]. In-
stead, the turbo code is designed using simulation (BER, EXtrinsic Information Transfer
(EXIT) simulation) to meet a particular performance requirement, such as a particular
coding gain with a particular computation complexity. Then the hardware implementa-
tion is designed to minimise the energy consumption under the constraint of achieving
the required decoding throughput, cost, area, etc. However, this design procedure is
not suitable for turbo codes designed to be applied in WSNs. By only considering the
performance in the code design stage, the energy consumption of the coding system is
not holistically optimised. The design result may achieve a great reduction on transmis-
sion energy consumption, Etx
b , but also impose a high decoding energy consumption,
E
pr
b , which cancels out the energy saving overall. If the decoding energy consumption,
E
pr
b , can be considered from the start of the design process, the disadvantage of the
conventional design method can be prevented. Therefore, a methodology providing the
capability to estimate the decoding energy consumption from the start is required.
Energy estimation is a key issue for Application-Specic Integrated Circuit (ASIC) de-
sign procedures. Methodologies and tools can be developed to perform energy estimation
at various stages of the design procedure. A typical ASIC design procedure is shown
in Figure 6.1. Typically, the later the design stage in which the energy estimation is
performed, the higher the accuracy that can be gained, because more information about
the hardware implementation is available [181]. Therefore, conventionally, the energyChapter 6 A turbo decoder energy estimation model allowing overall energy
optimisation 121
RTL Description
Physical Layout Implementation
Gate Level Netlist
Algorithm Level
Architecture Level
RTL Level
Gate Level (Post−synthesis)
Post−layout Level
Fabrication Chip
Turbo code parametrisation
Parallelism, memory usage, etc.
Figure 6.1: The typical ASIC design procedure and the outcome of each stage.
estimation at the post-synthesis level or post-layout level is widely used in practice due
to its accuracy and generality [182]. However, there is less opportunity to rethink the
design at these design stages. Previously, [183,184] proposed energy estimation methods
based on the computation complexity of a design, such as the number and the type of
arithmetic/Boolean operations in the behavioural description and the number of states
and/or transitions in a controller description. These complexity based models rely on
the assumption that the complexity of a circuit can be estimated using the number of
equivalent gates, which limits the accuracy of the estimation. This is because even with
the same number of equivalent gates, dierent compositions and behaviours of the gates
has a signicant eect on the energy consumption [185,186]. In addition, these methods
require additional analysis of the target design, in order to convert the information from
the algorithm level to hardware complexity. However, this analysis is not required in
the conventional design procedure and so this approach is inconvenient [187]. In this
chapter, the architecture proposed in Chapter 5 is used to derive a bottom-up energy
estimation framework. Using a detailed investigation of the energy consumed by the sub-
modules in the proposed architecture following post-layout simulations, energy models
are produced based on the measurements. These energy models of the sub-modules can
be used during the early design stage, in order to estimate the energy consumption of a
turbo decoder without requiring knowledge from later design stages. The motivation for
using a bottom-up energy estimation framework based on the proposed architecture is
because its high congurability allows the characteristics of the hardware implementa-
tion to be predicted at an early design stage. The object of this framework is to provide
energy consumption information for the turbo code designer, in order to help determine
the parameters of the codes. In particular, the framework allows the designer compare
the energy consumption of dierent turbo code designs.
The remaining sections of this chapter are organised as follows. Section 6.2 presents122
Chapter 6 A turbo decoder energy estimation model allowing overall energy
optimisation
a generalisation of the architecture proposed in Section 5. In particular, a redesigned
controller is proposed in order to allow the architecture to be readily recongured for
dierent sets of turbo code parameters. In Section 6.3, the above-mentioned energy
estimation framework is proposed. In Section 6.4, a holistic design procedure is demon-
strated which uses the proposed estimation framework and a path loss model to optimise
the selection of turbo code parameters for the scheme of [106]. In [106], dierent turbo
code parameterisations are investigated using the conventional BER evaluation method.
The demonstration of this section shows that the proposed holistic design procedure
gives an overall optimisation for the turbo code design, which the conventional design
procedure cannot provide. The conclusions are provided in Section 6.5.
6.2 Generalised architecture
As described in Section 5.4, the proposed architecture can be recongured to implement
any Look-Up Table based Logarithmic Bahl-Cocke-Jelinek-Raviv (LUT-Log-BCJR) de-
coder. This is because the proposed architecture is based on ve sub-modules, where
dierent decoder designs employ dierent combinations of the sub-modules. The top-
level of the architecture is shown in Figure 6.21. There are ve sub-modules in the
top-level, namely the Calculation Units (CUs), two register banks (Regbank1 and Reg-
bank2), the Logarithmic Likelihood Ratio (LLR) memories and the metric memory. The
CU
R
e
g
b
a
n
k
1
R
e
g
b
a
n
k
2
M
e
t
r
i
c
 
m
e
m
o
r
y
L
L
R
 
m
e
m
o
r
i
e
s
M
U
X
M
U
X
M
U
X
R1 R2 R3 Interconnection ACS
R1 R2 R3 Interconnection ACS
R1 R2 R3 Interconnection ACS
M
a
i
n
 
m
e
m
o
r
i
e
s
CU
CU
Figure 6.2: The conguration of the proposed LUT-Log-BCJR decoder architecture.
generalisation of the architecture proposed in this section requires no change to the ar-
chitecture shown in Figure 6.2. However, the generalisation includes a redesigned usage
of Regbank2 and a redesigned controller. Furthermore, for dierent LUT-Log-BCJR de-
coders, the reconguration of the proposed architecture may adjust the number of Add-
Compare-Select (ACS) units employed in parallel, the number of registers in Regbank1
1This gure is also provided in Section 5.4, Figure 5.6.Chapter 6 A turbo decoder energy estimation model allowing overall energy
optimisation 123
and Regbank2 and the redesign of the controller to correctly schedule the operations
that take place in the ACS units and the register banks.
LUT−Log−BCJR
Lower
decoder
LUT−Log−BCJR
Upper
decoder
b1
memory k + 1
a priori LLR
memory k + n
a priori LLR
a priori LLR
memory 1
a priori LLR
memory k
C
h
a
n
n
e
l
extrinsic LLR
memory 1
extrinsic LLR
memory k
π
π
extrinsic LLR
memory
memory
extrinsic LLR
a priori LLR
memory
a priori LLR
memory
memory elements
Upper encoder
with m
memory elements
Lower encoder
with m
˜ ba
k+1
˜ ba
k+n
˜ ba
k+1
˜ ba
k+n
˜ ba
1
˜ ba
k
bk+n
bk+1
˜ be
k
˜ be
1
˜ ba
1
˜ ba
k
π π−1
˜ b
p
k
˜ b
p
1
bk
Figure 6.3: The conguration of a typical turbo code scheme.
The conguration of the generalised architecture depends on the parametrisation of the
turbo code design. As shown in Figure 6.3, the parameters of a turbo code includes
the number of input sequences k provided to each component encoder, the number
of the memory elements m employed in each component encoder and the number of
non-systematic output sequences n provided by each component encoder. Note that
in Figure 6.3 each of the k input sequences corresponds to one of the k systematic
output sequences. Moreover, since the sliding-window technique [169] is recommended
as described in Section 5.2, the sliding-window length ws and the pre-backward recursion
length wp are parameters of the specication. In the following sub-sections, each sub-
module of Figure 6.2 is discussed individually. In Section 6.2.4, a controller design that is
more appropriate for general cases is proposed. The study shows that the conguration
of the turbo decoder implementation can be determined in the earliest turbo code design
stage.124
Chapter 6 A turbo decoder energy estimation model allowing overall energy
optimisation
6.2.1 Register banks
There are two register banks, Regbank1 and Regbank2, in the architecture, as shown
in Figure 6.2. Regbank1 is composed of both registers and dummy registers. The
dummy registers have hard-wired constant values, which correspond to the Look-Up
Table (LUT) elements. They are considered as registers in the architecture level, in
order to give a concise data organisation and facilitate a simpler controller design for the
decoder. Regbank1 provides four types of input for the CUs, namely the a priori LLRs,
the index values of the LUT, the output values of the LUT and a constant zero value.
The formation of Regbank1 is given in Figure 6.4. In the gure, f~ ba
1;j; ~ ba
2;j; ::: ~ ba
k+n;jg
0
LLRs LUT−input LUT−output Zero
0 0.75 2 0 0.25 0.5 0.75 ˜ ba
1,j ˜ ba
k+n,j ˜ ba
2,j
Figure 6.4: The components of Regbank1.
are the a priori LLRs of the current decoding step j of each a priori LLR sequences,
f~ ba
1; ~ ba
2; ::: ~ ba
k+ng, as shown in Figure 6.3, where ~ ba
i = f~ ba
i;jgN
j=1. Here, N is the block
length, namely, the number of LLRs in each sequence. As shown in Figure 6.4, only the
k + n number of LLRs require actual registers in the hardware design. The others are
all dummy registers. Note that some of the dummy registers store the same values, but
are kept separated in order to facilitate a concise controller design.
In Chapter 5, the turbo decoder was designed for a special case, where each component
encoder has k = 1 input and n = 1 non-systematic output sequences. Depending on
which part of the LUT-Log-BCJR algorithm is being processed, Regbank2 is used to
store dierent internal metrics. During the forward recursion, Regbank2 stores the set
of (S) values (Equation 5.5) that are calculated for a particular step j in the trellis.
Similarly, a set of (S) (Equation 5.7) values is stored during the pre-backward and
backward recursions. On other occasions, Regbank2 stores some internal calculation
results, as discussed in Section 5.4.2. The proposed decoder architecture requires Reg-
bank2 to store a set of metrics corresponding to one step j in the decoding trellis at a
time. Moreover, in this chapter, the proposed architecture is generalised for any LUT-
Log-BCJR decoder. Because Regbank2 is also used for storing the internal results for
the i(T) calculations (Equation 5.8), these internal results dene the register require-
ment in Regbank2 in the general case where k  1 or n  1. The number of registers
required in Regbank2 is given by 2m(2k   1), as discussed in Section 6.2.4.Chapter 6 A turbo decoder energy estimation model allowing overall energy
optimisation 125
6.2.2 Calculation unit
The CUs perform all the ACS operations of the Logarithmic Bahl-Cocke-Jelinek-Raviv
(Log-BCJR) decoding process. Each CU consists of a ACS unit, three general purpose
registers, the interconnections between them and the Multiplexor Unit (MUX) unit for
providing the input values from the register banks. The connections between the general
purpose registers and the ACS unit is given in Figure 6.52. The function of the ACS
3
LLRs LUT
R3
R2
R1
EnR2
EnMax
EnR3
LoadR2
LoadR3
LoadR1
C0
Op2
Op1
ACS
OpCode
C
res
EnExt2
EnExt1
EnR1
d
a
t
a
b
u
s
1
d
a
t
a
b
u
s
2
Sop
{
C
1
,
C
2
}
0
1
EnExt3
EnExt4
EnExt5
metrics data in data out Sel LUT
Figure 6.5: CU structure.
unit is to perform the fundamental ACS operations, including addition/subtraction,
max* and selection, as described in Section 5.4.3. As discussed in Section 5.4.4, the
optimal number of CUs to employ in the decoder, is 2m, where m is the number of the
memory elements in the component encoder. In the general case, the only modication
required is to place the correct number of CUs in parallel. There is no need to recongure
the internal structure of the CUs.
As shown in Figure 6.2, the MUXs are used to connect the register banks and the CUs.
They select the input for the CUs from the values stored in the register banks. The
required number of MUXs is equal to 2m, since each of the 2m CU requires a MUX
2This gure is also provided in Section 5.4.3, Figure 5.8.126
Chapter 6 A turbo decoder energy estimation model allowing overall energy
optimisation
to select its input individually. For each MUX, two individual multiplexors are used
respectively for Regbank1 and Regbank2. The number of inputs to each multiplexor
is determined by the number of the registers in the corresponding register bank. For
Regbank1, the dummy registers need to be considered as well. As a result, the number of
inputs to the MUX for Regbank1 can be derived from Figure 6.4, namely, k+n+8. For
Regbank2, a 2m(2k 1) input multiplexor is required, as will be explained in Section 6.2.4.
6.2.3 Memories
As shown in Figure 6.2, the memories required by the proposed architecture may be
divided into two types, namely, the LLR memory blocks and the metric memory block.
Each LLR memory block stores one a priori or extrinsic LLR sequence. For a particular
LUT-Log-BCJR decoder, the required number of extrinsic LLR memory blocks is k. The
required number of a priori LLR memory blocks depends on whether the implemented
turbo code is systematic or not. For a systematic code, k + n a priori LLR memory
blocks are required. By contrast, only n a priori LLR memory blocks are required for a
non-systematic code, since the systematic bits are not transferred to the decoder. The
use of the LLR memories in a turbo decoder is shown in Figure 6.3. All of the LLR
memory blocks have the same size and the same word length, which is determined by
the block length N of the turbo code design and the word length specication of the
input LLRs. Here, the word length specication depends on the xed-point setting of
the LLRs, as investigated in Chapter 4.
For the metric memory, only one block is used for the proposed architecture. Between
the operation of the forward and backward recursions, the metric memory stores 2m
number of (S) values from Equation 5.5, for each of the ws steps in the current window
of the trellis. Since the (S) values are always accessed in groups of 2m at the same
time, each group is stored in one word of the memory, in order to simplifying the design.
Therefore, the word length of the metric memory is required to be long enough to store
2m (S) values at once.
6.2.4 Controller design and decoding time consumption
In this section, the redesigned controller for the generalised LUT-Log-BCJR architec-
ture is proposed. The decoding time consumption is discussed along with the controller
design, since this information is required for the energy estimation framework in Sec-
tion 6.3.
The controller controls the operations performed during the decoding process. As dis-
cussed in Section 5.4.4, it can be described by a schedule chart like Figure 5.9. The
redesign that is proposed in this section adjusts the calculation of  values and the useChapter 6 A turbo decoder energy estimation model allowing overall energy
optimisation 127
of Regbank2. The purpose of the LUT-Log-BCJR decoder is to calculate the extrinsic
LLRs in the sequence ~ be
i;j, where i 2 [1;2;:::;k + n] and j 2 [1;2;:::;N]. In order to
determine the time required to perform the decoding, the adjusted algorithm and the
number of ACS operations performed during the calculation of the i(T), (S), (S),
i(T) and ~ be
i;j values are summarised as follows.
1. i(T) values
i(T) =
8
<
:
0 if Bi(T) = 1
~ ba
i;J(T) if Bi(T) = 0
(6.1)
i(T) values are the fundamental components of the (S) and (S) calculations, as
shown in Equation 6.1, 6.3 and 6.5. There is no calculation required to obtain the
i(T) values, since these adopt the value of either ~ ba
i;j or 0, depending on the value
of bi;j(T) in the decoding trellis, as shown in Equation 6.1. Owing to this, half of
the i values have a value of zero and do not need to be considered by subsequent
addition operations. Note that there are k + n number of i(T) values for each
transition T, namely one corresponding to each LLR sequence i 2 [1;k + n].
2. (S) values
(T) = 
 
fr(T)

+
k+n X
i=1
i(T) (6.2)
(S) =

max
T2to(S)
(T) (6.3)
Each (S) value corresponds to a state S in the decoding trellis. In the general
case, the calculation of an (S) value is given by Equation 6.3. Since 2k transitions
merge into each state in the trellis, a total of 2k 1 max* operations are required to
calculate each (S) value. The additions required by (S) calculations are those
of Equation 6.1 and 6.2. However, since many i(T) values have a value of zero,
the number of additions required by each (S) value is dierent. Considering that
half of the i(T) values are equal zero, a total of 2k 1(k+n) additions are required
to obtain all of the (S) calculations in each step j of the trellis.
3. (S) values
(T) = 
 
to(T)

+
k+n X
i=1
i(T) (6.4)
(S) =

max
T2fr(S)
(T) (6.5)
The calculation of a (S) value is similar to (S) values and the required number
of ACS operations is the same.128
Chapter 6 A turbo decoder energy estimation model allowing overall energy
optimisation
4. i(T) values
i(T) = 
 
fr(T)

+ j(T)   i(T) (6.6)
For each particular input sequence i 2 [1;k + n], there is one i(T) value cor-
responding to each transition T in the trellis. A total of 2k+m i(T) values are
required to calculate each extrinsic LLR. The generalised calculation of a i(T)
value is given in Equation 6.6, which includes two addition/subtraction opera-
tions. Therefore, a total of 2k+m+1 addition/subtraction operations are required
to perform all the i(T) calculations in any particular step j of the trellis.
Note that the original  calculation, from Equation 5.8, is given in Equation 6.7.
i(T) = 
 
fr(T)

+ 
 
to(T)

+
 k+n X
i0=1;i06=i
i0(T)

(6.7)
According to Equation 6.7, when k + n becomes a large number, the number of
additions required by the i(T) calculations increases signicantly. To reduce the
number of additions required by the i(T) calculations, Equation 6.6 is proposed
for the redesigned controller. This introduces the use of the internal variables
(T). As discussed in Chapter 2, the (S) calculation used in each step j of the
decoding trellis, includes the calculation of a variable (T) corresponding to each
transition T in the trellis. Each i(T) value is also corresponds to a particular
transition in the trellis. Therefore, using the (T) values, the calculation i(T)
values can be simplied according to Equation 6.6. When k > 1 or n > 1, the
number of additions required by Equation 6.6 is signicantly reduced compared
with Equation 6.7. As mentioned in Section 6.2.1, the use of Equation 6.6 to
calculate the i(T) values when k > 1 causes the number of registers required in
Regbank2 to increase. During the backward recursion, all the (T) values calcu-
lated for the current step j in the trellis must be stored for the i(T) calculations
later. There are 2m+k transitions in each step j of the trellis, so 2m+k (T) val-
ues are required to be stored at any one time. The controller design proposed in
this section allows each CU to store one (T) value in one of its general purpose
registers. Since there are 2m CUs in the datapath, 2m (T) values can be stored
in the general purpose registers. The remaining (T) values must be stored in
Regbank2. Therefore, 2m(2k   1) registers are required in Regbank2. Note that
when k = 1, only 2m(2k   1) = 2m registers are required in Regbank2, which is
the same number that was used in the implementation of Chapter 5. Therefore,
the datapath implementation of k = 1;n = 1 in Chapter 5 is a special case of the
proposed generalised architecture.Chapter 6 A turbo decoder energy estimation model allowing overall energy
optimisation 129
5. Extrinsic LLRs
~ be
i;j =

max
T
 

Bi(T)=0
J(T)=j
 
i(T)

 

max
T
 

Bi(T)=1
J(T)=j
 
i(T)

(6.8)
The extrinsic LLRs are the nal output of the decoder. The calculation of the
extrinsic LLR is given in Equation 6.8. For each extrinsic LLR, 2k+m i(T) val-
ues are divided into two groups, in order to calculate the dierence of two max*
results. A total of 2(2k+m 1   1) max* operations and one subtraction operation
are required. In each step j of the trellis, there are k extrinsic LLRs that need to
be calculated.
As discussed, the decoding process is divided into three recursion processes, namely, the
forward recursion, the pre-backward recursion and the backward recursion. The time
consumption of the redesigned controller in each of the three recursions is summarised
as follows.
1. Forward recursion:
The generalised controller design of the forward recursion can be described as
shown in Figure 6.6 (a). Regbank2 is divided into groups for the controller, where
*
*
*
(a) (b)
A Priori LLRs Memories
Forward Recursion
Loading LLRs
register update and memory access only at the ﬁrst loop of each window for initial α values
addition/subtraction max* register update memory access
Calculating ǫ(T)
Calculating α(S)
Prebackward Recursion
Loading LLRs
Calculating η(T)
Calculating β(S)
Metrics Memories
Extrinsic LLRs Memories
Regbank1
Calculation Units(CU)
Regbank2-Group2
Regbank2-Group1
Regbank2-Group(2k − 1)
Calculation Units(CU)
Regbank1
Metrics Memories
Extrinsic LLRs Memories
Regbank2-Group(2k − 1)
Regbank2-Group1
Regbank2-Group2
A Priori LLRs Memories
Figure 6.6: The decoding schedules of the forward and prebackward recursions.
each of the 2k 1 groups contains 2m registers. In this way the registers are always
updated or read one group at a time, as shown in Figure 6.6 (a). Each step j of
the forward recursion begins with a clock cycle in which the corresponding a priori
LLRs f~ ba
1;j;~ ba
2;j;:::~ ba
k+n;jg are loaded into Regbank1. Next, the (T) values are
calculated. Similarly to the (T) calculation, 2m (T) values can be stored of the
general purpose registers of each CU. The remaining 2m(2k   1) values are stored
in Regbank2. Since there are 2m CUs in the architecture, the decoder calculates130
Chapter 6 A turbo decoder energy estimation model allowing overall energy
optimisation
2m operands at a time. The results are then stored into one of the subgroups
in Regbank2, as shown in Figure 6.6 (a). Following this, the max* operations
can be performed to calculate the (S) values, since all the required (T) values
are available in the registers. The resultant (S) values are stored in the metric
memory block of Figure 6.2. Note that for the rst step in each window, the set of
(S) values calculated for the previous step need to be restored from the metric
memory, in order to initial the forward recursion.
As shown in Figure 6.6, each add/sub operation requires one clock cycle and each
max* operation requires four clock cycles. For each step j, in the forward recursion,
a total of 2k 1(k + n) + 4(2k   1) clock cycles are required. In addition, one
clock cycle is required to load the corresponding a priori LLRs from the memory.
Therefore, the total number of clock cycles required for each step of the forward
recursion, is given by
Tforward = 2k 1(k + n) + 2k+2   3: (6.9)
2. Pre-backward recursion:
The pre-backward recursion is shown in Figure 6.6 (b). It is the same as the
forward recursion, except that there is no need to store the (S) values, since only
the results from the last step in the window are required in order to initialise the
backward recursion. Furthermore, there is no need for the initial values for each
pre-backward window since the initial values of (S) are all zero.
The number of clock cycles required for each step j of the pre-backward recursion
is equal to that of the forward recursion, as shown in Figure 6.6, Tprebackward =
Tforward.
3. Backward recursion:
The backward recursion is shown in Figure 6.7. It starts with (S) calculation,
as in the pre-backward recursion. Following the (S) calculation is a loop for
the extrinsic LLR calculation, including the i(T) calculation and extrinsic LLR
calculation, as shown in Figure 6.7. There are k extrinsic LLRs f~ be
1;j;~ be
2;j;:::~ be
k;jg
which need to be calculated, so the loop repeats k times.
As discussed in Chapter 5, the use of the CUs for the max* operations in the ex-
trinsic LLR calculations is dierent from all the other calculations in the decoding
process. Each set of max* operations is distributed among the CUs using a binary
tree structure, in order to perform Equation 6.8, as shown in Figure 6.8. Figure 6.8
provides a detailed schedule of the CUs for the last stage in Figure 6.7, including
the max* operations and the nal subtraction of the extrinsic LLRs, according to
Equation 2.18. In general case, there are 2m+k i(T) values per trellis step j. The
process to calculate the extrinsic LLR using the i(T) values can be divided into
three stages, as shown in Figure 6.8. In the rst stage, all the CUs perform max*Chapter 6 A turbo decoder energy estimation model allowing overall energy
optimisation 131
Loop
A Priori LLRs Memories
Loading LLRs
Backward Recursion
max* operations
Calculating δi(T)
addition/subtraction max* register update memory access
Loading α(S) Calculating η(T)
Calculating β(S) Calculating extrinsic LLR
Extrinsic LLRs Memories
Regbank2-Group(2k − 1)
Calculation Units(CU)
Regbank1
Metrics Memories
Regbank2-Group2
Regbank2-Group1
Figure 6.7: The decoding schedule of the backward recursion.
third stage
second stage
first stage
addition/subtraction register update memory access
Calculation Units(CU) Group2-1
Calculation Units(CU) Group2-1
Calculation Units(CU) Group2-1
Calculation Units(CU) Group2-1
Calculation Units(CU) Group2-N
Calculation Units(CU) Group2-1
Calculation Units(CU) Group2-1
Calculation Units(CU) Group2-1
Calculation Units(CU) Group2-1
Calculation Units(CU) Group2-N
max*
Extrinsic LLR calculation
Loop
Figure 6.8: The decoding schedule of the extrinsic LLR calculations.132
Chapter 6 A turbo decoder energy estimation model allowing overall energy
optimisation
operations in a loop until there are not enough operands for all of the CUs to work
together. The second stage starts with only half of the CUs performing the max*
operations. The number of working CUs reduces by half after each max* cycle,
until there are only two CUs working at the same time. Then, in the nal stage,
the rst CU performs a subtraction of the results from the two CUs in previous
max* cycle, in order to obtain the extrinsic LLR.
In the backward recursion, the (S) calculation requires the same time consump-
tion as the forward and pre-backward recursions. For i(T) calculations, one clock
cycle is required for loading the required (S) values from the metric memory.
A further 2k+1 clock cycles are required to calculate the i(T) values. A total of
2k+1 + 1 clock cycles are required. In the rst stage of the extrinsic LLR calcula-
tions, the number of max* operands reduces from 2m+k to 2m+1, requiring k max*
cycles. In the second stage, log2 2m 1 = m 1 max* cycles are required to reduce
the number of max* operands from 2m to 4. In the third stage, one clock cycle is
used to perform the subtraction of the two results from the second stage. There
are k extrinsic LLRs that need to be calculated for one step j in the trellis, this
requires the loop shown in Figure 6.8 to be repeated k times. The time consump-
tion to calculate all the i(T) values and the extrinsic LLRs for one step j in the
trellis is therefore k(2k+1+4(k+m 1)+1) = k(2k+1+4k+4m 3) clock cycles.
The total time consumption for one loop of the backward recursion is given by
Tbackward = 2k 1j + 2k+2 + k(2k+1 + 4k + 4m   3)   2 (6.10)
The generalised controller design for the architecture proposed in Chapter 5 can be
adapted to any LUT-Log-BCJR decoder. For the special case of k = 1;n = 1, the
controller is same as the design of Chapter 5, except for the proposed alternative i(T)
calculation.
In order to obtain the throughput of the LUT-Log-BCJR decoder, it is necessary to
consider the sliding window length ws and the pre-backward window recursion length
wp. In each backward recursion, k number of extrinsic LLRs are calculated. Therefore,
the average number of clock cycles for calculating each extrinsic LLR can be calculated
as
Te =
ws(Tforward + Tbackward) + wpTprebackward
ws  k
: (6.11)
In summary, the conguration and the hardware structure of the proposed architecture
can be predicted at the early design stage. This conclusion is the foundation of the
proposed energy estimation framework of this chapter. In the next section, the proposed
energy estimation framework is presented based on the discussion in this section.Chapter 6 A turbo decoder energy estimation model allowing overall energy
optimisation 133
6.3 Energy estimation framework
In this section, the proposed architecture is employed as the basis of a framework for
estimating the energy consumption of a LUT-Log-BCJR decoder, as well as a turbo
decoder. As mentioned in Section 6.1, the objective of this framework is to provide
quantied energy consumption information of the LUT-Log-BCJR decoder at the turbo
code design stage, in order to assist the designer in making decisions about the param-
eters of the code.
In order to obtain quantied energy estimation results, some assumptions from the later
implementation stages are required. The Integrated Circuit (IC) fabrication process
technology [160], the supply voltage and the clock frequency of the implemented circuit
may all have signicant impacts on the energy consumption. Therefore, Taiwan Semi-
conductor Manufacturing Company (TSMC) 90nm technology is chosen in this work, to
provide typical energy consumption information for the hardware implementation. The
proposed framework can be extended to dierent process technologies by employing the
triple scaling factor of Table 5.4, which is a widely used method to approximately esti-
mate the energy consumption of a particular architecture, when using a dierent process
technology. The input variables of the framework are summarised in Table 6.1. When
k the number of inputs of each component encoder
m the number of memory elements of each compo-
nent encoder
n the number of non-systematic outputs of each
component encoder
ws the sliding-window length
wp the pre-backward recursion length
N the block length
I the number of decoding iterations performed
v the supply voltage
f the clock frequency
x, y and z the word length specications of the decoder, as
described in Chapter 4
Table 6.1: Summary of the variables in the energy estimation framework.
using the framework to estimate the LUT-Log-BCJR decoder's energy consumption, the
designer is able to change the values of the variables to investigate their impact on the
energy consumption. Note that the choices of the word length specications, x, y and
z, are fully investigated in Chapter 4. Therefore, they are not discussed in detail in this
chapter. Instead, the word length specications identied in Chapter 5, x = 4, y = 7
and z = 2, are assumed during the formation of the framework. However, the option
of estimating the energy consumption for a dierent x, y and z specication setting is
introduced at the end of the framework.134
Chapter 6 A turbo decoder energy estimation model allowing overall energy
optimisation
The proposed architecture of Figure 6.2 is considered as the combination of CUs, Reg-
bank1 and Regbank2. Here the CUs contain all of the combinational logic structures.
Regbank1 and Regbank2 are left with pure register arrays. Since synthesis results are
highly dependent on the data path lengths in the design, the described sub-module
division guarantees that all the data paths are contained in one sub-module, the CU.
Therefore, the described approach minimises the impact of variations imposes by the
synthesis process. The energy consumption of the architecture's datapath can be calcu-
lated as the sum of each sub-module's energy consumption, as detailed in the following
sections.
6.3.1 Calculation unit
In this section, the typical energy consumption of the CU sub-module is modelled as a
function of the parameters k, m, n, ws, wp, v and f. By taking the four parameters as
the input, the proposed energy model is able to calculate the typical energy consumption
of the CU in nJ/Clock Cycle, ECU
cyc.
The CU sub-modules includes an ACS unit, three general purpose registers, a intercon-
nection unit and a MUX unit. The operating clock frequency and the supply voltage
have a direct impact on the energy consumption. As discussed in Section 6.2.2, the
parameters k, m and n have an impact on the MUX unit, which aects the hardware
complexity. The parameters k, m, n, ws and wp determine the decoding throughput,
which is also directly related to the energy consumption. To model the energy consump-
tion of the CU, three types of energy consumption are investigated, namely, the energy
consumed by the CU when performing add/sub operations, max* operations and in the
idle state. Moreover, the variation of dierent parameters are investigated.
As a rst step in deriving an energy model for the CU, a xed set of parameter values
k = 1, m = 3, n = 1 and the typical supply voltage of TSMC 90nm technology v = 1:2 V
are chosen, in order to analyse the energy impact of varying the operating clock frequency
f. For a particular hardware structure, the power consumption increases almost linearly
with the clock frequency f [188], unless the data path lengths have longer delays than the
clock period, requiring additional optimisation of the hardware implementation by the
synthesis tool. Since the proposed architecture maintains control of the combinational
logic data path lengths, the post layout simulation results show a very good linearity
between the power consumption and the clock frequencies in the range 0 to 500 MHz, as
shown in Figure 6.9. Here, P(v) is dened as the power consumption of a circuit when
the supply voltage is v. Since the power consumption is proportional to the square of
the supply voltage, the power consumption for supply voltages other than v = 1:2 V can
be estimated based on the P(1:2) results of Figure 6.9 according to Equation 6.12.
P(v) =
v2
1:22  P(1:2): (6.12)Chapter 6 A turbo decoder energy estimation model allowing overall energy
optimisation 135
Figure 6.9: CU's power consumption results versus clock frequency, when k = 1,
m = 3, n = 1 and v = 1:2 V.
However, a reduction of the supply voltage v has a similar impact on the hardware
implementation as increasing the clock frequency. More specically, this increases the
delays in the circuit. When the delays in the data paths become longer than the clock
periods, the synthesis tool will introduce additional circuitry, in order to reduce the data
path lengths, at the cost of an increased hardware complexity. Therefore, Equation 6.12
is only applicable for a certain range of clock frequencies. For TSMC 90 nm technology,
the specied supply voltage range is 0.84 V to 1.32 V [189]. To obtain the applicable
clock frequency range of Equation 6.12 for the proposed architecture, the worst case
v = 0:84 V is investigated. Figure 6.10 compares the power consumption obtained by
Figure 6.10: The comparison of post layout simulation results and the estimation
results by Equation 6.12, when k = 1, m = 3, n = 1 and v = 0:84 V.
post layout results with the prediction made by Equation 6.12. As shown in the gure,
the power consumption of the actual implementation results becomes higher than the
estimation results from clock frequencies above f = 400 MHz. Therefore, when the136
Chapter 6 A turbo decoder energy estimation model allowing overall energy
optimisation
proposed architecture is implemented using TSMC 90 nm technology, Equation 6.12
can be only used when f 2(0, 400 MHz].
In this work, the objective of the propose framework is to estimate the energy consumed
by the LUT-Log-BCJR decoder per bit of information in nJ/bit. In Section 6.2.4, the
throughput of the decoder was determined as a function of the clock period. Here,
the energy consumption per clock cycle Ecyc is fundamental to the proposed energy
estimation framework. This can be obtained by dividing the power consumption P of
Figure 6.9 by the corresponding clock frequency f, according to,
Ecyc =
P
f
: (6.13)
The results obtained from post layout simulation results show that when the hardware
structure is implemented in TSMC 90 nm technology, Ecyc can be approximately con-
sidered to be a constant value that is independent from f. For example, the results
obtained for four dierent combinations of the parameters fk +n;mg are shown in Fig-
ure 6.11, including f2;3g, f3;3g, f2;5g and f7;3g. As discussed in Section 6.2.2, the
parameters k, m and n determine the size of the MUX used in the CU, which has only
an insignicant eect on the energy consumption results, as shown in Figure 6.11. The
Figure 6.11: Ecyc results of the CU with four dierent combination settings of
k + n;m, where v = 1:2 V.
dashed lines in Figure 6.11 are lines of best t for each group of Ecyc results. The results
show that, for a certain supply voltage v, Ecyc of the CU depends on fk + n;mg and is
approximately independent from f.
To model the relationship between Ecyc and fk+n;mg, the impact on the CU's hardware
structure by the variation of fk + n;mg must be considered. As discussed, parameters
k +n and m each aect the number of inputs required for a dierent multiplexor in the
MUX unit. Therefore, the eects of k + n and m are independent from each other, andChapter 6 A turbo decoder energy estimation model allowing overall energy
optimisation 137
so these parameters can be investigated individually. The simulation results of CU's
Ecyc for dierent values of k + n and m are provided in Figure 6.12 and 6.13. Note
that when one parameter is changed, the other is xed at its minimum value, namely
k+n = 2 or m = 1. The simulation results show that the parameters k+n and m have
Figure 6.12: Ecyc results of the CU for dierent k + n, where m = 1, v = 1:2 V and
f = 200 MHz.
Figure 6.13: Ecyc results of the CU for dierent m, where k = n = 1, v = 1:2 V and
f = 200 MHz.
only a negligible eect on Ecyc when the CU is idling. Therefore, Ecyc can be considered
to have a constant value which is independent from k +n and m when the CU is idling.
When the CU is performing max* or add/sub operations, the results in Figures 6.12 and
6.13 show that Ecyc increases linearly, when one of the parameter values is increased.
Linear tting results are provided in the gures accordingly. The eect on k +n and m
on the CU's energy consumption can be modelled as the energy consumption observed138
Chapter 6 A turbo decoder energy estimation model allowing overall energy
optimisation
for k + n = 2 and m = 1, plus additional energy consumptions caused by the target
k+n and target m respectively. Therefore, the linear tting results given in Figure 6.12
and 6.13 are constrained so that the functions cross the k + n = 2 and m = 1 points.
The energy consumption Ecyc of the CU is modelled as a function ECU
cyc(k;n;m;v) of
parameters k + n, m and v. The slopes of the four linear tting results in Figures 6.12
and 6.13 and the value of Ecyc(1;1;1;1:2) for dierent operations are given in Table 6.2.
Operation max* add/sub
Parameter k+n m k+n m
Slope 1:69  10 5 4:73  10 5 1:47  10 5 4:64  10 5
Ecyc(1;1;1;1:2) (nJ/Clock Cycle) 9:32  10 4 9:015  10 4
Table 6.2: The slope values of linear tting results in Figure 6.12 and 6.13 and
Ecyc(1;1;1;1:2) for dierent operations.
Therefore, the CU's Ecyc can be modelled as
ECU;max*
cyc (k;n;m;v) =
v2
1:22(9:3210 4 +1:6910 5(k +n 2)+4:7310 5(m 1));
(6.14)
ECU;add
cyc (k;n;m;v) =
v2
1:22(9:01510 4 +1:4710 5(k +n 2)+4:6410 5(m 1)):
(6.15)
As discussed above, the only signicant eect on Ecyc while the CU idling is given by
parameter v. Therefore, it can be modelled by
ECU;idle
cyc (v) =
v2
1:220:418  10 3; (6.16)
since E
CU;idle
cyc (1:2) = 0:418  10 3. A comparison between the results estimated using
these models for the examples of Figure 6.11 and the corresponding simulation results
is given in Figure 6.14.
Based on Equation 6.14, 6.15 and 6.16, the energy consumption of the CU in a LUT-Log-
BCJR decoder can be calculated, by investigating the three recursions in the decoding
process.
1. CU energy analysis for the forward recursion.
As shown in Figure 6.6, the operations of the CU in the forward recursion can be
divided into three stages, including idle stage, addition stage and max* stage. As
discussed in Section 6.2.4, one clock cycle is required for the idle stage, 2k 1(k+n)
clock cycles are required for the addition stage and 4(2k   1) clock cycles are
required for the max* stage. The average Ecyc of the CU during the forwardChapter 6 A turbo decoder energy estimation model allowing overall energy
optimisation 139
Figure 6.14: The comparison of Ecyc's simulation results and estimated results of the
CU with the same examples in Figure 6.11.
recursion can be calculated as
ECU;forward
cyc =
E
CU;idle
cyc + 2k 1(k + n)E
CU;add
cyc + 4(2k   1)E
CU;max*
cyc
Tforward
: (6.17)
2. CU energy analysis for the pre-backward recursion.
As shown in Figure 6.6, the operations performed by the CUs during the pre-
backward recursion are the same as those of the forward recursion, so
ECU;prebackward
cyc = ECU;forward
cyc : (6.18)
3. CU energy analysis for the backward recursion.
As discussed in Section 6.2.4, the backward recursion includes the calculation of
the (S) values, the i(T) values and the extrinsic LLRs. The (S) calculation
is the same as in the pre-backward recursion. The i(T) calculation includes one
clock cycle in idle and 2k+1 clock cycles of add/sub operations. The calculation
of the extrinsic LLRs can be divided into three stages. In the rst stage, each
CU perform k max* operations, taking 4k clock cycles. In the second stage, the
operations in each CU are dierent. A total of 4(m 1) clock cycles are required to
perform the m 1 max* operations, as shown in Figure 6.8. On average, each CU
spends a fraction
Pm 1
i=1 2i
(m 1)2m of the time performing max* operations and a fraction
1  
Pm 1
i=1 2i
(m 1)2m of the time in idle. In the third stage, only one CU is operated,
performing a subtraction in one clock cycle. On average, each CU spends 1
2m of
the time performing subtractions and 1   1
2m of the time in idle. The loop in
Figure 6.8 is repeated k times in each backward recursion. Therefore, the average140
Chapter 6 A turbo decoder energy estimation model allowing overall energy
optimisation
Ecyc for the backward recursion is given by
ECU;backward
cyc =
Tadd  E
CU;add
cyc + Tmax*  E
CU;max*
cyc + Tidle  E
CU;idle
cyc
Tbackward
; (6.19)
where
Tadd = 2k 1(k + n) + (2k+1 + 1
2m)k,
Tmax* = 4(2k   1) +

4k +
  Pm 1
i=1 2i
(m 1)2m

4(m   1)

k,
Tidle = 2 +

1   1
2m +
 
1  
Pm 1
i=1 2i
(m 1)2m

4(m   1)

k.
Overall, the average Ecyc of one CU is given by
ECU
cyc =
ws(E
CU;forward
cyc + E
CU;backward
cyc ) + wpE
CU;prebackward
cyc
2ws + wp
: (6.20)
6.3.2 Register banks
In this section, the typical energy consumption of the register banks is estimated. Two
internal variables are introduced for the energy model, namely the update rate u of a
register in the unit of update times per clock cycle and the number of registers in a
register bank r. As discussed in Section 6.2.1, the values of r for Regbank1 r1 and
Regbank2 r2 are given by
r1 = k + n; (6.21)
r2 = 2m(2k   1): (6.22)
In this section, the calculation of the values of u for Regbank1 and Regbank2 are dis-
cussed. The energy consumption of a register bank E
Regbank
cyc can be calculated by mul-
tiplying the energy consumption of one register E
Reg
cyc by the number of registers r it
contains. Here, E
Reg
cyc is considered to be a function of the parameter u in the energy
model.
For Regbank1, all the registers are updated together, as described in Section 6.2.4. As
shown in Figure 6.6 and 6.7, one update is performed in each loop of all three recursions.
Therefore, u of Regbank1 can be calculated as
u1 =
2ws + wp
ws(Tforward + Tbackward) + wpTprebackward
(6.23)
As described in Section 6.2.4, Regbank2 is divided into 2k   1 groups of 2m registers.
The registers in each group always updated together. As shown in Figure 6.6, for each
register is updated an average of two times in each of the forward and the pre-backwardChapter 6 A turbo decoder energy estimation model allowing overall energy
optimisation 141
recursions, giving the update rates
u2;forward =
k + n
2Tforward
; (6.24)
u2;prebackward =
k + n
2Tprebackward
: (6.25)
For the backward recursion, each group is updated
(k+n)(2k 1)
2 + 2k + k(k + m   1)   2
times in total, according to Figure 6.7. The average update rate is therefore
u2;backward =
(k+n)(2k 1)
2 + 2k + k(k + m   1)   2
(2k   1)Tbackward
: (6.26)
Finally, the overall u of Regbank2 can be calculated as
u2 =
wsu2;forward + wsu2;backward + wpu2;prebackward
2ws + wp
: (6.27)
Note that for energy estimation, only real registers are considered. The dummy registers
of Regbank1 need no consideration, since they are hard wired constant values in the
hardware implementation, as discussed in Section 6.2.1. The energy model of a register
bank can be built based on the energy analysis of a single register. Post-layout simulation
results of the energy consumed by a single register having a various update rates u,
are shown in Figure 6.15. Similar to the CU's results, the energy consumption of a
Figure 6.15: Energy consumptions of a single register operating at dierent clock
frequencies, for v = 1:2 V.
register can be considered to be a constant value that is independent from the clock
frequency. The dash lines in the gure are the average values of the simulation results.
The results show that the energy consumption signicantly increases when u increases.
In Figure 6.16, these average values are used to model the energy consumption of a
register, as a function of u. The dashed line in the gure provides the linear tting142
Chapter 6 A turbo decoder energy estimation model allowing overall energy
optimisation
Figure 6.16: Energy consumptions of a single register with dierent update rate u.
result, which demonstrates the linearity of the energy consumption with u as modelled
by
EReg
cyc = (0:168u + 0:1511)  10 3: (6.28)
Note that when u = 0, E
Reg
cyc represents the energy consumption of the register in idle.
Using the linear tting result, the energy consumption of a register bank having r number
of registers can be calculated as
ERegbank
cyc =
v2
1:22r(0:168u + 0:1511)  10 3: (6.29)
Since Regbank1 has k +n registers and Regbank2 has 2m(2k  1) registers, their energy
consumption are given by
ERegbank1
cyc =
v2
1:22(k + n)(0:168u1 + 0:1511)  10 3: (6.30)
ERegbank2
cyc =
v2
1:222m(2k   1)(0:168u2 + 0:1511)  10 3: (6.31)
To verify the energy model, an error bar plot generated by the simulation results and
the estimation results of r = 8 and v = 1:2 V is given in Figure 6.17. Here, the error
bars show the range of energy consumption values that result when the clock frequency
is varied in the range of 50 MHz to 400 MHz.Chapter 6 A turbo decoder energy estimation model allowing overall energy
optimisation 143
Figure 6.17: The error bar result, r = 8 and v = 1:2 V.
6.3.3 Datapath
There are 2m CUs in the proposed architecture. Based on the energy model of Sec-
tions 6.3.1 and 6.3.2, the energy consumption in nJ/Clock Cycle can be calculated as
EDatapath
cyc = 2mECU
cyc + ERegbank1
cyc + ERegbank2
cyc : (6.32)
This framework proposes to use the unit nJ/bit to evaluate the energy consumption of a
Log-LUT-BCJR decoder. More specically, this quanties the average energy consumed
to decode one extrinsic LLR E
Datapath
e . The energy eciency of a datapath E
Datapath
e
may be obtained by multiplying E
Datapath
cyc with the number of clock cycles required
to calculate one extrinsic LLR, Te, which can be obtained using Equation 6.11. The
word length of the xed-point representation used in the datapath is y + z, as dened
in Table 6.1. The results so far have been based on the assumption that y + z = 9.
Therefore, for a datapath with a dierent xed-point setting, E
Datapath
e can be calculated
according to
EDatapath
e =
y + z
9
EDatapath
cyc  Te: (6.33)
6.3.4 Controller
In typical ASIC design processes, intricate knowledge of the controller's hardware im-
plementation cannot be obtained before synthesis. This is because unlike the datapath
and the memory blocks, the controller design is based on the behaviour model. As a
result, the energy consumption of the controller is dicult to estimate in an early design
stage [185,190].144
Chapter 6 A turbo decoder energy estimation model allowing overall energy
optimisation
In this framework, a method to approximately estimate the controller's energy consump-
tion is proposed. A congurable Register-Transfer Level (RTL) model of the proposed
architecture's controller is designed for investigating the energy consumption of the con-
troller with dierent specications of the design parameters. The parameters that aect
the controller includes k, m, n, ws, wp and N. This RTL module is not an actual con-
troller for any LUT-Log-BCJR decoder, but it is designed to include the state machine,
which can be abstracted from Figure 6.6 and 6.7, and part of the combination logics
of the control signals, which can be generalised for any decoder. The RTL module can
be easily recongured by changing the parameters for the investigation. It represents
most of the energy consumption of an actual decoder. For example, for the Long Term
Evolution (LTE) decoder, simulation results show that the energy consumption of the
RTL module's simulation result is more than 95% of the actual design's. The remaining
5% combinational logic that denes the output signals of the controller, but which can-
not be generalised, since the design is dierent for dierent decoders. For the proposed
architecture, this inaccuracy in the controller's energy estimation is acceptable, since the
controller typically contributes only a small fraction (less than 5%) of the total energy
consumption of the decoder, as discussed in this section.
Using the proposed RTL module, the energy consumption of the proposed architecture's
controller is investigated. In Figure 6.18, the energy consumption simulation results of
the controller in nJ/Clock Cycle, EControl
cyc are given for k = 1, m = 1 and n = 1, with
dierent operating clock frequency f and interleaver length N. Here, ws = 128, wp = 24
and the typical supply voltage for TSMC 90 nm technology v = 1:2 V are assumed. It
N = 4096
N = 3072
N = 2048
N = 1024
f (MHz)
E
C
o
n
t
r
o
l
c
y
c
(
n
J
/
C
l
o
c
k
C
y
c
l
e
)
100 200 300 400
0.001
0.0008
0.0006
0.0004
0.0002
0
Figure 6.18: EControl
cyc (nJ/Clock Cycle) with dierent f and N, when k = 1, m = 1
and n = 1.
is shown that the energy consumption variation caused by dierent clock frequencies f
is insignicant. Therefore, EControl
cyc is considered to be independent from f.
In the controller design, the block length N only aects a bit counter in the controller.
This bit counter identies which bit is being processed and determines the end of aChapter 6 A turbo decoder energy estimation model allowing overall energy
optimisation 145
block, which has a length N. Therefore, the bit counter register must be able to store
values of up to N. According to the binary number representation, the required word
length is dened by dlog2(N +1)e. As a result, N only causes an apparent increment of
EControl
cyc when dlog2(N +1)e increases. To investigate the eect, EControl
cyc obtained from
post-layout simulation results with an increasing N are given in Figure 6.19, together
with a linear tting result. According to Figure 6.19, when f = 400 MHz, v = 1:2 V,
Figure 6.19: The increment of EControl
cyc when N increases.
ws = 128, wp = 24, k = 1, m = 1 and n = 1, EControl
cyc can be modelled by Equation 6.34.
EControl
cyc;N (N) = (0:01788dlog2(N + 1)e + 0:4293)  10 3: (6.34)
The proposed architecture recommends always using ws = 128 and wp = 24, except
when N  128. In this case, that the sliding window technique is not required and the
situation it is equivalent to ws = N and wp = 0 for the design. However, this exception
does not aect the controller's energy consumption, according to the simulation results.
For example, for N = 240 (WiMax) [191], EControl
cyc is 5:92510 4 nJ/Clock Cycle when
using the sliding window technique and 5:9125  10 4 nJ/Clock Cycle, otherwise.
Since parameter N only aects the bit counter in the controller, its impact on EControl
cyc
can be considered to be independent from that of the other parameters. However, k, m
and n have a more intricate eect on EControl
cyc . Unlike N, their eects on the controller's
implementation are not independent from each other. As a result, their combined eect
on the controller's energy consumption cannot be modelled independently. An estima-
tion method based on simulation results is proposed estimating EControl
cyc as functions of
the parameters k, m and n. The calculation is based on four groups of simulation results.
Here, f = 400 MHz, v = 1:2 V, N = 1024, ws = 128, and wp = 24 are assumed in these
results. Table 6.3 provides the rst group of results, which were obtained by increasing146
Chapter 6 A turbo decoder energy estimation model allowing overall energy
optimisation
k, while maintaining m = 1 and n = 1. The second group of results of Table 6.4 show
k 1 2 3 4
EControl
cyc 6:255  10 4 6:4075  10 4 6:6  10 4 6:9725  10 4
k 5 6 7 8
EControl
cyc 7:2475  10 4 7:3625  10 4 7:545  10 4 7:815  10 4
Table 6.3: EControl
cyc (nJ/Clock Cycle) simulation results of k increasing, when m = 1
and n = 1.
the eect of increasing m, while maintaining k = 1 and n = 1. The third group of
m 1 2 3 4
EControl
cyc 6:255  10 4 6:3275  10 4 6:3725  10 4 6:675  10 4
m 5 6 7 8
EControl
cyc 6:21  10 4 6:39  10 4 6:3775  10 4 6:3225  10 4
Table 6.4: EControl
cyc (nJ/Clock Cycle) simulation results of m increasing, when k = 1
and n = 1.
results of Table 6.5, were obtained by increasing n, while maintaining k = 1 and m = 1.
Finally, the fourth group of results of Table 6.6, were obtained for k = m = n, with all
n 1 2 3 4
EControl
cyc 6:255  10 4 6:205  10 4 6:145  10 4 6:39  10 4
n 5 6 7 8
EControl
cyc 6:3325  10 4 6:2925  10 4 6:2825  10 4 6:4025  10 4
Table 6.5: EControl
cyc (nJ/Clock Cycle) simulation results of n increasing, when k = 1
and m = 1.
of them increasing at the same time. To estimate EControl
cyc for a specic combination
k = m = n 1 2 3 4
EControl
cyc 6:255  10 4 6:3925  10 4 6:8625  10 4 7:0175  10 4
k = m = n 5 6 7 8
EControl
cyc 7:465  10 4 7:5925  10 4 7:7225  10 4 7:7795  10 4
Table 6.6: EControl
cyc (nJ/Clock Cycle) simulation results of k = m = n.
of k, m and n, rstly, EControl
cyc;k (k), EControl
cyc;m (m), EControl
cyc;n (n) and EControl
cyc;kmn(kmn) are used
to dene the results in Table 6.3 to Table 6.6. For a certain specication of fk;m;ng,
kmn = min(k;m;n) is dened and EControl
cyc is estimated as Equation 6.35.
EControl
cyc;k;m;n(k;m;n) = EControl
cyc;kmn(kmn) +
 
EControl
cyc;k (k)   EControl
cyc;k (kmn)

+
 
EControl
cyc;m (m)   EControl
cyc;m (kmn)

+
 
EControl
cyc;n (n)   EControl
cyc;n (kmn)

: (6.35)Chapter 6 A turbo decoder energy estimation model allowing overall energy
optimisation 147
Combining the models for N, k, m, n and v allows EControl
cyc to be estimated as
EControl
cyc =
v2
1:2
EControl
cyc;k;m;n(k;m;n) + 0:01788(dlog2(N + 1)e   11)  10 3: (6.36)
Using EControl
cyc and Te, the energy eciency of the controller in nJ/LLR can be calculated
as
EControl
e = EControl
cyc  Te: (6.37)
To verify the model, the estimation results and the simulation results of EControl
e for four
example applications are given in Table 6.7, where f = 400 MHz and v = 1:2 V.
Application WiMax [191] CDMA2000 [192] LTE [193] Deep space
communication [194]
Simulation result 0.0285 0.0241 0.0224 0.0247
Estimation result 0.0286 0.0243 0.0224 0.0246
Table 6.7: Comparison of the estimation results and the simulation results of EControl
e
(nJ/LLR) of example applications.
6.3.5 Energy estimation of LUT-Log-BCJR decoders and results vali-
dation
Based on the results in Section 6.3.3 and 6.3.4, the overall energy eciency of the LUT-
Log-BCJR decoder EBCJR
e can be calculated as
EBCJR
e = EDatapath
e + EControl
e : (6.38)
To validate the proposed framework, two LUT-Log-BCJR decoders for the two dif-
ferent turbo codes of [106] were implemented using the generalised architecture. Post-
layout simulations are performed for obtaining the post-layout energy estimation results.
Design-I has the specication of k = 1, m = 3 and n = 1. By contrast, design-II has the
specication of k = 1, m = 2 and n = 1. Both design have block lengths of N = 6144.
In addition, x = 4, y = 7, z = 2, ws = 128, wp = 24, f = 400 MHz and v = 1:2 V are
assumed, in both cases. The simulated energy consumption and the estimated energy
consumption are compared in Table 6.8. The results show that the estimated results are
with more than 98% of the post-layout simulation results.
Design-I Design-II
Simulation result 0.3208 nJ/LLR 0.154 nJ/LLR
Estimation result 0.3156 nJ/LLR 0.1516 nJ/LLR
Table 6.8: Comparison of the estimation results and the simulation results of EBCJR
e
(nJ/LLR) of the example designs.148
Chapter 6 A turbo decoder energy estimation model allowing overall energy
optimisation
6.3.6 Memories
As discussed in Section 6.1, the proposed framework aims to give energy estimation
results that closely predict the post-layout simulation results. For the memory modules,
the databook provided by the standard library developer [195] provides specications
which allow the energy consumption to be calculated. According to the TSMC 90
nm databook [195], the power consumption of a particular memory module can be
estimated by considering the accessing rate a in units of accesses per clock cycle, the
clock frequency f and the supply voltage v. According to [195], writing and reading
operations are considered to have the same energy consumption. In the standard cell
library, the power consumption of the Static Random-Access Memory (SRAM) used in
the architecture can be estimated using the reference table of [195]. In the reference
table, for memory blocks with various word-widths and word-lengths, typical accessing
power consumption pa and leakage current Il are given. The power consumption Pa
can be used to calculate the dynamic energy consumption when the memory is being
accessed. The leakage current Il can be used to calculate the static energy consumption
of the memory when it is idle. As is typical for standard cells, the reference table only
provides the reference data for several typical supply voltages. However, the voltage
scaling method used for the previous model can still applied. In this case, the typical
specications for TSMC 90 nm SRAM operating at 1.2 V are used.
The memories required by the proposed architecture are divided into three types, the
a priori LLR memory, the extrinsic LLR memory and the metric memory, as shown in
Figures 6.2 and 6.3. The accessing rate a of these three memories types are dened as aa,
ae and am, respectively. The calculation of these values can be derived from Figure 6.6
and 6.7, as
aa =
2ws + wp
ws(Tforward + Tbackward) + wpTprebackward
: (6.39)
ae =
ws
ws(Tforward + Tbackward) + wpTprebackward
: (6.40)
am =
4ws
ws(Tforward + Tbackward) + wpTprebackward
: (6.41)
As a result, the average power consumption of a memory block can be calculated as
pM =
v2
1:22(fvpaa + vIl)  10 3; (6.42)
where a 2 faa;ae;amg. The energy consumption per Clock Cycle is given by
EM
cyc =
( v3
1:22fpaa + vIl)  10 3
f
: (6.43)
For example, for the standard 128  64 bit memory block of the TSMC 90 nm library
[195], when v = 1:2 V, pa = 33:65 A/mW and Il = 0:887 A. Figure 6.20 gives bothChapter 6 A turbo decoder energy estimation model allowing overall energy
optimisation 149
the simulation results and the estimation results, in order to verify this memory energy
model. Similar to the datapath energy eciency, the memory block energy eciency
Figure 6.20: The error bar result of 128  64 bits memory, v = 1:2 V.
EM
e can be calculated as
EM
e = Te  EM
cyc: (6.44)
To estimate the memory's energy eciency in the Log-LUT-BCJR decoder, the mem-
ory specication must be chosen, according to the discussion in Section 6.2.3. Then
Equation 6.39 to 6.44 can be used to estimate the energy eciency.
6.3.7 Interleaver
The interleaver is typically designed independently from the turbo code. As a result,
it is not possible to devise a general model for estimating the energy consumption of
the interleaver in a turbo decoder, owing to the many dierent types of interleaver that
can be used. However, as mentioned in Chapter 5, in the proposed architecture, the
rate that the interleaver is required to generate addresses is very low. As a result, it is
straightforward to implement a low complexity interleaver, having an insignicant energy
consumption compared to the turbo decoder. Therefore, a less accurate estimation of
the interleaver's energy consumption does not signicantly impact the overall estimation
of the proposed framework. To simplify the energy estimation of the interleaver, further
assumptions for the employed framework may be chosen. Firstly, the interleaver may be
limited to supporting only a single length. Secondly, the LTE interleaver design may be
chosen for the estimation. These assumptions allow a relatively simple energy model to
be obtained for the interleaver and are reasonable for WSN applications. The simulation
and estimation results presented in this section will show that due to the low address
generating speed requirement in the proposed architecture, the energy consumption of
the interleaver is low and insignicant in the overall energy estimation.150
Chapter 6 A turbo decoder energy estimation model allowing overall energy
optimisation
The energy consumption of an LTE interleaver supporting only a single length is aected
by interleaver length (which is equal to the block length of the turbo code), N, and
the address generation rate g (which is measured in the number of addresses that are
generated per clock cycle, as determined by the controller design of Section 6.2.4). The
simulation results show that the eect of the interleaver length N on the interleaver's
energy consumption is insignicant. The simulated energy consumptions of interleavers
having dierent interleaver lengths, including 512, 1024, 2048 and 4096 bits have the
standard deviations that are no more than 0.8% of the average results, irrespective of
the address generation rate. Therefore, the proposed energy model for the interleaver's
energy consumption is based on only the address generation rate g. The average values
of the simulated energy consumptions of interleavers having lengths of 512, 1024, 2048
and 4096 bits, are shown in Figure 6.21, for various values of g. A linear tting result
Figure 6.21: Energy consumptions of the interleaver with dierent address generation
rates g, where V = 1.2 V and f = 400 MHz.
is also shown in the gure. Similarly to the modelling methods that were proposed for
the register banks and the CU, the energy consumption of the interleaver in nJ/Clock
Cycle can be estimated as
EInterleaver
cyc =
v2
1:22(0:9382g + 0:4359)  10 3: (6.45)
Based on the description of Section 6.2.4, in each forward, pre-backward and backward
recursion loop of the LUT-Log-BCJR decoder, the interleaver is required to generate
one memory address. As a result, the address generation rate can be calculated using
the time consumption of the three recursion loops, according to
g =
2ws + wp
ws(Tforward + Tbackward) + wpTprebackward
: (6.46)Chapter 6 A turbo decoder energy estimation model allowing overall energy
optimisation 151
Finally, using EInterleaver
cyc and Te, the energy eciency of the interleaver in nJ per extrinsic
LLR can be calculated as
EInterleaver
e = EInterleaver
cyc  Te: (6.47)
In Table 6.9, the same designs of Table 6.8 are chosen to demonstrate the insigni-
cance of the interleaver's energy consumption. The estimation results of the energy
consumed by the LUT-Log-BCJR decoders EBCJR
e , the memories EMem
e and the inter-
leavers EInterleaver
e , per extrinsic LLR are given.
Design-I Design-II
EBCJR
e 0.3156 nJ/LLR 0.1516 nJ/LLR
EMem
e 0.2761 nJ/LLR 0.276 nJ/LLR
EInterleaver
e 0.0166 nJ/LLR 0.0148 nJ/LLR
Table 6.9: The estimation results of EBCJR
b , EMem
e and EInterleaver
e (nJ/LLR) of the
example designs.
6.3.8 Energy estimation of the turbo decoders and results validation
Using the energy estimation models for the LUT-Log-BCJR decoder and the memories,
the energy estimation of a complete turbo decoder can be performed. As implied in
Figure 6.3, the two LUT-Log-BCJR decoders employed by a turbo decoder are typically
identical. As a result, the iteration decoding process between them can be performed by
one LUT-Log-BCJR decoder in practice. The energy consumption of a LUT-Log-BCJR
decoder is dened as EBCJR
e in Section 6.3.5, which is the energy consumed by the LUT-
Log-BCJR decoder for decoding one extrinsic LLR. Therefore, the energy required by
of the LUT-Log-BCJR decoder to decode one bit information EBCJR
b can be estimated
as
EBCJR
b = 2I  EBCJR
e ; (6.48)
where I is the number of iterations performed in the turbo decoding process. Similarly,
EInterleaver
b = 2I  EInterleaver
e ; (6.49)
To estimate the memories' energy consumption, the LLR memories in the turbo decoding
scheme of Figure 6.3 are divided into three groups. The a priori LLR memories with
indices 1 to k are dened as Group-1. The a priori LLR memories with indices k +1 to
k + n are dened as Group-2. Finally, the extrinsic LLR memories with indices 1 to k
are dened as Group-3.152
Chapter 6 A turbo decoder energy estimation model allowing overall energy
optimisation
As shown in Figure 6.3, Group-1 is shared by the two LUT-Log-BCJR decoders. Re-
gardless of which one is working, the behaviour of Group-1 memories is exactly the
same. Therefore, the accessing rate of Group-1 memories ag1 is same as aa, as discussed
in Section 6.3.6. This is given by
ag1 =
2ws + wp
ws(Tforward + Tbackward) + wpTprebackward
: (6.50)
Using Equation 6.50, the energy consumed by each of the k memory blocks in Group-1,
E
g1
e , can be calculated using Equation 6.44. When performing I decoding iterations,
the energy consumed per bit by Group-1 is given by
E
g1
b = 2I  k  Eg1
e : (6.51)
For Group-2, the memories are only in use when the corresponding LUT-Log-BCJR
decoder is operating, so their accessing rate is only half of aa, according to
ag2 =
2ws + wp
2(ws(Tforward + Tbackward) + wpTprebackward)
: (6.52)
However, there is another identical group of memories that are employed by the other
LUT-Log-BCJR decoder, as shown in Figure 6.3. The energy consumption per bit of
both groups can be estimated as
E
g2
b = 4I  n  Eg2
e : (6.53)
The Group-3 memories not only perform as the extrinsic LLR memories for one LUT-
Log-BCJR decoder, but also perform as the a priori LLR memories for the other LUT-
Log-BCJR decoder. Therefore, their accessing rate is the average of aa and ae, which is
given by
ag3 =
3ws + wp
2(ws(Tforward + Tbackward) + wpTprebackward)
: (6.54)
Similar to Group-2, there are two identical groups of extrinsic LLR memories correspond-
ing to each LUT-Log-BCJR decoder, as shown in Figure 6.3. The energy consumption
of both groups can be estimated as
E
g3
b = 4I  k  Eg3
e : (6.55)
The situation for the metric memory is similar to that of the LUT-Log-BCJR decoder.
The energy consumption of a turbo decoder's the metric memory is given by
Em
b = 2I  Em
e ; (6.56)
where Em
e can be calculated using Equation 6.44.Chapter 6 A turbo decoder energy estimation model allowing overall energy
optimisation 153
The total energy consumption of the memories in the turbo decoder EMem
b is the sum
of all the memories' energy consumptions, which is given by
EMem
b = E
g1
b + E
g2
b + E
g3
b + Em
b : (6.57)
The turbo decoder's total energy consumption per bit can be calculated as
ETurbo
b = EBCJR
b + EMem
b + EInterleaver
b : (6.58)
Finally, for the same designs used in Section 6.3.5, the turbo decoders' simulated energy
consumptions and the estimated energy consumptions per information bit (nJ/bit) are
compared in Table 6.8. The results show that the estimated results are within 95% of
the post-layout simulation results.
Design-I Design-II
Simulation result 6.3955 nJ/bit 4.7686 nJ/bit
Estimation result 6.0826 nJ/bit 4.4244 nJ/bit
Table 6.10: Comparison of the estimation results and the simulation results of the
energy consumptions (nJ/bit) of the example designs.
6.4 Holistic design method of turbo codes for energy-constrained
communication systems
In this section, the proposed energy estimation framework is applied to investigate the
overall energy eciency of dierent turbo codes. By considering both the transmission
energy consumption Etx
b and the decoding process energy consumption E
pr
b , a holis-
tic turbo code design method for energy-constrained applications, such as WSNs, is
demonstrated using the proposed architecture. Compared with the conventional design
method, this method allows the turbo code to be used for the purpose of reducing the
overall energy consumption of a wireless communication system. The proposed method
focuses on the encoder and decoder design of the turbo code.
The design procedure consists of two parts. In the rst part, BER simulations and
the transmission power model are used to estimate the required transmission energy
consumption Etx
b . In the second part, the energy estimation framework proposed in
Section 6.3 is used to estimate the required decoding energy consumption E
pr
b . Finally,
the overall energy consumption can be obtained by combining Etx
b and E
pr
b and used to
evaluate the energy eciency for dierent turbo codes.154
Chapter 6 A turbo decoder energy estimation model allowing overall energy
optimisation
As mentioned in Section 6.1, the objective of the proposed design method is to deter-
mine the particular parametrisation of the turbo code design that optimises the overall
energy consumption of the system. The component encoder of the design is specied by
the parameters k, m and n, as well as the generator polynomial. Further parameters
may be considered when variations of the turbo code are considered for the design. For
example, when multiple transmissions of the encoded bits are introduced, the coding
rate R is another parameter that should be considered individually. Furthermore, when
Multiple-Component Turbo Codes (MCTCs) [106] are considered, the number of parallel
component encoders that are employed becomes a parameter of the scheme. In addition,
the number of decoding iterations performed also aects the decoding energy consump-
tion E
pr
b and the required minimum transmission energy consumption Etx
b signicantly.
To present the holistic design method, a particular design example [106] is considered in
the following sections. The approach adopted here is in contrast to that of [106] where
an MCTC is designed by comparing dierent parameterisations of a turbo code scheme.
Using the conventional design method, the decisions in [106] were based on EXIT and
BER performance alone. In this chapter, by using the holistic design method, an overall
energy optimised design is obtained for the same scheme.
6.4.1 Decoding energy estimation
As mentioned above, a selection of dierent Twin-Component Turbo Codes (TCTCs)
and MCTCs were investigated in [106] using the conventional design method, based on
EXIT chart and BER analysis alone. The MCTC encoding scheme is shown in Fig-
ure 6.22. In contrast to the TCTC, an MCTC encoder employs more than two compo-
n−1
a2
π2
π3
π
a
URC3
URC2
URC1
URCh
ah+2
ah+3
a2h
a1 ah+1
a3
ah
π1
Figure 6.22: The MCTC encoder schematic [106].
nent encoders in parallel. As shown in Figure 6.22, in an MCTC employing h component
encoders, the uncoded sequence a1 and its interleaved copies fa2;a3;:::ahg are encoded
by Unity Rate Coding (URC) component encoders in order to generate the encoded
outputs fah+1;ah+2;:::a2hg. Hence, the coding rate is given by R = 1=h. Figure 6.23Chapter 6 A turbo decoder energy estimation model allowing overall energy
optimisation 155
gives the corresponding MCTC decoder schematic, which comprises h corresponding
Log-BCJR decoders.
BCJR2
3 BCJR
1
1
BCJRn
BCJR1
1
2
1
3 3
2
1
1
n−1 n−1
˜ a2h
π
π π
π
π
π
˜ a
π π
˜ a
p
1
˜ ah+1
˜ ah+2
˜ ah+3
Figure 6.23: The MCTC decoder schematic [106].
Moreover, owing to the lack of an energy estimation measure, computational complexity
is chosen for comparing dierent code designs, in [106], as well as in many similar
previous publications. In [106], the computational complexity C is dened as
C = 2m  B; (6.59)
where m is the number of the memory elements in the component encoders. However,
the proposed energy estimation framework shows that the complexity measure consid-
ered in the conventional design method does not oer fair comparisons, since it cannot
accurately represent the energy consumption of dierent turbo code designs.
In [106], twelve dierent turbo code schemes were investigated, including four MCTCs,
four systematic TCTCs and four non-systematic TCTCs, as listed in Table 6.11. For
each scheme, three dierent values of B were considered. Using BER simulations, the
performance of the chosen schemes were compared. An evaluation of the schemes was
then made, based on the BER results and the complexity C. The parametrisation of
the schemes chosen in [106] includes parameters k, m, n, B, the coding rate R and
the generator polynomial. All the schemes have the same block length of N = 2048
bits. Note that normally, for systematic codes, R = k
n+n, and for non-systematic codes,
R = k
n. However, R can be further increased by retransmitting some or all of the outputs
generated by the component encoders. Dierent coding rates R for the same code design
yield dierent performance and transmission energy consumption. in order to estimate
the decoding energy consumption of the coding schemes when employing the generalised
3The requried Signal-to-Noise Ratio (SNR) in dB of the chosen schemes at BER = 10
 5.156
Chapter 6 A turbo decoder energy estimation model allowing overall energy
optimisation
candidate k m n R B polynomial E
pr
b (nJ/bit) s (Mb/s) C SNR3
sysTCTC-1 1 3 1 1/3 3 (17;15)o 1.825 4.003 24 -0.2
sysTCTC-1 1 3 1 1/3 6 (17;15)o 3.65 2.001 48 -2.1
sysTCTC-1 1 3 1 1/3 12 (17;15)o 7.299 1.001 96 -2.5
sysTCTC-2 1 3 1 1/4 3 (17;15)o 1.825 4.003 24 -2
sysTCTC-2 1 3 1 1/4 6 (17;15)o 3.65 2.001 48 -3.3
sysTCTC-2 1 3 1 1/4 12 (17;15)o 7.299 1.001 96 -4
sysTCTC-3 1 3 1 1/5 3 (17;15)o 1.825 4.003 24 -3.3
sysTCTC-3 1 3 1 1/5 6 (17;15)o 3.65 2.001 48 -4.8
sysTCTC-3 1 3 1 1/5 12 (17;15)o 7.299 1.001 96 -5.2
sysTCTC-4 1 3 1 1/6 3 (17;15)o 1.825 4.003 24 -4.1
sysTCTC-4 1 3 1 1/6 6 (17;15)o 3.65 2.001 48 -5.6
sysTCTC-4 1 3 1 1/6 12 (17;15)o 7.299 1.001 96 -6.2
TCTC-1 1 3 1 1/3 3 (10;17)o 1.825 4.003 24 4.1
TCTC-1 1 3 1 1/3 6 (10;17)o 3.65 2.001 48 0.7
TCTC-1 1 3 1 1/3 12 (10;17)o 7.299 1.001 96 -0.5
TCTC-2 1 3 1 1/4 3 (10;17)o 1.825 4.003 24 2.6
TCTC-2 1 3 1 1/4 6 (10;17)o 3.65 2.001 48 -1
TCTC-2 1 3 1 1/4 12 (10;17)o 7.299 1.001 96 -1.5
TCTC-3 1 3 1 1/5 3 (10;17)o 1.825 4.003 24 0.6
TCTC-3 1 3 1 1/5 6 (10;17)o 3.65 2.001 48 -2.2
TCTC-3 1 3 1 1/5 12 (10;17)o 7.299 1.001 96 -2.6
TCTC-4 1 3 1 1/6 3 (10;17)o 1.825 4.003 24 -0.2
TCTC-4 1 3 1 1/6 6 (10;17)o 3.65 2.001 48 -3.3
TCTC-4 1 3 1 1/6 12 (10;17)o 7.299 1.001 96 -4.3
MCTC-1 1 2 1 1/3 6 (4;7)o 2.655 2.274 24 1
MCTC-1 1 2 1 1/3 12 (4;7)o 5.309 1.137 48 -2
MCTC-1 1 2 1 1/3 24 (4;7)o 10.619 0.568 96 -3
MCTC-2 1 2 1 1/4 6 (2;3)o 2.655 2.274 24 -3
MCTC-2 1 2 1 1/4 12 (2;3)o 5.309 1.137 48 -4
MCTC-2 1 2 1 1/4 24 (2;3)o 10.619 0.568 96 -4
MCTC-3 1 2 1 1/5 6 (2;3)o 2.655 2.274 24 -4
MCTC-3 1 2 1 1/5 12 (2;3)o 5.309 1.137 48 -5
MCTC-3 1 2 1 1/5 24 (2;3)o 10.619 0.568 96 -6
MCTC-4 1 2 1 1/6 6 (2;3)o 2.655 2.274 24 -4
MCTC-4 1 2 1 1/6 12 (2;3)o 5.309 1.137 48 -6
MCTC-4 1 2 1 1/6 24 (2;3)o 10.619 0.568 96 -7
Table 6.11: The chosen turbo code designs.Chapter 6 A turbo decoder energy estimation model allowing overall energy
optimisation 157
architecture of Section 6.2, a clock frequency of f = 400 MHz is assumed in order to
obtain the maximum throughput and a supply voltage v = 1:2 V is assumed, since this is
the standard setting of the TSMC 90 nm process technology. The decoding throughput
of the decoder s in Mb/s can be obtained as
s =
f
Te  B
; (6.60)
where Te is the average number of clock cycles for a Log-BCJR decoder to calculate
each extrinsic LLR, as discussed in Section 6.2.4. The specications and the estimation
results of all the chosen schemes are given in Table 6.11.
6.4.2 Transmission energy estimation
In this section, the BER simulation results of the chosen schemes are used to deter-
mine the corresponding transmission energy consumptions Etx
b using a transmission
power model. The path loss model given in Equation 5.1 to 5.4 is used to calculate
the transmission energy consumption per information bit Etx
b . The same environment
assumptions and system specication of the target scenario of Table 5.2 are applied.
The BER simulation results of the chosen schemes [106] are given in Figure 6.24. To
Figure 6.24: The BER simulation results of the chosen schemes. c H. Chen et al.
2010 [106]
evaluate the transmission energy, a maximum BER requirement must be specied based
on the target application. Here, a BER of 10 5 is assumed to be the maximum BER
that can be tolerated. The minimum SNRs required to achieve these BERs for each
scheme are summarised in Table 6.11. Using this transmission energy model and the
results of Table 6.11, the transmission energy consumption Etx
b of the chosen schemes
can be obtained as a function of the transmission distance, as shown in Figure 6.25.158
Chapter 6 A turbo decoder energy estimation model allowing overall energy
optimisation
Figure 6.25: The transmission energy consumption of the chosen schemes.
6.4.3 Overall energy eciency analysis
Based on the results of Sections 6.4.1 and 6.4.2, the overall energy consumption Etx
b +E
pr
b
can be obtained. Figures 6.26, 6.27 and 6.28 provide the Etx
b + E
pr
b estimation results
of the chosen schemes at transmission ranges of d = 20;30 and 40 m, which are typical
transmission ranges of WSNs that are deployed in buildings or small areas.
Figure 6.26: Overall energy consumption Etx
b + E
pr
b of the chosen schemes, when
d = 20 m.
In Figures 6.26, 6.27 and 6.28, the chosen schemes are referred to using the `scheme
name/complexity' format. The schemes are arranged in descending order of the SNR
required to achieve BER = 10 5 from left to right, according to the results given in
Table 6.11. As shown in Figures 6.26, 6.27 and 6.28, neither the required SNR nor
the complexity is correlated with the overall energy consumption. Therefore, the over-
all energy eciency of a turbo code scheme cannot be adequately analysed using the
conventional methods. However, using the proposed energy estimation framework and
path loss model, the overall energy consumption results can be obtained, as shown in
Figure 6.26, 6.27 and 6.28.Chapter 6 A turbo decoder energy estimation model allowing overall energy
optimisation 159
Figure 6.27: Overall energy consumption Etx
b + E
pr
b of the chosen schemes, when
d = 30 m.
Figure 6.28: Overall energy consumption Etx
b + E
pr
b of the chosen schemes, when
d = 40 m.
As shown in Figure 6.26, `sysTCTC-4/24', `sysTCTC-3/24' and `sysTCTC-2/24' schemes
have the least overall energy consumption among the candidates when d = 20 m. When
d = 30 m, `sysTCTC-4/24' and `sysTCTC-3/24' schemes have the least overall energy
consumption. Finally, `sysTCTC-4/48' and `sysTCTC-3/48' schemes have the least
overall energy consumption when d = 40 m. However, the overall energy consumption
of the `sysTCTC-4/24' and `sysTCTC-3/24' schemes are only slightly higher than those
of the `sysTCTC-4/48' and `sysTCTC-3/48' schemes. As a result, `sysTCTC-4/24' and
`sysTCTC-3/24' can be considered to be the most energy ecient schemes for transmis-
sion ranges between 20 m and 40 m. According to Equation 6.60, these two schemes
also have high decoding throughputs of 4.003 Mb/s. However, as shown in Figure 6.26,
6.27 and 6.28, the `sysTCTC-4/24' scheme consumes more transmission energy and less160
Chapter 6 A turbo decoder energy estimation model allowing overall energy
optimisation
decoding energy than the `sysTCTC-3/24' scheme. In general, this distinction does not
aect the choice between these two schemes. However, it may be helpful, when there are
special constraints imposed on transmission energy consumption or onboard processing
energy consumption.
The case study of [106] oers a simple example for the purpose of demonstrating the
philosophy of the proposed holistic design method. Many details, such as the assumption
of the environment and the WSN system specication included in the analysis, are
simplied for avoiding distraction from the demonstration. The proposed design method
is capable of helping the designer to optimise a turbo code design in many dierent
aspects. For example, besides the basic parameters of turbo code schemes that were
considered in the example, the longest block length N of a turbo code determines the
memory requirement of the hardware implementation, which contributes an signicant
part of the total decoding energy consumption, as shown in Section 6.3.6. The number
of decoding iterations performed in the decoding process has a signicant eect on both
the BER performance and the decoding energy consumption. In addition, the number
of hops employed in a multi-hop network determines the average transmission range
and the sensor densities. All of these aspects directly aect the transmission and the
decoding energy consumption. As a result, the proposed design method can be used to
optimise a wide variety of related specications for the purpose of improving the system
energy eciency.
6.5 Conclusions
In this chapter, the LUT-Log-BCJR architecture of Chapter 5 is generalised so that it
can be recongured to support any encoder design. A redesigned controller is introduced
which is optimised for the general case. The proposed generalised architecture may be
recongured by simply changing the number of the CUs and the number of registers in
the register banks. The behaviour and the decoding time consumption of the architecture
is fully predictable based on the redesigned controller.
Secondly, an energy estimation framework based on the LUT-Log-BCJR architecture is
proposed that allows the designer to estimate the energy consumption of a turbo code
design at an early design stage. The framework fully utilises the recongurability and
predicability of the generalised architecture. As a result, the input of the framework is
the parameters of the encoder design of the turbo code. Using a series of equations, the
energy consumption in nJ/bit of the turbo decoder can be calculated.
Finally, using the decoding energy estimation framework and the transmission power
model used in Chapter 5, the two parts of the energy consumption that related to the
using of turbo code, namely the transmission energy consumption Etx
b and the decoding
energy consumption E
pr
b can be estimated during the code design stage. This estimationChapter 6 A turbo decoder energy estimation model allowing overall energy
optimisation 161
cannot be obtained by any conventional design procedure. By taking advantage of this,
a holistic design method for turbo code design in energy constrained communication
systems is presented. The presented method allows the designer to determine the pa-
rameters of a turbo code for the purpose of optimising the overall energy eciency at
early design stage. The method is demonstrated by using it to compare the candidate
designs proposed in a conventional turbo code design work [106]. The results show that
the tools used in the conventional design method, including the BER performance and
computation complexity analysis, are not suitable for analysing the energy eciency of
turbo codes. However, the proposed holistic design method is shown to achieve this
goal. The discussion in this section concludes that the proposed design method allows
the designer to optimise the parametrisation of a turbo code design in many dierent
aspect for the purpose of improving the system energy eciency.Chapter 7
Conclusions and future work
The conclusions provided in this chapter constitute an amalgam of the conclusions drawn
from each previous chapter and their logical connections. Some ideas for the possible
future work are provided thereafter.
7.1 Conclusions
Chapter 1 reviewed the communication requirements of Wireless Sensor Networks (WSNs).
The key issue for these communication systems is the limited energy resources that are
typically available on the sensor nodes. Moreover, other communication requirements,
such as transmission range, data rate, reliability, accuracy and latency, must be con-
sidered when designing an energy ecient communication system. It is challenging
to meet all the communication requirements while maintaining a sucient lifetime for
the sensor nodes. Turbo-like codes are then proposed for reducing the transmission
energy consumption of the sensor nodes. The advantages and the drawbacks of em-
ploying turbo-like codes in channel coding are discussed. Based on the investigation
of Chapter 1, the objective of this thesis was stated as developing low-complexity and
energy-ecient hardware implementation of turbo codes and to design turbo codes with
the overall energy consumption considered in a holistic way. Previous contributions on
this topic are reviewed. The chapter is concluded by outlining the novel contributions
of the thesis.
In Chapter 2, the background knowledge of this thesis is introduced. Firstly, the basic
principle of turbo-like codes is introduced, including the typical encoding and decod-
ing schemes of Serial Concatenated Convolutional Codes (SCCCs) and turbo codes.
Secondly, the principle of the Logarithmic Bahl-Cocke-Jelinek-Raviv (Log-BCJR) de-
coding algorithm was briey reviewed. The most widely used variations of the algo-
rithm, namely the Look-Up Table based Logarithmic Bahl-Cocke-Jelinek-Raviv (LUT-
Log-BCJR) and the Maximum Logarithmic Bahl-Cocke-Jelinek-Raviv (Max-Log-BCJR)
163164 Chapter 7 Conclusions and future work
algorithms were presented. Thirdly, the EXtrinsic Information Transfer (EXIT) chart
tool and its use for analysing the performance of turbo-like codes are presented. Finally,
the xed-point number representation system for hardware design and implementation
was introduced.
In Chapter 3, a SCCC channel coding scheme is proposed for star WSNs, in order to
redistribute the energy consumption from the sensor nodes to the central node for the
purpose of extending the system's lifetime. The SCCC is an augmentation of Institute
of Electrical and Electronics Engineers (IEEE) 802.15.4 channel coding scheme. It is
designed and synthesised using STMicroelectronics 0.12 m technology for validation.
Assuming the implementation of the proposed design as a dedicated module in Chipcon
CC2430 sensor nodes, the analysis shows that the overall energy consumption of sensor
node transmissions can be reduced by 26.65 - 32.78%. The investigation results of this
chapter demonstrated that a signicant energy saving could be obtained by adopting
sophisticated Error-Correcting Codes (ECCs) in wireless communication systems. Since
ECCs encoders typically consume an insignicant amount of energy compared with the
rest of the system, a star network is naturally suited for using sophisticated ECC to give
an overall energy saving. However, for more complicated network topologies where multi-
hop communication is involved, the situation is dierent. In these cases, the employment
of high complexity ECC decoders may be required on the energy-constrained sensor
nodes. The trade-o between the energy saving provided by the coding gain of the
employed ECC and the extra energy that is consumed by the ECC decoders must be
carefully investigated. In addition, the design and implementation of energy-ecient
ECC decoders becomes an important consideration, when sophisticated ECC are used
to reduce the overall energy consumption of the wireless communication system. In
the rest of this thesis, a widely used near Shannon limit ECC, the turbo code [97], is
considered for these applications.
As discussed in Chapter 1, to employ turbo codes or turbo-like codes for reducing the
overall energy consumption of a WSN, it is important to have an energy ecient im-
plementation of the decoder. As a result, in xed-point hardware implementations, the
word length specications are highly related to the hardware complexity and hence the
energy eciency. In Chapter 4, a framework based on EXIT chart analysis is proposed to
investigate the trade-o between the word length specications of the xed-point repre-
sentation used in a turbo decoder and the decoder's Bit Error Rate (BER) performance.
The framework aims to provide a design approach that gives insight into the relationship
between the xed-point specication and the decoders' performance. This is in contrast
to the conventional BER analysis solution for avoiding energy wastage by employing a
redundant and excessive word length in xed-point hardware implementations.
In Chapter 5, the turbo decoder's eect on the overall energy eciency of a wireless
communication system is further investigated. The unsuitability of conventional turbo
decoder architectures for WSNs is discussed. Motivated by this, a low complexity andChapter 7 Conclusions and future work 165
energy-ecient LUT-Log-BCJR decoder architecture is proposed, specically for WSN
scenarios. The implementation results show that the proposed LUT-Log-BCJR archi-
tecture has a signicantly lower energy consumption than conventional LUT-Log-BCJR
decoder architectures. Furthermore, the proposed architecture provides greater over-
all energy eciency than either the conventional LUT-Log-BCJR and Max-Log-BCJR
decoder architectures when the transmission range exceeds 39 m.
In Chapter 6, based on the comprehensive investigation of energy-ecient turbo decoder
design and implementation in the previous chapters, a holistic design method for turbo
codes is proposed. This allows the optimisation of the overall energy consumption of the
turbo code from the very start of the design. The aim of this method is to help the de-
signer to determine the specications of turbo codes for energy-constrained applications,
such as WSNs. The fundamental approach of this design method is an energy estimation
framework for turbo decoders. By employing the proposed LUT-Log-BCJR architecture
of Chapter 5, a bottom-up energy estimation framework is proposed. Combining the
use of the framework, BER analysis and a path loss model of the transmission power
in wireless communication, the overall energy eciency of a turbo coding system can
be evaluated at an early design stage. By including this investigation during the code
design procedure, the turbo code can be specically optimised for a particular scenario
with the purpose of improving the overall energy eciency. In this thesis, the Uni-
versal Mobile Telecommunications System (UMTS)/Long Term Evolution (LTE) turbo
code used throughout the thesis as an example for demonstrating the proposed analysis
methods and hardware architecture for the purpose of giving comparison with typical
previous work. All the proposed techniques are generally applied to characterise any
turbo code.
7.2 Future work
This thesis has explored dierent aspects of energy-ecient turbo code design and imple-
mentation, including their parameterisations, hardware architecture and design method-
ology. All of the developed techniques are transferable to Low-Density Parity-Check
(LDPC) codes [98]. For example, the EXIT chart analysis based method proposed in
Chapter 4 for parameterising the xed-point hardware implementation is applied for
LDPC codes in [196]. Since turbo codes and LDPC codes have similar logarithmic
decoding algorithms, the proposed architecture of Chapter 5 and the holistic design
method of Chapter 6 can be also applied for LDPC codes.
An LDPC decoder can be described by a Tanner graph, as shown in Figure 7.1. The
decoding process is an iterative process between the variable nodes and the check nodes.
Each edge between the variable nodes and the check nodes of Figure 7.1 represents a
two-way data exchange. During the rst half of an iteration, the variable nodes generate166 Chapter 7 Conclusions and future work
Check nodes
Input LLRs
Variable nodes
Figure 7.1: The Tanner graph of an example LDPC decoder.
extrinsic Logarithmic Likelihood Ratios (LLRs) corresponding to each edge in the gure
and passes them to the check nodes. During the second half of an iteration, the check
nodes take those extrinsic LLRs as input and generate new extrinsic LLRs corresponding
to each edge in the Tanner graph. This is then passed to the variable nodes for use in
the next iteration. The output generated at each port of a variable node is obtained as
the sum of all the inputs provided at all other ports. Likewise the output generated at
each port of each check node is obtained as the min* of all the inputs provided at all
other ports [98]. The min* operation of a Min-Sum LDPC decoding algorithm is dened
as
min*(~ p; ~ q) = sign(~ p)sign(~ q)min(j~ pj;j~ qj) + log

1 + e j~ p+~ qj

  log

1 + e j~ p ~ qj

: (7.1)
The fundamental reason why the proposed LUT-Log-BCJR decoder architecture can
also be transformed to be a Look-Up Table based Min-Sum (LUT-Min-Sum) LDPC
decoder is that both of these decoding algorithms consist of only Add-Compare-Select
(ACS) operations. Similar to the max* operation, the log components of the min*
operation can be realised using a Look-Up Table (LUT). According to [196], xed-point
representation with 2-bit fraction part is recommended. As a result, the two correction
functions having the form of log
 
1 + e ~ x
can be implemented by a LUT, according to
log
 
1 + e ~ x

8
> > > > > > <
> > > > > > :
0:75 if ~ x = 0
0:5 if ~ x = f0:25;0:5;0:75g
0:25 if ~ x = f1;1:25;1:5;1:75;2g
0 otherwise
: (7.2)
According to the discussion given in Chapter 5, it can be concluded that as long as an
algorithm can be decomposed into a sequence of ACS operations, the proposed archi-
tecture is capable of performing it.
Using the same design philosophy of Chapter 5 for designing a Calculation Unit (CU)
for the max* operation, a new CU can be specically designed for the min* operation,
as shown in Figure 7.2 and 7.3. This new CU is designed based on some modications
of Figure 5.7 and Figure 5.8. Here, MSB(X) is dened as the most signicant bit ofChapter 7 Conclusions and future work 167
n
1
1
1
1
1
1
n
n
+
n 1
C4
C0
C2
C3
C4
M
S
B
Res
LoadC0
Op2
Op1
Cin
LoadC2
LoadC3
LoadC4
LoadC1 C1
C0
C1
C2
C3
Figure 7.2: ACS unit.
6 7
n n
n
3
n
n
n
1
1
1
1
4
0
1
5
Op2
R1
d
a
t
a
b
u
s
1
d
a
t
a
b
u
s
2
Res
LoadR1
EnR1
[C1,C4]
ALU
C
OpCode
Res
Cmp
Msb = {MSB(A),MSB(B),C0} MSB(A)
MSB(B)
B
A
LUT
E
n
M
C
0
Op1
Figure 7.3: Calculation unit.
signal X. There are four input signals and three output signals for the CU. Signals
A and B are the two operands of the target operation. The signal LUT provides all
the constant values required by the CU, including the input and output elements of
the LUT and a constant zero-valued input. The signal OpCode is the control signal
of the CU, where OpCode = fEnM;Cin;LoadC0;LoadC1;LoadC2;LoadC3;LoadC4g.
The output signal Res gives the calculation results for the addition, subtraction or min*
operation. The output signal Cmp gives the comparison results required by the min*
operations, which are stored in the 1-bit registers C1 to C4. Finally, the output signal
Msb = fC0;MSB(A);MSB(B)g is required by the controller for controling the CU during
the min* operation, as explained below. Similar to the process of performing the max*
operation in the CU of Chapter 5, the min* operation is decomposed into a sequence of
ACS operation and performed by the proposed CU in nine clock cycles, completing one
ACS operation per clock cycle, as follows.168 Chapter 7 Conclusions and future work
1. In the rst clock cycle,
OpCode =
8
<
:
11100002 if MSB(A)  MSB(B) = 0;
10100002 if MSB(A)  MSB(B) = 1:
(7.3)
The operation result of this clock cycle is:
R1 =
8
<
:
A   B if MSB(A)  MSB(B) = 0;
A + B if MSB(A)  MSB(B) = 1;
(7.4)
C0 =
8
<
:
MSB(A   B) if MSB(A)  MSB(B) = 0;
MSB(A + B) if MSB(A)  MSB(B) = 1:
(7.5)
Based on the results in R1, C0 and the signs of A and B, the further operations in
the CU to perform the min* operation can be determined. Assuming a = jAj and
b = jBj, there are four possible combinations of the signs of A and B, as summarised
in Table 7.1. The three internal results of min*(A;B), namely min(A;B), jA + Bj
A B R1
Msb
sign(A)sign(B) min (jAj; jBj) jA + Bj jA   Bj
MSB(A) MSB(B) C0
a b A   B 0 0
0 b a+b a-b
1 a a+b -(a-b)
a -b A + B 0 1
0 b a+b a-b
1 -a -(a+b) a-b
-a b A + B 1 0
0 a a+b -(a-b)
1 -b -(a+b) -(a-b)
-a -b A   B 1 1
0 -a -(a+b) -(a-b)
1 -b -(a+b) a-b
Table 7.1: Summarisation of the four dierent possible situations of Signal A and B.
and jA   Bj, can be calculated based on the results on the output signal MSB,
as shown in the table. In this way, the absolute value operation can be avoided,
removing the requirement for any dedicated hardware resource for this purpose.
2. From the second to the sixth clock cycle, the task for the CU is to perform the
comparison operations according to Equation 7.2 for the two correction terms
of Equation 7.1. Four comparisons are required in total. Similar to the proposed
LUT-Log-BCJR decoder, the comparison results are stored as 1-bit binary numbers
in the 1-bit registers C1 to C4. Based on Equation 7.1, two absolute values jA + Bj
and jA   Bj are compared with the input elements of the LUT from Equation 7.2.
The particular one that is compared rst depends on the current value stored in
R1, according to Equation 7.4. Once the comparisons for the current value in R1
are completed, the other one is calculated, stored in R1 and compared with the
LUT contents.Chapter 7 Conclusions and future work 169
The method used to compare an absolute value jxj with an LUT input element y
is to calculate y jxj. The most signicant bit of the result is then stored in a 1-bit
register as the comparison result. However, as with the current value in R1, the
CU can only calculate A+B or A B without performing an absolute operation.
In order to perform the equivalent calculation of y   jxj, the CU calculates y   x
or y + x, according to the sign of x. The sign of A + B or A   B can be obtained
by the value of MSB, as shown in Table 7.1. The controller takes MSB as an input
and generates the corresponding OpCode for the CU to perform the comparison.
The comparisons required for a 4-output LUT are given in Section 5.4.3. As a
result, two clock cycles are required to perform the comparisons for the current
value in R1. To complete the calculation of the other correction function, one clock
cycle is then used to calculate and store the value of jA + Bj or jA   Bj, whichever
has not been considered yet. Two more clock cycles are required to perform the
remaining two comparison. Five clock cycles are required in total to complete the
comparison task of a min* operation.
3. In the seventh clock cycle, the rst component of Equation 7.1 is calculated and
stored in R1. The CU performs one of the calculation from 0 + A, 0   A, 0 + B
and 0   B, according to Table 7.1.
4. Finally, in the eighth and ninth clock cycles, two additions are performed to add
the correction terms in Equation 7.1 to R1, according to the results stored in C1
to C4.
Based on the proposed CU, a highly scalable parallel architecture can be realised, oering
an attractive trade-o between the decoding throughput and the energy eciency. As
demonstrated above, the proposed CU is capable of performing the functions of both
the variable nodes and the check nodes. Ideally, a fully parallel architecture can be
implemented for the LDPC decoder, where each edge in Figure 7.1 employs dedicated
CU for calculating the extrinsic LLR. The CUs can operate as either variable nodes or
check nodes iteratively during the decoding process. However, for a LDPC code with a
long interleaver length, the number of CUs required may be too large, and the hardware
complexity of the decoder may become excessively high. Therefore, the trade-o between
the scale of parallelism of the architecture and the decoding throughput can be explored
in future work.Glossary
3GPP 3rd Generation Partnership Project.
ACS Add-Compare-Select.
AMT Aeronautical Mobile Telemetry.
APP A Posteriori Probability.
ASIC Application-Specic Integrated Circuit.
AWGN Additive White Gaussian Noise.
BAN Body Area Network.
BCJR Bahl-Cocke-Jelinek-Raviv.
BER Bit Error Rate.
BPSK Binary Phase-Shift Keying.
CU Calculation Unit.
DRP Dithered Relative Prime.
DSP Digital Signal Processor.
DVB Digital Video Broadcasting.
ECC Error-Correcting Code.
ECG ElectroCardioGraphy.
EEG ElectroEncephaloGraphy.
EMG ElectroMyoGraphy.
ETSI European Telecommunications Standards In-
stitute.
171172 Glossary
EXIT EXtrinsic Information Transfer.
FCC Federal Communications Commission.
FER Frame Error Rate.
FPGA Field-Programmable Gate Array.
FSM Finite-State Machine.
GA Genetic Algorithm.
HBC Human Body Communications.
HIHO Hard-In Hard-Out.
IC Integrated Circuit.
IEEE Institute of Electrical and Electronics Engi-
neers.
ISM Industrial Scientic and Medical.
LBT Listen-Before-Transmit.
LDPC Low-Density Parity-Check.
LLR Logarithmic Likelihood Ratio.
Log-BCJR Logarithmic Bahl-Cocke-Jelinek-Raviv.
LTE Long Term Evolution.
LUT Look-Up Table.
LUT-Log-BCJR Look-Up Table based Logarithmic Bahl-
Cocke-Jelinek-Raviv.
MAC Media Access Control.
Max-Log-BCJR Maximum Logarithmic Bahl-Cocke-Jelinek-
Raviv.
MCTC Multiple-Component Turbo Code.
MEMS Micro-Electro-Mechanical Systems.
MI Mutual Information.Glossary 173
MICS Medical Implant Communication Service.
MIPS Million Instructions Per Second.
MUX Multiplexor Unit.
NB NarrowBand.
NLOS Non-Line Of Sight.
O-QPSK Oset Quadrature Phase-Shift Keying.
PCC Parallel Concatenated Code.
PCCC Parallel Concatenated Convolutional Code.
PDA Personal Digital Assistant.
PHY PHYsical layer.
PN Pseudo Noise.
ROM Read-Only Memory.
RSC Recursive Systematic Convolutional.
RTL Register-Transfer Level.
SCC Serial Concatenated Code.
SCCC Serial Concatenated Convolutional Code.
SIHO Soft-In Hard-Out.
SISO Soft-In Soft-Out.
SNR Signal-to-Noise Ratio.
SRAM Static Random-Access Memory.
TCTC Twin-Component Turbo Code.
TSMC Taiwan Semiconductor Manufacturing Com-
pany.
UMTS Universal Mobile Telecommunications Sys-
tem.174 Glossary
UWB Ultra-WideBand.
VA Viterbi Algorithm.
WLAN Wireless Local Area Network.
WSN Wireless Sensor Network.Bibliography
[1] N. Sadeghi, S. Howard, S. Kasnavi, K. I. V. C. Gaudet, and C. Schlegel, \Analysis
of error control code use in ultra-low-power wireless sensor networks," in Proceed-
ings of International Symposium on Circuits and Systems, Island of Kos, 2006,
pp. 3558{3561.
[2] S. L. Howard, C. Schlegel, and K. Iniewski, \Error Control Coding in Low-Power
Wireless Sensor Networks: When is ECC Energy-Ecient?" EURASIP Journal
of Wireless Communications and Networking, Special Issue: CMOS RF Circuits
for Wireless Applications, vol. 2006, Arti, pp. 1{14, 2006.
[3] I. F. Akyildiz, W. Su, Y. Sankarasubramaniam, and E. Cayirci, \A survey on
sensor networks," Communications Magazine, IEEE, vol. 40, no. 8, pp. 102{114,
2002.
[4] P. Corke, T. Wark, R. Jurdak, W. Hu, P. Valencia, and D. Moore, \Environmental
Wireless Sensor Networks," Proceedings of the IEEE, vol. 98, no. 11, pp. 1903{
1917, Nov. 2010.
[5] K. Stone, B. Hoenes, and T. Camp, \Hardware Platform for Wireless Geophys-
ical Monitoring," in 10th International Conference on Information Processing in
Sensor Networks (IPSN), Chicago, IL, USA, Apr. 2011, pp. 157{158.
[6] S. Drude, \Requirements and Applications Scenarios for Body Area Networks," in
Mobile and Wireless Communications Summit 16th IST, Budapest, 2007, pp. 1{5.
[7] S. P. Kumar, \Sensor networks: Evolution, opportunities, and challenges," Pro-
ceedings of the IEEE, vol. 91, no. 8, pp. 1247{1256, Aug. 2003.
[8] I. F. Akyildiz, W. Su, Y. Sankarasubramaniam, and E. Cayirci, \Wireless sensor
networks: a survey," Computer Networks: The International Journal of Computer
and Telecommunications Networking, vol. 52, pp. 292{422, 2008.
[9] T. Arampatzis, J. Lygeros, and S. Manesis, \A Survey of Applications of Wireless
Sensors and Wireless Sensor Networks," in Proceedings of the IEEE International
Symposium on, Mediterrean Conference on Control and Automation Intelligent
Control. IEEE, 2005, pp. 719{724.
175176 BIBLIOGRAPHY
[10] B. O'Flynn, R. Martinez-Catala, S. Harte, C. O'Mathuna, J. Cleary, C. Slater,
F. Regan, D. Diamond, and H. Murphy, \SmartCoast: A Wireless Sensor Network
for Water Quality Monitoring," in 32nd IEEE Conference on Local Computer
Networks. IEEE, Oct. 2007, pp. 815{816.
[11] Z. Rasin and M. R. Abdullah, \Water Quality Monitoring System Using Zigbee
Based Wireless Sensor Network," International Journal of Engineering & Tech-
nology IJET, vol. 9, no. 10, pp. 24{28, 2009.
[12] S. Kim, S. Pakzad, D. Culler, J. Demmel, G. Fenves, S. Glaser, and M. Turon,
\Health monitoring of civil infrastructures using wireless sensor networks," in Pro-
ceedings of the 6th international conference on Information processing in sensor
networks - IPSN '07. New York, USA: ACM Press, Apr. 2007, p. 254.
[13] M. Martinelli, L. Ioriatti, F. Viani, M. Benedetti, and A. Massa, \A WSN-based
solution for precision farm purposes," in IEEE International Geoscience and Re-
mote Sensing Symposium. IEEE, 2009, pp. 469{472.
[14] G. Werner-allen, K. Lorincz, J. Johnson, J. Lees, and M. Welsh, \Fidelity and yield
in a volcano monitoring sensor network," in In Proceedings of the 7th USENIX
Symposium on Operating Systems Design and Implementation, Berkeley, CA,
USA, 2006, pp. 381|-396.
[15] M. V. Ramesh, \Real-Time Wireless Sensor Network for Landslide Detection," in
Third International Conference on Sensor Technologies and Applications. Athen-
s/Glyfada, Greece: IEEE, June 2009, pp. 405{409.
[16] A. Mainwaring, D. Culler, J. Polastre, R. Szewczyk, and J. Anderson, \Wireless
sensor networks for habitat monitoring," in Proceedings of the 1st ACM interna-
tional workshop on Wireless sensor networks and applications - WSNA '02. New
York, USA: ACM Press, Sept. 2002, p. 88.
[17] R. Szewczyk, A. Mainwaring, J. Polastre, J. Anderson, and D. Culler, \An anal-
ysis of a large scale habitat monitoring application," in Proceedings of the 2nd
international conference on Embedded networked sensor systems - SenSys '04, ser.
SenSys '04. New York, New York, USA: ACM Press, 2004, p. 214.
[18] G. Barrenetxea, F. Ingelres, G. Schaefer, and M. Vetterli, \Wireless Sensor Net-
works for Environmental Monitoring: The SensorScope Experience," in IEEE In-
ternational Zurich Seminar on Communications, Zurich, 2008, pp. 98{101.
[19] V. Potdar, A. Sharif, and E. Chang, \Wireless Sensor Networks: A Survey," 2009
International Conference on Advanced Information Networking and Applications
Workshops, pp. 636{641, May 2009.
[20] G. J. Pottie and W. J. Kaiser, \Wireless integrated network sensors," Communi-
cations of the ACM, vol. 43, no. 5, pp. 51{58, May 2000.BIBLIOGRAPHY 177
[21] F. L. Lewis, \Wireless Sensor Networks," in Smart environments: technologies,
protocols, and applications. Wiley-Interscience, 2005, vol. 43, ch. 2.
[22] A.-S. Porret, T. Melly, C. C. Enz, and E. A. Vittoz, \A low-power low-voltage
transceiver architecture suitable for wireless distributed sensors network," in 2000
IEEE International Symposium on Circuits and Systems. Emerging Technologies
for the 21st Century. Proceedings (IEEE Cat No.00CH36353). Presses Polytech.
Univ. Romandes, pp. 56{59.
[23] A. Arora, R. Ramnath, E. Ertin, P. Sinha, S. Bapat, V. Naik, V. Kulathu-
mani, M. Sridharan, S. Kumar, N. Seddon, C. Anderson, T. Herman, N. Trivedi,
M. Nesterenko, R. Shah, S. Kulkarni, M. Aramugam, M. Gouda, D. Culler,
P. Dutta, C. Sharp, G. Tolle, M. Grimmer, B. Ferriera, and K. Parker, \ExScal:
Elements of an Extreme Scale Wireless Sensor Network," in 11th IEEE Interna-
tional Conference on Embedded and Real-Time Computing Systems and Applica-
tions. IEEE, 2005, pp. 102{108.
[24] E. Shih, S.-H. Cho, N. Ickes, R. Min, A. Sinha, A. Wang, and A. Chandrakasan,
\Physical layer driven protocol and algorithm design for energy-ecient wireless
sensor networks," in Proceedings of the 7th annual international conference on
Mobile computing and networking - MobiCom '01. New York, New York, USA:
ACM Press, July 2001, pp. 272{287.
[25] Z.-H. Long and M.-J. Gao, \Survey on network lifetime research for wireless sensor
networks," in 2009 2nd IEEE International Conference on Broadband Network &
Multimedia Technology, Oct. 2009, pp. 899{902.
[26] S. Roundy, P. K. Wright, and J. M. Rabaey, Energy Scavenging for Wireless Sensor
Networks. New York: Springer, 2004.
[27] V. Raghunathan, A. Kansal, J. Hsu, J. Friedman, and M. Srivastava, \Design
considerations for solar energy harvesting wireless embedded systems," in 4th In-
ternational Symposium on Information Processing in Sensor Networks. IEEE,
2005, pp. 457{462.
[28] S. Y. Seidel and T. S. Rappaport, \914 MHz path loss prediction models for
indoor wireless communications in multioored buildings," IEEE Transactions on
Antennas and Propagation, vol. 40, no. 2, pp. 207{217, 1992.
[29] A. Fanimokun and J. Frolik, \Eects of natural propagation environments on
wireless sensor network coverage area," in Proceedings of the 35th Southeastern
Symposium on System Theory. Auburn, AL, U.S.: IEEE, 2003, pp. 16{20.
[30] W. Huan, \Wireless body area networks path loss characterization analysis," in
2nd International Conference on Computer Engineering and Technology. IEEE,
2010, pp. 163{164.178 BIBLIOGRAPHY
[31] E. Reusens, W. Joseph, G. Vermeeren, L. Martens, B. Latre, I. Moerman,
B. Braem, and C. Blondia, \Path loss models for wireless communication chan-
nel along arm and torso: measurements and simulations," in IEEE Antennas and
Propagation International Symposium. IEEE, June 2007, pp. 345{348.
[32] A. Fort, J. Ryckaert, C. Desset, P. De Doncker, P. Wambacq, and L. Van Biesen,
\Ultra-wideband channel model for communication around the human body,"
IEEE Journal on Selected Areas in Communications, vol. 24, no. 4, pp. 927{933,
Apr. 2006.
[33] K. Sayraan-Pour, W.-B. Yang, J. Hagedorn, J. Terrill, and K. Y. Yazdandoost,
\A statistical path loss model for medical implant communication channels," in
2009 IEEE 20th International Symposium on Personal, Indoor and Mobile Radio
Communications. Tokyo, Japan: IEEE, Sept. 2009, pp. 2995{2999.
[34] V. Z. Groza, D. Makrakis, D. C. Petriu, N. D. Georganas, and E. M. Petriu,
\Sensor-based information appliances," IEEE Instrumentation & Measurement
Magazine, vol. 3, no. 4, pp. 31{35, 2000.
[35] M. Dermibas, \Wireless sensor networks for monitoring of large public buildings,"
University at Bualo, Tech. Rep, 2005.
[36] T. G. Zimmerman, \Personal Area Networks: Neareld intrabody communica-
tion," IBM System Journal, vol. 35, pp. 609{617, 1996.
[37] B. Zhen, H. Li, and R. Kohno, \IEEE Body Area Netwokrs for Medical Applica-
tions," in Wireless Communication Systems, 2007. ISWCS 2007. 4th International
Symposium on, 2007.
[38] M. A. Hanson, H. C. Powell, A. T. Barth, K. Ringgenberg, B. H. Calhoun, J. H.
Aylor, and J. Lach, \Body Area Sensor Networks: Challenges and Opportunities,"
Computer, vol. 42, no. 1, pp. 58{65, Jan. 2009.
[39] H. Li, K. Takizawa, B. Zhen, and R. Kohno, \Body Area Network and Its Stan-
dardization at IEEE 802.15.MBAN," in Mobile and Wireless Communications
Summit, 2007. 16th IST, 2007, pp. 1{5.
[40] J. A. D. Moutinho, \Wireless Body Area Network," 2011.
[41] J. Ryckaert, P. De Doncker, R. Meys, A. De Le Hoye, and S. Donnay, \Chan-
nel model for wireless communication around human body," Electronics Letters,
vol. 40, no. 9, p. 543, 2004.
[42] N. F. Timmons and W. G. Scanlon, \Analysis of the performance of IEEE 802.15.4
for medical sensor body area networking," in First Annual IEEE Communications
Society Conference on Sensor and Ad Hoc Communications and Networks. IEEE,
2004, pp. 16{24.BIBLIOGRAPHY 179
[43] J. Rousselot, A. El-Hoiydi, and J.-D. Decotignie, \Performance Evaluation of the
IEEE 802.15.4a UWB Physical Layer for Body Area Networks," in Computers and
Communications, 2007. ISCC 2007. 12th IEEE Symposium on, 2007.
[44] D. Domenicali and M.-G. D. Benedetto, \Performance Analysis for a Body Area
Network Composed of IEEE 802.15.4a Devices," in Proceedings of 4th Workshop
on Positioning, Navigation and Communication 2007(WPNC'07), Hannover, Ger-
many, 2007, pp. 273{276.
[45] Q. Zhang, P. Feng, Z. Geng, X. Yan, and N. Wu, \A 2.4-GHz Energy-Ecient
Transmitter for Wireless Medical Applications," IEEE Transactions on Biomedical
Circuits and Systems, vol. 5, no. 1, pp. 39{47, Feb. 2011.
[46] J. Grosinger and M. Fischer, \Indoor on-body channel measurements at 900MHz,"
in 2011 IEEE-APS Topical Conference on Antennas and Propagation in Wireless
Communications. IEEE, Sept. 2011, pp. 1037{1040.
[47] J. Masuch and M. Delgado-Restituto, \A 350 W 2.3 GHz integer-N frequency syn-
thesizer for body area network applications," in 2011 IEEE 11th Topical Meeting
on Silicon Monolithic Integrated Circuits in RF Systems. IEEE, Jan. 2011, pp.
105{108.
[48] D. Kurup, W. Joseph, E. Tanghe, G. Vermeeren, and L. Martens, \Extraction
of antenna gain from path loss model for in-body communication," Electronics
Letters, vol. 47, no. 23, p. 1262, 2011.
[49] S. L. Cotton, A. McKernan, A. J. Ali, and W. G. Scanlon, \An experimental study
on the impact of human body shadowing in o-body communications channels
at 2.45 GHz," in Proceedings of the 5th European Conference on Antennas and
Propagation (EUCAP), 2011, pp. 3133{3137.
[50] V. De Santis and M. Feliziani, \Intra-body channel characterization of medical
implant devices," in EMC Europe 2011 York, 2011, pp. 816{819.
[51] D. Kurup, W. Joseph, G. Vermeeren, and L. Martens, \In-body Path Loss Model
for Homogeneous Human Tissues," IEEE Transactions on Electromagnetic Com-
patibility, no. 99, pp. 1{9, 2011.
[52] \Revision of part 15 of the commission's rules regarding ultra-wideband trans-
mission systems: First report and order," Federal Communications Commission,
Washington DC, Tech. Rep., 2002.
[53] J. Ryckaert, C. Desset, V. De Heyn, M. Badaroglu, P. Wambacq, G. V. der Plas,
and B. V. Poucke, \Ultra-WideBand Transmitter for Wireless Body Area Net-
works," in Proceeding on 14th IST Mobile & Wireless Communications Summit,
2005.180 BIBLIOGRAPHY
[54] T. S. P. See, J. Y. Hee, C. T. Ong, L. C. Ong, and Z. N. Chen, \Inter-body channel
model for UWB communications," in 3rd European Conference on Antennas and
Propagation, 2009, pp. 3519{3522.
[55] Q. Wang, T. Tayamachi, I. Kimura, and J.-Q. Wang, \An On-Body Channel Model
for UWB Body Area Communications for Various Postures," IEEE Transactions
on Antennas and Propagation, vol. 57, no. 4, pp. 991{998, Apr. 2009.
[56] L. Betancur, N. Cardona, A. Navarro, and L. Traver, \A statistical channel model
for on body Area networks in Ultra Wide Band Communications," in 2009 IEEE
Latin-American Conference on Communications. IEEE, Sept. 2009, pp. 1{6.
[57] K. Y. Yazdandoost and K. Hamaguchi, \Very small UWB antenna for WBAN
applications," in 2011 5th International Symposium on Medical Information and
Communication Technology. IEEE, Mar. 2011, pp. 70{73.
[58] M. L. R. Fox, H. Symons, S. Berson, and H. Westphal, \FCC proposes rules for
body area networks (MBAN)," 2009.
[59] K. S. Kwak, S. Ullah, and N. Ullah, \An overview of IEEE 802.15.6 standard," in
2010 3rd International Symposium on Applied Sciences in Biomedical and Com-
munication Technologies (ISABEL 2010). IEEE, Nov. 2010, pp. 1{6.
[60] M. Patel and J. Wang, \Applications, challenges, and prospective in emerging
body area networking technologies," Wireless Communications, IEEE, vol. 17,
no. 1, pp. 80{88, 2010.
[61] Y.-Q. Zhang, Y. Shakhsheer, A. T. Barth, H. C. P. Jr., S. A. Ridenour, M. A.
Hanson, J. Lach, and B. H. Calhoun, \Energy Ecient Design for Body Sensor
Nodes," Journal of Low Power Electronics and Applications, vol. 1, no. 1, pp.
109{130, 2011.
[62] S. Ullah and K. S. Kwak, \Throughput and delay limits of IEEE 802.15.6," in
2011 IEEE Wireless Communications and Networking Conference. IEEE, Mar.
2011, pp. 174{178.
[63] V. M. Jones, R. G. A. Bults, D. Konstantas, and P. A. M. Vierhout, \Healthcare
PANs: Personal Area Networks for trauma care and home care," in In 4th Inter-
national Symposium on Wireless Personal Multimedia Communications (WPMC),
2001, pp. 1369{1374.
[64] M. Soini, J. Nummela, P. Oksa, L. Ukkonen, and L. Syd anheimo, \Wireless Body
Area Network for Hip Rehabilitation System," Ubiquitous Computing and Com-
munication Journal, vol. 3, p. 7, 2008.
[65] L. Huang, M. Ashouei, R. F. Yazicioglu, J. Penders, R. J. M. Vullers, G. Dolmans,
P. Merken, J. Huisken, H. De Groot, C. V. Hoof, and B. Gyselinckx, \Ultra-LowBIBLIOGRAPHY 181
Power Sensor Design for Wireless Body Area Networks - Challenges, Potential
Solutions, and Applications," Journal of Digital Content Technology and its Ap-
plications, vol. 3, no. 3, pp. 136{148, 2009.
[66] B. Latr e, I. Moerman, B. Dhoedt, and P. Demeester, \Networking in wireless body
area networks," in in 5th FTW PHD Symposium, Interactive poster session, 2004,
p. 113.
[67] C. K. Singh and A. Kumar, \Performance evaluation of an IEEE 802.15.4 sensor
network with a star topology," Wireless Networks, vol. 14, no. 4, pp. 543{568,
2008.
[68] S. Choi, S. Song, K. Sohn, H. Kim, J. Kim, J. Yoo, and H. Yoo, \A Low-power
Star-topology Body Area Network Controller for Periodic Data Monitoring Around
and Inside the Human Body," in 10th IEEE International Symposium on Wearable
Computers, 2006, pp. 139{140.
[69] R. G. Maunder, A. S. Weddell, G. V. Merrett, B. M. Al-Hashimi, and L. Hanzo,
\Iterative Decoding for Redistributing Energy Consumption in Wireless Sensor
Networks," in Proceedings of the IEEE Int Conf on Computer Communications
and Networks. IEEE, 2008, pp. 623{628.
[70] A. G. Ruzzelli, R. Jurdak, G. M. P. O'Hare, and P. V. D. Stok, \Energy-ecient
multi-hop medical sensor networking," in Proceedings of the 1st ACM SIGMO-
BILE international workshop on Systems and networking support for healthcare
and assisted living environments, 2007, pp. 37{42.
[71] B. Latre, B. Braem, I. Moerman, C. Blondia, E. Reusens, W. Joseph, and P. De-
meester, \A Low-delay Protocol for Multihop Wireless Body Area Networks," in 4h
Annual International Conference on Mobile and Ubiquitous Systems: Networking
& Services, 2007, pp. 1{8.
[72] J. Heidemann and D. Estrin, \An energy-ecient MAC protocol for wireless sensor
networks," in Proceedings of Twenty-First Annual Joint Conference of the IEEE
Computer and Communications Societies. IEEE, 2002, pp. 1567{1576.
[73] T. V. Dam and K. Langendoen, \An adaptive energy-ecient MAC protocol for
wireless sensor networks," in Proceedings of the First international conference on
Embedded networked sensor systems - SenSys. New York, New York, USA: ACM
Press, Nov. 2003, p. 171.
[74] E. J. Coyle, \An energy ecient hierarchical clustering algorithm for wireless sen-
sor networks," in Twenty-second Annual Joint Conference of the IEEE Computer
and Communications Societies, vol. 3. IEEE, 2003, pp. 1713{1723.182 BIBLIOGRAPHY
[75] D.-H. Nam and H.-K. Min, \An Energy-Ecient Clustering Using a Round-Robin
Method in a Wireless Sensor Network," in 5th ACIS International Conference on
Software Engineering Research, Management & Applications. IEEE, Aug. 2007,
pp. 54{60.
[76] L. Kyounghwa, L. Joohyun, L. Hyeopgeon, and S. Yongtae, \A Density and Dis-
tance based Cluster Head Selection algorithm in Sensor Networks," in The 12th In-
ternational Conference on Advanced Communication Technology (ICACT), 2010,
pp. 162{165.
[77] M. C. M. Thein and T. Thein, \An Energy Ecient Cluster-Head Selection for
Wireless Sensor Networks," in International Conference on Intelligent Systems,
Modelling and Simulation. IEEE, Jan. 2010, pp. 287{291.
[78] M. Cardei and M. Thai, \Energy-ecient target coverage in wireless sensor net-
works," in Proceedings of 24th Annual Joint Conference of the IEEE Computer
and Communications Societies., vol. 3. IEEE, 2005, pp. 1976{1984.
[79] A. Wang and A. Chandrakasan, \Energy-ecient DSPs for wireless sensor net-
works," IEEE Signal Processing Magazine, vol. 19, no. 4, pp. 68{78, July 2002.
[80] X.-H. Li, \Energy ecient wireless sensor networks with transmission diversity,"
Electronics Letters, vol. 39, no. 24, p. 1753, 2003.
[81] O. C. Omeni, O. Eljamaly, and A. J. Burdett, \Energy Ecient Medium Access
Protocol for Wireless Medical Body Area Sensor Networks," in 4th IEEE/EMBS
International Summer School and Symposium on Medical Devices and Biosensors.
IEEE, Aug. 2007, pp. 29{32.
[82] C. C. Enz, A. El-Hoiydi, J.-D. Decotignie, and V. Peiris, \WiseNET: an ultralow-
power wireless sensor network solution," Computer, vol. 37, no. 8, pp. 62{70, Aug.
2004.
[83] C. Schurgers and M. B. Srivastava, \Energy ecient routing in wireless sensor net-
works," in Proceedings Communications for Network-Centric Operations: Creating
the Information Force, vol. 1. IEEE, 2001, pp. 357{361.
[84] H. Oh and K. Chae, \An Energy-Ecient Sensor Routing with low latency, scal-
ability in Wireless Sensor Networks," in International Conference on Multimedia
and Ubiquitous Engineering. IEEE, 2007, pp. 147{152.
[85] M. Zhang, S.-P. Wang, C. Liu, and H.-B. Feng, \An Novel Energy-Ecient Mini-
mum Routing Algorithm (EEMR) in Wireless Sensor Networks," in 4th Interna-
tional Conference on Wireless Communications, Networking and Mobile Comput-
ing. IEEE, Oct. 2008, pp. 1{4.BIBLIOGRAPHY 183
[86] W. N. W. Muhamad, N. F. Naim, N. Hussin, N. Wahab, N. A. Aziz, S. S. Sarnin,
and R. Mohamad, \Maximizing Network Lifetime with Energy Ecient Rout-
ing Protocol for Wireless Sensor Networks," in Fifth International Conference on
MEMS NANO, and Smart Systems. IEEE, 2009, pp. 225{228.
[87] X. Chen, R. Blum, and B. M. Sadler, \A new scheme for energy-ecient estimation
in a sensor network," in 43rd Annual Conference on Information Sciences and
Systems. IEEE, Mar. 2009, pp. 799{804.
[88] M. C. Vuran and I. F. Akyildiz, \Error Control in Wireless Sensor Networks: A
Cross Layer Analysis," IEEE/ACM Transactions on Networking, vol. 17, no. 4,
pp. 1186{1199, Aug. 2009.
[89] M. E. Pellenz, R. D. Souza, and M. S. P. Fonseca, \Error control coding in wireless
sensor networks," Telecommunication Systems, vol. 44, no. 1-2, pp. 61{68, 2009.
[90] A. Brokalakis and I. Papaefstathiou, \Using hardware-based forward error correc-
tion to reduce the overall energy consumption of WSNs," in 2012 IEEE Wireless
Communications and Networking Conference (WCNC). IEEE, Apr. 2012, pp.
2191{2196.
[91] J. Abouei, J. D. Brown, K. N. Plataniotis, and S. Pasupathy, \On the Energy
Eciency of LT Codes in Proactive Wireless Sensor Networks," IEEE Transactions
on Signal Processing, vol. 59, no. 3, pp. 1116{1127, Mar. 2011.
[92] A. Brokalakis, G.-G. Mplemenos, K. Papadopoulos, and I. Papaefstathiou, \RE-
SENSE: An Innovative, Recongurable, Powerful and Energy Ecient WSN
Node," in 2011 IEEE International Conference on Communications (ICC). IEEE,
June 2011, pp. 1{5.
[93] J. Singh and D. Pesch, \Towards Energy Ecient Adaptive Error Control in In-
door WSN: A Fuzzy Logic Based Approach," in 2011 IEEE Eighth International
Conference on Mobile Ad-Hoc and Sensor Systems. IEEE, Oct. 2011, pp. 63{68.
[94] J. S. Rahhal, \LDPC coding for MIMO wireless sensor networks with clustering,"
in 2012 Second International Conference on Digital Information and Communica-
tion Technology and it's Applications (DICTAP). IEEE, May 2012, pp. 58{61.
[95] J. Misic, \Enforcing Patient Privacy in Healthcare WSNs Using ECC Implemented
on 802.15.4 Beacon Enabled Clusters," in Pervasive Computing and Communica-
tions, 2008. PerCom 2008. Sixth Annual IEEE International Conference on, 2008.
[96] M. R. Yuce, \Implementation of Body Area Networks Based on MICS/WMTS
Medical Bands for Healthcare Systems," in IEEE Engineering in Medicine and
Biology Society Conference (IEEE EMBC08), Aug. 2008, pp. 3417{3421.184 BIBLIOGRAPHY
[97] C. Berrou, A. Glavieux, and P. Thitimajshima, \Near Shannon Limit Error Cor-
recting Coding and Decoding: Turbo Codes," in Proceedings of the IEEE Inter-
national Conference on Communications, vol. 2, Geneva, Switzerland, 1993, pp.
1064{1070.
[98] R. Gallager, \Low-density parity-check codes," IEEE Transactions on Information
Theory, vol. 8, no. 1, pp. 21{28, Jan. 1962.
[99] M. Petrova, J. Riihijarvi, P. Mahonen, and S. Labella, \Performance study of
IEEE 802.15.4 using measurements and simulations," in Proceeding of the IEEE
Wireless Communications and Networking Conference. IEEE, 2006, pp. 487{492.
[100] S. Ten Brink, \Convergence Behavior of Iteratively Decoded Parallel Concatenated
Codes," IEEE Transactions on Communications, vol. 49, no. 10, pp. 1727{1737,
2001.
[101] S. Benedetto and G. Montorsi, \Serial concatenated of block and convolutional
codes," Electronics Letters, vol. 32, no. 10, pp. 887{888, 1996.
[102] \Wireless Medium Access Control (MAC) and Physical Layer (PHY) Specica-
tions for Low-Rate Wireless Personal Area Networks (WPANs)," 2006.
[103] \3GPP LTE Turbo Reference Design," Altera Corporation, Tech. Rep., 2011.
[104] \Digital Video Broadcasting (DVB); Framing Structure, channel coding and mod-
ulation for Satellite Services to Handheld devices (SH) below 3 GHz," 2010.
[105] L. R. Bahl, J. Cocke, F. Jelinek, and J. Raviv, \Optimal decoding of linear codes
for minimizing symbol error rate," IEEE Transactions on Information Theory,
vol. 20, no. 3, pp. 284{287, 1974.
[106] H. Chen, R. G. Maunder, and L. Hanzo, \An Exit-Chart Aided Design Procedure
for Near-Capacity N-Component Parallel Concatenated Codes," in Proceedings of
the IEEE Global Telecommunications Conference GLOBECOM. Miami, Florida,
US: IEEE, Dec. 2010, pp. 1{5.
[107] G. D. J. Forney, \Concatenated Codes," Massachusetts Institute of Technology
Research Lab of Electronics, Tech. Rep., 1966.
[108] I. S. Reed and G. Solomon, \Polynomial Codes Over Certain Finite Fields," SIAM
Journal of Applied Math, vol. 8, pp. 300{304, 1960.
[109] P. Elias, \Coding for noisy channels," in IRE Convention Record Pt. 4, 1955, p. 37.
[110] J. H. Yuen, M. K. Simon, W. Miller, F. Pollara, C. R. Ryan, D. Divsalar, and
J. C. Morakis, \Modulation and coding for satellite and space communications,"
in Proceedings of the IEEE, vol. 78, no. 7, 1990, pp. 1250{1265.BIBLIOGRAPHY 185
[111] A. J. Viterbi, \Error bounds for convolutional codes and an asymptotically opti-
mum decoding algorithm," IEEE Transactions on Information Theory, vol. IT-13,
pp. 493{497, 1967.
[112] E. Boutillon, C. Douillard, and G. Montorsi, \Iterative Decoding of Concatenated
Convolutional Codes: Implementation Issues," in Proceedings of the IEEE, vol. 95,
no. 6, 2007, pp. 1201{1227.
[113] B. Sklar, Fundamentals of Turbo Codes, Digital Communications: Fundamentals
and Applications, Second Edition. Prentice-Hall, 2001.
[114] C. Berrou and A. Glavieux, \Near Optimum Error Correcting Coding and De-
coding: Turbo-Codes," IEEE Trans. on Communications, vol. 44, no. 10, pp.
1261{1271, 1996.
[115] C. Schlegel and L. Perez, Trellis and Turbo Coding, ser. IEEE Press Series on
Digital & Mobile Communication, J. B. Anderson, Ed. John Wiley & Sons, 2004.
[116] Universal Mobile Telecommunications System (UMTS); Multiplexing and Channel
Coding (FDD), European Telecommunications Standards Institute Std., 1999.
[117] C. Weiss, C. Bettstetter, and S. Riedel, \Code construction and decoding of par-
allel concatenated tail-biting codes," IEEE Transactions on Information Theory,
vol. 47, no. 10, pp. 366{386, 2001.
[118] P. Robertson, E. Villebrun, and P. Hoeher, \A comparison of optimal and Sub-
optimal MAP decoding algorithms operating in the log domain," in Proceedings
of IEEE International Conference of Communication, vol. 2, Seattle, WA, USA,
1995, pp. 1009{1013.
[119] J. Sayir, \Measuring EXIT Charts for Low Complexity Decoders," in International
Symposium on Communication Theory and Applications, Ambleside, UK, 2009.
[120] B. Krishnamachari and C. S. Raghavendra, \Performance evaluation of the IEEE
802.15.4 MAC for low-rate low-power wireless networks," in IEEE International
Conference on Performance, Computing, and Communications. IEEE, 2004, pp.
701{706.
[121] J. Hoert, K. Klues, and O. Orjih, \Conguring the IEEE 802.15.4 MAC Layer
for Single-sinkWireless Sensor Network Applications," Washington University, St.
Louis, Missouri, Tech. Rep., 2005.
[122] \A True System-on-Chip Solution for 2.4 GHz IEEE 802.15.4 / Zig- Bee(TM)
Datasheet," Chipcon, Tech. Rep., 2007.
[123] D. E. Goldberg, Genetic Algorithms in Search, Optimization and Machine Learn-
ing. Reading, MA, USA: Addison-Wesley, Oct. 1989.186 BIBLIOGRAPHY
[124] S. Crozier and P. Guinand, \High-performance low-memory interleaver banks for
turbo-codes," in Proceeding of IEEE 54th Vehicular Technology Conference. VTC
Fall, vol. 4. IEEE, 2001, pp. 2394{2398.
[125] D. Divsalar, S. Dolinar, and F. Pollara, \Serial Concatenated Trellis Coded Modu-
lation with Rate-1 Inner Code," in In Proceeding of GLOBECOM, San Francisco,
2000, pp. 777{782.
[126] S. Benedetto and G. Montorsi, \Iterative decoding of serially concatenated convo-
lutional codes," Electronics Letters, vol. 32, no. 13, pp. 1186{1188, 1996.
[127] T. Okuma, Y. Cao, M. Muroyama, and H. Yasuura, \Reducing access energy
of on-chip data memory considering active data bitwidth," in Proceedings of the
International Symposium on Low Power Electronics and Design. IEEE, 2002, pp.
88{91.
[128] Z.-F. Wang and Q.-W. Li, \Very Low-Complexity Hardware Interleaver for Turbo
Decoding," IEEE Transactions on Circuits and Systems II: Express Briefs, vol. 54,
no. 7, pp. 636{640, July 2007.
[129] J.-H. Kim and I.-C. Park, \Double-Binary Circular Turbo Decoding Based on
Border Metric Encoding," IEEE Transactions on Circuits and Systems II: Express
Briefs, vol. 55, no. 1, pp. 79{83, Jan. 2008.
[130] M. Martina, M. Nicola, and G. Masera, \A Flexible UMTS-WiMax Turbo Decoder
Architecture," IEEE Transactions on Circuits and Systems II: Express Briefs,
vol. 55, no. 4, pp. 369{373, Apr. 2008.
[131] R. Dobkin, M. Peleg, and R. Ginosar, \Parallel interleaver design and VLSI archi-
tecture for low-latency MAP turbo decoders," IEEE Transactions on Very Large
Scale Integration (VLSI) Systems, vol. 13, no. 4, pp. 427{438, Apr. 2005.
[132] M. Kakitani, G. Brante, R. D. Souza, and A. Munaretto, \Comparing the energy
eciency of single-hop, multi-hop and incremental decode-and-forward in multi-
relay wireless sensor networks," in 2011 IEEE 22nd International Symposium on
Personal, Indoor and Mobile Radio Communications. IEEE, Sept. 2011, pp.
970{974.
[133] H. Michel and N. Wehn, \Turbo-Decoder Quantization for UMTS," IEEE Com-
munication Letters, vol. 5, no. 2, pp. 55{57, 2001.
[134] M. A. Castellon, I. J. Fair, and D. G. Elliott, \Fixed-point turbo decoder imple-
mentation suitable for embedded applications," in Proceedings of the Canadian
Conference on Electrical and Computer Engineering, Saskatoon, Canada, 2005,
pp. 1065{1068.BIBLIOGRAPHY 187
[135] J. Hsu and C. Wang, \On nite-precision implementation of a decoder for turbo
codes," in Proceedings of the IEEE International Symposium on Circuits and Sys-
tems, vol. 4, Orlando, FL, USA, 1999, pp. 423{426.
[136] A. Worm, H. Michel, F. Gilbert, G. Kreiselmaier, M. Thul, and N. Wehn, \Ad-
vanced Implementation Issues of Turbo-Decoders," in Proceedings of the Interna-
tional Symposium on Turbo-Codes and Related Topics, Brest, France, Sept. 2000,
pp. 351{354.
[137] T. K. Blankenship and B. Classon, \Fixed-point performance of low-complexity
turbo decoding algorithms," in Proceddings of the IEEE Vehicular Technology
Conference, vol. 2, Rhodes, Greece, 2001, pp. 1483{1487.
[138] G. Montorsi and S. Benedetto, \Design of Fixed-Point Iterative Decoders for Con-
catenated Codes with Interleavers," in IEEE Journal on Selected Areas in Com-
munications, vol. 2, San Francisco, CA, USA, 2000, pp. 801{806.
[139] Y. Wu, B. D. Woerner, and T. K. Blankenship, \Data width requirements in SISO
decoding with modulo normalization," IEEE Transactions on Communications,
vol. 49, no. 11, pp. 1861{1868, 2001.
[140] R. Hoshyar, A. R. S. Bahai, and R. Tafazolli, \Finite precision turbo decoding," in
Proceedings of the International Symposiumon on Turbo Codes and Related Topics,
Brest, France, 2003, pp. 483{486.
[141] A. Morales-Cortes, R. Parra-Michel, L. F. Gonzalez-Perez, and T. G. Cervantes,
\Finite Precision Analysis of the 3GPP Standard Turbo Decoder for Fixed-Point
Implementation in FPGA Devices," in Proceedings of the International Conference
on Recongurable Computing and FPGAs, Cancun, Mexico, 2008, pp. 43{48.
[142] S. Benedetto, D. Divsalar, G. Montorsi, and F. Pollara, \A Soft-Input Soft-Output
APP Module for Iterative Decoding of Concatenated Codes," IEEE Communica-
tions Letters, vol. 1, no. 1, pp. 22{24, 1997.
[143] V. Singh, \Elimination of overow oscillations in xed-point state-spece digital
lters using saturation alrithmetic," IEEE Transactions on Circuits and Systems,
vol. 37, no. 6, pp. 814{818, 1990.
[144] D. A. Balley and A. A. Beer, \Simulation of lter structures for xed-point imple-
mentation," in Proceeding of the 28th Southeastern Symposium on System Theory,
Baton Rouge, LA, USA, 1996, pp. 270{274.
[145] G. Masera, Turbo Code Applications: a journey from a paper to realization,
K. Sripimanwat, Ed. Springer Netherlands, 2005.
[146] A. Hekstra, \An Alternative to Metric Rescaling in Viterbi Decoders," IEEE Tran-
scations on Communications, vol. 37, no. 11, pp. 1220{1222, 1989.188 BIBLIOGRAPHY
[147] B. Riaz and J. Bajcsy, \Impact of Finite Precision Arithmetics on EXIT Chart
Analysis of Turbo Codes," in Proceedings of the IEEE Consumer Communications
and Networking Conference, Las Vegas, NV, USA, 2008.
[148] S. Dolinar, D. Divsalar, and F. Pollara, \Turbo code performance as a function of
code block size," in Information Theory 1998 Proceedings 1998 IEEE International
Symposium on, 1998, pp. 32{.
[149] P. Robertson, P. Hoeher, and E. Villebrun, \Optimal and Sub-Optimal Maximum
A Posteriori Algorithms Suitable for Turbo Decoding," European Transactions on
Telecommunications, vol. 8, no. 2, pp. 119{125, 1997.
[150] M. C. Valenti and J. Sun, \The UMTS turbo Code and an Ecient Decoder
Implementation Suitable for Software-Dened Radios," International Journal of
Wireless Information Networks, vol. 8, no. 4, pp. 203{215, 2001.
[151] G. Masera, G. Piccinini, M. R. Roch, and M. Zamboni, \VLSI Architectures for
Turbo Codes," IEEE Transactions on Very Large Scale Integration (VLSI) Sys-
tems, vol. 7, no. 3, pp. 369{379, 1999.
[152] E. Yeo, P. Pakzad, B. Nikolic, and V. Anantharam, \VLSI Architectures for Itera-
tive Decoders in Magnetic Recording Channels," IEEE Transactions on Magnetics,
vol. 37, no. 2, pp. 748{754, 2001.
[153] T. Miyauchi, K. Yamamoto, T. Yokokawa, M. Kan, Y. Mizutani, and M. Hattori,
\High-performance programmable SISO decoder VLSI implementation for decod-
ing turbo codes," in Global Telecommunications Conference, vol. 1, San Antonio,
TX , USA, 2001, pp. 305{309.
[154] M. A. Bickersta, D. Garrett, T. Prokop, C. Thomas, B. Widdup, G. Zhou, L. M.
Davis, G. Woodward, C. Nicol, and R.-H. Yan, \A unied turbo/Viterbi channel
decoder for 3GPP mobile wireless in 0.18-m CMOS," IEEE Journal of Solid-State
Circuits, vol. 37, no. 11, pp. 1555{1564, 2002.
[155] G. Masera, M. Mazza, G. Piccinini, F. Viglione, and M. Zamboni, \Architectural
Strategies for Low-Power VLSI Turbo Decoders," IEEE Transactions on Very
Large Scale Integration (VLSI) Systems, vol. 10, no. 3, pp. 279{285, 2002.
[156] M. Bickersta, L. Davis, C. Thomas, D. Garrett, and C. Nicol, \A 24Mb/s radix-4
Log-MAP turbo decoder for 3GPP-HSDPA mobile wireless," in IEEE Interna-
tional Solid-State Circuits Conference, vol. 1, 2003, pp. 150{484.
[157] Y. Zhang and K. K. Parhi, \High-throughput radix-4 logMAP turbo decoder ar-
chitecture," in Asilomar conference on Signals, System and Computers, Pacic
Grove, CA, USA, 2006, pp. 1711{1715.BIBLIOGRAPHY 189
[158] F.-M. Li, C.-H. Lin, and A.-Y. Wu, \Unied Convolutional/Turbo Decoder Design
Using Tile-Based Timing Analysis of VA/MAP Kernel," IEEE Transactions on
Very Large Scale Integration (VLSI) Systems, vol. 16, no. 10, pp. 1063{8210, 2008.
[159] M. May, T. Ilnseher, N. Wehn, and W. Raab, \A 150Mbit/s 3GPP LTE Turbo
Code Decoder," in Design, Automation & Test in Europe Conference & Exhibition
(DATE), Dresden, Germany, 2010, pp. 1420{1425.
[160] C. Studer, C. Benkeser, S. Belfanti, and Q. Huang, \Design and Implementation
of a Parallel Turbo-Decoder ASIC for 3GPP-LTE," IEEE Jouranal of Solid-State
Circuits, vol. 46, pp. 8{17, 2011.
[161] I. Ahmed and T. Arslan, \VLSI Design of Multi Standard Turbo Decoder for 3G
and Beyond," in Asia and South Pacic Design Automation Conference, Yoko-
hama, Japan, 2007, pp. 589{594.
[162] C.-H. Lin, C.-Y. Chen, and A.-Y. Wu, \High-throughput 12-mode CTC decoder
for WiMAX standard," in IEEE International Symposium on VLSI Design, Au-
tomation and Test, Hsinchu, Taiwan, 2008, pp. 216{219.
[163] C. Wong, Y. Lee, and H. Chang, \A 188-size 2.1mm^2 Recongurable Turbo De-
coder Chip with Parallel Architecture for 3GPP LTE System," in 2009 Symposium
on VLSI Circuits, Kyoto, Japan, 2009, pp. 288{289.
[164] S. Benedetto, D. Divsalar, G. Montorsi, and F. Pollara, \Soft-Output Decoding
Algorithms in Iterative Decoding of Turbo Codes," TDA Progress Report 42-124,
Tech. Rep., 1996.
[165] W. J. Gross and P. G. Gulak, \Simplied MAP algorithm suitable for implemen-
tation of turbo decoders," Electronics Letters, vol. 34, no. 16, p. 1577, 1998.
[166] J. Vogt and A. Finger, \Improving the Max-Log-MAP turbo decoder," Electronics
Letters, vol. 36, no. 23, p. 1937, 2000.
[167] L. Hanzo, T. H. Liew, B. L. Yeap, R. Tee, and S. X. Ng, Turbo Coding, Turbo
Equalisation and Space-Time Coding. John Wiley & Sons Inc, 2011.
[168] L. Hanzo, J. P. Woodard, and P. Robertson, \Turbo decoding and detection for
wireless applications," in Proceedings of the IEEE, vol. 95, no. 6, 2007, pp. 1178{
1200.
[169] C. Schurgers, F. Catthoor, and M. Engels, \Memory Optimization of MAP Turbo
Decoder Algorithms," IEEE Transactions on Very Large Scale Integration (VLSI)
Systems, vol. 9, no. 2, pp. 305{312, 2001.
[170] J. A. Erfanian, S. Pasupathy, and G. Gulak, \Reduced complexity symbol de-
tectors with parallel structures," in IEEE Global Telecommunications Conference,
San Diego, CA, 1990, p. 704.190 BIBLIOGRAPHY
[171] A. J. Viterbi, \An Intuitive Justication and a Simplied Implementation of the
MAP Decoder for Convolutional Codes," IEEE Journal on Selected Areas in Com-
munications, vol. 16, no. 2, pp. 162{264, 1998.
[172] R. Dobkin, M. Peleg, and R. Ginosar, \Parallel VLSI architecture for MAP turbo
decoder," in The 13th IEEE International Symposium on Personal, Indoor and
Mobile Radio Communications, vol. 1, 2002, pp. 384{388.
[173] S. Devadas and S. Malik, \A survey of optimization techniques targeting low
power VLSI circuits," in Proceedings of the 32nd ACM/IEEE conference on Design
automation conference - DAC '95. New York, New York, USA: ACM Press, Jan.
1995, pp. 242{247.
[174] A. Ghosh, S. Devadas, K. Keutzer, and J. White, \Estimation of average switching
activity in combinational and sequential circuits," in Proceedings 29th ACM/IEEE
Design Automation Conference. IEEE Comput. Soc. Press, 1992, pp. 253{259.
[175] T. Karnik, S. Borkar, and V. De, \Sub-90nm technologies: challenges and oppor-
tunities for CAD," in Proceedings of the 2002 IEEE/ACM international conference
on Computer-aided design - ICCAD '02. New York, New York, USA: ACM Press,
Nov. 2002, pp. 203{206.
[176] C. Benkeser, A. Burg, T. Cupaiuolo, and Q. Huang, \Design and Optimization of
an HSDPA Turbo Decoder ASIC," IEEE Journal of Solid-State Circuits, vol. 44,
no. 1, pp. 98{106, 2009.
[177] J.-H. Kim and I.-C. Park, \A unied parallel radix-4 turbo decoder for mobile
WiMAX and 3GPP-LTE," in IEEE Custom Integrated Circuits Conference, San
Jose, CA, 2009, pp. 487{490.
[178] S.-G. Lee, C.-H. Wang, and W.-H. Sheen, \Architecture Design of QPP Interleaver
for Parallel Turbo Decoding," in IEEE Vehicular Technology Conference, Taipei,
Taiwan, 2010, pp. 1{5.
[179] Y. Sun and J. R. Cavallaro, \Ecient Hardware Implementation of A Highly-
Parallel 3GPP LTE, LTE-Advance Turbo Decoder," Integration, the VLSI Jour-
nal, vol. 44, no. 1, pp. 1{11, 2010.
[180] K. Nakano, M. Sengoku, K. Mase, and S. Shinoda, \Network structure of multi-
hop radio networks," in 2000 26th Annual Conference of the IEEE Industrial
Electronics Society. IECON 2000. 2000 IEEE International Conference on In-
dustrial Electronics, Control and Instrumentation. 21st Century Technologies and
Industrial Opportunities (Cat. No.00CH37141), vol. 2. IEEE, pp. 1159{1164.
[181] K. M. Buyuksahin and F. N. Najm, \Early power estimation for VLSI circuits,"
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems,
vol. 24, no. 7, pp. 1076{1088, July 2005.BIBLIOGRAPHY 191
[182] F. N. Najm, \A survey of power estimation techniques in VLSI circuits," IEEE
Transactions on Very Large Scale Integration (VLSI) Systems, vol. 2, no. 4, pp.
446{455, Dec. 1994.
[183] K. D. Muller-Glaser, K. Kirsch, and K. Neusinger, \Estimating essential design
characteristics to support project planning for ASIC design management," in IEEE
International Conference on Computer-Aided Design Digest of Technical Papers.
IEEE Comput. Soc. Press, 1991, pp. 148{151.
[184] M. Nemani and F. N. Najm, \High-level area and power estimation for VLSI
circuits," IEEE Transactions on Computer-Aided Design of Integrated Circuits
and Systems, vol. 18, no. 6, pp. 697{713, June 1999.
[185] A. Raghunathan, S. Dey, and N. K. Jha, \Register-transfer level estimation tech-
niques for switching activity and power consumption," in Proceedings of Interna-
tional Conference on Computer Aided Design. IEEE Comput. Soc. Press, 1996,
pp. 158{165.
[186] A. Raghunathan, S. Dey, and N. Jha, \High-level macro-modeling and estimation
techniques for switching activity and power consumption," IEEE Transactions on
Very Large Scale Integration (VLSI) Systems, vol. 11, no. 4, pp. 538{557, Aug.
2003.
[187] M. Pedram, \Power simulation and estimation in VLSI circuits," in The VLSI
handbook, W.-K. Chen, Ed. Boca Raton, FL, USA: The CRC Press and the
IEEE Press, 1999.
[188] C. Svensson, \Power consumption estimation in CMOS VLSI chips," IEEE Jour-
nal of Solid-State Circuits, vol. 29, no. 6, pp. 663{670, June 1994.
[189] \TSMC 90nm Core Library Databook," 2005.
[190] P. Surti and L.-F. Chao, \Controller power estimation using information from
behavioral description," in 1996 IEEE International Symposium on Circuits and
Systems. Circuits and Systems Connecting the World. ISCAS 96, vol. 4. IEEE,
pp. 679{682.
[191] D.-F. Zhao, Y.-P. Wu, and N.-N. Tong, \The Applied Research of Convolutional
Turbo Code Based on WiMAX Protocol," in 4th International Conference on
Wireless Communications, Networking and Mobile Computing. IEEE, Oct. 2008,
pp. 1{3.
[192] Q. Li and N. S. Ramesh, \Channel coding performance in cdma2000 systems," in
IEEE Emerging Technologies Symposium on Broadband, Wireless Internet Access.
Digest of Papers (Cat. No.00EX414). IEEE, 2000, p. 5.192 BIBLIOGRAPHY
[193] X.-M. Yu, Y.-M. Kang, and D.-F. Yuan, \Performance analysis of turbo codes in
wireless rician fading channel with low rician factor," in IEEE 12th International
Conference on Communication Technology. IEEE, Nov. 2010, pp. 48{51.
[194] D. Divsalar and F. Pollara, \Turbo Codes for Deep-Space Communications," Tech.
Rep., 1995.
[195] \TSMC 90nm Low Power High Density Synchronous Single Port with Redundancy
SRAM Compiler Databook," 2007.
[196] X. Zuo, R. G. Maunder, and L. Hanzo, \Design of Fixed-Point Processing Based
LDPC Codes Using EXIT Charts," in Proceeding of IEEE Vehicular Technology
Conference, San Francisco, CA, USA, 2011.Author Index
Abdullah, M. R. 3
Ahmed, I. 97
Akyildiz, I. F. 1, 2, 4, 6, 8, 19, 51
Al-Hashimi, B. M. 17, 19, 52, 54{56, 62, 63
Ali, A. J. 14
Anantharam, V. 92
Anderson, C. 8
Anderson, J. 3, 5, 8
Arampatzis, Th. 2
Aramugam, M. 8
Arora, A. 8
Arslan, T. 97
Ashouei, M. 16
Aylor, J. H. 13
Aziz, N. A. 19
Badaroglu, M. 14
Bahai, A. R. S. 66, 70, 73{75, 89
Bahl, L. R. 23, 32, 39
Bajcsy, J. 72
Balley, D. A. 68
Bapat, S. 8
Barrenetxea, G. 5, 92
Barth, A. T. xiii, 13, 15
Beer, A. A. 68
Belfanti, S. xiv, 94, 97, 113{116
Benedetti, M. 3, 5
Benedetto, M.-G. D. 14
Benedetto, S. 22, 27, 33, 53, 66, 70, 71, 73{77,
89, 98
Benkeser, C. xiv, 94, 97, 105, 113{116
Berrou, C. 20, 27, 29, 31, 99
Berson, S. 14
Betancur, L. 14
Bettstetter, C. 34
Bickersta, M. 92, 94, 102, 114
Bickersta, M. A. 92, 94, 114
Blankenship, T. K. 66, 68{71, 73{75, 89
Blondia, C. 9, 16, 17
Blum, R. 19
Borkar, S. 102
Boutillon, E. 28, 32{34, 66
Braem, B. 9, 16, 17
Brante, G. 64, 117
Bults, R. G. A. 16
Burdett, A. J. 19, 51
193194 AUTHOR INDEX
Burg, A. 105
Buyuksahin, K. M. 118
Calhoun, B. H. xiii, 13, 15
Camp, T. 2, 3
Cao, Y. 56
Cardei, M. 19
Cardona, N. 14
Castellon, M. A. 66, 70, 73{76, 89
Catthoor, F. 99, 121
Cavallaro, J. R. 112
Cayirci, E. 1, 2, 4, 6, 8, 51
Cervantes, T. G. 66, 70, 73{76, 89
Chae, K. 19
Chandrakasan, A. 8, 19
Chang, E. xiii, 4, 9, 18
Chang, H. 97
Chao, L.-F. 141
Chen, C.-Y. 97
Chen, H. xv, 24, 120, 145, 152{154, 156, 158,
159
Chen, X. 19
Chen, Z. N. 14
Cho, S.-H. 8
Choi, S. 17
Classon, B. 66, 70, 73{75, 89
Cleary, J. 3
Cocke, J. 23, 32, 39
Corke, P. 2, 3, 5, 7, 8, 92
Cotton, S. L. 14
Coyle, E. J. 19
Crozier, S. 53, 56
Culler, D. 3, 5, 6, 8
Cupaiuolo, T. 105
Davis, L. 92, 94, 102, 114
Davis, L. M. 92, 94, 114
De Doncker, P. 9, 14, 16
De Groot, H. 16
De Heyn, V. 14
De Le Hoye, A. 14
De Santis, V. 14
De, V. 102
Decotignie, J.-D. 14, 19
Delgado-Restituto, M. 14
Demeester, P. 17
Demmel, J. 3, 6
der Plas, G. V. 14
Dermibas, M. 11, 18
Desset, C. 9, 14, 16
Devadas, S. 102
Dey, S. 119, 141
Dhoedt, B. 17
Diamond, D. 3
Divsalar, D. 28, 53, 66, 98, 145
Dobkin, R. 60, 102
Dolinar, S 73
Dolmans, G. 16
Domenicali, D. 14AUTHOR INDEX 195
Donnay, S. 14
Douillard, C. 28, 32{34, 66
Drude, S. 2, 12, 13, 15{17
Dutta, P. 8
El-Hoiydi, A. 14, 19
Elias, P. 28
Eljamaly, O. 19, 51
Elliott, D. G. 66, 70, 73{76, 89
Engels, M. 99, 121
Enz, C. C. 7, 19
Erfanian, J. A. 101
Ertin, E. 8
Estrin, D. 19
Fair, I. J. 66, 70, 73{76, 89
Fanimokun, A. 9, 10
Feliziani, M. 14
Feng, H.-B. 19
Feng, Peng 14
Fenves, G. 3, 6
Ferriera, B. 8
Finger, A. 98
Fischer, M. 14
Fonseca, M. S. P. 19, 20, 23
Forney, G. D. Jr 27
Fort, A. 9, 16
Fox, M. L. R. 14
Friedman, J. 9
Frolik, J. 9, 10
Gallager, R. 20, 33, 163
Gao, M.-J. 8
Garrett, D. 92, 94, 102, 114
Gaudet, K. I. V. C. iii, 9, 11, 19, 20, 23, 65,
93, 113
Geng, Zhiqing 14
Georganas, N. D. 10
Ghosh, A. 102
Gilbert, F. 66, 68{71, 73{75
Ginosar, R. 60, 102
Glaser, S. 3, 6
Glavieux, A. 20, 27, 29, 31, 99
Goldberg, D. E. 52, 59
Gonzalez-Perez, L. F. 66, 70, 73{76, 89
Gouda, M. 8
Grimmer, M. 8
Grosinger, J. 14
Gross, W. J. 98
Groza, V. Z. 10
Guinand, P. 53, 56
Gulak, G. 101
Gulak, P. G. 98
Gyselinckx, B. 16
Hagedorn, J. 9, 16
Hamaguchi, K. 14
Hanson, M. A. 13
Hanson, Mark A. xiii, 15
Hanzo, L. xv, 17, 19, 24, 52, 54{56, 62, 63, 99,
120, 145, 152{154, 156, 158, 159, 163, 164
Harte, S. 3196 AUTHOR INDEX
Hattori, M. 92, 105
Hee, J. Y. 14
Heidemann, J. 19
Hekstra, A. 69
Herman, T. 8
Hoeher, P. 39, 73, 93, 99
Hoenes, B. 2, 3
Hoert, J. 51
Hoof, C. V. 16
Hoshyar, R. 66, 70, 73{75, 89
Howard, S. iii, 9, 11, 19, 20, 23, 65, 93, 113
Howard, S. L. iii, 9, 11, 19, 20, 23, 65, 93, 113
Hsu, J. 9, 66, 70, 73{75, 89
Hu, W. 2, 3, 5, 7, 8, 92
Huan, W. 9, 16
Huang, L. 16
Huang, Q. xiv, 94, 97, 105, 113{116
Huisken, J. 16
Hussin, N. 19
Hyeopgeon, L. 19
Ickes, N. 8
Ilnseher, T. 94, 97, 114
Ingelres, F. 5, 92
Iniewski, K. iii, 9, 11, 19, 20, 23, 65, 93, 113
Ioriatti, L. 3, 5
Jelinek, F. 23, 32, 39
Jha, N. K. 119, 141
Jha, N.K. 119
Johnson, J. 3, 5, 8
Jones, V. M. 16
Joohyun, L. 19
Joseph, W. 9, 14, 16, 17
Jr., H. C. P. xiii, 15
Jurdak, R. 2, 3, 5, 7, 8, 17, 92
Kaiser, W. J. 6
Kakitani, M.T. 64, 117
Kan, M. 92, 105
Kang, Y.-M. 145
Kansal, A. 9
Karnik, T. 102
Kasnavi, S. iii, 9, 11, 19, 20, 23, 65, 93, 113
Keutzer, K. 102
Kim, H. 17
Kim, J. 17
Kim, J.-H. 56, 105
Kim, S. 3, 6
Kimura, I. 14
Kirsch, K. 119
Klues, K. 51
Kohno, R. 12, 13, 15{17
Konstantas, D. 16
Kreiselmaier, G. 66, 68{71, 73{75
Krishnamachari, B. 51
Kulathumani, V. 8
Kulkarni, S. 8
Kumar, A. 17AUTHOR INDEX 197
Kumar, S. 8
Kumar, S. P. 2
Kurup, D. 14
Kwak, K. S. 14, 16
Kyounghwa, L. 19
Labella, S. 20
Lach, J. xiii, 13, 15
Langendoen, K. 19
Latre, B. 9, 16, 17
Lee, S.-G. 112
Lee, Y. 97
Lees, J. 3, 5, 8
Lewis, F. L. 6
Li, F.-M. 94, 114
Li, H. 12, 13, 15{17
Li, L. 19
Li, Q. 145
Li, Q.-W. 56
Li, X.-H. 19
Liew, T. H. 99
Lin, C.-H. 94, 97, 114
Liu, C. 19
Long, Z.-H. 8
Lorincz, K. 3, 5, 8
Lygeros, J. 2
Mahonen, P. 20
Mainwaring, A. 3, 5, 8
Makrakis, D. 10
Malik, S. 102
Manesis, S. 2
Martens, L. 9, 14, 16
Martina, M. 56
Martinelli, M. 3, 5
Martinez-Catala, R. 3
Mase, K. 117
Masera, G. 56, 68, 69, 73, 83, 92, 105
Massa, A. 3, 5
Masuch, J. 14
Maunder, R. G 163, 164
May, M. 94, 97, 114
Mazza, M. 92
McKernan, A. 14
Melly, T. 7
Merken, P. 16
Merrett, G. V. 17, 52, 54{56, 62, 63
Meys, R. 14
Michel, H. 66, 68{71, 73{75, 77, 89
Miller, W. 28
Min, H.-K. 19
Min, R. 8
Misic, J. 18, 20
Miyauchi, T. 92, 105
Mizutani, Y. 92, 105
Moerman, I. 9, 16, 17
Mohamad, R. 19
Montorsi, G. 22, 27, 28, 32{34, 53, 66, 70, 71,
73{77, 89, 98198 AUTHOR INDEX
Moore, D. 2, 3, 5, 7, 8, 92
Morakis, J. C. 28
Morales-Cortes, A. 66, 70, 73{76, 89
Moutinho, J. A. D. 13, 17
Muhamad, W. N. W. 19
Muller-Glaser, K. D. 119
Munaretto, A. 64, 117
Muroyama, M. 56
Murphy, H. 3
Naik, V. 8
Naim, N. F. 19
Najm, F. N. 118, 119
Nakano, K. 117
Nam, D.-H. 19
Navarro, A. 14
Nemani, M. 119
Nesterenko, M. 8
Neusinger, K. 119
Ng, S. X. 99
Nicol, C. 92, 94, 102, 114
Nicola, M. 56
Nikolic, B. 92
Nummela, J. 16
O'Flynn, B. 3
Oh, H. 19
O'Hare, G. M. P. 17
Oksa, P. 16
Okuma, T. 56
O'Mathuna, C. 3
Omeni, O. C. 19, 51
Ong, C. T. 14
Ong, L. C. 14
Orjih, O. 51
Pakzad, P. 92
Pakzad, S. 3, 6
Parhi, K. K. 92
Park, I.-C. 56, 105
Parker, K. 8
Parra-Michel, R. 66, 70, 73{76, 89
Pasupathy, S. 101
Patel, M. 15, 16
Pedram, M. 119
Peiris, V. 19
Peleg, M. 60, 102
Pellenz, M. E. 19, 20, 23
Penders, J. 16
Perez, L. xiii, 29, 30, 32
Petriu, D. C. 10
Petriu, E. M. 10
Petrova, M. 20
Piccinini, G. 92, 105
Polastre, J. 3, 5, 8
Pollara, F. 28, 53, 66, 98, 145
Porret, A.-S. 7
Potdar, V. xiii, 4, 9, 18
Pottie, G. J. 6AUTHOR INDEX 199
Poucke, B. V. 14
Powell, H. C. 13
Prokop, T. 92, 94, 114
Raab, W. 94, 97, 114
Rabaey, J. M. 8
Raghavendra, C. S. 51
Raghunathan, A. 119, 141
Raghunathan, V. 9
Ramesh, M. V. 3, 5
Ramesh, N. S. 145
Ramnath, R. 8
Rappaport, T. S. 9, 11
Rasin, Z. 3
Raviv, J. 23, 32, 39
Reed, I. S. 28
Regan, F. 3
Reusens, E. 9, 16, 17
Riaz, B. 72
Ridenour, S. A. xiii, 15
Riedel, S. 34
Riihijarvi, J. 20
Ringgenberg, K. 13
Robertson, P. 39, 73, 93, 99
Roch, M. R. 92, 105
Roundy, S. 8
Rousselot, J. 14
Ruzzelli, A. G. 17
Ryan, C. R. 28
Ryckaert, J. 9, 14, 16
Sadeghi, N. iii, 9, 11, 19, 20, 23, 65, 93, 113
Sadler, B. M. 19
Sankarasubramaniam, Y. 1, 2, 4, 6, 8, 51
Sarnin, S. S. 19
Sayir, J. 43
Sayraan-Pour, K. 9, 16
Scanlon, W. G. 14
Schaefer, G. 5, 92
Schlegel, C. iii, xiii, 9, 11, 19, 20, 23, 29, 30,
32, 65, 93, 113
Schurgers, C. 99, 121
Seddon, N. 8
See, T. S. P. 14
Seidel, S. Y. 9, 11
Sengoku, M. 117
Shah, R. 8
Shakhsheer, Y. xiii, 15
Sharif, A. xiii, 4, 9, 18
Sharp, C. 8
Sheen, W.-H. 112
Shih, E. 8
Shinoda, S. 117
Simon, M. K. 28
Singh, C. K. 17
Singh, V. 68
Sinha, A. 8
Sinha, P. 8
Sklar, B. 29200 AUTHOR INDEX
Slater, C. 3
Sohn, K. 17
Soini, M. 16
Solomon, G. 28
Song, S. 17
Souza, R. D. 19, 20, 23, 64, 117
Sridharan, M. 8
Srivastava, M. 9
Srivastava, M. B. 19
Stok, P. V. D. 17
Stone, K. 2, 3
Studer, C. xiv, 94, 97, 113{116
Su, W. 1, 2, 4, 6, 8, 51
Sun, J. 73
Sun, Y. 112
Surti, P. 141
Svensson, C. 132
Syd anheimo, L. 16
Symons, H. 14
Szewczyk, R. 3, 5, 8
Tafazolli, R. 66, 70, 73{75, 89
Takizawa, K. 13, 15
Tanghe, E. 14
Tayamachi, T. 14
Tee, R. 99
Ten Brink, S 21, 43, 71
Terrill, J. 9, 16
Thai, M.T. 19
Thein, M. C. M. 19
Thein, T. 19
Thitimajshima, P. 20, 27, 29, 31, 99
Thomas, C. 92, 94, 102, 114
Thul, M. 66, 68{71, 73{75
Timmons, N. F. 14
Tolle, G. 8
Tong, N.-N. 145
Traver, L. 14
Trivedi, N. 8
Turon, M. 3, 6
Ukkonen, L. 16
Ullah, N. 14
Ullah, S. 14, 16
Valencia, P. 2, 3, 5, 7, 8, 92
Valenti, M. C. 73
Van Biesen, L. 9, 16
Van Dam, T. 19
Vermeeren, G. 9, 14, 16
Vetterli, M. 5, 92
Viani, F. 3, 5
Vierhout, P. A. M. 16
Viglione, F. 92
Villebrun, E. 39, 73, 93, 99
Viterbi, A. J. 28, 101
Vittoz, E. A. 7
Vogt, J. 98
Vullers, R. J. M. 16
Vuran, M. C. 19AUTHOR INDEX 201
Wahab, N. 19
Wambacq, P. 9, 14, 16
Wang, A. 8, 19
Wang, C. 66, 70, 73{75, 89
Wang, C.-H. 112
Wang, J. 15, 16
Wang, J.-Q. 14
Wang, Q. 14
Wang, S.-P. 19
Wang, Z.-F. 56
Wark, T. 2, 3, 5, 7, 8, 92
Weddell, A. S. 17, 52, 54{56, 62, 63
Wehn, N. 66, 68{71, 73{75, 77, 89, 94, 97, 114
Weiss, C. 34
Welsh, M. 3, 5, 8
Werner-allen, G. 3, 5, 8
Westphal, H. 14
White, J. 102
Widdup, B. 92, 94, 114
Woerner, B. D. 66, 68, 69, 71
Wong, C. 97
Woodard, J. P. 99
Woodward, G. 92, 94, 114
Worm, A. 66, 68{71, 73{75
Wright, P. K. 8
Wu, A.-Y. 94, 97, 114
Wu, Nanjian 14
Wu, Y. 66, 68, 69, 71
Wu, Y.-P. 145
Yamamoto, K. 92, 105
Yan, R.-H. 92, 94, 114
Yan, Xiaozhou 14
Yang, W.-B. 9, 16
Yasuura, H. 56
Yazdandoost, K. Y. 9, 14, 16
Yazicioglu, R. F. 16
Yeap, B. L. 99
Yeo, E. 92
Yokokawa, T. 92, 105
Yongtae, S. 19
Yoo, H. 17
Yoo, J. 17
Yu, X.-M. 145
Yuan, D.-F. 145
Yuce, M. R. 18
Yuen, J. H. 28
Zamboni, M. 92, 105
Zhang, M. 19
Zhang, Qi 14
Zhang, Y. 92
Zhang, Y.-Q. xiii, 15
Zhao, D.-F. 145
Zhen, B. 12, 13, 15{17
Zhou, G. 92, 94, 114
Zimmerman, T. G. 12
Zuo, X. 163, 164Index
3GPP, 23, 33, 72, 94
ACS, 25, 39, 105{111, 118, 122, 123, 125,
127, 134, 166, 167
AMT, 13
APP, 31
ASIC, 23, 25, 27, 46, 49, 52, 55, 59, 66,
94, 104, 118, 120, 143
AWGN, 29, 43, 53, 66, 71, 73, 91
BAN, 3, 11{18, 20, 21
BCJR, 31, 32, 34, 39, 43, 73, 74, 100
BER, 15, 24, 25, 29, 30, 42, 46, 48, 65,
66, 70{76, 79, 88{90, 93, 95, 97,
100, 116, 120, 122, 153{155, 157,
158, 160, 161, 164, 165
BPSK, 29, 33, 66, 71, 73, 91
CU, 106, 107, 110{113, 122, 124, 125,
128{130, 132, 134, 136{141, 143,
160, 166{169
DRP, 53, 56{58
DSP, 18, 66
DVB, 23, 94, 99
ECC, 19{22, 27, 32, 48, 52, 64, 93, 164
ECG, 11, 14, 15
EEG, 11, 14, 15
EMG, 14, 15
ETSI, 6
EXIT, 22, 24, 25, 27, 43{46, 48, 49, 65,
71{76, 79, 80, 82{85, 88{91, 120,
154, 164, 165
FCC, 13
FER, 15, 46, 52{54, 63, 66, 70{72
FPGA, 66
FSM, 33, 35, 36
GA, 52, 59
HBC, 14
HIHO, 28
IC, 21, 133
IEEE, 12{14, 20, 22, 24, 51{54, 60, 63,
164
ISM, 6, 13
LBT, 18
LDPC, 20, 32, 165, 166, 169
LLR, 28, 31, 32, 37, 38, 41{43, 66, 75{
77, 79{83, 89{91, 101, 103, 107,
111{114, 122, 124, 126{130, 132,
139, 143, 148, 151, 152, 157, 166,
169
Log-BCJR, 32, 33, 36{39, 41{43, 74, 75,
125, 155, 157, 163
LTE, 23, 33{35, 39, 43, 45, 48, 66, 68{
70, 72{75, 82, 84{87, 90, 91, 94,
95, 98{100, 107, 112{114, 116,
118, 144, 147, 149, 150, 165
LUT, 39, 66, 67, 73, 76, 90, 105, 107,
109, 110, 124, 166{169
LUT-Log-BCJR, 24, 25, 39, 48, 49, 64,
66{70, 73{76, 82{84, 90, 94, 95,
98{101, 103{108, 111{114, 116{
118, 122, 124, 126, 127, 132, 133,
136, 138, 144, 147, 150{152, 160,
163, 165, 166, 168
MAC, 17
203204 INDEX
Max-Log-BCJR, 39, 73{76, 85, 95, 98{
100, 106, 107, 114, 116{118, 163,
165
MCTC, 154, 155
MEMS, 2, 8
MI, 43{46
MICS, 20
MIPS, 5
MUX, 125, 126, 134, 136
NB, 14
NLOS, 16
O-QPSK, 53, 58, 59, 62
PCC, 27, 29
PCCC, 33, 48
PHY, 14, 24, 51{56, 60, 63
PN, 53, 56{60
ROM, 56, 59, 61, 62
RSC, 31, 33, 36, 37, 39
RTL, 144
SCC, 27, 28, 32
SCCC, 22{24, 32, 34, 48, 51, 119, 163,
164
SIHO, 28, 32
SISO, 29, 31, 32, 38, 48
SNR, 43, 65, 72, 73, 76, 77, 85, 87, 88,
94, 97, 155, 157, 158
SRAM, 114, 148
TCTC, 154, 155
TSMC, 112, 114, 133{136, 144, 148, 157
UMTS, 33{35, 39, 43, 45, 48, 66, 68{70,
72{75, 82, 84{87, 90, 91, 98, 165
UWB, 13, 14
VA, 30
WLAN, 5, 10, 20
WSN, 1{13, 16{25, 51, 52, 63{65, 91,
93{95, 99, 114, 116{120, 149, 153,
158, 160, 163{165