Bit-by-bit iterative decoding expedites the convergence of Repeat Accumulate decoders by Chen, Jinhui et al.
1Bit-by-Bit Iterative Decoding Expedites the
Convergence of Repeat Accumulate Decoders
Jinhui Chen, Wenbo Zhang, Robert G. Maunder and Lajos Hanzo
School of ECS, University of Southampton, SO17 1BJ, United Kingdom.
Email: {jc9e12, wz4g11, rm, lh}@ecs.soton.ac.uk, http://www-mobile.ecs.soton.ac.uk
Abstract—In this paper, we propose bit-by-bit iterative de-
coding for expediting the convergence of Repeat Accumulate
(RA) decoders. In a conventional RA decoder, the repeat and
accumulate component decoders are operated iteratively, in order
to facilitate near-capacity communication. However, whenever
one decoder is activated, the other is kept idle. The outputs
of the active component decoder are stored until its operation
is completed, whereupon the outputs are forwarded to the other
decoder all at once and the activation of the decoders is swapped.
The proposed bit-by-bit RA decoder expedites this process by
allowing both component decoders to operate simultaneously,
continuously exchanging outputs without buffering. We present
both EXtrinsic Information Transfer (EXIT) charts and Bit
Error Ratio (BER) results, which demonstrate that the proposed
bit-by-bit RA decoder requires fewer decoding iterations to
converge, at the cost of a slightly increased complexity per
decoding iteration. Overall, we demonstrate that in a range
of practical scenarios, the proposed bit-by-bit RA offers gains
of up to 0.86 dB, without imposing any additional decoding
complexity and without requiring any additional transmission-
energy, -bandwidth or -duration.
Index Terms—Repeat accumulate code, channel coding, itera-
tive decoding, EXIT chart, decoding convergence.
I. INTRODUCTION
REPEAT Accumulate (RA) codes [1] have been proposedas a class of low-complexity “turbo-like” channel codes,
which facilitate near-capacity communication. An RA encoder
may be considered to be a special case of a self-concatenated
convolutional encoder [2], [3], which comprises a serial con-
catenation of a repetition encoder, a pseudo-random interleaver
[4] and a Unity-rate Recursive Convolutional (URC) encoder
[5], [6]. More specifically, this URC encoder is also known as
an accumulator in the special case, where its output bits are
obtained as a cumulative modulo-2 sum of its input bits [7].
However, regardless of the URC encoder design, the term ‘RA
code’ is used instead of ‘self-concatenated code’ throughout
this paper, owing to its greater familiarity. Accordingly, the
RA decoder comprises a corresponding serial concatenation
of Soft-Input Soft-Output (SISO) repeat and URC decoders
[8]–[10], which iteratively exchange increasingly-reliable soft
information through an interleaver and a deinterleaver [1].
Owing to its relatively low complexity and its suitability
to parallel processing, the operation of the repeat decoder
can typically be completed in significantly less time than
that required for the URC decoder. This is because the URC
decoder employs the trellis-based Logarithmic Maximum A
posteriori Probability (Log-MAP) algorithm [11], which has
The authors wish to gratefully acknowledge the financial support of
the EPSRC, Swindon UK under the auspices of grant EP/J015520/1 and
EP/L010550/1, as well as the TSB, Swindon UK under the auspices of grant
TS/L009390/1.
a relatively high complexity and is not well suited to parallel
processing. A number of techniques have been proposed for
increasing the parallelism of the Log-MAP algorithm, which
could be applied to RA decoders in order to mitigate the
corresponding bottleneck. In [12], the parallelism of the Log-
MAP algorithm was increased by decomposing the trellis
into K number of sub-trellises and performing the Log-MAP
algorithm on each in parallel, hence reducing the correspond-
ing processing time by a factor of 1/K. Other approaches
include pipelining the operations of the Log-MAP algorithm
[13], so that different operations are performed in parallel, in
different parts of the decoding hardware. Similarly, the radix-4
technique can be employed for processing two trellis stages at
a time [13]. Furthermore, some of the Log-MAP calculations
can be replaced with lower-complexity approximations [11],
so that a greater number of these calculations can be performed
in parallel using the same amount of hardware. However, the
increased parallelism and reduced processing time offered by
these techniques is typically achieved at the cost of either
increasing the number of decoding iterations required or
degrading the error correction performance [8], [9].
While the above-mentioned papers have proposed tech-
niques for increasing the parallelism of the Log-MAP algo-
rithm, they all consider the corresponding SISO component
decoder to be activated alternately with the turbo-like code’s
other SISO component decoder. More specifically, although
soft information outputs may be generated throughout the
operation of the active SISO component decoder, these are
buffered for forwarding to the other SISO component decoder
on a frame-by-frame basis, rather than on a bit-by-bit basis.
As a result, this other decoder must remain idle throughout
the operation of the active decoder. The other decoder is only
activated when the operation of the first SISO component
decoder is completed, whereupon it is provided with all of
the soft information within the buffer at once. Likewise, the
first decoder must remain idle, while the second decoder
is active, for the same reason. This motivated the bit-by-
bit exchange of soft information in the work of [14]–[17].
More specifically, [14] proposed a decoder for turbo product
codes, which performs the decoding of the code’s rows in
parallel with the decoding of its columns. Soft information
is exchanged between the two decoding processes following
the completion of each row and each column, rather than
following the completion of all rows and all columns, in a
serial manner. Similarly, [15], [16] proposed a decoder for
turbo convolutional codes, which performs the decoding of
both SISO component decoders in parallel. Again, soft infor-
mation is exchanged between the two components on a bit-by-
bit basis, rather than a frame-by-frame basis. This work was
2further refined in [17], which proposed an interleaver design
that mitigates contention during the bit-by-bit exchange of
soft information. Each of the above-mentioned approaches was
successful in increasing the degree of parallelism of the various
turbo-like codes, hence reducing the associated processing
time required. However, in each case, this was achieved at
the cost of incurring a reduced error correction performance
and/or an increased memory or complexity requirement.
Against this background, this paper proposes a technique for
expediting the iterative decoding convergence of RA decoders.
More specifically, we propose a bit-by-bit iterative decoding
algorithm, which allows the repeat decoder and the URC
decoder to operate in parallel. In contrast to the bit-by-bit and
parallel iterative decoding algorithms mentioned above, ours
actually reduces the number of decoding iterations required
for achieving convergence, as well as improving the associated
error correction performance. This is achieved using a novel
technique, which allows the URC decoder to generate soft
information at every stage of both its Forward Recursion (FR)
and its Backward Recursion (BR), in contrast to the approach
of [15], [16]. Furthermore, we enable the repeat decoder to
process each piece of soft information as soon as it is generated
by the URC decoder, rather than buffering it until the operation
of the URC decoder is completed, as in a conventional frame-
by-frame RA decoder. Likewise, we provide the URC decoder
with each resultant piece of soft information as soon as it is
generated by the repeat decoder, potentially to be used by the
URC decoder within the same FR or BR. A novel technique
is proposed for significantly reducing the complexity of each
decoding iteration, which avoids generating soft information
that will not be used before it is replaced with more reliable
updated soft information. Furthermore, we propose a novel
EXtrinsic Information Transfer (EXIT) chart based analysis
[18] technique, which demonstrates that each decoding iter-
ation of our proposed bit-by-bit RA decoder is equivalent to
two iterations of the conventional frame-by-frame RA decoder.
Note that rather than using the above-described serially
concatenated interpretation of RA decoders [1], many previous
contributions [19]–[21] have used an alternative factor graph
based interpretation, which operates on the basis of message
passing [22]. Furthermore, serial update schedules [23]–[28]
have been proposed for performing URC decoding within the
factor graph using a FR and a BR. This approach has been
shown to reduce the required number of decoding iterations
by up to 50% [26, Figure 6], compared to the parallel update
schedule that is conventionally adopted for message passing.
However, these serial update schedules do not benefit from
the further complexity reductions that are offered by bit-
by-bit decoding, as in the proposed approach. Instead, these
schedules wait until the URC decoder’s FR and BR have been
completed, before operating the repeat decoder on the whole
frame at once, in a frame-by-frame manner. Furthermore,
these schedules have only been applied to the special case
of URC decoders, where the corresponding encoder is a two-
state accumulator. This is because accumulators avoid short
cycles in the factor graph, which severely degrade the message
passing performance [22]. By contrast, the proposed approach
imposes no restrictions on the design of the URC decoder.
The rest of this paper is organized as follows. In Section
II, we contrast the operation of the proposed bit-by-bit RA
decoder to that of a conventional frame-by-frame RA decoder.
Section III proposes a technique for reducing the complexity
of the proposed bit-by-bit RA decoder, although we show that
this still results in a slightly higher decoding complexity per
iteration than the conventional frame-by-frame RA decoder.
Section IV provides our EXIT chart analysis [18] for charac-
terizing the reduced number of decoding iterations required for
achieving convergence in the proposed bit-by-bit RA decoder.
Owing to its reduced number of iterations, we show that
the proposed bit-by-bit RA decoder has a lower complexity
than the conventional frame-by-frame RA decoder, despite
having a slightly higher per-iteration complexity. Overall, we
demonstrate that in a range of practical scenarios, the proposed
bit-by-bit RA offers gains of up to 0.86 dB, without imposing
any additional decoding complexity and without requiring any
additional transmission-energy, -bandwidth or -duration. Fi-
nally, we conclude in Section V that improved RA decoding is
facilitated by our novel bit-by-bit iterative decoding algorithm,
our novel technique for complexity reduction and our novel
EXIT chart analysis technique.
II. BIT-BY-BIT ITERATIVE DECODING
This section commences by providing an overview of RA
encoding and decoding in Section II-A. Following this, our
bit-by-bit RA decoding algorithm is proposed in Section II-B.
A. System Overview
As shown in Figure 1, the RA encoder comprises a repeat
encoder, a pseudo-random interleaver pi [4] and a URC en-
coder [6]. During the encoding process, each of the L number
of bits in the message vector x = [xj ]Lj=1 is repeated N
number of times by the repeat encoder, which has a coding
rate of R = 1/N . This results in the vector of LN bits
y = [yk]
LN
k=1, which are obtained according to yk = xdk/Ne.
The bit vector y is permuted by the interleaver pi = [pik]LNk=1
in order to obtain the bit vector of equal length z = [zk]LNk=1,
where zpik = yk. Following this, a URC encoder employing
M number of states [6] is used for converting the bit vector z
into the bit vector of equal length w = [wk]LNk=1, as shown in
Figure 1. As it will be justified in Section IV-A, three different
RA code parametrizations are considered in this paper, namely
the combinations of R = 1/2-, 1/3- and 1/4-rate repetition
codes with M = 8-, 4- and 2-state URC codes, respectively.
The M = 8-, 4- and 2-state URC codes generate the bits of
w according to wk = wk−3 ⊕ wk−2 ⊕ wk−1 ⊕ zk, wk =
wk−2 ⊕ wk−1 ⊕ zk and wk = wk−1 ⊕ zk respectively, where
⊕ represents modulo-2 addition and we assume wk|k<1 = 0.
Note that these M = 8- and 4-state URC codes would result
in short cycles within a factor graph interpretation of the RA
decoder, resulting in poor message passing performance [22].
Figure 1 demonstrates the combination of the R = 1/3-rate
repetition encoder with the M = 4-state URC code, for the
case of an L = 2-bit message vector and the interleaver
pi = [2, 4, 1, 6, 5, 3].
3pi
pi−1
BPSK
Demod
x1
x2
y1
y6
z1
z6
w1
w6
w˜1
w˜6
z˜a6
z˜a1
z˜e1
z˜e6
Repeat Encoder URC Encoderpi
Repeat Decoder Frame-by-Frame
Equivalent Interleaver Π Bit-by-Bit URC Decoder
(Parametered by pi) (Parametered by pi)
Read or write memory
Concurrent read and write memory
z˜a6
z˜a1
z˜e6
z˜e1
x˜1
x˜2 1 2 3 4 5 6
1 2 3 4 5 6
6 5 4 3 2 1
Modified Log-MAP
Log-MAP
RA Encoder
Frame-by-Frame RA Decoder
Bit-by-Bit RA Decoder
6 5 4 3 2 1x˜1
x˜2
+ + + + ++
1 2 3 4 5 6
+ + + + ++
1 2 3 4 5 6
BPSK
Demod
w˜1
w˜6
+
+
+
+
+
BPSK
Mod
wk = wk−2 ⊕ wk−1 ⊕ zk, wk|k<1 = 0
6 5 4 3 2 1
6 5 4 3 2 1
y˜a1
y˜a6
y˜e1
y˜e6
URC Decoder
Fig. 1. Schematic of an example RA code having a message length of
L = 2 bits, a R = 1/3-rate repetition code and an M = 4-state URC
code. The RA encoder and frame-by-frame decoder employ the interleaver
pi = [2, 4, 1, 6, 5, 3], while the proposed bit-by-bit RA decoder employs the
corresponding equivalent interleaver Π = [2, 1, 5, 1, 3, 3; 4, 4, 6, 2, 6, 5].
We use Binary Phase Shift Keying (BPSK) for modu-
lating the bit vector w onto an Additive White Gaussian
Noise (AWGN) channel. In the receiver, soft decision BPSK
demodulation [6] is employed for recovering the vector of
Logarithmic Likelihood Ratios (LLRs) w˜ = [w˜k]LNk=1, which
pertain to the bits of w. As shown in Figure 1, these LLRs
are provided as inputs to the Log-MAP URC decoder. In both
frame-by-frame and bit-by-bit RA decoders, the Log-MAP
URC decoder and the repeat decoder are operated iteratively.
In each iteration, the Log-MAP URC decoder uses an FR and
a BR for generating the vector of extrinsic LLRs (eLLRs)
z˜e = [z˜ek]
LN
k=1 [5], [9], which pertain to the bits of z. The eLLRs
in this vector are rearranged by the deinterleaver pi−1, in order
to obtain the vector of a priori LLRs (aLLRs) y˜a = [y˜ak ]
LN
k=1,
where y˜ak = z˜
e
pik
. The repeat decoder adds together particular
combinations of the aLLRs in y˜a in order to generate the
vector of eLLRs y˜e = [y˜ek]
LN
k=1, which pertain to the bits of
y. More specifically, the eLLR y˜ek pertaining to a particular
bit yk is obtained as the sum of the aLLRs that pertain to the
other repetitions of that bit, according to
y˜ek =
dk/NeN∑
k′ = (dk/Ne − 1)N + 1
k′ 6= k
y˜ak′ . (1)
The eLLRs in the vector y˜ek are rearranged by the interleaver
pi in order to obtain the vector of aLLRs z˜a = [z˜ak ]
LN
k=1,
where z˜apik = y˜
e
k. As shown in Figure 1, these aLLRs are
then input to the Log-MAP URC decoder. Once the iterative
decoding process has converged, the repeat decoder adds
together particular combinations of the aLLRs to produce the
vector of a posteriori LLRs x˜ = [x˜j ]Lj=1, which pertain to the
bits of x. More specifically, the a posteriori LLR x˜j pertaining
to a particular bit xk is obtained as the sum of the aLLRs that
pertain to all repetitions of that bit, according to
x˜j =
jN∑
k′=(j−1)N+1
y˜ak′ . (2)
Note that in the bit-by-bit decoder, the deinterleaver pi−1, the
repeat decoder and the interleaver pi are amalgamated in order
to form an equivalent interleaver Π, as will be detailed in
Section II-B.
B. Bit-by-Bit RA Decoding Algorithm
In contrast to the alternated operation of the two SISO
component decoders in the frame-by-frame RA decoder, the
proposed bit-by-bit RA decoder enables both component de-
coders to operate simultaneously. This is facilitated by using
concurrent read and write memory, the equivalent interleaver
Π and an bit-by-bit Log-MAP URC decoder, as discussed
in the following subsections. These discussions are aided by
the timing diagrams of Figures 2 and 3, which exemplify
the values that are input, stored, loaded and output by each
component during one iteration of the frame-by-frame and bit-
by-bit RA decoders of Figure 1, respectively. Note that in the
frame-by-frame RA decoder timing diagram of Figure 2, the
Log-MAP URC decoder is active in the first eleven Time Slots
(TSs), while the repeat decoder is active in the twelfth TS,
when it performs all processing of (1) in parallel. By contrast,
in the bit-by-bit RA decoder timing diagram of Figure 3, the
Log-MAP URC decoder and the equivalent interleaver Π are
both active in all ten TSs.
β5
z˜a3 z˜
a
4 z˜
a
5
α2 α3 α4 α5 β3β4 β2
α4
1 2 3 4 5 6 7 8 9 11Time
Store
Load
Output
Input
Output
z˜a4 z˜
a
3 z˜
a
2
w˜2 w˜3 w˜4 w˜5 w˜6 w˜5 w˜4 w˜3 w˜2
z˜e6 z˜
e
5 z˜
e
4 z˜
e
2
F
ra
m
e
-b
y
-f
ra
m
e
U
R
C
R
e
p
e
a
t
α3 α1
α1 β1
β5
α2 α3 α4
β3β4 β2
α1α0
β6
Load
Load
Store
Store
z˜a2 -z˜
a
6
Input z˜
a
1 -z˜
a
6
z˜a2 z˜
a
6 z˜
a
5
z˜e3 z˜
e
1
α2 α0
z˜e6 z˜
e
5 z˜
e
4 z˜
e
2z˜
e
3 z˜
e
1
y˜e1-y˜
e
6
pi
−
1 Input
Output y˜a4 y˜
a
5 y˜
a
2 y˜
a
1y˜
a
6 y˜
a
3
pi
Input
Output
y˜e1-y˜
e
6
z˜a1 -z˜
a
6
10
α5
β1
z˜a1
w˜1
12
y˜a1 -y˜
a
6
y˜a4 y˜
a
5 y˜
a
2 y˜
a
1y˜
a
6 y˜
a
3
y˜a4 y˜
a
5 y˜
a
2 y˜
a
1y˜
a
6 y˜
a
3
z˜a1
w˜1
Fig. 2. Timing diagram for each decoding iteration of the frame-by-frame
RA decoder shown in Figure 1.
1) Concurrent Read and Write Memory: In an RA decoder,
each SISO component decoder uses a memory to store LLRs
until its decoding algorithm is ready to use them. In the frame-
by-frame RA decoder of Figure 1, only one SISO component
decoder is active at a time, as exemplified in Figure 2. During
this time, the active SISO component decoder reads aLLRs
4β5
z˜a3 z˜
a
4 z˜
a
5
α2 α3 α4 α5 β3β4 β2
β2 β3 β5
α5 α4
1 2 3 4 5 6 7 8 9 10Time
Input
Store
Load
Output
Input
Output
z˜a4 z˜
a
3 z˜
a
2
β1
w˜1 w˜2 w˜3 w˜4 w˜5 w˜6 w˜5 w˜4 w˜3 w˜2
z˜e2 z˜
e
3 z˜
e
5 z˜
e
6 z˜
e
5 z˜
e
4 z˜
e
2z˜
e
1
z˜e1 z˜
e
2
z˜e2
z˜a4
z˜e4
z˜a2
z˜e3
z˜e6
z˜a5
z˜e5
z˜a6
z˜e5
z˜e6
z˜e6
z˜a5
z˜e5
z˜e6
z˜a3
z˜e4
z˜e4
z˜a2
z˜e2
z˜e4
z˜a1
B
it
-b
y
-b
it
U
R
C
Π
z˜a4z˜
a
2 z˜
a
5 z˜
a
6 z˜
a
5 z˜
a
3 z˜
a
2z˜
a
1
α3 α1
α1 β1
β5
α2 α3 α4
β3β4 β2
α1α0
β6
Load
Load
z˜a4 z˜
a
5 z˜
a
3 z˜
a
2Store
Store z˜e1 z˜
e
2 z˜
e
3 z˜
e
5 z˜
e
6 z˜
e
5 z˜
e
4 z˜
e
2
z˜a2 z˜
a
6 z˜
a
5z˜
a
1
z˜a2 z˜
a
6 z˜
a
5z˜
a
1
z˜e1 z˜
e
1 z˜
e
3 z˜
e
3 z˜
e
3 z˜
e
5 z˜
e
1 z˜
e
2
Fig. 3. Timing diagram for each decoding iteration of the bit-by-bit RA
decoder shown in Figure 1.
from its memory, as required by its decoding algorithm.
In response, it generates eLLRs, which are interleaved and
written to the memory of the other SISO component decoder,
as shown in Figure 1. Once the operation of the first SISO
component decoder is completed, the other decoder becomes
active and the roles of the memories are swapped. Owing
to this, the memories in the frame-by-frame RA decoder are
never read and written at the same time, allowing read or write
memory to be used. This is exemplified in Figure 2, where the
aLLRs are never stored in a particular component’s memory
during a TS, in which aLLRs are read from that memory.
By contrast, both SISO component decoders are active all
of the time in the bit-by-bit RA decoder of Figure 1, requiring
both decoders to use concurrent read and write memory. More
specifically, throughout the bit-by-bit RA decoding process,
one SISO component decoder reads the aLLRs from its own
memory and writes the eLLRs into the memory of the other
SISO component decoder. Meanwhile, the other SISO compo-
nent decoder will be doing the same, causing both memories
to be read and written concurrently. This is exemplified in
Figure 3, where the aLLRs are frequently stored in a particular
component’s memory during a TS, in which the aLLRs are
read from that memory. Note that at the beginning of iterative
decoding, the concurrent read and write memories of both
SISO component decoders are filled with zero-valued LLRs.
2) Equivalent Interleaver: In the proposed bit-by-bit RA
decoder, the deinterleaver pi−1, the repeat decoder and the
interleaver pi are combined in order to form an equivalent
interleaver Π, as mentioned above. In contrast to a stand-
alone repeat decoder, the equivalent interleaver is aware of
where each aLLR is positioned within the vector z˜a. As it
will be described below, this knowledge may be exploited in
order to reduce the complexity of the equivalent interleaver.
The equivalent interleaver is specifically configured to main-
tain the relationships that are created between the aLLRs
of z˜a and the eLLRs of z˜e, when employing a separate
deinterleaver, repeat decoder and interleaver. More specifically,
each aLLR in the vector z˜a is obtained by deinterleaving,
adding and interleaving (N − 1) of the eLLRs in the vector
z˜e, as exemplified in Figure 1. Note that the particular set
of (N − 1) eLLRs depends on the design of the interleaver
pi. Conversely, each eLLR in the vector z˜e contributes to the
summations that provide (N − 1) of the aLLRs in the vector
z˜a, again depending on the design of the interleaver pi. For
example, both the frame-by-frame and bit-by-bit RA decoders
of Figure 1 are configured for ensuring that the first aLLR z˜a1
is obtained as the sum of the (N − 1) = 2 eLLRs z˜e2 and z˜e4.
Conversely, the first eLLR z˜e1 contributes to the summations
that provide the (N − 1) = 2 aLLRs z˜a2 and z˜a4 .
The configuration of the equivalent interleaver may be
described by a matrix Π, having (N − 1) rows and LN
columns. The kth column of the matrix lists the indices of
the eLLRs that are added together in order to obtain the kth
aLLR z˜ak , according to
z˜ak =
N−1∑
n=1
z˜eΠn,k . (3)
For example, the equivalent interleaver of Figure 1 is config-
ured according to Π =
[
2, 1, 5, 1, 3, 3
4, 4, 6, 2, 6, 5
]
, where the first column
contains the indices 2 and 4 because the first aLLR is obtained
according to z˜a1 = z˜
e
2 + z˜
e
4. Equivalently, the k
th column of the
matrix Π can be considered to list the indices of the aLLRs
that are contributed to by the eLLR z˜ek. For example, the first
column of Π in Figure 1 contains the indices 2 and 4, because
the first eLLR z˜e1 contributes to the aLLRs z˜
a
2 and z˜
a
4 . Note
that the calculation of (3) is reminiscent of the operation of a
variable node in a factor graph interpretation of the RA code.
More specifically, the kth aLLR z˜ak is obtained by summing
all of the relevant eLLRs, except for the corresponding one z˜ek
[22].
In the proposed bit-by-bit RA decoder, the equivalent in-
terleaver Π operates in a reactive manner. More explicitly,
whenever an updated eLLR value z˜ek is received, the equivalent
interleaver stores it in its concurrent read and write memory.
At the same time, the (N − 1) eLLRs with the indices in the
kth column of Π may be loaded from the concurrent read
and write memory. Together with the eLLR z˜ek, these (N −1)
eLLRs allow the computation of new values for the (N − 1)
aLLRs having the indices in the kth column of Π. These
(N − 1) aLLRs may then be input to the Log-MAP URC
decoder and stored in its concurrent read and write memory.
In the example of Figure 1, whenever an updated value is
received for the first eLLR z˜e1, the eLLRs z˜
e
2 and z˜
e
4 may be
loaded from the concurrent read and write memory. These
eLLRs may be used for producing updated values for the
aLLRs z˜a2 = z˜
e
1 + z˜
e
4 and z˜
a
4 = z˜
e
1 + z˜
e
2.
However, in the bit-by-bit RA decoder, the eLLRs z˜ek are
generated in a particular order by the Log-MAP URC decoder,
which is dictated by its FR and BR, as it will be described in
Section II-B3. Likewise, the aLLRs z˜ak are read from the Log-
MAP URC decoder’s concurrent read and write memory in
that same order. Owing to this, when the equivalent interleaver
receives an eLLR, it is unnecessary for it to update all (N−1)
of the corresponding aLLRs, because some of these will be
replaced before they are used by the Log-MAP URC decoder.
5In the example of Figure 3, the Log-MAP URC decoder loads
the aLLR z˜a1 in TS 1, it loads z˜
a
2 in TSs 2 and 10, and it loads
z˜a4 in TSs 4 and 8. Likewise, it outputs an updated value for
the eLLR z˜e1 in TS 1, it outputs z˜
e
2 in TSs 2 and 10, and
it outputs z˜e4 in TS 8, as shown in Figure 3. During TS 1,
the equivalent interleaver uses z˜e1 to generate only z˜
a
2 , without
generating z˜a4 . This is because z˜
a
4 will not be used by the
Log-MAP URC decoder until TS 4 and because in TS 2, the
equivalent interleaver uses z˜e2 to generate a superior value for
z˜a4 . Likewise, z˜
a
1 is not generated during TS 2, because it will
not be used by the Log-MAP URC decoder until TS 1 of the
next decoding iteration. In this way, the complexity associated
with generating unnecessary aLLRs can be eliminated from the
equivalent interleaver. As mentioned above, this complexity
reduction is facilitated by informing the equivalent interleaver
of where each aLLR that it generates is positioned within the
vector z˜a, as well as by informing the equivalent interleaver
of the order in which the aLLRs are used by the Log-MAP
URC decoder.
3) Bit-by-Bit Log-MAP URC Decoder: Similarly to the
Log-MAP URC decoder of a conventional frame-by-frame
RA decoder, the proposed bit-by-bit RA decoder operates on
the basis of an FR and a BR. More specifically, an FR is
performed during the first half of each decoding iteration,
which corresponds to TSs 1 to 5 in the examples of Figures 2
and 3. During this time, the Log-MAP URC decoder loads and
processes a single aLLR z˜ak and a single demodulated LLR w˜k
in each TS, in ascending order of the index k, as exemplified
in Figures 2 and 3. During each TS, a vector αk−1 of M
forward state metrics is loaded from memory and combined
with the LLRs z˜ak and w˜k to produce a new vector αk of
M forward state metrics [11], which is stored in memory.
Note that these data dependencies of αk upon αk−1 dictate
the requirement for the forward state metrics to be calculated
using an FR. Likewise, a BR is performed during the second
half of each decoding iteration, when the LLRs are loaded
and processed in descending order of index k, as exemplified
in TSs 6 to 10 of Figures 2 and 3. Each step of the BR has
to calculate and store the vector βk−1 of M backward state
metrics, which are obtained by combining βk with z˜ak and
w˜k. Note that depending on how the URC code is terminated,
α0 and βLN are initialized with values that remain constant
throughout the iterative decoding process [11].
In the frame-by-frame RA decoder, the Log-MAP URC
decoder is only able to generate and output eLLRs during
the BR, as shown in Figure 2. This is because each eLLR
z˜ek is obtained by loading and combining w˜k, αk−1 and βk
[11], but the backward state metrics βk are not available until
the BR is started. By contrast, the Log-MAP URC decoder
of the proposed bit-by-bit RA decoder is able to generate and
output eLLRs during both the FR and the BR, as shown in
Figure 3. This is made possible in the FR because we propose
to generate the eLLR z˜ek by loading and combining w˜k and
αk−1 with the backward state metrics βk that were generated
and stored during the previous decoding iteration. Note that
before iterative decoding begins, all backward state metrics
are initialized with zero values.
Note that in the proposed bit-by-bit RA decoder, the Log-
MAP URC decoder has the opportunity to generate and output
each eLLR z˜ek during each FR and during each BR. Therefore,
it can be said that the eLLR vector z˜e is updated in each
half-iteration of the bit-by-bit RA decoder, where alternate
half-iterations comprise a FR or a BR. Likewise, since the
equivalent interleaver operates in a reactive manner, generating
aLLRs as soon as it is provided with eLLRs, it can be said
that the aLLR vector z˜a is also updated in each half-iteration.
By contrast, the vectors z˜e and z˜a are only updated once per
iteration in the frame-by-frame RA decoder. Therefore, each
bit-by-bit half-iteration may be deemed to correspond to one
frame-by-frame iteration and hence we surmise that each bit-
by-bit iteration corresponds to two frame-by-frame iterations.
These relationships will be demonstrated with the aid of our
EXIT chart analysis in Section IV-A.
Note that as in the equivalent interleaver Π of Section II-B2,
the Log-MAP URC decoder can facilitate a complexity reduc-
tion by exploiting the knowledge of the interleaver design pi.
More specifically, it is unnecessary for the Log-MAP URC
decoder to generate a particular eLLR z˜ek during its FR,
if it would be superseded by the updated version of that
eLLR z˜ek generated during the BR. This happens when the
corresponding aLLRs generated by the equivalent interleaver
Π in response to z˜ek would not be used by the Log-MAP URC
decoder before they are replaced in response to the generation
of z˜ek during the BR. Likewise, it is unnecessary for the Log-
MAP URC decoder to generate a particular eLLR z˜ek during its
BR, if it would be superseded by the eLLR z˜ek generated during
the FR of the next decoding iteration. For example, in TS 4 of
Figure 3, it is unnecessary for the Log-MAP URC decoder to
generate the eLLR z˜e4. This is because this eLLR would allow
the equivalent interleaver Π to update the aLLRs z˜a2 and z˜
a
1 , as
described in Section II-B2. However, these aLLRs would not
be used by the Log-MAP URC decoder until the end of its BR,
in TS 10 and TS 1 of the next decoding iteration, respectively.
However before this, the Log-MAP URC decoder has a second
opportunity to update z˜e4 in TS 8. It is this version of the eLLR
z˜e4 that is used by the equivalent interleaver Π to update the
aLLRs z˜a1 and z˜
a
2 , ready for TSs 10 and 1. Note that since there
is no need to generate z˜e4 during TS 4, there is also no need
to load β4 for that purpose during that TS. Furthermore, since
no eLLR is output during TS 4, the equivalent interleaver Π is
not operated during that TS and there is no new input for the
Log-MAP URC decoder to store in TS 5, as shown in Figure 3.
Likewise, there is no need for the Log-MAP URC decoder to
generate the eLLR z˜e3 during TS 9 because it is superseded by
the z˜e3 generated during TS 3 of the next decoding iteration.
III. COMPLEXITY ANALYSIS
In this section, we characterize and compare the compu-
tational complexity per iteration of the frame-by-frame and
bit-by-bit RA decoders. This analysis will be used for making
fair comparisons between the frame-by-frame and bit-by-bit
RA decoders in Section IV. As described in Section II-B, a
complexity reduction is facilitated for the proposed bit-by-bit
RA decoder, if the equivalent interleaver Π and Log-MAP
URC decoder are able to exploit knowledge of the interleaver
6design pi. Owing to this, it is not necessary to generate all
aLLRs and eLLRs during the operation of the bit-by-bit RA
decoder. We commence in Section III-A by characterizing the
average number of LLRs are generated during each TS of
each decoding iteration. Then, Section III-B quantifies and
compares the computational complexity of the frame-by-frame
and bit-by-bit RA decoders in terms of the number of Add,
Compare and Select (ACS) operations [9] that they perform
in each decoding iteration.
A. Complexity Reduction
As described in Section II-B2, whenever an eLLR z˜ek is
provided for the equivalent interleaver Π, it has the opportu-
nity to update the (N − 1) aLLRs having the indices in the
kth column of Π. However, it is sufficient for the equivalent
interleaver Π to update at most one of these aLLRs, namely
that specific z˜ak′ which will be used first by the Log-MAP URC
decoder. This is because the updated z˜ak′ will lead to better
opportunities to update the other (N − 2) aLLRs, before they
are used by the Log-MAP URC decoder. More specifically, the
other (N − 2) indices in the kth column of Π also appear in
the k′th column, owing to the symmetry within the equivalent
interleaver. Owing to this, when z˜ak′ is used by the Log-MAP
URC decoder, it will input the corresponding eLLR z˜ek′ to the
equivalent interleaver Π, granting it the opportunity to update
the other (N − 2) aLLRs. Note that owing to the symmetry
of the equivalent interleaver, this approach will never miss
an opportunity to update an aLLR that is used by the Log-
MAP URC decoder and hence it does not compromise the
error correction capability of the bit-by-bit RA decoder. In the
example of Figure 3, the equivalent interleaver Π is provided
with z˜e1 in TS 1, giving it the opportunity to update z˜
a
2 and
z˜a4 . Of these, it is z˜
a
2 that is used first by the Log-MAP URC
decoder, in TS 2. In response, the Log-MAP URC decoder
provides z˜e2 to the equivalent interleaver Π, giving it a better
opportunity to update z˜a4 , ready to be used by the Log-MAP
URC decoder in TS 4. Therefore, even though the opportunity
to update z˜a4 was not exploited in TS 1, the error correction
capability of the bit-by-bit RA decoder is not compromised,
since z˜a4 is updated in TS 2, ready to be used in TS 4.
Like the equivalent interleaver Π, the Log-MAP URC
decoder has the opportunity to generate an eLLR in each TS
of each decoding iteration. However, in some TSs it is not
necessary for the Log-MAP URC decoder to generate any
eLLRs. In these TSs it is therefore also unnecessary for the
equivalent interleaver Π to generate any aLLRs, since there is
no new extrinsic information to base these on. This situation
happens when the Log-MAP URC decoder has the opportunity
to generate an eLLR z˜ek, but all (N−1) of the indices in the kth
column of Π are situated behind the index k in the direction of
the current recursion. In this case, none of the corresponding
(N−1) aLLRs would be used by the Log-MAP URC decoder
before the current recursion is completed. Nor would they be
used before the eLLR z˜ek is updated again during the opposite
recursion, when there is a better opportunity to update these
aLLRs. This explains why the Log-MAP URC decoder does
not exploit the opportunity to update z˜e4 in TS 4 of Figure 3 and
why the equivalent interleaver Π does not update any aLLRs
in this TS, as discussed in Section II-B2. More specifically, this
is because the corresponding aLLRs z˜a1 and z˜
a
2 have indices
that are behind that of z˜e4 during the FR.
Based on these observations, we can quantify the average
number of LLRs that the equivalent interleaver Π and the Log-
MAP URC decoder generates during each TS of each decoding
iteration. Since the Log-MAP URC decoder and the equivalent
interleaver Π will each generate either one or zero LLRs
during each TS, the average number is given by the fraction of
TSs in which one eLLR and one aLLR are generated. During
the FR, the kth column of Π is associated with the generation
of an eLLR and an aLLR, if at least one of its (N−1) elements
has a higher value than k. Therefore, the specific fraction of
columns in Π that are of this type gives the average number
of eLLRs and aLLRs that are generated during each TS of
the FR. When employing a random interleaver design pi, the
expected value for this fraction is given by
Pa =
N − 1
N
(
1− 1
LN
)
, (4)
as derived in Appendix A. Owing to the symmetry of the
situation, the corresponding fraction for the BR will have the
same value. Therefore, Pa quantifies the average number of
eLLRs that the Log-MAP URC decoder generates during each
TS of each decoding iteration, as well as the average number
of aLLRs that the equivalent interleaver Π generates. Note
that we have Pa → N−1N for long messages, where L→∞.
B. Add, Compare and Select Operations
The computational complexity of the frame-by-frame and
bit-by-bit RA decoders can be quantified using the number
of operations that are performed during each decoding itera-
tion. Both decoders operate solely on the basis of additions,
subtractions and max∗ operations [9], where
max ∗(a, b) = max(a, b) + ln(1 + e−|a−b|). (5)
The logarithmic term on the right hand side of (5) may be ap-
proximated using an 8-entry Look-Up-Table (LUT) [9]. In this
case, each max∗ operation can be considered to comprise five
ACS operations [9]. More specifically, a first ACS operation
may be employed for simultaneously calculating max(a, b)
and −|a − b|. Three ACS operations may be employed to
logarithmically decompose the LUT and then to select the
best approximation for ln(1 + e−|a−b|). Finally, a fifth ACS
operation may be employed to add this to max(a, b). Similarly,
each addition and each subtraction employed by the frame-
by-frame and bit-by-bit RA decoder may be considered to
comprise a single ACS operation.
Table I quantifies the computational complexity of the bit-
by-bit and frame-by-frame RA decoders per message bit per
decoding iteration, when employing the particular combina-
tions of R-rate repetition codes and M -state URC codes, as it
will be justified in Section IV. The quantities of Table I were
obtained by counting the number of operations performed per
message bit per iteration in optimized implementations of the
bit-by-bit and frame-by-frame RA decoders. Note that for each
7combination of R and M , the bit-by-bit RA decoder requires
more ACS operations than the frame-by-frame RA decoder.
This is because in the bit-by-bit RA decoder, the equivalent
interleaver Π and the Log-MAP URC decoder each generate
an average of around (N − 1)/N LLRs in each TS, where
N = 1/R. By contrast, in the frame-by-frame RA decoder,
both the repeat and the URC decoder each generate an average
of only about 1/2 an LLR in each TS. This is because around
half of the TSs are dedicated to performing the Log-MAP
URC decoder’s FR, which does not generate any eLLRs, as
exemplified in Figure 2. Despite its higher complexity per
message bit per iteration, we will show in Section IV that the
bit-by-bit RA decoder has a lower complexity overall, since it
requires fewer decoding iterations to converge.
TABLE I
COMPUTATIONAL COMPLEXITY PER MESSAGE BIT PER DECODING
ITERATION FOR BIT-BY-BIT AND FRAME-BY-FRAME RA DECODERS, WHEN
EMPLOYING R-RATE REPETITION CODES AND M -STATE URC CODES.
Decoder R M + or − max∗ ACS
1/2 8 102 60 402
Bit-by-Bit 1/3 4 98 48 338
1/4 2 86 28 226
1/2 8 100 60 400
Frame-by-Frame 1/3 4 81 42 291
1/4 2 64 24 184
IV. RESULTS
In this section, the EXIT charts [6] of frame-by-frame and
bit-by-bit RA decoders are characterized and compared in
Section IV-A. Then, the Bit Error Ratio (BER) performances
of these schemes are compared in Section IV-B.
A. EXIT Charts
EXIT charts [6] may be used for characterizing the iterative
decoding convergence of iterative decoders, considering the
evolution of the Mutual Information (MI) I(z˜a; z) and I(z˜e; z)
of the aLLRs and eLLRs, respectively. The EXIT chart of
the frame-by-frame RA decoder may be obtained in the
conventional manner, by plotting the inverted EXIT function
of the repeat decoder together with the EXIT function of
the Log-MAP URC decoder, as shown in Figures 4(a)-(e).
Here, Figure 4(a) shows that the R = 1/2-rate repetition
code’s EXIT function has a complementary shape to that of
the M = 8-state frame-by-frame URC code described in
Section II-A. This allows the creation of an open EXIT chart
tunnel at low Eb/N0 values, which facilitates iterative decod-
ing convergence to a low BER. Likewise, Figures 4(b) and (d)
show that the R = 1/3- and R = 1/4-rate repetition codes
have EXIT functions that complement those of the M = 4-
and M = 2-state frame-by-frame URC codes, respectively.
Indeed, our experiments revealed that other combinations of
R = 1/2, 1/3, and 1/4 with M = 2, 4 and 8 offered inferior
matches between the repetition and URC EXIT functions,
requiring higher Eb/N0 values to create open EXIT chart
tunnels. This motivates the selected combinations of R and
M , as described in Section II-A. Figures 4(a)-(e) show that
the frame-by-frame RA decoder’s iterative decoding trajectory
offers a perfect match with the corresponding EXIT functions,
for all the parametrizations and Eb/N0 values considered, as
may be expected. Note that these trajectories were obtained
using a message length of L = 106, in order to eliminate the
variation that occurs from frame-to-frame, when employing
shorter message lengths L.
By contrast, the EXIT functions and trajectories of the bit-
by-bit RA decoder cannot be obtained in the conventional
manner, owing to its unique operation. Instead, the bit-by-
bit RA decoder’s iterative decoding trajectory is obtained
by measuring the MIs I(z˜a; z) and I(z˜e; z) simultaneously,
following the completion of the BR at the end of each
decoding iteration. While the iterative decoding trajectory of
the bit-by-bit RA decoder can be seen to offer a perfect
match with the repetition code’s EXIT function, it consistently
and significantly overshoots that of the frame-by-frame URC
code, as shown in Figures 4(a)-(e). This may be explained by
the observation that each bit-by-bit decoding iteration can be
thought of as being approximately equivalent to two frame-
by-frame decoding iterations, as described in Section II-B3.
This observation is supported by the so-called ‘half-iteration’
trajectories, which are also plotted in Figures 4(a)-(e) for
the bit-by-bit RA decoder. These half-iteration trajectories
were obtained by measuring the MIs I(z˜a; z) and I(z˜e; z)
simultaneously, following the completion of not only each
BR, but also each FR. Accordingly, it may be observed that
each pair of steps in the half-iteration trajectory reaches the
same point in the EXIT chart as a single step in the bit-by-
bit RA decoder trajectory. However, it may also be observed
that this half-iteration trajectory offers a good match with the
EXIT functions of both the repetition code and the frame-
by-frame URC code. This confirms that each bit-by-bit half
iteration is equivalent to a single frame-by-frame iteration
and hence that each bit-by-bit iteration is equivalent to two
frame-by-frame iterations. Owing to this, Figures 4(a)-(e)
show that the proposed bit-by-bit RA decoder is capable of
achieving iterative decoding convergence using significantly
fewer decoding iterations than that required by the frame-
by-frame RA benchmarker, in all the scenarios considered.
This reduction in the required number of decoding iterations
more than compensates for the bit-by-bit RA decoder’s slightly
increased complexity per decoding iteration, hence offering
a significantly reduced overall complexity, as described in
Section III-B.
The EXIT function of the bit-by-bit URC code cannot be
plotted in the conventional manner, since this overlooks the
memory that is maintained between the operation of the bit-by-
bit Log-MAP URC decoder in successive decoding iterations.
While the outputs of the conventional frame-by-frame Log-
MAP URC decoder depend only on its inputs, those of the
bit-by-bit Log-MAP URC decoder additionally depend also
on the results calculated during previous decoding iterations
and stored in memory. For example, during the bit-by-bit
Log-MAP URC decoder’s FR, the backward state metrics
calculated during the previous decoding iteration are loaded
from memory, as described in Section II-B3. Figures 4(a)-(e)
provide predictions for the EXIT function of the bit-by-bit RA
8(e) R = 1/4, M = 2, Eb/N0 = 4 dB
I(z˜a; z)
I
(z˜
e
;z
)
10.80.60.40.20
1
0.8
0.6
0.4
0.2
0
(d) R = 1/4, M = 2, Eb/N0 = 2 dB
I(z˜a; z)
I
(z˜
e
;z
)
10.80.60.40.20
1
0.8
0.6
0.4
0.2
0
(c) R = 1/4, M = 2, Eb/N0 = 0 dB
I(z˜a; z)
I
(z˜
e
;z
)
10.80.60.40.20
1
0.8
0.6
0.4
0.2
0
(b) R = 1/3, M = 4, Eb/N0 = 2 dB
I(z˜a; z)
I
(z˜
e
;z
)
10.80.60.40.20
1
0.8
0.6
0.4
0.2
0
Bit-by-bit half-iteration trajectory
Bit-by-bit trajectory
Frame-by-frame trajectory
Bit-by-bit URC EXIT function
Frame-by-frame URC EXIT function
Repetition EXIT function
(a) R = 1/2, M = 8, Eb/N0 = 2 dB
I(z˜a; z)
I
(z˜
e
;z
)
10.80.60.40.20
1
0.8
0.6
0.4
0.2
0
Fig. 4. EXIT charts of frame-by-frame and bit-by-bit RA decoders employing R-rate repetition codes and M -state URC codes, where (a) R = 1/2 and
M = 8, (b) R = 1/3 and M = 4, as well as (c)-(e) R = 1/4 and M = 2. EXIT functions and trajectories are obtained for message lengths of L = 106
bits, when employing BPSK for communication over an AWGN channel having (a), (b) and (d) Eb/N0 = 2 dB, (c) Eb/N0 = 0 dB, as well as (e)
Eb/N0 = 4 dB.
decoder, which attempt to model the effect of this memory.
These are motivated by the observation that each bit-by-bit
decoding iteration can be thought of as being approximately
equivalent to two frame-by-frame decoding iterations. More
specifically, each point in the predicted bit-by-bit Log-MAP
URC decoder’s EXIT function is obtained by modeling two
iterations of the frame-by-frame RA decoder. This is exempli-
fied by the dashed line in Figure 4(a), which shows how to ob-
tain the point in the bit-by-bit Log-MAP URC decoder’s EXIT
function corresponding to I(z˜a; z) = 0.4. The first iteration of
the frame-by-frame RA decoder may be modeled by drawing
the vertical dashed line from [I(z˜a; z), I(z˜e; z)] = [0.4, 0] up
to the frame-by-frame URC code’s EXIT function, which is
met at [I(z˜a; z), I(z˜e; z)] = [0.4, 0.56]. The second iteration is
modeled by extending this line horizontally to meet the repeti-
tion code’s EXIT function at [I(z˜a; z), I(z˜e; z)] = [0.56, 0.56]
and then vertically to meet that of the frame-by-frame URC
code again at [I(z˜a; z), I(z˜e; z)] = [0.56, 0.7]. This models the
operation of the repetition decoder and then the URC decoder
again. Finally, the corresponding point in the predicted bit-
by-bit URC code’s EXIT function is obtained by extending
the line horizontally back to I(z˜a; z) = 0.4, obtaining the
point in the predicted bit-by-bit URC code’s EXIT function
at [I(z˜a; z), I(z˜e; z)] = [0.4, 0.7]. This point represents the
observation that if the repeat decoder produces aLLRs having
the MI I(z˜a; z) = 0.4, then the bit-by-bit URC decoder will
produce eLLRs having the MI I(z˜e; z) = 0.7. This process
may be repeated to obtain the other points in the predicted
bit-by-bit URC code’s EXIT function, corresponding to other
values of I(z˜a; z). As shown in Figures 4(a)-(e), the approach
advocated offers a good, albeit imperfect, match with the bit-
by-bit RA decoder’s iterative decoding trajectory in all the
cases considered. Note that this analysis reveals that bit-by-bit
decoding magnifies the EXIT chart tunnel of the RA decoder.
More specifically, if the frame-by-frame RA decoder has an
open EXIT chart tunnel, then that of the bit-by-bit RA decoder
will be wider, expediting the iterative decoding convergence.
However, if the frame-by-frame RA decoder has a closed
EXIT chart tunnel, then the same is true for the bit-by-bit RA
decoder, as exemplified in Figure 4(c). Owing to this, it will be
shown in Section IV-B that the bit-by-bit and frame-by-frame
RA decoders converge to the same BER performance, although
the bit-by-bit decoder requires fewer decoding iterations to
achieve this.
B. BER performances
Figure 5 contrasts the BER performance of the frame-by-
frame and of the proposed bit-by-bit RA decoders, when
employing the parametrizations of Section IV-A, but with the
9aid of a more practical message length of L = 103 bits.
During the simulation of the frame-by-frame and bit-by-bit
RA decoders, both the BER and the cumulative computational
complexity was recorded following each decoding iteration.
This facilitated a three-dimensional plot of the BER versus
complexity and Eb/N0. The plots of Figure 5 were obtained
by taking slices through this 3-D plot at particular values of
computational complexity and interpolating, which facilitates
fair comparisons. These particular values of computational
complexity were specifically chosen so that they facilitate 3,
6 and 30 iterations of the frame-by-frame RA decoder.
As shown in Figures 5(a)-(c), the proposed bit-by-bit RA
decoder offers superior BER performance to that of the frame-
by-frame benchmarker, for all the parametrizations and for all
the complexity limits considered. In particular, when employ-
ing the lowest considered complexity limit, the bit-by-bit RA
decoder offers 0.72 dB, 0.83 dB and 0.86 dB gain for the
R = 1/2-, 1/3- and 1/4-rate parametrizations, respectively.
Note that this is achieved ‘for free’, since the comparisons are
fair in terms of their overall decoding complexity, as well as
in terms of their transmission energy, bandwidth and duration.
However, the attainable gains diminish as the complexity limit
is increased. This may be expected, because the bit-by-bit RA
decoder’s EXIT chart tunnel opens at the same Eb/N0 value
as that of the frame-by-frame benchmarker, as discussed in
Section IV-A. Owing to this, similar BER performances are
achieved, provided the complexity limit is sufficiently high for
both schemes to achieve iterative decoding convergence.
V. CONCLUSIONS
In this paper, we have proposed a novel bit-by-bit iterative
decoding technique, which was shown to expedite the con-
vergence of RA decoders. In a frame-by-frame RA decoder,
the repeat and accumulate component decoders are operated
iteratively, with one component decoder kept idle, whenever
the other is activated. The proposed bit-by-bit RA decoder
expedites this process by allowing both component decoders
to operate simultaneously, exchanging outputs continuously
without buffering. In contrast to previous bit-by-bit iterative
decoding techniques, ours reduces the number of decoding
iterations required to achieve convergence and improves the
associated error correction capability. Furthermore, we have
proposed a novel technique for reducing the complexity of
our bit-by-bit RA decoder. We have also proposed a novel
EXIT chart analysis technique, which demonstrates that each
decoding iteration of our bit-by-bit RA decoder is equivalent to
two iterations of the conventional frame-by-frame RA decoder.
Overall, we have demonstrated that in a range of practical
scenarios, the proposed bit-by-bit RA decoder offers gains
of up to 0.86 dB, without imposing any additional decoding
complexity and without requiring any additional transmission-
energy, -bandwidth or -duration.
APPENDIX A
DERIVATION OF (4)
We are interested in the situation where at least one of
the (N − 1) elements in the kth column of the equivalent
interleaver Π has a higher value than k. When the interleaver
pi adopts random designs according to the uniform interleaver
distribution [1], the fraction of columns that are of the type
described above will vary from design to design. The expected
value for the fraction of columns that are of the type described
above across all random designs is given by
Pa =
1
LN
LN∑
k=1
Pr
(
N−1⋃
n=1
Πn,k > k
)
, (6)
where Pr
(⋃N−1
n=1 Πn,k > k
)
is the probability that the kth
column of Π is of the type described above. This fraction
may be obtained as one minus the fraction of columns that
are not of the type described above, according to
Pa = 1− 1
LN
LN∑
k=1
Pr
(
N−1⋂
n=1
Πn,k < k
)
, (7)
where Pr
(⋂N−1
n=1 Πn,k < k
)
is the probability that the kth
column of Π is not of the type described above. This proba-
bility is given by
Pr
(
N−1⋂
n=1
Πn,k < k
)
(8)
=
N−1∏
n=1
Pr
(
Πn,k < k
∣∣∣∣∣
n−1⋂
n′=1
Πn′,k < k
)
(9)
For cases where k ≥ N , we have
Pr
(
N−1⋂
n=1
Πn,k < k
)
(10)
=
k − 1
LN
× k − 2
LN − 1 × . . .×
k − (N − 1)
LN − (N − 1) + 1 (11)
=
(
k − 1
N − 1
)
(
LN
N − 1
) . (12)
Meanwhile, for cases where k < N , we have
Pr
(⋂N−1
n=1 Πn,k < k
)
= 0. Substituting these results into (7)
yields
Pa = 1− 1
LN
LN∑
k=N
(
k − 1
N − 1
)
(
LN
N − 1
) . (13)
Substituting m = k − 1 into the index of the summation in
(13) gives
Pa = 1− 1
LN
LN−1∑
m=N−1
(
m
N − 1
)
(
LN
N − 1
) . (14)
10
5520 ACS/bit/it
1104 ACS/bit/it
552 ACS/bit/it
Frame-by-frame
Bit-by-bit
C
ap
ac
it
y
b
ou
n
d
-0
.7
39
d
B
(c) R = 1/4, M = 2
Eb/N0 [dB]
S
E
R
6420-2-4
100
10−1
10−2
10−3
10−4
10−5
8730 ACS/bit/it
1746 ACS/bit/it
873 ACS/bit/it
Frame-by-frame
Bit-by-bit
C
ap
ac
it
y
b
ou
n
d
-0
.4
18
d
B
(b) R = 1/3, M = 4
Eb/N0 [dB]
S
E
R
6420-2-4
100
10−1
10−2
10−3
10−4
10−5
12000 ACS/bit/it
2400 ACS/bit/it
1200 ACS/bit/it
Frame-by-frame
Bit-by-bit
C
ap
ac
it
y
b
ou
n
d
0.
16
0
d
B
(a) R = 1/2, M = 8
Eb/N0 [dB]
S
E
R
6420-2-4
100
10−1
10−2
10−3
10−4
10−5
Fig. 5. BER plots of frame-by-frame and bit-by-bit RA decoders employing R-rate repetition codes and M -state URC codes, where (a) R = 1/2 and
M = 8, (b) R = 1/3 and M = 4, as well as (c) R = 1/4 and M = 2. These plots are obtained for message lengths of L = 103 bits, when employing
BPSK for communication over an AWGN channel having various Eb/N0 values. In each case, plots are provided for three iterative decoding complexity
limits, which are chosen to facilitate 3, 6 and 30 iterations of the frame-by-frame RA decoder.
Applying the identity
n∑
m=k
(
m
k
)
=
(
n+ 1
k + 1
)
to (14) gives
Pa = 1− 1
LN
(
LN
N
)
(
LN
N − 1
) . (15)
Finally, applying the identity
(
n
k
)
=
n!
k!(n− k)! to (15)
and simplifying it yields (4).
REFERENCES
[1] D. Divsalar, H. Jin, and R. J. McEliece, “Coding theorems for ‘turbo-
like’ codes,” in Proc. Allerton Conf. on Communications, Control and
Computing, (Urbana, IL, USA), pp. 201–210, Sept. 1998.
[2] S. Benedetto, G. Montorsi, D. Divsalar, and F. Pollara, “Soft-input soft-
output modules for the construction and distributed iterative decoding
of code networks,” Euro. Trans. Telecommun., vol. 9, pp. 155–172, Mar.
1998.
[3] M. F. U. Butt, S. X. Ng, and L. Hanzo, “Self-concatenated code design
and its application in power-efficient cooperative communications,”
IEEE Commun. Surveys Tutorials, vol. 14, pp. 858–883, Sept. 2012.
[4] S. Dolinar and D. Divsalar, “Weight distributions for turbo codes using
random and nonrandom permutations,” Telecommunications and Data
Acquisition Progress Report, vol. 122, pp. 56–65, Apr. 1995.
[5] L. Bahl, J. Cocke, F. Jelinek, and J. Raviv, “Optimal decoding of linear
codes for minimizing symbol error rate (Corresp.),” IEEE Trans. Inform.
Theory, vol. 20, pp. 284–287, Mar. 1974.
[6] R. G. Maunder and L. Hanzo, “Near-capacity irregular variable length
coding and irregular unity rate coding,” IEEE Trans. Wireless Commun.,
vol. 8, pp. 5500–5507, Nov. 2009.
[7] A. Roumy, S. Guemghar, G. Caire, S. Verdu, “Design methods for
irregular repeat-accumulate codes,” IEEE Trans. Inform. Theory, vol. 50,
pp. 1711–1727, August 2004.
[8] J. Hagenauer, E. Offer, and L. Papke, “Iterative decoding of binary block
and convolutional codes,” IEEE Trans. Inform. Theory, vol. 42, pp. 429–
445, Mar. 1996.
[9] L. Li, R. G. Maunder, B. M. Al-Hashimi, and L. Hanzo, “A low-
complexity turbo decoder architecture for energy-efficient wireless sen-
sor networks,” IEEE Trans. VLSI Syst., vol. 21, pp. 14–22, Jan. 2013.
[10] K. Chung, “Irregular repeat quaternary-accumulate (IRqA) codes and
two-dimensional BCJR decoding,” IET Commun., vol. 6, pp. 1284–1290,
July 2012.
[11] P. Robertson, E. Villebrun, and P. Hoeher, “A comparison of optimal and
sub-optimal MAP decoding algorithms operating in the log domain,” in
Proc. IEEE Int. Conf. on Communications, vol. 2, (Seattle, WA, USA),
pp. 1009–1013, June 1995.
[12] S. Yoon and Y. Bar-Ness, “A parallel MAP algorithm for low latency
turbo decoding,” IEEE Commun. Lett., vol. 6, pp. 288–290, July 2002.
[13] T. Ilnseher, F. Kienle, C. Weis, and N. Wehn, “A 2.15GBit/s turbo code
decoder for LTE Advanced base station applications,” in Proc. Int. Symp.
on Turbo Codes and Iterative Information Processing, (Gothenburg,
Sweden), pp. 21–25, Aug. 2012.
[14] C. Argon and S. W. McLaughlin, “A parallel decoder for low latency
decoding of turbo product codes,” IEEE Commun. Lett., vol. 6, pp. 70–
72, Feb. 2002.
[15] Y.-C. Lu and E.-H. Lu, “A parallel decoder design for low latency turbo
decoding,” in Proc. Conf. Innovative Computing Information Control,
(Kumamoto, Japan), p. 386, Sept. 2007.
[16] Y.-C. Lu, T. C. Chen, and E.-H. Lu, “Low-latency turbo decoder design
by concurrent decoding of component codes,” in Proc. IEEE Int. Conf.
Innovative Computing Information Control, (Dalian, China), p. 533, June
2008.
[17] M. Taskaldiran, R. C. S. Morling, and I. Kale, “Increasing the speed
of parallel decoding of turbo codes,” in Proc. Ph.D. Research in
Microelectronics Electronics, no. 1, (Cork, Ireland), pp. 304–307, July
2009.
[18] S. ten Brink, “Convergence of iterative decoding,” Electron. Lett.,
vol. 35, pp. 806–808, May 1999.
[19] H. Jin, A. Khandekar, and R. McEliece, “Irregular repeat-accumulate
codes,” in Proc. Int. Symp. Turbo Codes Related Topics, (Brest, France),
Sept. 2000.
[20] A. Abbasfar, D. Divsalar, and K. Yao, “Accumulate-repeat-accumulate
codes,” IEEE Trans. Commun., vol. 55, pp. 692–702, Apr. 2007.
[21] G. Wang, I. Land, and A. Grant, “Design of irregular repeat accumulate
codes for finite decoder iterations,” IEEE Trans. Commun., vol. 62,
pp. 3092–3103, Sept. 2014.
[22] H.-A. Loeliger, “An introduction to factor graphs,” IEEE Signal Pro-
cessing Magazine, vol. 21, pp. 28–41, Jan. 2004.
[23] X. Wu, Y. Xue, and H. Xiang, “On concatenated zigzag Codes and their
decoding schemes,” IEEE Commun. Lett., vol. 8, pp. 54–56, Jan. 2004.
[24] J. Li, K. R. Narayanan, and C. N. Georghiades, “Product accumulate
codes: A class of codes with near-capacity performance and low
decoding complexity,” IEEE Trans. Inform. Theory, vol. 50, pp. 31–46,
Jan. 2004.
[25] K. Yen, N. Veselinovic, and T. Matsumoto, “Space-time weighted
nonbinary repeat-accumulate codes in frequency-selective MIMO chan-
nels,” in Proc. IEEE Veh. Technol. Conf., vol. 5, (Stockholm, Sweden),
pp. 3152–3156, May 2005.
[26] A. Wolf, J. Ertel, and A. Finger, “Performance of serial concate-
nated codes under iterative decoding and different update modes,” in
Proc. IEEE Workshop Signal Processing Advances Wireless Commun.,
(Cannes, France), July 2006.
[27] T. Bhatt and V. Stolpman, “Structured interleaves and decoder archi-
tectures for zigzag codes,” in Proc. Asilomar Conf. Signals Systems
Computers, (Pacific Grove, CA, USA), pp. 99–104, Oct. 2006.
[28] M.-C. Chiu, “Bandwidth-efficient modulation codes based on nonbinary
11
irregular repeat-accumulate codes,” IEEE Trans. Inform. Theory, vol. 56,
pp. 152–167, Jan. 2010.
Jinhui Chen Jinhui Chen received her BEng de-
gree in Telecommunication Engineering (first class
honors) in Xi’an Jiao Tong Liverpool University,
Suzhou, China, in 2012. She was awarded a MSc
degree in Wireless Communications (distinction) in
the University of Southampton, Southampton, UK,
in 2013. She is currently working in Panawell &
Partners, LLC, Beijing, China. Her interests include
intellectual property, channel coding, and antennas.
Wenbo Zhang received the M.E. degree in Informa-
tion and Communication Engineering from the Uni-
versity of Beijing University of Posts and Telecom-
munications (BUPT), Beijing, China, in 2011. He
is currently working toward the Ph.D. degree with
the Communications Research Group, Electronics
and Computer Science, University of Southampton,
Southampton, UK. His current research interests
include joint source/channel coding and variable
length coding.
Robert G. Maunder has studied with Electronics
and Computer Science, University of Southampton,
UK, since October 2000. He was awarded a first
class honors BEng in Electronic Engineering in July
2003, as well as a PhD in Wireless Communications
in December 2007. He became a lecturer in 2007
and an Associated Professor in 2013. Rob’s re-
search interests include joint source/channel coding,
iterative decoding, irregular coding and modulation
techniques. For further information on this research,
please refer to http://users.ecs.soton.ac.uk/rm.
Lajos Hanzo (http://www-mobile.ecs.soton.ac.uk)
FREng, FIEEE, FIET, Fellow of EURASIP, DSc
received his degree in electronics in 1976 and
his doctorate in 1983. In 2009 he was awarded
the honorary doctorate “Doctor Honoris Causa” by
the Technical University of Budapest. During his
35-year career in telecommunications he has held
various research and academic posts in Hungary,
Germany and the UK. Since 1986 he has been with
the School of Electronics and Computer Science,
University of Southampton, UK, where he holds
the chair in telecommunications. He has successfully supervised 80 PhD
students, co-authored 20 John Wiley/IEEE Press books on mobile radio
communications totalling in excess of 10 000 pages, coauthored 1300+
research entries at IEEE Xplore, acted both as TPC and General Chair of
IEEE conferences, presented keynote lectures and has been awarded a number
of distinctions. Currently he is directing a 100-strong academic research team,
working on a range of research projects in the field of wireless multimedia
communications sponsored by industry, the Engineering and Physical Sciences
Research Council (EPSRC) UK, the European IST Programme and the Mobile
Virtual Centre of Excellence (VCE), UK. He is an enthusiastic supporter of
industrial and academic liaison and he offers a range of industrial courses. He
is also a Governor of the IEEE VTS. During 2008 - 2012 he was the Editor-in-
Chief of the IEEE Press and a Chaired Professor also at Tsinghua University,
Beijing. His research is funded by the European Research Council’s Senior
Research Fellow Grant. For further information on research in progress and
associated publications please refer to http://www-mobile.ecs.soton.ac.uk
