Simplified Compression of Redundancy Free Trellis Sections in Turbo Decoder by Boutillon, Emmanuel et al.
Simplified Compression of Redundancy Free Trellis
Sections in Turbo Decoder
Emmanuel Boutillon, Jose´-Luis Sanchez-Rojas, Ce´dric Marchand
To cite this version:
Emmanuel Boutillon, Jose´-Luis Sanchez-Rojas, Ce´dric Marchand. Simplified Compression of
Redundancy Free Trellis Sections in Turbo Decoder. IEEE Communications Letters, Institute
of Electrical and Electronics Engineers, 2014, 18 Issue: 6 DOI: 10.1109/LCOMM.2014.2319257
Publication Year: 2014 Page(s): (6), pp.941 - 944. <hal-00978579>
HAL Id: hal-00978579
https://hal.archives-ouvertes.fr/hal-00978579
Submitted on 14 Apr 2014
HAL is a multi-disciplinary open access
archive for the deposit and dissemination of sci-
entific research documents, whether they are pub-
lished or not. The documents may come from
teaching and research institutions in France or
abroad, or from public or private research centers.
L’archive ouverte pluridisciplinaire HAL, est
destine´e au de´poˆt et a` la diffusion de documents
scientifiques de niveau recherche, publie´s ou non,
e´manant des e´tablissements d’enseignement et de
recherche franc¸ais ou e´trangers, des laboratoires
publics ou prive´s.
1Simplified Compression of Redundancy Free Trellis
Sections in Turbo Decoder
Emmanuel Boutillon⊥, Senior Member, IEEE, Jose´-Luis Sanchez-Rojas∗, Ce´dric Marchand⊥, Member, IEEE
⊥Lab-STICC, UMR 6582, Universite´ de Bretagne Sud, 56100 Lorient, France
∗INICTEL-UNI, av. San Luis 1771, San Borja, Lima 41, Lima, Peru
Abstract—It has been recently shown that a sequence of
R = q(M − 1) redundancy free trellis stages of a recursive
convolutional decoder can be compressed in a sequence of
L = M − 1 trellis stages, where M is the number of states of
the trellis and q is a positive integer. In this paper, we show that
for an M state Turbo decoder, among the L compressed trellis
stages, only m = 3 or even m = 2 are necessary. The so-called
m-min algorithm can either be used to increase the throughput
for decoding a high rate turbo-code and/or to reduce its power
consumption.1
I. INTRODUCTION
The quality of an error control code design can be evalu-
ated in terms of decoding performance, implementation cost
(area, power dissipation) and decoding throughput. A general
overview of error control code decoders can be found in [1].
The recent specifications of wireless systems (LTE, HSPA [2])
propose the use of turbo-codes with very high code rates (typi-
cally, between 0.8 and 0.98). In contrast to code construction,
there are very few papers dedicated to decoders with such
high rates. Most of the reported architectures propose the basic
structure of a rate 1/3 decoder, with parameters (i.e. window
lengths) optimized for high rates to tradeoff performance
and decoding throughput. In [3], a method is proposed to
directly exploit the existence of long sequences (up to 100)
of Redundancy Free Trellis Sections (RFTS, or sequences of
bits without redundancy) to reduce the complexity of part of
the decoder: during the acquisition process, any RFTS of size
R is replaced by a shorter RFTS sequence of size L = M−1,
with additional (R mod L) steps of shuffling. In this paper,
we show that, in the context of a Max-Log-MAP decoder [5],
among the L steps of the compressed RFTS, only m = 3 or
m = 2 are really useful, which allows further architectural
optimization.
The remainder of the paper is divided into four sections.
Sec. II gives enough information about the trellis compression
to have a self-consistent paper. Then, Sec. III presents the
sub-optimal 3-min simplification; followed in Sec. IV by a
discussion about hardware implementation. Finally, Sec. V
concludes the paper.
II. PRINCIPLE OF THE RTFS TRELLIS COMPACTION
In this section, we will first recall the problem of acquisition
for high rate turbo-codes. Then, we will present the principle
1This work has been supported by the GIGADEC project from Brittany
region, as well as the CPER project PALMYRE II (Brittany region and
FEDER funding).
of trellis compaction at the encoding level before deriving it
at the decoding level.
A. Acquisition for high rate turbo-codes
The implementation of a turbo-code is a well investigated
area. The standard implementation uses the Log-MAP algo-
rithm or the Max-Log-MAP algorithm [5] along with the
sliding window algorithm [6]. This algorithm consists in
dividing the frame of length K into windows of length W and
processing the forward-backward steps on a W -sized block
instead of a whole K-sized block. To process the pth window,
an accurate estimation of the initial forward state metrics αpW
and backward state metrics βpW+W are required. One possible
scheduling is to perform the backward recursion directly from
index K down to 1 to obtain naturally the initial βpW+W
states and to obtain the αpW initial states by an acquisition
of size W ′, starting from state α0pW−W ′ to state αpW (see
Fig. 1). This pre-processing is called “acquisition”. The initial
value α0pW−W ′ can be either the all-zero state vector (if there
is no a priori knowledge on the initial state) or a forward state
vector stored from the previous iteration. This last method is
commonly called Next Iteration Initialization (NII) [7].
For high code rates, the length of the acquisition L′ needs
to be high enough to contain some non RFTS (i.e. trellis
sections associated with non-punctured redundancy bits). In
fact, starting from the all-zero state vector, if the acquisition
processes only RFTS, the final state will also be the all-
zero state vector and the acquisition process will thus be
Fig. 1. Schematic representation of the sliding-window algorithms, with
parameters W = K/4 and W ′ = W/2. The x-axis represents the time
index and the y-axis the index of the bit.
2useless. By simulation, it is verified that a high value of L′
is required, even if the NII technique is used. This means
that the acquisition step consumes a significant portion of the
decoder’s processing effort, which reduces the efficiency and
throughput of the hardware implementation.
B. Compression of RFTS at the encoding level
Let us consider the trellis compression in the case of hard
information bits (a decision is made on the value of the
received bit, i.e., no soft value is available). For an 8-state
convolutional encoder, the state-space representation of [8]
gives
Xk+1 = AXk + BDk, (1a)
Vk = CXk + DDk, (1b)
where Xk, Dk and Vk are respectively the state of the encoder,
the input information bit and the output vector at time k. The
matrices (A,B,C,D) are respectively the state matrix, the
entry matrix, the output matrix and the feedforward matrix.
In (1a) and (1b), arithmetic is over GF(2). Moreover, for a
recursive code, the matrix A verifies AL = Id, where Id
is the identity matrix. Fig. 2.a shows the structure of the
encoder (the generation of output bits, corresponding to (1.b),
is omitted). Starting from a state X0 and the bit sequence
Dk, k = 0, 1, . . . , R−1, the final state of the encoder is given
by
XR = A
R · X0 +
R−1∑
k=0
A
k · B ·DR−1−k. (2)
Let us replace index k with qL+ l, where l = k mod L and
q = k−l
L
. For the sake of simplicity, k mod L will be denoted
k|L. Then (2) can be rewritten as
XR = A
RX0 +
L−1∑
l=0
⌊R−1−lL ⌋∑
q=0
A
qL+l
BDR−1−qL−l. (3)
Since AL = Id, AqL+l = Al, thus (3) can be expressed as:
XR = A
R|LX0 +
L−1∑
l=0
A
l
B


⌊R−1−lL ⌋∑
q=0
DR−1−qL−l

 , (4)
where R|L = R mod L. We perform the change of variable
p =
⌊
R−1−l
L
⌋
− q in the last summation term of (4). Let
h(l) = (R− 1− l) mod L. We have
⌊R−1−lL ⌋∑
q=0
DR−1−qL−l =
⌊R−1−h(l)L ⌋∑
p=0
Dh(l)+pL. (5)
Let us give an example to illustrate (5) with R = 23, L = 7
and l = 3. The left term of (5) gives D19 +D12 +D5. Since
h(3) = (23− 1− 3) mod 7 = 5, the right term of (5) gives
D5 + D12 + D19 and both terms are equal. Let D
a
h be the
“aggregated bit” corresponding to the bits having the same
residue h mod L: for h = 0, 1, . . . , L− 1
Dah =
⌊R−1−hL ⌋∑
p=0
Dh+pL. (6)
Fig. 2. (a) 8-state LTE encoder structure, (b) Step of the trellis (c) Example
of RFTS to compute α1 from α0 and γ0 (see table I).
Note: when h ≥ L, Dah is equal to D
a
h mod L. Then, the
last summation of (4) can be expressed as Da
h(l). Let X
′
0 =
A
R|LX0. Using the fact that A
LX′0 = A
R|LX0, (4) can be
expressed recursively starting from state X′0 to state X
′
L = XR
as, for l = 0, 1, . . . , L− 1
X′l+1 = AX
′
l + BD
a
h(L−1−l). (7)
Since h(L−1−l) = (R−1−(L−1−l)) mod L = (R|L+l)
mod L, (7) can be expressed simply as:
X′l+1 = AX
′
l + BD
a
R|L+l
. (8)
Note that (8) has the same structure as (1a). Moreover,
the computation of X′0 = A
R|LX0 can be performed in R|L
steps of (1a) with dummy null input bits Ddk = 0, k =
0, 1, . . . , R|L − 1. In other words, every RFTS sequence of
length R (i.e. R trellis stages with no redundancy) can be
reduced to an RFTS sequence of length R|L + L. The first
R|L RFTS are associated with a null dummy bit D
d
k = 0,
k = 0, 1, . . . , R|L−1 and the last L trellis stages are associated
with the aggregated bit Dal , l = R|L, . . . , L − 1 + R|L. One
should note that shifting the position of an aggregated bit by
a multiple of L does not affect the result. Swapping the sub-
block of the last R|L aggregated bits with the sub-block of
the first R|R dummy bits leads to another valid compaction
of the trellis. In that case, the last R|L stages are associated
with dummy null bits. This solution was originally presented
in [3].
C. Trellis compaction
Let us recall the classical forward recursion of the MAP
decoder. Let αk be the forward state metrics at time k and γk
the branch metrics at time k. The recursion is given by
αk+1(s) = max
∗(αk(s0) + γk(s0, s), αk(s1) + γk(s1, s)),
(9)
where s is the current state, (s0, s) is the branch in the trellis
associated with the input bit Dk = 0 connecting state s0 to the
3state s. Similarly, (s1, s) is the branch in the trellis associated
with the input bit Dk = 1 connecting state s1 to state s. The
max∗ function is defined as
max∗(a, b) = max(a, b) + e−|a−b|. (10)
When the Max-Log-MAP algorithm is used, the max∗ oper-
ator in (9) is simply replaced by the max operator. Fig. 2.b
shows the computation of the state metric αk+1(s).
For an RFTS, there is no associated redundancy bit
and the branch metrics are simply γ(s0, s) = LLR(Dk)
and γ(s1, s) = −LLR(Dk), where LLR(Dk) is the Log-
Likelihood Ratio of the received bit, i.e. LLR(Dk) =
ln
(
P (Dk=0)
P (Dk=1)
)
. In the sequel, for the sake of simplicity,
LLR(Dk) will be denoted γk. Similarly, γ
a
l will represent the
LLR of the aggregated bit Dal and γ
d
l the LLR of the dummy
null bit (γdl = +∞ since D
d
l = 0).
As shown in Sec. II.B, at the encoder side, a sequence of
length R can be reduced to a sequence of length R|L with
dummy zero inputs and a sequence of length L of aggregated
bits. At the receiver side, this implies that the RFTS sequence
can be compressed into R|L steps of shuffling (using the
branch metric γdl = +∞), followed by L trellis sections of
aggregated bits γal . Since summation in (6) is over GF(2), at
the receiver side, γal = LLR(D
a
l ) is given by
γal = 2 tanh
−1


⌊R−1−lL ⌋∏
q=0
tanh
(γqL+l
2
) . (11)
Thanks to the min-sum approximation, (11) can be simplified
by separating the modulus of γal and the sign of γ
a
l :
|γal | = min
{
|γqL+l|, q = 0, 1, . . . ,
⌊
R− 1− l
L
⌋}
, (12)
sign(γal ) =
⌊R−1−lL ⌋∏
q=0
sign(γqL+l). (13)
D. Example of RFTS compaction
Let us consider the M = 8-state encoder defined in Fig.
2.a, a RFTS sequence of length R = 16 with initial state
metrics at time 0 equal to α0 = [35, 0, 20, 25, 17, 3, 16, 31]
′.
The LLRs of the Dk values are equal to {γk}k=0,1,...,15 =
{−14, 31, 24, 12, 31, 20, 6, −31, 19, 15, −19, −8, 15,
12, 5, −11}2. Let us assume an RFTS sequence of length
R = 16. Table I shows the first forward state metrics and
the last forward state metrics of the RFTS sequence. In
order to bound the increasing values of the state metrics,
at each stage, the minimum value of the state metrics are
subtracted out (αk = αk − min(αk(i), i = 0, 1, . . . , 7)).
After this subtraction, the minimum metric is always equal
to zero. Fig. 2.c shows the details of the computation of α1
from α0 and γ0. Let us compute γ
a
2 . From (12), we obtain
|γa2 | = min{|D2|, |D9|} = min{|24|, |15|} = 15. From (13),
we obtain sign(γa2 )) = 1. The computation for the other values
gives {γal }l=2,3,...,8 = {15,−12,−8, 15, 6, 5,−11}. Table II
TABLE I
CLASSICAL TRELLIS FOR R = 16
k 0 1 2 - 11 12 13 14 15 16
γk -14 31 24 - -8 15 12 5 -11 . . .
αk(0) 35 4 4 - 13 18 18 18 6 16
αk(1) 0 17 13 - 22 0 24 13 16 12
αk(2) 20 0 32 - 4 12 9 0 12 6
αk(3) 25 13 28 - 20 24 13 28 4 2
αk(4) 17 32 17 - 32 9 0 24 1 6
αk(5) 3 22 0 - 0 16 12 9 6 4
αk(6) 16 14 22 - 28 28 16 12 2 1
αk(7) 31 28 14 - 17 13 28 16 0 0
TABLE II
COMPRESSED TRELLIS (L-MIN ALGORITHM)
γd
l
γa
l
l 0 1 2 3 4 5 6 7 8 9
+∞ +∞ 15 -12 -8 15 6 5 -11 . . .
αl(0) 35 35 35 35 13 18 18 18 6 16
αl(1) 0 25 31 16 22 0 24 13 16 12
αl(2) 20 17 0 25 4 12 9 0 12 6
αl(3) 25 31 16 3 14 24 13 28 4 2
αl(4) 17 0 25 31 32 9 0 24 1 6
αl(5) 3 20 17 0 0 10 12 9 6 4
αl(6) 16 3 20 17 28 28 10 12 2 1
αl(7) 31 16 3 20 17 13 28 16 0 0
shows the 9 steps of the compressed trellis. As expected, the
last stage of the classical trellis (last column of Table I) and
the last stage of the compressed trellis are equal (last column
of Table II). This algorithm implies the use of L minima.
It is called the L-min algorithm. In terms of performance,
processing acquisition sequences of the sliding window algo-
rithm without or with trellis compression gives identical final
results. In the latter case, the number of clock cycles can be
significantly reduced, leading to a more efficient architecture,
as described briefly in [3]. Now the question arises whether
it is possible to further decrease the complexity of the L-min
algorithm by trading-off complexity and performance.
III. m-MINa AND m-MINg COMPUTATION
The analysis of the Max-Log-MAP algorithm for a sequence
of RFTS shows that a high magnitude of γ implies simply a
shuffling of the state metrics value α, while a low magnitude
of γ decreases significantly the dynamics of the α terms (i.e. a
low reliability bit gives more uncertainty on the current state
of the encoder). This indicates that it may be sufficient to
consider only a small subset of the initial γ values to process
the RFTS sequence. Let us define the m-mina method as an
extension of the L-min method where among the L LLRs of
the aggregated bits, the L−m highest γa modules are saturated
to sign(γ)×∞. Using this method with the example of the pre-
vious section, the 3-mina method now consists of replacing the
aggregated γa values {15,−12,−8, 15, 6, 5,−11} by keeping
only the 3 smallest magnitude values and saturating the others,
i.e., the set {+∞,−∞,−8,+∞, 6, 5,−∞}. Table III shows
the corresponding trellis. Compared to Table II, intermediate
state metrics can differ from the L-min and the 3-min methods,
but finally, both methods lead to the same final state. In the
2The MATLAB code used for this example can be downloaded at [9]
4TABLE III
COMPRESSED TRELLIS (3-MIN ALGORITHM)
γd
l
γa
l
l 0 1 2 3 4 5 6 7 8 9
γd,a
l
∞ +∞ ∞ -∞ -8 ∞ 6 5 −∞ . . .
αl(0) 35 35 35 35 16 24 24 18 6 16
αl(1) 0 25 31 16 25 0 30 13 16 12
αl(2) 20 17 0 25 0 18 15 0 12 6
αl(3) 25 31 16 3 17 30 19 28 4 2
αl(4) 17 0 25 31 35 15 0 24 1 6
αl(5) 3 20 17 0 3 16 18 9 6 4
αl(6) 16 3 20 17 31 34 16 12 2 1
αl(7) 31 16 3 20 20 19 34 16 0 0
TABLE IV
REQUIRED Eb/No TO OBTAIN A FER OF 10
−2 . HSPA TURBO-CODE OF
LENGTH K = 5144 FOR SEVERAL CODE RATES. FB =
FORWARD-BACKWARD, SW = SLIDING WINDOW, ACQUISITION WITH NII
AND W = W ′
rate W ′ = W FB FB-SW 2-ming 1-ming
r = 0.8 32 3.89 dB 3.92 dB 3.94 dB 4.05 dB
r = 0.9 64 4.87 dB 4.89 dB 4.91 dB 5.02 dB
r = 0.94 128 5.49 dB 5.51 dB 5.53 dB 5.71 dB
r = 0.98 32 6.73 dB 8.17 dB 8.18 dB 8.20 dB
64 6.73 dB 7.09 dB 7.10 dB 7.15 dB
128 6.73 dB 6.83 dB 6.84 dB 6.92 dB
256 6.73 dB 6.75 dB 6.77 dB 6.87 dB
general case, the final state metrics can differ slightly, even
using the 4-min or 5-min methods.
In practice, the only pertinent criterion to evaluate an
algorithm is the performance loss. To this end, we have run
several simulations performing all acquisitions using the m-
mina algorithm. We also tested a modified version of the m-
mina method, performing the saturation before the aggregation
of the bits. In this case, the R−m highest values of the RFTS
sequence are saturated prior to the aggregation method. This
variation of the m-mina method is called m-ming method.
Note that the m-mina and the m-ming methods lead to the
same result if m = 1. For m > 1, the two methods give the
same result only if the m minimum values before aggregation
have all distinct indices modulo L.
Table IV shows bit-true simulation results of an HSPA
Turbo decoder with K = 5144 using the sliding window
technique associated with the NII technique. For r = 0.98,
various acquisition lengths W ′ are given to illustrate the need
of high values of W ′ for very high code rate. In all cases,
the 3-min algorithm (not shown in the table) does not have
significant performance degradation. The 2-ming and 1-ming
algorithms degrade the performance by less than 0.02 dB and
0.2 dB respectively for all code rates.
IV. ARCHITECTURE OPTIMIZATION
In [3], the authors proposed to perform the bit aggregation
“on the fly” during the previous iteration. The time for process-
ing the acquisition is thus reduced and the decoding throughput
is increased. In this paper, we present new applications of
RFTS compression to reduce the power consumption of the
decoder. Let us focus on the 2-ming aggregation in an RFTS
acquisition sequence of length R. The RFTS compression
can be done on the fly in three steps. Step one: during the
Initialisation: sign(l) = +1, for l = 0, 1, . . . , L− 1;
(min1, ind1) = (+∞, 0); (min2, ind2) = (+∞, 1)
for k = 0 to R− 1 do
l = k mod L;
sign(l) = sign(l)× sign(γk)
if |γk| < min1 then
min2 = min1, min1 = |γk|
ind2 = ind1, ind1 = l
else if |γk| < min2 then
min2 = |γk|, ind2 = l
end
Algorithm 1: On the fly aggregation of the bit for the
2-ming algorithm.
first R − (R|L) − L clock cycles, the acquisition forward
unit is frozen (and thus saving energy). Step two: during the
next R|L clock cycles, permutations of state metrics are done
(processing of R|L dummy bits). Step 3: during the last L
clock cycles, the forward unit is processed with the aggregated
bit computed on the fly by algorithm 1. For R = 100 (code rate
0.98), this method allows the decoder to freeze the acquisition
forward unit 90% of the time while the low complexity bit
aggregation algorithm works. This method does not increase
the throughput but helps to save power dissipation.
V. CONCLUSION
In this paper, we showed that the compression of redun-
dancy free trellis stages using the L-min algorithm can be
further simplified by considering only 3, or even 2, minima
among the L values. Simulations on 3-min show no perfor-
mance loss and simulation on 2-min shows minor performance
loss (around 0.02 dB). The 3-min (or 2-min) algorithm opens
new architecture optimization, either to save power during the
acquisition or to increase the overall decoding throughput.
REFERENCES
[1] F. Kienle, N. Wehn, H. Meyr, “On Complexity, Energy- and
Implementation-Efficiency of Channel Decoders”, IEEE Trans. Commun.,
vol. 59, no. 12, pp. 3301-10, Dc 2011.
[2] http://www.3gpp.org/ftp/Specs/html-info/36212.htm, version 10.0.0.
[3] E. Boutillon, J.-L. Sanchez-Rojas, C. Marchand , “Compression of
redundancy free trellis stages in Turbo-Decoder”, Electronics Letters vol.
49, no. 7, pp. 460-462, Feb. 2013.
[4] L.Bahl, J.Cocke, F.Jelinek, and J.Raviv, “Optimal decoding of linear codes
for minimizing symbol error rate”, IEEE Trans. Inf. Theory, vol. IT-20(2),
pp. 284-287, March 1974.
[5] E. Boutillon, C. Douillard, G. Montorsi, “Iterative decoding of concate-
nated convolutional codes: Implementation Issues”, Proceedings of the
IEEE, vol. 95, no.6, June 2007.
[6] A.J. Viterbi. “An intuitive justification and a simplified implementation of
the MAP decoder for convolutional codes”, IEEE J. Sel. Areas Commun.,
vol.16, pp. 260264, Feb. 1998.
[7] J. Dielissen, and J. Huisken, “State vector reduction for initialization of
sliding window MAP”, in Proc. 2nd Int. Symp. Turbo Codes, pp. 387-390,
Sept. 2000.
[8] C. Weiss, C. Bettstetter, S. Riedel, D.J. Costello “Turbo decoding with
tail-biting trellises”, International Symposium on Signal, System and
Electronics, pp. 343-348, Pisa, Italy, Sept. 1998.
[9] E. Boutillon, April 2014, http://www-labsticc.univ-ubs.fr/∼boutillon/tc
agglo/tc agglo.html.
