Compression of redundancy free trellis stages in Turbo-Decoder by Boutillon, Emmanuel et al.
Compression of redundancy free trellis stages in
Turbo-Decoder
Emmanuel Boutillon, Jose´-Luis Sanchez-Rojas, Ce´dric Marchand
To cite this version:
Emmanuel Boutillon, Jose´-Luis Sanchez-Rojas, Ce´dric Marchand. Compression of redundancy
free trellis stages in Turbo-Decoder. Electronics Letters, IET, 2013, 49 (7), pp.460 - 462.
<hal-00825275>
HAL Id: hal-00825275
https://hal.archives-ouvertes.fr/hal-00825275
Submitted on 23 May 2013
HAL is a multi-disciplinary open access
archive for the deposit and dissemination of sci-
entific research documents, whether they are pub-
lished or not. The documents may come from
teaching and research institutions in France or
abroad, or from public or private research centers.
L’archive ouverte pluridisciplinaire HAL, est
destine´e au de´poˆt et a` la diffusion de documents
scientifiques de niveau recherche, publie´s ou non,
e´manant des e´tablissements d’enseignement et de
recherche franc¸ais ou e´trangers, des laboratoires
publics ou prive´s.
Compression of redundancy free trellis
stages in Turbo-Decoder
E. Boutillon, J. Sánchez-Rojas and C. Marchand
For turbo code with coding rate close to one, the high puncturing rate
induces long sequences of trellis without redundancy bit. A simplification
technique to compute the final state of a sequence of Redundancy Free
Trellis Stage (RFTS) is presented. It compresses a sequence of RFTS of
length N into a sequence of RFTS of length m − 1 + (N mod (m −
1)), where m is the number of states of the trellis. The computation is
reduced accordingly.
Introduction: Turbo codes with coding rate close to one are specified in the
LTE (up to 0.95) and HSPA [1] (up to 0.98) standards, leading to highly
punctured turbo codes. For such high rates, the trellis of each convolutional
code can be viewed as long sequences of Redundancy Free Trellis Stage
(RFTS), separated by single trellis stages with a redundancy bit, as shown
in Fig. ??. For example, lengths of RFTS sequence are N = 101 bits or
N = 102 bits for the code rate 0.98 in HSPA. With such puncturing, the
conventional sliding window implementation of the Forward-Backward
algorithm [2] becomes inefficient. In fact, the convergence length L
required to estimate accurately the initial state metric values of the
window borders becomes large (L should be large enough to contain few
redundancy bits). Large value of L impacts the hardware efficiency of the
decoder. In [3] and related references, architectures with values of L up to
128 are reported for the LTE standard. In this letter, we present a method
Fig. 1. Model of trellis with and without redundancy sections
that reduces the time of the convergence process of an m-state trellis code
by reducing every sequence of RFTS of lengthN to a sequence of RFTS of
length m′ + φm′ (N), where m′ =m− 1 and φm′ (N) =N mod m′.
The proposed method can be viewed as a generalization of the method
proposed in [4], replacing hard information bit by soft information bit.
Compression with hard information bits: Let us first focus our attention on
trellis compression in the case of hard information bits. For a convolutional
encoder of memory m, the state-space representation of [5] gives:
Xk+1 = [A] · Xk + [B] ·Dk (1a)
Vk = [C] · Xk + [D] ·Dk (1b)
where Xk, Vk and Dk are respectively the state of the encoder, the coded
vector and the input information bit at time k, the summation are made
on GF(2). Moreover, for a recursive code, the matrix [A] verifies [A]m′ =
Id, where Id is the identity matrix. Starting from a state X0 and the bit
sequence Dk, k = 0..N − 1, the final state of the encoder is given by:
XN = [A]N · X0 +
N−1∑
k=0
[A]k · [B] ·D(N−1)−k (2)
Since [A]m′ = Id, then, for any k, [A]k = [A]φm′ (k). Equation (2) can be
simplified by regrouping (or agglomerating) the k values of same modulo
m′. Let us assume first that φm′ (N) = 0. In that case, equation (2) can be
rewritten as:
XN = X0 +
m′−1∑
k=0
[A]k · [B] ·Da(m′−1)−k (3)
where Dal is the lth “agglomerated” bit defined as:
Dal =
N/m′−1∑
j=0
Dm′·j+l, l= 0..m
′ − 1 (4)
When φm′ (N) 6= 0, m′ − φm′ (N) dummy bits equal to 0 are added to
the N bits of the original sequence to obtain an extended sequence of
length Ne. Since the extra dummy bits are all equal to zero, (2) gives
XNe = [A]m
′
−φm′ (N) · XN . Multiplying XNe by [A]φm′ (N) we obtain:
[A]φm′ (N) · XNe = [A]
m′ · XN = XN (5)
Since φm′ (Ne) = 0, the agglomeration method (3) can be applied to
compute XNe inm′ trellis stages. Finally, φm′ (N) extra trellis stages with
input bits equal to 0 are required to process XN (computation of equation
(5)). To summarize, every trellis section of length N can be compressed
in a trellis section of length m′ + φm′ (N) thanks to the “agglomeration”
modulo m′ of the input bits. One can consider the state of the trellis (one
value among m possible) and the input bit (0 or 1) as Dirac distributions.
Thus the question arises whether the same method can also be utilized for
the non-Dirac distribution representing state metrics and branch metrics in
the forward-backward decoding algorithm.
Compression with soft information bits: In this section, we replace the
GF(2) summation of bits in equation (4) by the convolution of their
associated probability distribution. For the sake of clarity, let us restrict
to an m= 4 states recursive convolutional encoder defined by the state
equation (6) and let us consider the processing of the forward recursion
on a sequence of RFTS of length N = 4, starting from a state metric
α0(i), i= 0..3, whereαk(i) represents the probability of the encoder to be
at state i at time k. Since there is no redundancy, the branch metrics are just
given by the information bit (pk, qk), where pk = P (Dk = 0) and qk =
P (Dk = 1) at time k, computed from both a-priori and channel values.
Figure ?? shows the 4 sections of the trellis. The branches represented
by filled lines (respectively dotted lines) are associated to an input bit 0
(respectively 1). On the graph, we show also in bold lines the 4 paths
between state 0 at time k = 0 and time k = 4.
Xk+1 =
[
1 1
1 0
]
· Xk +
[
1
0
]
·Dt (6)
From this graph, it is easy to derive α4(0) from its initial state metric α0
and the a-priori bits (pk, qk)k=0..3 as:
α4(0) = α0(0) · (p0p1p2p3 + p0q1q2q3 + q0p1p2q3 + q0q1q2p3)
+ α0(1) · (p0p1p2q3 + p0q1q2p3 + q0p1p2p3 + q0q1q2q3)
+ α0(2) · (p0p1q2p3 + p0q1p2q3 + q0p1q2q3 + q0q1p2p3)
+ α0(3) · (p0p1q2q3 + p0q1p2p3 + q0p1q2p3 + q0q1p2q3)
(7)
Factorizing (7), we obtain:
α4(0) = α0(0) · (p1p2 (p0p3 + q0q3) + q1q2 (p0q3 + q0p3))
+ α0(1) · (p1p2 (p0p3 + q0q3) + q1q2 (p0q3 + q0p3))
+ α0(2) · (p1q2 (p0p3 + q0q3) + q1p2 (p0q3 + q0p3))
+ α0(3) · (p1q2 (p0p3 + q0q3) + q1p2 (p0q3 + q0p3)) (8)
According to (4), Da0 =D0 +D3. Therefore, the probability density
function (pa0 , qa0 ) of Da0 is equal to (p0p3 + q0q3, p0q3 + p3q0) (soft
output of a parity constraint). Thus, we can reduce the N = 4 trellis
stages into m′ = 3 trellis stages, by replacing (p0p3 + q0q3) with pa0
and (p0q3 + p3q0) with qa0 in the first trellis section, as shown in figure
??. This factorization is also explicitly given in (8). Since φ3(4) = 1, an
extra trellis section of length 1 is required to re-order the α4 vector using
only the 0-branches. This method can be generalized for any size of RFT
sections and values of m. In particular, it could be applied for the 8-state
encoder of the LTE and HSPA standards.
Implementation issues: In the above section, we have derived the
"agglomeration" method in the probability domain. It can also be
implemented in the logarithm domain using either the log-map or the max-
log-map algorithms [2]. In the latter case, the agglomerated bit amplitude
and sign are given by:
|LLR(Dal )|=min{|LLR(Dm′·j+l)|, j = 0..Ne/m
′ − 1} (9)
sign(LLR(Dal )) =
Ne/m
′
−1∏
j=0
sign(LLR(Dm′·j+l)) (10)
ELECTRONICS LETTERS 20th December 2012 Vol. 00 No. 00
Fig. 2. Example of the agglomeration method for a 4-state trellis
5 5.5 6 6.5 7 7.5 8 8.5 9
10−4
10−3
10−2
10−1
100
Eb/No (in dB)
FE
R
 
 
Reference
L = 128, W = 128
L = 800, W = 128
2 dB
Fig. 3 Simulation results for HSDPA, K=5114, R=0.98 with 4 decoding
iterations
where LLR(Dk) is the Log-Likelihood Ratio of bit Dk defined as
LLR(Dk) = ln(pk/qk). In the context of a turbo code, the computation
of the agglomerated bits can be done on the fly during the generation of
the extrinsic information of the previous iteration. Using these properties
and without any change in the trellis structure, the convergence of the
forward (or backward) algorithm can be significantly accelerated. For
example, for a (m= 8)-state turbo code, a window of length L= 64 is
processed in 64 cycles using conventional methods (one trellis stage per
clock cycle). If the window does not contain any redundancy bit, trellis
agglomeration allows to process it in 7 + φ7(64) = 8 clock cycles. If the
window contains a single section with a redundancy bit in position u
(with 7 < u < 56), then the number of clock cycles needed to process the
window is 7 + φ7(u− 1) to obtain the state vector αu−1, plus one trellis
section for obtaining αu (trellis section with redundancy), and finally,
7 + φ7(64− u) clock cycles for obtaining α64. The total number of clock
cycles is thus equal to 15 + φ7(u− 1) + φ7(64− u) = 22. One should
note that the last φm′ (N) clock cycles are used only for shuffling on the
state metric: with extra hardware (muxes), this operation can be done in
one clock cycle. In that case, the 22 clock cycles reduce to 15 + 2 = 17
clock cycles. Figure ?? shows the performance of a rate 0.98 HSPA turbo
code of payload K = 5114, with 4 decoding iterations (for high coding
rate, 4 decoding iterations is almost optimal). The curve reference is
obtained performing exactly the forward-backward algorithm. The curves
L= 128, W = 128 and L= 800, W = 128 are classical sliding windows
implementation with a convergence length L and windows size W . With
trellis compression, in L= 128 clock cycles, a convergence length of
size L= 800 can be processed (length of RFTS sequence are N = 101
or 102 for this code rate). As shown in figure ??, trellis compression
gives optimal performance while the classical windows implementation
degrades significantly the performances (2 dB for a Frame Error Rate
(FER) of 10−2).
Conclusion: In this paper, we show that for a m-state convolutional
decoder, a sequence of RFTS of length N can be reduced to a sequence
of RFTS of length m′ + φm′ (N) steps thanks to the bit agglomeration
method (m′ + 1 if extra muxes are used to perform the final shuffle).
This method opens a new efficient way to perform sliding window based
algorithms for high rate turbo codes, and it should have an impact on future
architecture developments.
Acknowledgment: This work has been supported by the GIGADEC
project from Région Bretagne, as well as the CPER projet PALMYRE
II (Région Bretagne and FEDER funding).
E. Boutillon and C. Marchand (Lab-STICC, UMR 6285, Université de
Bretagne Sud, FRANCE),
J. Sánchez-Rojas (INICTEL-UNI, PERU).
E-mail: emmanuel.boutillon@univ-ubs.fr
References
1 http://www.3gpp.org/ftp/Specs/html-info/36212.htm, version 10.0.0.
2 E. Boutillon, C. Douillard, G. Montorsi, ‘Iterative Decoding of
Concatenated Convolutional Codes: Implementation Issues’, Transactions
of the IEEE, vol. 95, no.6, June 2007.
3 M. May, T. Ilnseher, N. Wehn, W. Raab, ‘A 150 Mbit/s 3GPP LTE Turbo
Code Decoder’, Design, Automation & Test in Europe Conference &
exhibition (DATE), pp.1420-1425, Dresden, Germany, March 2010.
4 A.S. Barbulescu, S.S. Pietrobon, ‘Terminating the trellis of turbo codes in
the same state’, Electronics Letters , vol.31, no.1, pp.22-23, Jan. 1995.
5 C. Weiss, C. Bettstetter, S. Riedel, D.J. Costello ‘Turbo decoding with
tail-biting trellises’, International Symposium on Signal, System and
Electronics, pp.343-348, Pisa, Italy, 1998.
2
