Trellises and Trellis-Based Decoding Algorithms for Linear Block Codes by Lin, Shu
TRELLISES AND TRELLIS-BASED
DECODING ALGORITHMS FOR
LINEAR BLOCK CODES
Technical Report
to
NASA
Goddard Space Flight Center
Greenbelt, Maryland 20771
Grant Numbers: NAG 5-931
NAG 5-2938
Report Number: 98-003
Principal Investigator: Shu Lin
Department of Electrical Engineering
University of Hawaii at Manoa
2540 Dole Street, Holmes Hall 483
Honolulu, Hawaii 96822
April 20, 1998
https://ntrs.nasa.gov/search.jsp?R=19990009597 2020-06-15T22:04:42+00:00Z

TRELLISES AND TRELLIS-BASED
_ECODING ALGORITHMS FOR
LINEAR BLOCK CODES
Part 3
Shu Lin and Marc Fossorier
April 20, 1998

/,,, ../- '.if;_.
iv
10 THE VITERBI AND DIFFERENTIAL
TRELLIS DECODING ALGORITHMS
Decoding algorithms based on the trellis representation of a code (block or con-
volutional) drastically reduce decoding complexity. The best known and most
commonly used trellis-based decoding algorithm is the Viterbi algorithm [23,
79, 105]. It is a maximum likelihood decodia_g algorithm. Convolutional codes
with the Viterbi decoding have been widely used for error control in digital
communications over the last two decades. This chapter is concerned with
the application of the Viterbi decoding algorithm to linear block codes. First,
the Viterbi algorithm is presented. Then, optimum sectionalization of a trellis
to minimize the computational complexity of a Viterbi decoder is discussed
and an algorithm is presented. Some design issues for IC (integrated circuit)
implementation of a Viterbi decoder are considered and discussed. Finally, a
new decoding algorithm based on the principle of compare-select-add is pre-
sented. This new algorithm can be applied to both block and convolutional
codes and is more efficient than the conventional Viterbi algorithm based on
the add-compare-select principle. This algorithm is particularly efficient for
175
176 TRELLISESANDTRELLIS-BASEDD CODINGALGORITHMSFORLINEARBLOCKCODES
rate-1/nantipodalconvolutionalcodesandtheirhigh-ratepuncturedcodes.
It reduces computational complexity by one-third compared with the Viterbi
algorithm.
10.1 THEVITERBI DECODING ALGORITHM
The Viterbi algorithm is based on the simple idea that among the paths merging
into a state in the code trellis, only the most probable path needs to be saved
for future processing and all the other paths can be eliminated without affecting
decoding optimality. This elimination of the less probable paths from further
consideration drastically reduces decoding complexity. The path being saved is
called the survivor. Therefore, there is a survivor at each state in the trellis
at every level. The survivors at each level of the code trellis are extended to
the next level through the composite branches between the two levels. The
paths that merge into a state at the next level are then compared and the most
probable path is selected as the survivor. This process continues until the end
of the trellis is reached. At the end of the trellis, there is only one state, the
final state or/, and there is only one survivor, which is the most likely codeword.
Decoding is then completed.
Viterbi decoding of a linear block code based on a sectionalized trellis dia-
gram T({ho, hi ..... hL}), with section boundary locations 0 = ho < hi < ... <
h6 = N, is carried out serially, section by section, from the initial state a0 to
the final state a I. Suppose the decoder has processed j trellis sections up to
time-hi. There are IEh_(C)I survivors, one for each state in Ebb(C). These
survivors together with their path (or state) metrics are stored in memory.
To process the (j + 1)-th section, the decoder executes the following steps:
(l) Each survivor is extended through the composite branches diverging
from it to the next state level at time-hi+l.
(2) For each composite branch into a state in Ehi.,(C), find the single
branch with the largest (correlation) metric. The metric computed is
the branch metric.
(3) Replace each composite branch by the branch with the largest metric.
(4)
THE VITERBI AND DIFFERENTIAL TRELLIS DECODING ALGORITHMS 177
Add the metric of a branch to the metric of the survivor from which the
branch diverges. For each state a in Ehj+t (C), compare the metrics of
the paths converging into it and select the path with the largest path
metric as the survivor terminating at state a. This step is called the
add-compare-select (ACS) procedure in the Viterbi algorithm.
The decoder executes the above steps repeatedly,
reaches the final state aS. At this point, there is
which is the decoded codeword and the most likely
bits corresponding to this decoded codeword are
The decoding window is simply the code length.
section by section, until it
one and only one survivor,
codeword. The information
then delivered to the user.
Using the above decoding algorithm, the total number of operations (addi-
tions and comparisons) can be computed easily. This number can be reduced
significantly if sectionalization of a trellis is done properly [60]. This will be
discussed in the next section.
10.2 OPTIMUM SECTIONALIZATION OF A CODE TRELLIS:
LAFOURCADE-VARDY ALGORITHM
In decoding a block code with the Viterbi algorithm, the total number of com-
putations depends on the sectionalization of the trellis diagram for the code. A
sectionalization of a code trellis for a code C that gives the smallest total num-
ber of computations is called an optimum sectionalization for C. An optimum
sectionalization is not necessarily unique. In the following, an algorithm for
finding an optimum sectionalization is presented. This algorithm was devised
by Lafourcade and Vardy [60 I.
The Lafourcade-Vardy (LV) algorithm is based on the following simple fact:
(F) For any integers x and y with 0 <_ x < y <_ N, a section from
time-x to time-y in any sectionalized trellis T(U) with z,y E U and
z % 1,z + 2,...,y- 1 ¢ U is identical.
Let _(x,y) denote the number of computations required in steps (1) to (4) of
the Viterbi algorithm to process the trellis section from time-x to time-y in any
sectionalized code trellis T(U) with z, y E U and z + 1, z + 2 .... y - 1 ¢ U.
It follows from the above simple fact (F) that _o(z, y) is determined only by z
and y. Let qo,,i,,(z, y) denote the smallest number of computations of steps (1)
178 TRELLISES AND TRELLIS-BASED DECODING ALGORITHbIS FOR LINEAR BLOCK CODES
Table 10.1. Opt;mum sect;onalizat;ons.
Code Optimum Sectionalization U Complexity Complexity
_mi°(0, N) N-section
RMI,6 {0, 4, 8, 12, 16, 24, 32, 40, 48, 806 2,825
52, 56, 60, 63, 64}
RM2,e {0,8,16,32,48,56,61,63,64} 101,786 425,209
RMa,6 {0, 8, 16, 24, 32, 40, 48, 56, 64} 538,799 773,881
to (4) to process the trellissection(s) from time-z to time-y in any sectionalized
code trellisT(U) with z,y E U. The value, _,ni,(0,N), gives the total number
of computations of the Viterbi algorithm for the code trelliswith an optimum
sectionalization. Then, it follows from (F) and the definitions of _(x,y) and
qOmin(X,y) that
qOmin(O,y) = { min {_o(0,y),_o(O,1), o<.<ymin{_o,.i.(O, x)+ _o(z, y)}}, forlfory <y<_N,=1.
(10.1)
We can compute _min(0, Y) for every y with 0 < y < N efficientlyin the
following way: The values of _o(x,y) for 0 < x < y < N are computed using
the structure of the trellissection from time-z to time-y. First, the value of
_o,,,ia(O, 1) is computed. For an integer y with 1 < y _ N, to,,,in (O, y) can
be computed from _O,nln(O,x) and _(x,y) with 0 < x < y. By storing the
information when the minimum value occurs in the right-hand side of (10.1),
an optimum sectionalization is found from the computation of _o,.i.(O, N).
Example 10.1 Table 10.1 gives the optimum sectionalizations for three RM
codes of length 64 using the LV algorithm.
AA
THE VITERBI AND DIFFERENTIAL TRELLIS DECODING ALGORITHMS 179
10.3 SOME DESIGN ISSUES FOR IC IMPLEMENTATION OF
VlTERBI DECODERS FOR LINEAR BLOCK CODES
Theoretically, any linear block code can be decoded by applying the Viterbi
algorithm to a trellis for the code. However, practical limitations preclude
the application of this algorithm to many good codes with long block lengths.
The main reasons are the increases in state complexity, state connectivity, and
branch complexity of the trellises for good block codes as the length of the
codes increases. Much of the research on maximum likelihood decoding of lin-
ear block codes with the Viterbi algorithm over a code trellis has focussed on
the minimization of the number of computations required for decoding a re-
ceived sequence. If the actual decoding is intended to be performed using a
stored program approach (a software implementation) that executes the oper-
ations needed to decode a received sequence sequentially, then this approach
will lead to the fastest decoding speed. However, if an IC (hardware) imple-
mentation is intended, then many other factors besides the number of decoding
computations must be considered. We must consider the factors that affect the
circuit requirements, wire-routing within an IC chip, chip size, circuit utiliza-
tion, power consumption, ACS computation speed, and other implementation
issues. As a result, an alternate approach that is more suitable for IC imple-
mentation is desired.
For IC implementation of a Viterbi decoder for a linear block code, besides
the state and branch complexities, other important trellis structural properties
that should be included in the design considerations are state connectivity,
the parallel structure, regularity, and symmetry. Proper use of these structural
properties may result in a simpler decoding circuit and a higher decoding speed.
Optimum sectionalization in terms of minimizing the computational com-
plexity, in general, results in a non-uniformly sectionalized trellis diagram. In
a Viterbi decoder, quantities such as the branch labels, survivor path metrics,
and survivor path labels generally reside in word registers, which are basically
an ordered sequence of bit registers. The same hardware is used to process all
trellis sections. If a register must store a particular variable, such as a branch
metric or a state metric, it must be designed to accommodate the largest value
of the variable over all trellis sections. Since the section lengths for a non-
uniformly sectionalized trellis vary from one section to another, the registers
180 TRELLISESANDTRELLIS-BASEDD CODINGALGORITHMSFORLINEARBLOCKCODES
involved must be designed based on the longest section. This may increase the
relative complexity of an IC Viterbi decoder. Therefore, for IC implementation
of a Viterbi decoder, a uniformly sectionalized trellis is more desirable.
Although a minimal trellis reduces the state and branch complexities, the
states are densely connected. For long codes, this dense connection between
the states causes serious wire-routing (interconnection) problems within an IC
chip for hardware implementation of a Viterbi decoder and requires a large area
of the chip (or a multilayer chip) to accommodate the decoding circuit. Fur-
thermore, interconnections increase internal communications between various
parts of the decoding circuit, which slow down the decoding speed and increase
power consumption. Let PmAx(C) be the maximum state space dimension of a
minimal trellis for a code C. Then the number of registers required to store the
survivor paths and their metrics must be 2p=-,{c). If a separate ACS circuit
is required for processing each state at each trellis level, then 2p-"=(c) ACS
circuits are needed. If the differences between pm,x(C') and the state space
dimensions at many section boundary locations are large, then many of the
registers and ACS circuits are not used during the decoding process. This re-
sults in poor hardware utilization efficiency. All the above problems may be
solved or partially solved by using a non-minimal trellis with a proper parallel
decomposition, as discussed in Chapter 7. Regularity among the trellis sections
also helps to overcome the above problems and reduces decoding complexity.
Symmetry structure, such as mirror symmetry, allows bidirectional decoding,
which speeds up the decoding process. Therefore, for hardware implementation
of a Viterbi decoder for a linear block code, a non-minimal trellis may result
in a simpler and faster decoding circuit with a higher hardware utilization effi-
ciency. In design, both minimal and non-minimal trellises should be considered
and the one that results in a simpler circuit and a higher decoding speed should
be used.
In the following, we examine some key factors that affect the decoding com-
plexity and speed of a Viterbi decoder based on a minimal or non-minimal trel-
lis. The non-minimal trellis structure presented in Section 7.1 reduces internal
communications and allows independent parallel processing of the subtreUises,
while decreasing the complexity of an IC Viterbi decoder. It has significant
advantages over the minimal trellis for IC implementation of a Viterbi decoder.
THE VITERBI AND DIFFERENTIAL TRELLIS DECODING ALGORITHMS 181
10.3.1 Hardware Utilization Efficiency and Effective Computational
Complexity
Consider an IC Viterbidecoder based on an L-sectiontrellisfora linear(N, K)
block code C with sectionboundary locationsat h0 = 0,hi,h2,...,hL = N.
While many VLSI structureshave been describedfor a Viterbi decoder [10,
38, I00],the most widely implemented structureisbased on the ACS-array
architecture,wherein each abstract state in the trellismanifests itselfas a
physical ACS circuiton the IC, and the same ACS circuitsare repeatedly
used for all levelsin the trellis.The ACS circuitscan be labeledACS-I for
I _<l _< 2pL....(c),where pL.m_x(C) isthe maximum statespace dimension
ofthe L-sectionminimal trellisfor C. We assume that PL.max(C) isfixedno
matter whether a minimal trellisor a non-minimal trellisisused in the decoder
design.
The ACS circuitswork as follows.At time-0,the metricsofthe ACS circuits
corresponding to the originatingstatesofeach parallelsubtrellisare initialized
to 0. At time-hi, the ACS-/ corresponding to state cr(l) E _,_(C) at the
end of section-1 of the trellis, for 1 < 1 < ]Eh,(C)l , has the metric of state
cr¢0. The index of the surviving branch into state o"l_) is stored in ACS-/.
Continuing in this way, at time-h_, for 1 < i < L, ACS-/corresponding to state
a (l) E Eh, (C) will have the metric for a (0 and a sequence of i survivor branch
indices corresponding to the most likely path from the initial state to cr(0.
Whenever the decoder is processing the trellis at a level at which the size
of the state space is smaller than 2pL.... {c) a number of ACS circuits will be
idle. If the number of inactive ACS circuits is large and occurs often during
the. decoding process, the hardware utilization efficiency becomes poor. For
example, consider the minimal 8-section trellis for the (64, 42) RM code, RM3,s.
This trellis has a state space dimension profile (0, 7, 10, 13, 10, 13, 10, 7, 0) with
P8..... x(C) = 13. For a Viterbi decoder designed based on this trellis, at time-
ht and -hT, there are 2 t3 - 27 = 8,064 inactive ACS circuits. At time-h2, -h4
and -he, there are 213 - 2 l° = 7,168 inactive ACS circuits. Only at time-ha
and -hs, all the ACS circuits axe active. We see that the hardware utilization
efficiency is very poor for a Viterbi decoder for the RM3,6 code based on the
minimal 8-section trellis using the ACS-array architecture.
182 TRELLISESAND TRELLIS-BASED DECODING ALGORITHMSFOR LINEARBLOCK CODES
Hardware utilization efficiency can be improved by a proper parallel decom-
position of a minimal trellis into parallel isomorphic subtrellises. The decom-
position results in a non-minimal trellis with the same maximum state space
dimension pZ,,max(C). Therefore, the number of ACS circuits in the ACS-array
is still the same, but the number of active ACS circuits is increased at many, if
not all, section boundary locations. We illustrate this with an example. Using
the method presented in Section 7.1, the minimal 8-section trellis for the (64, 42)
RM code, RM3,6, can be decomposed into a non-minimal 8-section trellis with
128 parallel isomorphic subtrellises, each having a state space dimension pro-
file (0, 6, 6, 6, 3, 6, 6, 6, 0). Therefore, the state space dimension profile for the
overall trellis is (0, 13, 13, 13, 10, 13, 13, 13, 0). We see that the maximum state
space dimension is still Ps.max(C) = 13. However, for a decoder based on this
non-minimal trellis, all 8,192 ACS circuits are active all the time, except at
time-h4. This greatly improves the hardware utilization efficiency.
For a trellis (minimal or non-minimal) that consists of parallel subtrellises, all
the subtrellis decoders operate independently in parallel without communica-
tion between them. From the standpoint of speed, the effective computational
complexity of decoding a received sequence is defined as the computational
complexity of a single parallel subtrellis (viz. the minimal trellis for a sub-
code C _) plus the cost of the final comparison among the survivors presented
by each of the subtrellis decoders. The time required for final comparison is
generally small relative to the time required for processing a subtrellis and this
comparison can be pipelined. Since all the subtrellises are processed in parallel,
the speed of decoding is therefore limited only by the time required to process
one subtrellis. If a minimal trellis does not have enough parallel structure and
decoding speed is critical, parallel decomposition can be used to reduce the
effective computational complexity and thus to gain speed.
10.3.2 Complexity of the ACS Circuit
The converging branch dimension profile (CBDP), (_1,$2,...,$L), defined in
Section 6.2 also affects decoding speed and complexity. Each component _i
is the base-2 logarithm of the number of composite branches converging into
a state at a particular level of the trellis. The number 2_' is called a radix
number in the IC literature. At level-/of the trellis, each ACS circuit has to
THE VITERBI AND DIFFERENTIAL TRELLIS DECODING ALGORITHMS 183
perform _i stages of a tree-type two-way comparison to find the best incoming
branch. Hence, a reduction in the values of the components in the CBDP of
a trellis will improve decoding speed and reduce the complexity of each ACS
circuit. If the radix numbers in a minimal trellis are too large, then parallel
decomposition can be used to achieve smaller radix numbers, and hence to
reduce the complexity of each ACS circuit and to increase the decoding speed.
10.3.3 Tracebac__ Complexity
Even though the branch and state metrics are computed and updated in the
Viterbi decoder at every level of the trellis,the best (or most likely) path
through the trellismust be determined. The process of determining the best
path is called traceback in the literature.
Recall that the number of parallel branches in a composite branch in the
i-th section of an L-section trellisfor C with section boundary locations in
{ho, hi .... , hn} is
I '_ I 2k'c_.. _,Ch,_l,h * = -- , .
For a Viterbi decoder based on the minimal trellis, the ACS-[ corresponding to
state a Ct) E ]gh,(C) for 1 < i < L must store 6_(C)+ k(Ch._,.h,) bits in order
to identify which of the 2 a'Cc} composite branches converging into state a (l) is
chosen and which of the 2 _{c_,-''h,) parallel branches survives. Therefore, each
ACS-I needs to store
L
_(6i(C) + k(C,,,_,,h,)) = K (10.2)
i=!
bits in order to identify the sequence of survived incoming branches and to
determine the decoded path. If this number is too large, parallel decomposi-
tion can be used to reduce it. Consider a non-minimal trellis with 2 '1 parallel
subtrellises obtained by parallel decomposition of the minimal L-section trellis
for C based on a subcode C'. If a Viterbi decoder is designed based on this
non-minimal trellis, then the number of bits that must be stored for each ACS-I
is
t.
__,(6,(C') + k(C;,,_,.h,)) = dim(C'). (10.3)
i=l
184 TRELLISESAND TRELLIS-BASED DECODING ALGORITHMS FOR LINEAR BLOCK CODES
Since dim(C _) = K- q < K, the ACS circuits based on this non-minimal trellis
design require less storage than for the design based on the minimal trellis. The
total savings in storage in all the 2pL.... (c) ACS circuits are
2.L .... (c) (K - dim(C')). (10.4)
This is a significant savings.
10.3.4 ACS-Connectivity
The hardware implementation of a Viterbi decoder is severely affected by the
physical placement of the ACS circuits and the need to route information be-
tween them. The routing complexity should be minimized in a Viterbi decoder
IC design in order to reduce the size of the IC chip.
The basic operations performed by an ACS circuit are: addition of the branch
metrics of the incoming branches to the state metrics of the corresponding orig-
inating states, comparison of the resulting sums to find the best one, selection
of the surviving sum as the new state metric and the corresponding surviving
branch label. The ACS-array architecture is usually dominated by the area
required by the interconnections to transfer the state metrics. For a state
o"It} E Eh,(C) with 1 < l < IEh,(C)[ and 0 < i < L, let Qi(a (l)) denote the set
of states in Eh,+l(C) that are adjacent to a (ll. For I > IEh,(C)I, Qi(a (t}) = 0.
Then in the ACS-array implementation of a Viterbi decoder, paths to transfer
the state metrics exist between ACS-I and all the ACS circuits that correspond
to the states in
Qo(a "1) v Q_(a(t)) U... U QCL-_I (_"))' (10.5)
The above set, denoted _ti), defines the connectivity of ACS-I in the ACS-array
corresponding to state a (_). We call 10('_1and _(t} & log z 101,11the connectivity
and connectivity dimension of the ACS-/, respectively. The connectivities of
ACS circuits determine the areas on an IC chip needed for wiring [10, 38]. This
area should be kept as small as possible.
The ACS-connectivity can be reduced by using a non-minimal trellis with a
proper number of parallel isomorphic subtrellises. With such a trellis, the ACS
circuits can be divided into blocks such that the ACS circuits corresponding to
states in a single subtrellis form a block. A particular ACS circuit only needs
THEVITERBI AND DIFFERENTIAL TRELLIS DECODING ALGORITHMS 185
to transferitsmetric to a subset of ACS circuitswithin itsown block. This
greatlyreduces the ACS-connectivity and hence the hardware complexity and
the wiringarea on the IC chip.
10.3.5 Branch Complexity
The decoding speed of a Viterbi decoder depends on the total number of
branches in the trellisto be processed and how fastthey are being processed.
Ifthe processingload isshared by many ACS circuitsatany time instant,then
each ACS circuitwillcarry a small amount of processingload. This willspeed
up the decoding process.Therefore,a more meaningfulmeasure ofbranch com-
plexityisthe number ofbranches to be processed by an ACS circuit[73,101].
As pointed out earlierin thissection,the number ofactiveACS circuitscan
be increasedby paralleldecomposition of a minimal trellis.However, parallel
decomposition, in general,resultsin an increasein the number of composite
branches ina trellissection.Ifthe rate ofincreaseofactiveACS circuitsislarger
than the increaserate ofcomposite branches, then the number of branches to
be processed by each ACS circuitwilldecrease. The processing load of an
ACS circuitat time-h/ is determined by the number of composite branches
divergingfrom itscorresponding statein Eh,(C). Therefore the totalnumber
of branches to be processed by an ACS circuitisdetermined by the diverging
branch dimension profile(DBDP) of the trellisbeing used inthe design.
Consider the minimal 8-sectiontrellisof the (64,42) RM code, RMs,s. The
statespace dimension profileof thiscode is(0,7,I0,t3,10,13,10,7,0) and its
DBDP is (7,6,6,3,6,3,3,0). Consider section-2of the trellis.The number
of composite branches in this sectionis213. However, the number of active
ACS circuitscorresponding to the statesof the trellisat the end of section-1
is2T = 128. Since each state has 64 composite branches diverging from it,
each activeACS circuitmust process 64 composite branches. Now consider
the paralleldecomposition of thisminimal trellisinto128 parallelisomorphic
subtrellises.The resultantnon-minimal trellishas a statespace dimension
profile(0,13,13,13,10,13,13,13,0) and each subtrellishas a state space di-
mension profile(0,6,6,6,3,6,6,6,0). The DBDP of thisnon-minimal trellis
is(13,3,3,3,6,3,3,0). All the components of thisDBDP, except for the first
one,are smallerthan (orequal to) the correspondingcomponents ofthe DBDP
186 TRELLISES AND TRELLIS-BASED DECODING ALGORITHMS FOR LINEAR BLOCK CODES
for the minimal 8-section trellisof this code. Consider section-2 of this non-
minimal trellis.The total number of composite branches is now 2le, a large
increase from 213 for the minimal trellis.However, all the 2I_ ACS circuits
at time-hl are active and they share the processing load. Each ACS circuit
processes only 8 composite branches, compared to 64 for the minimal trellis.
It is the same at the other time instants, except at time-h4, where each active
ACS circuit needs to process 64 branches, the same as for the minimal trellis.
Therefore, the number of operations performed per ACS circuit is smaller for
a Viterbi decoder designed based on the above non-minimal trellis.Reducing
the diverging branch profilealso results in a reduction of ACS-connectivity and
hence a reduction in implementation complexity and wiring area on an IC chip.
Based on the above analysis and discussions, we may conclude that in design-
ing a hardware Viterbi decoder for a specificlinear block code, ifthe minimal
trellisfor the code is not desirable, then a non-minimal trelliswith proper
structural properties should be considered.
10.4 DIFFERENTIAL TRELLIS DECODING
The Viterbi algorithm was firstdevised for decoding convolutional codes. This
decoding algorithm is based on the simple principle of add-compare-select
(ACS) to process the code trellisand eliminate the less probable paths at each
trellislevel. This simple ACS principle has been used for implementing Viterbi
decoders over the last two decades. However, a trellis-baseddecoding algo-
rithm for convolutional codes can be devised based on a differentprocessing
principle, namely compare-select-add (CSA). This decoding algorithm is
devised based on a specificpartition of a trellisection and the CSA processing
principle. It ismore efficientthan the conventional Viterbi decoding algorithm.
This decoding algorithm is called the differential trellis decoding (DTD)
algorithm [32].
Consider a rate-1/n (n, 1,m) convolutional code of memory order m. The
encoder of this code has one input and n outputs. Let a = (ao,ax,... ,ai,...)
be the input information sequence. The n corresponding output code sequences
are
• (1) (1) ,u(_),iu(I)
_'tl 0 ,111 , ....
T.EwrER.I.,.NOD,FF,  NT, LTRELUSDECOD,NG.'.LGOR,T.MS
u('0 r, (") . (") ,ul"),..).
187
At time-i, the input to the encoder is a_ and the output of the encoder is a
block of n code bits ' (1) (2) - (")_ The trellis for this code consists of
_Bi ,Bi ,''',_i 1"
2 'n states with two branches entering and leaving each state at any time (or
level) greater than rn.
A rate-1/n (n, 1, rn) convolutiona] code is said to be antipodal if, in the
generator matrix of (9.7), Go = G,, = [11... 1]. Most of the best rate-l/r=
convolutional codes are antipodal. For an antipodal convolutional code, the two
branches entering (or leaving) a state in its code trellis are one's complement
to each other, i.e., if one branch is labeled with ' (l) . (2) . (,-,),,Lui ,=i ,"',=i ), then the
_(t) _ (':') _ (.),
other branch is labeled with (1 @ u_ , 1 _ u i ,. .. ,1 _ u i ], where @ denotes
the modulo-2 addition.
At time-i, the state of the encoder is defined by and labeled with the infor-
mation bits (a,-l,ai-2,... ,a,_,,_), stored in the input shift register. Consider
the trellis section from time-/ to time-(i + 1) for i >_ m. This section can
be partitioned into 2 "_-1 two-state fully connected subtrellises with the
following structural properties: (1) the two states at time-/ are labeled with
(a,-1,a_-2,...,a,-,,_+1,0)u(I)u(2) u(') (O,a,-1,a_-2,...at-,_+1)
t _ | P'''_ t
...,_u,
(ni_l, Gi_2, . Oi_m+l, 1 ) U(I) (2) t_(,t) (1, _tt_l, __2,..., ai_,n+l )
• ._ = ,tL= 1.'', i
Figure 10.1. The structure of a 2-state subtrellis.
188 TRELLISES AND TRELLIS-BASED DECODING ALGORITHMS FOR LINEAR BLOCK CODES
(ai-1,ai-2,...,ai-,_+1,O) and (ai-1,ai-2,...,a/-,_+t,1),respectively;(2)the
two statesat time-(/+ I) are labeledwith (O,ai-l,ad-2,...,ad-m+t) and (I,
ai-l,ai-2,...,ai-m+l), respectively;(3) the branches connecting the state
(ai-I,ai-2,...,ai-m+ I,0) to the states(0,ai-I,ai-2....,ai-m+ l)and (i,ai-I,
ai-_,...,ai-m+t) are labeledwith the code blocks (u?),u_2),...,u_'_))and
(I • u_1),1 • u_2),...,i_ ul")),respectively;and (4) the branches connect-
ing state (ai-l,ai-2.....a/_,,_+1,1)to the states(O,ai-l,al-2,...,ai-,,_+1)
and (1,ai-t,ai-2,...,ai-,,+t)are labeledwith the code blocks(i • u_t),1 •
u_2),...,1 _ u_")) and (u_I),u_2),...,ul'0),respectively.The structureofsuch
a two-statesubtrellisisdepicted in Figure 10.1. These 2"-I fullyconnected
subtrellisesare commonly calledabutterflies_.
Based on the above state grouping and trellispartitioningbetween time-
i and time-(/+ I),each subtrelliscan be labeledby an (m - l)-tuplect =
(ai-t,ai-2,...,ai-m+t). In each subtrellis-ct,he statesat time-/and the
statesat time-(/+ 1) are representedby (or,ai-m) and (al,or),respectively,
with ai-,_,ai E {0, 1}.
The decoding algorithm to be presented in the following is based on the
above trellis partition. Assume that BPSK is used for transmission and each
BPSK signal has unit energy. A code sequence is mapped into a bipolar signal
sequence for transmission. The i-th code block (ul 1), u12),..., ttl ")) is mapped
into the following bipolar sequence:
(2 T- 1,2 I 1,...,2C '/- i) (10.6)
Suppose correlationisused asthe decoding metric.Let ri = (r_t),rl_),...,r_'_))
be the receivedblock inthe intervalbetween time-/and time-(/+ 1).Itfollows
from properties (3), (4) of a butterfly subtrellis given above, that the four
branch metrics between time-/and time-(/+ 1) in subtrellis-ot take two opposite
values :t=N_+I, with
rl
° =°Z( Ij' -N_+ t 2u I) r, . (10.7)
j=l
Let Mi(ct, O) and Mi(ct, 1) denote the cumulative correlation metrics that
have survived at time-/for states (a,0) and (ct, 1), respectively. Define
A?+t(O,1)& Mi((x,O) - Mi(o(,1) (10.s)
THE VITERBI AND DIFFERENTIAL TRELLIS DECODING ALGORITHMS 189
as the difference between these two metrics computed at time-(/+ 1). Then
at time-(/+ 1), the difference between the cumulative metric candidates cor-
responding to transitions from states (c_,0) and (c_, 1) to state (a_, c_) is given
by
DT+ 1 (a_) = A_'+I (0, 1) - 2(2a{ - t)N_+ 1 (10.9)
for a_ e {0, 1}.
Note that MLD maximizes the correlation metric. Hence, from (10.9), we
conclude that at state (hi, ¢_) of subtrellis-_, we select the branch diverging
from state (_, 0) if A_'+ 1 (0, 1) > 2 (2a_- t)N_%l, and the branch diverging from
state (a, 1) otherwise. Therefore, this decision can be made by first determining
IM ,NI -- max{lA%l(0,1)l,12N?+ll}, (10.10)
and then checking the sign of the value Mz_.A, corresponding to this maximum,
denoted sgn(Mz_,N). Based on the comparison result given in (10.10) and
sgn(Ma,N), the selection of the surviving branches into states (a,,(_) with
a, E {0, 1} is made. All the four selections of surviving branches are shown in
Figure 10.2. The selection rules are given below:
(1) If IA,%t(0, 1)1 > 12N?.._I and A_+l(0, 1) > 0, the two branches diverging
from state (er, 0) into states (0, ct) and (1, ex) are selected as the surviving
branches.
(2) If lAb'+l(0, 1)] > ]2N_+1{ and A_+l(0 , 1) < 0, the two branches diverging
from state (¢x, 1) into states (0, a) and (1, a) are selected as the surviving
branches.
(3) If [A_+l(O, 1)[ < [2.N',°.1[ and 2N_ 1 > 0, the branch diverging from state
(,%0) into state (0, c_) and the branch diverging from state (cx, 1) into
state (1, c_) are selected as the surviving branches.
(4) If [A7+1(0,£)1< i2N  l and 2N 1 < 0, the branch diverging from state
(c_, 0) into state (1, c_) and the branch diverging from state ((_, 1) into
state (0, _) are selected as the surviving branches.
For each subtrellis-cq the decoding process from time-/ to time-(/+ 1) can
be carried out as follows:
190 TRELLISES AND TRELLIS-BASED DECODING ALGORITHMS FOR LINEAR BLOCK CODES
• If IA_+_(o,1)1> 12N_+_I:
•i_A_+_(o,_)> o:_
•Else
•Else
*If N_%1 > 0:
•Else
Figure 10.2. Branch select;ons for subtrellis-_.
THE V[TERBI AND DIFFERENTIAL TRELLIS DECODING ALGORITHMS 191
Step-1 Compute the four possible branch metrics -I-N_+ I (preprocessing) and
scale them by 2.
Step-2 From Step-l, identify 2N/_ 1 and compute the metric difference
1).
Step-3 Compare [A_+l(O ,1)1 with [2N_+ll.
Step-4 Based on the comparison result of Step-3, determine either
sgn(A_+,(0, 1)) or sgn(N_,), and select the surviving branches based
on the selection rule shown in in Figure 10.2.
Step-5 For each state at time-(/+ 1), update the new survivor metric based
on Step-4.
The above decoding algorithm is called the differential CSA-algorithm. The
metric computations in Step-l, which are also performed by the conventional
Viterbi algorithm, can be preprocessed since at most 2 "-1 values must be
computed. Also, if the branch from state (c_, a, .... ) to state (ai, o_) survives,
the surviving metric at state (a,, c_) in Step-5 can be computed as follows:
M,÷l(a,,a) = Mi(c_,ai-m) + (2al-m - 1)(2a, - 1)Y_+ 1. (10.11)
Note that the scaling by 2 of the preprocessed values d-Ni_ 1 at Step-1 and
the sign checks at Step-4 are elementary binary operations (scaling is done by
shifting the register once). The real number operations are performed at Step-2,
Step-3, and Step-5. There are 2m-1 subtractions at Step-2, 2 ''-I comparisons
at Step-3 and 2m additions at Step-5. Therefore a total of 2.2 m real number
operations is required to process a trellis section. However, after Step-l,
the conventional Viterbi algorithm requires 2''+1 additions to evaluate the
cumulative metrics for 2m states and 2 'n comparisons to determine the 2 TM
survivors. This results in a total of 3-2 "_ real number operations to process
a trellis section. As a result, the the differential CSA-algorithm requires about
1/3 less real number operations than the conventional Viterbi algorithm for
rate-1/n antipodal convolutional codes as well as high-rate punctured codes
obtained from them.
192
Example 10.2
tern
TRELLISES AND TRELLIS-BASED DECODING ALGOR ITHMS FOR LINEAR BLOCK CODES
Consider the (2,1,6) convolutional code with generating pat-
which is the most commonly used convolutional code. This code is antipodal.
Its trellis consists of 64 states and can be decomposed into 32 fully connected
2-state subtrellises as shown in Figure 10.1. Since n = 2_ at time-i, there are
four possible branch metrics of the form :k(r_,l 4- r_,2), which are computed
with two real additions, and then scaled by 2. For this code, at time-i, the
Viterbi algorithm computes 128 cumulative metric candidates and then per-
forms 64 comparisons, so a total of 192 real value operations is required. The
differential CSA-algorithm first computes 32 metric differences at Step-2, and
then performs 32 comparisons at Step-3. Finally, based on the 32 sign checks
of Step-4, 64 surviving cumulative metrics are updated at Step-5. As a result,
only 128 real value additions are executed. Therefore, 64 real value operations
are saved by the differential CSA-algorithm at the expense of 32 sign checks
and 2 scalings by 2.
In practical applications, high-rate convolutional codes are often constructed
from a low-rate (n, 1, m) convolutional code by puncturing. The trellis for the
punctured code has the same structure and state complexity as that of the
original rate-i/n convolutional code, except that the lengths of its sections vary
periodically. As a result, the decoder for the rate-I/n convolutional code can be
used for decoding the punctured code. If the base rate-1/n convolutional code is
antipodal, then any punctured code constructed from it is also antipodal. Each
trellis section for the punctured code can be partitioned into 2'_'-1 butterfly
subtrellises in exactly the same manner as described above. The two branches
leaving (or entering) a state in a butterfly subtrellis are one's complement
of each other. Consequently, the differential CSA-algorithm can be used for
decoding the punctured code. All the rate-k/(k + 1) punctured convolutional
codes presented in [16] are time-varying antipodal codes. Also, this construction
can be generalized to the case where k rate-1/n base convolutional codes rather
than only one are periodically selected, with period k. Again, if the resulting
time-varying punctured code is antipodal, then the differential CSA-algorithm
can be used.
[1 1 0 1 1 1 1 1 0 0 1 0 1 1],
THE VITERBI AND DIFFERENTIAL TRELLIS DECODING ALGORITHMS 193
The application of the differential CSA-algorithm to rate-k/n convolutional
codes with k > I also allows 1/3 real value computation saving after proper
pairing of the states in the code trelliJ [32 I. The differential CSA-algorithm
can also be applied e_ciently to trellis decoding of block codes. For example,
trellis decoding based on the 4-section trellis diagram for the (16, 5) RM code
requires 59 real value operations for the differential CSA-algorithm and 95 real
value operations for the conventional Viterbi algorithm.

