Good trellises for IC implementation of viterbi decoders for linear block codes by Lin, S. et al.
NASA-CR-201091 "_ ....
GOOD TRELLISES FOR IC
IMPLEMENTATION OF VITERB!
DECODERS FOR LINEAR
BLOCK CODES
Technical Report
to
NASA
Electrical Engineering
Goddard Space Flight
6reenbe!t, Maryland
Division
Center
20771
Grant Number NAG 5-2938
Report No. 96-002
Principal Investigator: Shu Lin
Department of Electrical Engineering
University of Hawaii at Manoa
Honolulu, Hawaii 96822
May 24, 1996
https://ntrs.nasa.gov/search.jsp?R=19960026643 2020-06-16T03:55:46+00:00Z

GOOD TRELLISES FOR IC
IMPLEMENTATION OF VITERB!
DECODERS FOR LINEAR
BLOCK CODES
Hari T. Moorth¥, Shu Lin and Gregory T. Uehara
Technical Report Number 96-002
May 24, 1996

Good Trellises for IC Implementation of Viterbi
Decoders for Linear Block Codes
Hari T. Moorthy, Shu Lin and Gregory T. Uehara
Department of Electrical Engineering,
University of Hawaii,
Honolulu, Hawaii 96822
Abstract
This paper investigates trellis structures of linear block codes for the IC (integrated
circuit) implementation of Viterbi decoders capable of achieving high decoding speed while
satisfying a constraint on the structural complexity of the trellis in terms of the maximum
number of states at any particular depth. Only uniform sectionalizations of the code trellis
diagram are considered. An upper bound on the number of parallel and structurally
identical (or isomorphic) subtrellises in a proper trellis for a code without exceeding
the maximum state complexity of the minimal trellis of the code is first derived. Parallel
structures of trellises with various section lengths for binary BCH and Reed-Muller (R.M)
codes of lengths 32 and 64 are analyzed. Next, the complexity of IC implementation of a
Viterbi decoder based on an L-section trellis diagram for a code is investigated. A structural
property of a Viterbi decoder called ACS-connectivity which is related to state connec-
tivity is introduced. This parameter affects the complexity of wire-routing (interconnections
within the IC). The effect of five parameters namely: (1) effective computational complexity;
(2) complexity of the ACS-circuit; (3) traceback complexity; (4) ACS-connectivity; and (5)
branch complexity of a trellis diagram on the VLSI complexity of a Viterbi decoder is inves-
tigated. It is shown that an IC implementation of a Viterbi decoder based on a non-minimal
trellis requires less area and is capable of operation at higher speed than one based on the
minimal trellis when the commonly used ACS-array architecture is considered.
1. Introduction
Any linear block code can theoretically be decoded by applying the Viterbi algorithm
to a trellis for the code. Trellises for block codes were first described in [1]-[3]. After Forney's
refinement of the structure of these trellises [4], their potential in the practical decoding
of block codes has been realized by many others who have published extensively on various
aspects of the trellis structure of block codes [5]-[25]. In some of the above papers, one goal
was to minimize the maximum number of states in the trellis at any depth by considering
all possible permutations of the code [6]. For some codes such as Reed-Muller codes, this
optimum permutation is known [7]. For most others only bounds are known.
Even when the optimum order of bits is known or a good permutation is known (if the
optimum order is unknown), previous work has focussed on minimization of the number of
computations required for decoding [12, 22, 24]. If the actual decoding is intended to be
performed using a stored program approach that executes the operations needed to decode
a received vector sequentially, then this approach will lead to the fastest decoding speed.
However, if an IC implementation is intended, then an alternative approach is more suit-
able. Given a constraint on the amount of hardware (determined by the number of states
and the complexity of branches) in the decoder, decoding must be done as fast as pos-
sible; not necessarily with as few computations as possible. To achieve this end, we
propose the use of non-minimal trellises with parallel structure in which the maximum state
space dimension is not greater than the maximum state space dimension of the minimal
trellis of a code. In this paper, certain properties concerning the state connectivity and
branch complexity [9] of this non-minimal trellis are derived which demonstrate that the
non-minimal trellis implementation would require less area in an IC implementation
than the corresponding minimal trellis when the ubiquitous ACS array architecture [26]-[28]
is used for implementation. We caution that if a different architecture as proposed in [27]
or [24] is chosen for implementation, then the trellis structure that is best suited will in
general be different from the proposed trellis.
The number of decoding operations required by the standard trellis-based Viterbi decod-
ing algorithm depends on the sectionalization of the trellis used for decoding. Most of the
previous works focussed on uniform sectionalization of a trellis, each section consists of the
same number of code symbols. However, Lafourcade and Vardy [22] recently showed that
non-uniform sectionalization of a trellis often results in less number of decoding operations
than uniform sectionaIization. They have devised an efficient algorithm for finding optimal
sectionalization of a trellis for minimizing the total number of decoding operations required
for maximum-likelihood trellis decoding. Optimal sectionalization of a trellis to minimize
computational complexity is also investigated in [24]. In this paper, we only investigate
good trellises with uniform sectionalization for IC implementation of Viterbi decoders. Par-
ticularly, we are concerned with those structures, such as parallel structure, regularity and
state-connectivity that: (1) affects the complexity of wire-routing (interconnections) within
the IC and chip-size; and (2) facilitate parallel and pipeline decoding process to achieve high
decoding speed. Since non-uniform sectionalization of a trellis requires less decoding opera-
tions, this advantage over uniform sectionalization and other properties definitely should be
investigated for IC implementation of Viterbi decoders to achieve high decoding speed. This
investigation is beyond the scope of this paper.
Trellises for block codes are often loosely connected. A properly constructed trellis may
consist of many parallel and structurally identical (isomorphic) subtrellises of smaller state
space dimension without cross-connections between them. Consequently, identical Viterbi
decoders of much smaller complexity can be devised to process the subtrellises independently
in parallel without internal communication between them. This not only simplifies the
IC implementation but also speeds up the decoding process. For example, the (32,16,8)
Reed-Muller (RM) code, also an extended BCH code, has a 4-section, 64-state minimal
trellis diagram, which consists of eight parallel and structurally identical 8-state subtrellises
without cross-connections among them. As a result, we can devise eight identical 8-state
Viterbi decoders to process the 8 subtrellises in parallel without communication between
them. At the end, there are 8 survivors (one from each subtrellis) and the best one will
be chosen as the decoded codeword. This reduces the implementation of a 64-state Viterbi
decoder to the implementation of an 8-state decoder and using 8 copies of it. This parallel
structure reduces the wire-routing and internal communications within IC which reduces chip
size and improves decoding speed. If the state and branch complexities of each subtrellis
is small and the total number of subtrellises is small, all the subtrellis decoders can be put
on a single chip, such as for the (a2,16,8) RM code [29]. However, if the state and branch
complexities are big, then each subtrellis decoder (or several of them) can be implemented
on a single chip. This provides flexibility in chip plan and decoder architecture.
The two fundamental bottlenecks to Viterbi decoding (decoding speed) are the inter-
nal communications between ACS (add-compare-select) units and comparisons of incoming
branches (radix-profile) at each state [28, 30]. Properly designed parallel structure in a trel-
lis would overcome these obstacles without exceeding the maximum state space dimension
of the minimal trellis. For example, a (64,40,8) RM subcode which is being considered by
NASA for high-speed satellite communications has an &section 2048 state trellis. This trellis
consists of 32 parallel and structurally identical 64-state subtrellises. The last 4 sections of
each subtrellis are a mirror image of the first 4 sections as shown in Figure 3. As a result,
a bidirectional decoding can be performed. Furthermore, the maximum component of the
radix profile for each half subtrellis is only 8. A 64-state subtrellis decoder can be imple-
mented on a single chip in 0.5 micron CMOS technology which can operate at a decoding
speed of 600 Mbps [31]. Other structural properties of the subtrellises for this (64,40,8) RM
subcode which simplifies the IC implementation will be discussed later. Parallel structure
therefore, offers simplification, flexibility and higher decoding speed for IC implementation.
We must note that the parallel structure does not reduce the total number of single-state
processors, i.e., number of ACSs.
In this paper, we investigate trellis structures, particularly the parallel structure, of
linear block codes for implementation of Viterbi decoders capable of achieving high decoding
speed while satisfying a constraint on the structural complexity of the trellis in terms of the
maximum number of states at any depth. Only uniform sectionalizations of the code trellis
are considered. The organization of the paper is as follows.
In Section 2, using the theory of L-section minimal trellis diagrams, an upper bound
on the number of parallel isomorphic subtrellises in a proper trellis for a code without
exceeding the maximum state space dimension of the minimal trellis of the code is derived.
In Section 3, we analyze the trellises for all extended BCH and RM codes of lengths 32 and
64. In Section 4, we define parameters related to the complexity of a Viterbi decoder IC
using the ACS-array architecture for linear block codes. Section 5 treats examples and in
Section 6 we use the results of this paper to design a trellis for a (64, 40) RM subcode.
o Trellises with Parallel Structure for Linear Block
Codes with Constraint on Maximum State Space
Dimension
The objective of this section is to show that we can build a trellis for a linear block code
C which is a disjoint union of a certain desired number of parallel isomorphic subtrellises.
Although this trellis is not minimal, its state space dimension at every depth is less than or
equal to the mazimum state space dimension of the minimal trellis. The conditions under
which such a trellis construction is possible and an upper bound on the number of such
parallel subtrellises are derived. In some cases, the minimal trellis itself possesses a parallel
structure. The number of such parallel subtrellises (if any) in the minimal trellis is derived.
2.1. Preliminaries
We consider only binary (N, K, drain) linear block codes. Let L, M be positive integers
such that L M = N. The minimal (up to graph isomorphism) L-section trellis, is a well
understood graphical representation of the code [5, 9]. Let the sets of states at the end of each
section be denoted {So, SM, S2M,. • •, ,5'(L-1)M, SLAI}. We define a sequence {So, aM,..., 8L2_I}
called the state complexity profile (SCP) of the trellis and given by aim = log.2(IS;MI) for
0 < i < L. The minimal L-section trellis of a code C has the property that every component
of its SCP is less than or equal to the corresponding component in the SCP of any other
proper L-section trellis for C. The maximum among the N+I components in the SCP of the
minimal N-section trellis (L = N, M = 1) for C is denoted sn,_,x(C) and we will denote the
maximum of the components in the SCP of the minimal L-section trellis for C as Sm_x,L(C).
For a binary N-tuple v = (vl,...,vN), let ph,h,[v] denote the (h'- h)-tuple (vh+l,...,vh,)
and let ph.h,[C] = : c C}. Let Ch,h, be the linear subcode of C consisting of
all codewords whose components are all zero except for the (h'- h) components from the
(h + 1)-th bit position to the h'-th bit position.
In an L-section minimal trellis for a block code, there may be a set of parallel branches
between two adjacent states. In such a case, we call the entire set of parallel branches a
composite branch. Each composite branch in the i-th section 1 < i < L, is made up of 2*"
parallel branches where Pi is the dimension of the subcode denoted C(i-,)M.iM [9]. In the
i-th section of an L-section trellis for a linear block code 1 _< i <__L, the number of distinct
branch metrics that have to be computed is 2 °_ where Di is the dimension of the subcode
p(i_l)M,iM(C) and this number is much less than the total number of branches. Di is the
rank of the submatrix formed by M columns from the ((i - 1)M + 1)-th to the (iM)-th
column of the generator matrix of the code and is upper bounded by M. For 1 < i < L, let
the number of composite branches merging into any state s E SiM be 26_M (it is the same for
any state in SiM). For an L-section trellis for C, we define the converging branch profile
(CBP) as the ordered sequence {SM,52M,...,SLM}. For 0 _< i < L, let the number of
composite branches emanating from any state s E SiM be 2 :_'v, (it is the same for any state
in SiM). The ordered sequence {,\0,,\t,...,A(L-11M} is called diverging branch profile
(DBP). Then 5 and ,\ are related as follows:
¢5iM = S(i-1)M q- A(i-1)M -- $iM.
Based on the theory of L-section trellises [9], it can be shown that
/_(i-1)M : dim(C(i_l)M,N) -- dim(CiM,N) -- Pi (2.2)
which implies that A/i_I)M equals the number of rows of a trellis oriented generator matrix
of C whose leading 1 occurs among the positions {(i- 1)M,(i - 1)M + 1,...,iM- l}
and whose span is not contained in the i-th section. These dimensions can be easily deter-
mined from the trellis oriented generator matrix of the code [9, 16, 20}. The two sequences,
{(_M,_2M,..., (_LM} and {A0, A,,..., )_(L-1)M} provides a measure of the state connectivity
of an L-section minimal trellis. In IC implementation of a Viterbi decoder, _iM is called a
radix number.
2.2. Parallel Trellises
Let G be the trellis oriented generator matrix of an (N, K) linear block code C [4]. Let
r = (r_,r2,...,rN) be a typical row of G. Then, we define the span of r, denoted apan(r),
to be the smallest interval [i, j], 1 < i < j _< N which contains all the non-zero elements of
r. For a row r whose span is [i,j] we also define an active span of r, denoted aspan(r),
as [i,j - 1] if i < j and aspan(r) = ¢ if i = j. The trellis oriented matrix has the following
properties: (1) The leading 1 of every row occurs in an earlier position than the leading 1
of the row below it; (2) The trailing 1 of every row occurs at a different position from the
trailing 1 of every other row. Any other trellis oriented matrix for C has the same set of row
spans although the rows themselves may be different [20]. Let T be the minimal N-section
trellis for C. Given the trellis oriented generator matrix of a code, the state space dimension
at any position I is just equal to the number of rows whose active span contain I [20]. For
example, consider the following trellis oriented generator matrix:
1 1 1 1 0 0 0 0 rl _J0 1 0 1 i 0 1 0 r20 0 1 l l l 0 0 r3
0 0 0 0 1 l i i r4
for which aspan(r,) = [1,3], aspan(
For each l, 0 _< 1 _< 8, counting the
r2) = [2,6], aspan(r3) = [3,51 and aspan(r4)= [5, 71.
number of rows which are active at that l yields the
state dimension profile (SCP), {0, 1,2,3,2,3,2, 1,0}. For 0 < l _< N, let st(C) denote the
dimension of the l-th state space of C. Let Smax(C) be the maximum among the state space
dimensions. Define the non-empty set,
/max(C) : {l : 8I(C)= 8max(C)}. (2.3)
Suppose we choose a subcode C' of C such that dim (C') = dim (C) - 1 and the set of coset
representatives [C/C'] is generated by the single row r E G. From the above statement
about st(C), it is clear that st(C') = al(C) - 1 for exactly those l where r is active, i.e.,
l E aapan(r). For other positions I (L aspan(r) we have st(C') = st(C). Hence we have the
following proposition.
Lemma 1: If there exists a row r in the trellis oriented generator matrix G for the code C
such that aspan(r) D_ Im_x(C), then we can form a subcode C' of C generated by G - {r}
such that 8max(Or ) = 8max(C ) - 1. and [max(Ct) D /max(C). R
In fact /m_x(C') = Im_x(C) U {l: st(C)= $max(C) -- I,l _ aspan(r)}. Since G is a trellis
oriented generator matrix, G' = G - {r} is also trellis oriented. We can apply the above
proposition again to C' if there exists a row r' E G' with aspan(r') D__Im_×(C'). This yields
a subcode (_ with dimension smaller by one and Sm_x(C) = Sm_x(C') - 1. If no such row
r' exists, the proposition cannot be applied and the recursion stops. The above proposition
can be generalized.
Let R(C) be the following subset of rows of G,
R(C) = {r e G" aspan(r) DD_/max(C)}. (2.4)
Let p = [R(C)[ where IQI denotes the cardinality of any finite set Q.
Theorem 1: With R(C) defined as above and p = ]R(C)I , let 1 < p' < p. There exists
a subcode C' of C such that s,_,,,x(C') = Smax(C) - p' and dim(C') = dim(C)- p' if and
only if there exists a subset R' C_ R(C) consisting of p' rows of R(C) such that for every
1 satisfying s,(C) > am_x(C'), there exist at least s,(C) - Smax(C') rOWS in R' whose active
spans contain l. The set of coset representatives [C/C'] is generated by R'.
Proof: Suppose R' = {r'l,...,r'p,} satisfies the conditions in the hypothesis. Since R' C_
R(C), Imax(C) _C aspan(r'i) for 1 < i < p'. Consider the subcode generated by G - R'. For
those I E /max(C), we can determine aI(C') by counting the number of rows r E (G - R')
that are active at the position l. But this number is exactly less than Smax(C) by p'. For
I _ /max(C) and satisfying s,(C) > Smax(C'), we are assured by the hypothesis that s,(C)
will be reduced by at least st(C)- 3max(C t) thus guaranteeing that 3rnax(C 1) = 3max(C ) --pt.
To prove the converse, let C' be a subcode of C whose dimension is dim(C) - p' and
satisfying amax(C') = amax(C) - p'. Without loss of generality, we may let C' be generated
G - R' for some subset R' of the trellis oriented generator matrix G of C with JR' l = p'.
Let 7- be the minimal trellis corresponding to G. Let 7" be the minimal trellis for C'. Let
Nt(R') be the number of rows r' in R' such that l E aapan(r'). Then, at every position
I, 0 < I < N, we have
s,(f-) = N,(R') > st(C) (2.5)
since st(C) is the smallest possible state space dimension. Therefore
NI(R') > at(C)- st(C'), (2.6)
,V,(_') > a,(C)- amax(C').
For every l, at least st(C) - Sm_._(C') rows of R' are active. Also, for every l C /m_×(C), we
have Nt(R') > Smax(C) -- Smax(C') = r'. So all the rows r' E R' satisfy aspan(r') D lmax(C).
Thus R' C R(C). |
The utility of the above theorem is that it shows how to choose a subcode C' of C
with Sm_x(C') = Sm_x(C)- dim([C/C']), such that one can build a non-minimal trellis T
for C with the following properties:
1. The maximum state space dimension of T is Sm_×(C).
2. 7- is the union of 2 dim [C/C'] parallel isomorphic subtrellises Ti with each T, being
isomorphic to the minimal trellis for C'.
3. Upper Bound on Parallelism: The smallest such subcode has dimension lower
bounded by dim (C)- I_(c)l. i.e., the maximum number of parallel subtrellises one can
obtain with the constraint that the total state space dimension never exceeds Smax(C)
is upper bounded by 2 IR(C)I with R(C) as defined above.
4. Parallelism of the Minimal Trellis: The logarithm to the base 2 of the number of
parallel isomorphic subtrellises in a minimal L-section trellis for a binary (N, It') linear
block code is given by the number of rows in its trellis oriented generator matrix whose
active span contains the integers {M, 2M,...,(L - 1)M} where N = LM.
As an example, consider the extended and permuted (32, 21,6) BCH code. A parity check
matrix for this code with an optimum order of bits with respect to trellis state complexity
is shown in Figure 1. The set of spans of any trellis oriented generator matrix for this
code is given in Table 1. The 4-section minimal trellis has the SCP {0, 7,9, 7,0} giving
8rnax,4(C) = 9. This trellis has 2 parallel isomorphic subtrellises. /max(C) = {16} and it
can be verified that JR(C)] = 9. In an attempt to build a trellis consisting of 64 parallel
subtrellises while satisfying the upper bound of 9 on the maximum state space complexity,
we let fl t--- 6. So _qm&x(C) --/0' -" Sm&x(C') - 3. The set {l: s/(C) > Sm&x(C')} -- {8, 16,24}.
However, we find that no subset R' of R(C) exists satisfying the conditions in Theorem 1.
Hence we cannot build a trellis consisting of 64 parallel subtrellises for this code without
violating the constraint on the maximum state space dimension. If we choose / = 5, then
we can find a subset R' = {r6, rr, rs, r12, rl5} C R(C) that satisfies all the conditions in
Theorem 1. Hence choosing the subcode C' generated by G - R' we obtain a trellis T for C
consisting of 32 parallel isomorphic subtrellises. Each subtrellis is isomorphic to the minimal
trellis for C' which has Sma.,(C') = 4.
For the same code, the 32-section minimal trellis has the SCP that gives .Smax.32(C) = 10
and /max(C) = {117., 14, 18,20}. Using Table 1, we find that IR(C)] = 2. In an attempt to
8
build a trellis consisting of 4 parallel subtrellises while satisfying the upper bound of 10 on
maximum state space dimension, we let p' = 2. So Sm_×(C') = 8. The set {1 : st(C) >
amax(C')} --- {10, 11, 12, 13, 14, 1,5, 16, 17, 18, 19, 20, 21, 22}. We find that the subset of two
rows having span [8, 25] and [10, 23] satisfy the conditions in Theorem 1.
This decomposition of a trellis into parallel and structurally identical subtrellises of
smaller state complexity without cross-connections between them has significant advantages
for IC implementation of Viterbi decoding. Identical Viterbi decoders of much simpler com-
plexity can be devised to process the subtrellises independently in parallel without internal
communications (or information transfer) between them. Internal information transfer limits
the decoding speed [28, 30]. Furthermore, the number of computations to be carried out per
subtrellis is much smaller than that of a fully connected trellis. As a result, the parallel struc-
ture not only simplifies the decoding complexity but also speeds up the decoding process.
For example, the (32, 16,8) extended and permuted BCH code (also a Reed-Muller code) has
a 4-section trellis diagram of 64 states. It can be decomposed into 8 parallel and structurally
identical 8-state subtrellises without cross-connections between them as shown in Figure 2.
As a result, 8 identical 8-state Viterbi decoders can be devised to process the decoding in
parallel. An IC implementation of a Viterbi decoder for this code using a 0.8 micron CMOS
technology has been recently completed at the University of Hawaii VLSI Design Center.
The decoder is implemented in Xilinx Field Programmable Gate Array (FPGA) chips [29].
The decoder is capable of operating at a speed of 200 Mbps. Custom design of this decoder
using 0.5 micron CMOS technology can achieve a decoding speed of 600 Mbps or higher.
3. Trellises of BCH and RM Codes of Lengths 32, 64
Based on the theory developed in the previous section, an analysis of the parallel struc-
ture of the trellises ['or Reed-Muller (RM) and extended binary BCH codes was carried out.
The degree of parallelism and the state complexity both depend on the sectionalization of the
trellis. In general, it is known that as the number of sections decreases, the state complexity
also decreases but the branch complexity in each section increases. We consider all possible
uniform sectionalizations in which the number of parallel branches between two connected
states is at most two. For example, we consider only 64-, 32-, 16- and 8-section trellises
for the (64,42,8) RM code because the 4-section trellis has 32 parallel branches between
any two adjacent connected states. The reason is that from an implementation and com-
putational viewpoint, greater than 2 parallel branches between adjacent connected states
are disadvantageous? The results of this analysis are presented in Table 4. Some surpris-
ing results are observed. A 32-section trellis for the (32,11,12) extended BCH code can be
constructed which consists of 512 parallel 2-state subtrellises. An S-section trellis for the
(64,42,8) RM code can be constructed which consists of 128 parallel 64-state subtrellises.
These results do not follow from the squaring construction for Reed-Muller codes or other
previously published approaches. They also provide the designer a wide range of choices
for trellises from which to choose. In the tables for each code and for each possible choice
of the number of sections L, the logarithm to base 2 of the maximum number of parallel
subtrellises that can be obtained without exceeding the number of states in the minimal
trellis is denoted Pmax,L. The maximum state space dimension of the L-section subtrellis for
the subcode C' is denoted sm_x,L(C'). The best known order of bit positions with respect to
state complexity of BCH codes of length 64 presented in [12, 25] was used to produce the
tables.
o Issues in the IC Implementation of an L-section
Trellis-Based Viterbi Decoder
In this section, five key factors affecting the decoding speed of a Viterbi decoder based
on the minimal and non-minimal trellis are examined. The non-minimM trellis structure
presented in this paper reduces the internal communication and allows independent-parallel
processing of the subtrellises while decreasing the complexity of a Viterbi decoder IC. We
substantiate this claim through analysis in the following subsections.
4.1. Effective Computational Complexity of L-section Trellis
We consider a Viterbi decoder IC based on an L-section trellis with M bits per section
for a (L M, K, drain) block code C. While many VLSI structures have been described for
a Viterbi decoder [26, 27, 32], the most widely implemented structure is based on add-
compare-select circuits (ACS) wherein each abstract state in the trellis diagram manifests
itself as a physical ACS circuit on the IC and the same ACS's are repeatedly used for all
depths in the trellis. The ACS's can be labeled ACS-i for 0 _< i < 2s.... L(c)
Let 7i be the time required to process section-/ of the trellis. At time t = 0, the
metrics of the ACS circuits corresponding to the originating state of each parallel subtrellis
are initialized to 0. After "_1 units of time, at t = *tl, the ACS-i corresponding to state si
_When there are exactly two parallel branches with complementary labels, the correlation metric for one
branch is the negative of other and hence can be obtained by a mere sign inversion
10
at the end of section-1 for 0 < i < [SM(C)I, has the metric of state a/ E S,w. The index
of the surviving branch into si is also stored in ACS-i. Continuing in this way, at time
t = "_L+'" +Tt, 1 < l < L, ACS-/corresponding to state si, 0 _< i < ]S_,w(C)t, will have the
metric of ai C Sl,w(C) and a sequence of l survivor branch indices corresponding to the most
likely path from the originating state (of the subtrellis to which si belongs) to si E oel,.w.
There are as many ACS's as the maximum number of states at any depth in the L-section
trellis for the linear block code. In the minimal trellis, whenever the decoder is processing
the trellis at a depth at which the state size is less than the maximum state size, a number
of ACS circuits are idle and the hardware utilization efficiency is poor. In the non-minimal
L-section trellis, the utilization of the ACS circuits that exist in the [C is improved. Since
all the subtrellis decoders operate independently in parallel, from the standpoint of speed,
the effective computational complexity of decoding a single block (a received vector)
is defined as the computational complexity of a single parallel subtrellis (viz. the minimal
trellis for the subcode C') plus the cost of the final comparison among the choices (survivors)
presented by each of the subtrellises. The time required for the final comparison is small
relative to the time required for decoding a subtrellis and this comparison can be pipelined.
Since subtrellises are processed in parallel, the speed of operation is limited only by the time
required to process a subtrellis.
Note that both the minimal and non-minimal trellises require the same number of ACS
circuits. However, the non-minimal trellis has a larger number of parallel subtrellises as
compared to the minimal trellis (which often has none). Hence decoding using the non-
minimal trellis with proper structure is faster compared to that using the minimal trellis.
Therefore, a system bit rate specification which earlier could be met only by the use of some
P number of Viterbi decoders operating simultaneously in parallel can be met with much
fewer than P \/iterbi decoders. In this manner, the effective computational complexity is a
factor att'ecting the reduction in hardware complexity of an overall decoder.
4.2. Complexity of the ACS circuit
The converging branch profile (CBP) defined as the number of branches merging into a
state at each particular depth also a.ffects decoding speed and implementation complexity'.
This is called radix in [C literature. Let 8,,w(C), 1 <_ i < L, be the CBP of the minimal
trellis for C with trellis oriented generator matrix G. At depth I, 1 _< I _< L, the ACS
circuits have to perform at least c5¢,wstages of a tree type [aaltwo-way comparisons to find
the best incoming branch. Hence reduction of the converging branch profile will improve the
speed of decoding and reduce the complexity of each ACS circuit. We now show that none
of the components in the CtBP of the non-minimal trellis is increased. As will be shown by
ll
examples in Section 5, most of the components of the CBP are decreased considerably.
Consider a non-minimal trellis for C obtained as the union of 2 parallel subtrellises each
isomorphic to the minimal trellis for C', a subcode of C generated by G - {r}, r (f G. Let
8iM(C'), 1 _< i < L, be the CBP of the minimal trellis for C'. Recall that st(C') = at(C)
if l qf aspan(r) and st(C') = sl(C)- 1, if l E aspan(r). By equation (2.2), A(i__)M(C' ) E
{A(i_x)M(C),A(i_I)M(C) - 1}. By equation (2.1), (SiM(C') > aiM(C) only if s(i-1}M(C') =
s(i__)M(C) and siM(C') = S,M(C)- 1. But in this case, (i- 1)M _ aspan(r) and iM E
a.spa_(r). So )_(i_I)M(Ct)= /\(i_,)/_I(C)- 1. Therefore, &M(C')= _SiM(C).
4.3. Traceback Complexity
Consider the problem of traceback to determine the best path through the trellis. In
the minimal trellis, the ACS-i corresponding to state s, E &M has to store aiM(C) + P,(C)
bits in order to identify which of the 2a'MtC) composite branches merging into si and which
of the 2 P'(C) parallel branches that form a composite branch survives. Therefore, in the
minimal trellis, each ACS-i needs to store EiLI(aiM(C) + P,(C)) = dim(C) bits in order to
identify sequence of surviving incoming branches. In the non-minimal trellis, the storage in
number of bits required for each acs-i is Z;_=I(6_M(C') + P,(C')) = dim(C') where C' is
the subcode of C corresponding to the subtrellis. Since dim(C') = dim(C) - Pm,=,L(C), the
ACS's in the non-minimal trellis design require less storage than in the minimal trellis. The
combined savings in storage in all the 2_.... L(C) ACS circuits is significant.
4.4. ACS-Connectivity
The basic operations performed by an ACS circuit are: addition of branch metrics of the
incoming branches to the state metrics of the corresponding originating states, comparison
of the resulting sums to find the best, selection of the surviving sum as the new state metric
and the corresponding surviving branch label. The ACS-array architecture is dominated
by the area required by the interconnections to transfer the state metrics [27]. For a state
si E StM, 0 _< i < 2`'(C), 0 <_ 1 < L, let Al(,si) denote the set of states in S(t+l}M that are
adjacent to si. Let Al(si) = ¢ if i _> I&MI. Then in the ACS-array implementation of the
Viterbi decoder based on the minimal trellis, a path to transfer the state metric must exist
between ACS-i and all ACS circuits that correspond to states in
Ao(s,) U &(s,) u... u A(c__)(s,)
The above set defines the connectivity of ACS-i in the ACS array corresponding to state
si E StM. The connectivity of the ACS's corresponding to states in the minimal trellis
12
results in a large amount of area in the VLSI chip being used for wiring [26, 27]. On the
contrary, in the implementation of a Viterbi decoder based on the non-minimal trellis, the
ACS circuits can be divided into blocks [31] such that the ACS's corresponding to states
in a single subtrellis form a block. A particular ACS-i needs to transfer its metric only to
a subset of ACS's within its own block. This reduced connectivity results in a reduction
of hardware complexity and wiring area. The maximum connectivity of ACS-i is upper
bounded by Sma×,L(C') in the non-minimal trellis implementation.
4.5. Branch Complexity
The number of distinct branch metrics that have to be computed in section-/ of the
trellis is a property of the code and is unaltered by the parallelization of the trellis. Most
IC decoders have a branch metric computational unit where all the branch metrics are
calculated and then transferred to the ACS circuits [26, 27]. Because of the interconnection
of branches between states in the trellis, routing the branch metrics to each of the ACS
circuits requires a large anaount of chip area. The trellises we describe show improvement
over the minimal L-section trellis on this count because each subtrellis requires only a subset
of the set of branch metrics in section-/of the trellis.
Paralle[ization of the minimal trellis as described in Section 2 may lead to a larger number
of total computations being performed in decoding. The number of emanating branches in
section-(/ + 1) is IStM{2x'M which may be larger than the corresponding product for the
minimal trellis for some values of l, 0 _< l < L. However, as explained above, the hardware
complexity of the decoder is not affected. We illustrate the reason with an example: The RM
(64,42,8) code has a minimal trellis with the 8tMH-,\tM sequence of {7, 13, 16, 16, 16, 16, 13, 7}.
The same sequence for the non-minimal trellis is {13, 16, 16, 16, 16, 16, 16, 13} which is larger
at positions {0, 1,6, 7}. Consider the case when I = 1 (other cases are similar). In section
2 of the minimal trellis, each of the 128 ACSs corresponding to states at the end of section
1 has 64 branches emanating from it. In section 2 of the non-minimM trellis, each of the
8192 ACSs has 8 branches emanating from it. Hence the number of operations performed
per ACS are fewer in the non-minimal trellis. Hence larger values of IStMI2 arm represent
larger number of operations performed simultaneously in parallel by all the ACS's in the
non-minimal trellis.
5. Examples
Consider the (32,21,6) extended and permuted BCH code. The minimal 4-section,
8-bits/section trellis has SCP {0,7,9,7,0}. A non-minimal trellis 4-section trellis can be
13
obtained as the union of 32 parallel isomorphic subtrelliseseachhaving SCP {0,4,4,4,0}.
Thus, Viterbi decoder implementations using the ACS-array architecture for both trellises
will require 512 ACS circuits. However, in the minimal trellis, each ACS will require the
capability of choosing the best among 64 incoming branches whereas the corresponding
number is only 16 in the non-minimal trellis. The problem of routing state metrics is also
much reduced since the connectivity of ACS-0 is 128 and that of ACS-i is at least 64 for
1 < i < 51i while the maximum connectivity of any ACS in the non-minimal trellis is only
16. The structural parameters of each of these trellises are summarized in Tables 2 and 3.
Assuming each real number to be quantized to 8-bits the VLSI layouts of a radix-8
ACS and a radix-16 ACS were generated. A modified form of the bit-level pipelined ACS
architecture [aalwas used for the ACSs. The area required for the radix-16 ACS was 2.7 times
that required for the radix-8 ACS. Assuming a factor of 9-.5 increase in area per doubling
of the radix, we see that 128 ACSs in the minimal trellis have an area 6.25 times larger
than their counterparts in the non-minimM trellis implementation. The remaining ACSs
require the same area. We see that the device area. is reduced by adopting the proposed
trellis architecture. Furthermore, the reduction in ACS-connectivity will yield significant
reduction in wiring area. The savings in hardware complexity and increase in speed due
to the non-minimM trellis approach easily overcomes the extra cost of the final comlSarison
among the a2 choices, (1 from each of the subtrellises) to find the best codeword.
6. Trellises for a (64, 40, 8) subcode of RM (64, 42, 8)
A (64, 40, 8) subcode of the Reed-Muller (64, 42, 8) code is proposed to NASA for usage
as inner code in a concatenated coding system with the NASA standard (25,5,223, 33) Reed-
Solomon code as outer code [31]. This RM subcode achieves a ,5.3 dB coding gain over
uncoded BPSK at the bit error rate of 10 -6. The required speed of decoding is 960 x 106
BPSK symbols per second which translates to an information bit rate of 600 Mbps. The
coding gain is 0.5 dB less than the coding gain of a similar scheme with the same outer
code but the NASA standard rate-I/2, 64-state convolutional code [34] as the inner code.
However the (64, 40) RM subcode has a higher rate of 0.626 bits/symbol than that of the
convolutional code and thus requires lesser bandwidth. More significant is the fact that a
Viterbi decoder for the (64, 40, 8) inner code can be designed to operate at higher data rates
than that for the convolutional code using the parallelism of the trellis of the RM subcode.
The trellis for the NASA standard 64-state convolutional code does not consist of parallel
subtrellises.
Let C denote the RM (64,42,8) code and (_ a (64,40) subcode of C. If the L-section
14
trellis on which decoding is based is composed of a union of P parallel isomorphic subtrellises
then, the effective computational complexity denoted Aeff(L ) is merely that of a single
subtrellis plus the cost of obtaining the final decision by comparing outputs of each of the
P Viterbi decoders. The value of L which minimizes Aeff(L ) with the constraint that the
L-section trellis ir have a maximum state complexity not greater than Sma×,L(C) (different
from s(C)) is determined. Note that Sm_x,c(C) is a function of the choice of the subcode and
we will choose that subcode which has the least Smax,L(C) for each L. The complexity of
each addition, subtraction and comparison is assumed to be equal to one addition equivalent
operation.
In the following, the trellis diagrams of various sectionalizations for this RM subcode
are given. Their effective computational complexities are computed.
6.1. L=4, M= 16
Let C0 = (16,15,2), C 1 = (16,11,4) C2 = (16,5,8)be the corresponding Reed-
Muller codes, Gi a generator matrix of C¢ and Gi/j a generator matrix for the set of coset
representatives [C_/Cj]. Let x denote the Kronecker product. For L = 4, the RM (64, 42)
code has a minimal trellis corresponding to the 2-level squaring construction with a state
complexity profile (SCP) {0, 10, 10, 10,0} (Sm,_×,4(C) = 10) and trellis oriented generator
matrix
G 1 1 1 1 ) +
\
® GO/1
0 1 1 0 ® G1/2+
0 0 1 1
/1000)0 1 0 00 0 1 0 ®G2
0001
(6.7)
In order to obtain a (64, 40) subcode (_, one can delete any two of the 64 rows above giving
a generator matrix for (_. The maximum state space complexity Sm_xa(C) of the resulting
code depends on which two rows we delete. It is easy to see that in order to have the
least Sm_x,4((_) which equals 8 we must delete any two of the 4 rows among (1111) ® G0/1
obtaining an SCP of {0,8,8,8,0} (S_n_x,4(C) = 4). Using the theory developed earlier, it
can be seen that we can obtain at most 4 parallel subtrellises in any 4-section trellis for (_
without exceeding the allowable Sm_x,4 Of S. The effective computational complexity may be
computed to give A_./f(4) = 39,682 addition equivalent operations for the 4-section trellis.
15
6.2. L=8, A4=8
Let Co = (8,8,1), C, = (8,7,2) C2 = (8,4,4) C3 = (8,1,8) be Reed-Muller codes.
For L = 8, the RM (64,42,8) code has a minimal trellis (with 2 parallel subtrellises)
corresponding to the 3-1evel squaring construction with a SCP {0,7, 10, 13, 10, 13, 10,7,0}
(Smax,s(C) = 13) with trellis oriented generator matrix
1 1 1 1 I i 1 r_
1110000r_
1011010r 1
0111100r_
0001111r_
ll000000
01100000
00110000
00011000
00001100
00000110
00000011
(1
1
0
0
0
® Go/1 +
® G_/2 +
@ G2/3 + 18 ® G3 (6.8)
The (64, 40) subcode C with the best SCP is obtained by deleting the rows r ° ® G0/1 and
any one among the three rows r I ® G1/2. This code C has SCP {0,6,8,11,8,11,8,6,0}
(Sm_x.S((_) = 11). Repeating a similar analysis, it is seen that one can obtain at most 32
parallel subtrellises in any 8-section trellis for C without exceeding the maximum allowable
state space complexity of Sm_._,S((_) = ll. Each subtrellis has the SCP {0, 6, 6, 6, 3, 6, 6, 6, 0}
and from knowledge of its trellis structure the effective complexity is A_/](8) = 12,822
addition equivalent operations.
6.3. L ---- 16, M = 4
Let C0 = C1 = (4, 4, 1), C2 = (4, 3, 2) Ca = (4, 1,4) C4 = (4, 0, oo) be Reed-Muller
codes. For L = 16, the RM (64, 42, 8) code has a trellis oriented generator matrix given by
G = GRM(16,5,S) @ G1/2 + GRM{16,11,4) @ G2/3 + GRM(16,15,2 ) @ G3/4. (6.9)
where GRM(n.k.d) denotes a trellis oriented generator matrix for the (n,k,d) Reed-
Muller code. For L = 16, the RM (64,42,8) code has a minimal trellis (with
no parallel subtrellises) corresponding to the 4-level squaring construction with a SCP
{0, 4, 7, 10, 10, 13, 13, 13, 10, 13, 13, 13, 10, 10, 7, 4, 0} (Sm_xa6(C) = 13). The (64, 40) subcode
16
with the best SCP is generated by G = G-{r_®G,/_, r_®G1/2} where r I and r_ are the
two rows with span [2, 15] and [3, 14] in the trellis oriented generator matrix for RM(16, 11,4).
The SCP of the minimal trellis for C is {0,4,6,8,8,11,11,11,8,11,11,11,8,8,6,4,0}
(Smax.16((_) = 11). By analysis, one can obtain at most S parallel subtrellises in any 16-
section trellis for (_ without exceeding the allowable Smax.16((_) Of 11. Each subtrellis has
SCP {0, 4, 6, 8, 6, 8, 8, 8, 5, 8, 8, 8, 6, 8, 6, 4, 0}. The resulting effective computational complex-
ity is A_]/(16) = 23,174 addition equivalent operations.
6.4. L -- 32, M = 2 and L = 64, M = 1
When L = 32, Smax,32(C) -- 12. The maximum number of parallel isomorphic sub-
trellises possible without exceeding the allowable Sm_,32((_) = 12 in any 32-section trel-
lis for the (64,40) subcode (_ is at most 4. So A_H(32 ) >_ 37,476. When L = 64,
Sm_×,64(C) = 12. Furthermore, no parallel subtrellises are possible without exceeding the
allowable Sm_x,_4((_) = 12. Hence A_H(64 ) = 198,000.
From the above analysis, we see that the S-section trellis for the (64,40) RM subcode
results in the least effective complexity. A VLSI implementation of a high-speed decoder
for the (64,40) RM subcode is under way. The decoder is based on the S-section trellis
which is a union of 32 parallel isomorphic subtrelIises with a maximum of 64-states each. A
schematic of the subtrellis is shown in Figure 3. Note that the last 4 sections of the subtrellis
form a mirror image of the first 4 sections. This structure allows us to perform bidirectional
decoding from both ends of the subtrellis simultaneously [10, 31, 35]. Sections 1 through 4
and sections 8 to 5 (in reverse order) are processed at the same time and path information
corresponding to the most likely paths into the center S states which are the destination
states are stored. The two path metrics (one from each side) at a center state are then
added. This gives path metrics of 8 final survivors and the path with the largest path metric
is the most likely path through the subtrellis. Since the resolution is done at the center of
the subtrellis, the bottleneck of decoding caused by the large radix at the center states is
avoided. This bidirectional decoding can be achieved by either using two identical subtrellis
decoders working from both directions or using only one decoder to process the subtrellis in
a concurrent bidirectional execution sequence as shown in Figure 4. The second approach
simply exploits the use of pipelining in the ACS implementation and the mirror symmetry
of the subtrellis about the center axis. The bidirectional decoding results in advantages in
speed and implementation. A block diagram for the overall decoder is shown in Figure 5. We
further note that sections 2, 3, 6 and 7 of each subtrellis decompose into 8 parallel, 8-state,
fully connected isomorphic sub-subtrellises as depicted in Figure 3. This fact can be used to
further reduce implementation complexity and increase the decoding speed.
17
7. Conclusion
We have presented an approach for decomposing the minimal trellis of a binary linear
block code into a non-minimal trellis composed of parallel components. This approach
allows parallel processing of the subtrellises and does not increase the maximum number of
states. Hence it has significant speed advantage. In addition, it also reduces the IC area
requirements. Given a linear block code, we have estimated the limits to the benefits of
this approach and its dependence on the uniform sectionalization of the trellis. The branch
complexity of the non-minimal trellis relative to the minimal trellis can be larger in some
sections. However, this does not increase the hardware complexity. Since the application of
this method depends only on the generator matrix of the code, it can be applied to arbitrary
linear block codes.
Acknowledgement
We are grateful to Dr. Toru Fujiwara of Osaka university for providing us with the
generator matrices of some extended BCH codes with the best known order of bit positions
with respect to trellis state complexity. We are extremely grateful to Cecilia W. Chu and Eric
Nakamura of the University of Hawaii for helpful discussions relating to the VLSI aspects
of this paper. We thank the reviewers for their many helpful suggestions and constructive
criticism.
References
[1]
[2]
[31
[4]
[5]
L.R. Bahl, J. Cocke, F. Jelinek and J. Raviv, "Optimal Decoding of Linear Block Codes
for Minimizing Symbol Error Rate," IEEE Transactions on Information Theory, Vol.
20, pp. 284-287, 1974.
J.K. Wolf, "Efficient maximum likelihood decoding on linear block codes using a trellis,"
IEEE Transactions on Information Theory, Vol. 24, pp. 76-80, Jan. 1978.
J.L. Massey, "Foundation and methods of channel encoding," Proc. Intl. Conf. Infor-
mation Theory and Systems, NTG-Fachberichte, Berlin 1978.
G.D. Forney,Jr., "Coset codes - Part II: Binary lattices and related codes," IEEE
Transactions on Information Theory, Vol. 34, pp. 1152-1187, 1988.
D.J. Muder, "Minimal trellises for block codes," IEEE Transactions on Information
Theory, Vol. 34, pp. 1049-1053, Sept. 1988.
18
[6] Y. Berger and Y. Be'ery, "Bounds on the trellis size of linear block codes," IEEE
Transactions on Information Theory, Vol. 39, 1993.
[7] T. Kasami, T. Takata, T. Fujiwara and S. Lin, "On the optimum bit orders with respect
to the state complexity of trellis Diagrams of binary linear codes," I_EEE Transactions
on Information Theory, Vol. 39, Jan. 1993.
[8] T. Kasami, T. Takata, T. Fujiwara and S. Lin, "On complexity of trellis structure of
linear block codes," IEEE Transactions on Information Theory, Vol. 39, No. 3, pp.
1057-1064, May 1993.
[9] T. Kasami, T. Takata, T. Fujiwara and S. Lin, "On structural complexity of the L-
section minimal trellis diagrams for binary linear block codes," IEICE Transactions
on Fundamentals of Electronics, Communications and Computer Sciences, Vol. E76-A,
No. 9, pp. 1411-1421, Sept,. 1993.
[10] T. Kasami, T. Takata, T. Fujiwara and S. Lin, "On branch labels of parallel compo-
nents of the L-section minimal trellis diagrams for binary linear block codes," IEICE
Transactions on Fundamentals of Electronics, Communications and Computer Sciences,
Vol. E77-A, No. 6, pp. 10.58-1068, June 1994.
[11] G.D. Forney, Jr. and M.D. Trott, "The Dynamics of Group Codes: State Spaces, Trellis
Diagrams and Canonical Encoders," IEEE Transactions on Information Theory, Vol.
39, No..5, pp. 1491-1.513, Sept. 1993.
[19.] A. Vardy and Y. Be'ery, "Maximum Likelihood Soft-Decision Decoding of BCH Codes,"
IEEE Transactions on Information Theory, Vol. 40, March 1994.
[1:3] G.D. Forney, Jr., :'Dimension/Length Profiles and Trellis Complexity of Linear Block
Codes," IEEE Transactions on hflbrmation Theory, Vol. 40, No. 6, pp. 1741-1751, Nov.
1994.
[14]
[15]
G.D. Forney, Jr., "Dimension/Length Profiles and Trellis Complexity of Lattices,"
IEEE Transactions on Information Theory, Vol. 40, No. 7, pp. 1753-1772, Nov. 1994.
G.D. Forney, Jr., :'Trellises old and new," Communications and Cryptography, Edited
by R.E. Blahut and D..J. Cost, ello, U. Maurer and T. Mittelholzer, pp. 115-128, Kluwer
Academic Publishers, 1994.
[16] Hari T. Moorthy and S. Lin, "On the Labeling of Minimal Trellises for Linear Block
Codes," Proceedings of the International Symposium on Information Theory and Its
Applications 1994, Vol. 1, pp..33-38, Institution of Engineers, Australia.
19
[17]
[lS]
[19]
[2o]
[9_1]
[29_]
[9_3]
[24]
[2.5]
[_96]
[27]
A. Lafourcade and A. Vardy, "Asymptotically Good Codes have Infinite Trellis Com-
plexity," IEEE Transactions on Information Theory, Vol. 41, No. 2, pp. 555-559. March
1995.
O. Ytrehus, "On the trellis complexity of linear block codes," IEEE Transactions on
Information Theory, Vol. 41, No. 2, pp. 559-560, March 1995.
Y. Berger and Y. Be'ery, "Trellis-Oriented Decomposition and Trellis Complexity of
Composite-Length Cyclic Codes," IEEE Transactions on Information Theory, Vol. 41,
No. 5, pp. 118.%1191, July 1995.
F.R. Kschischang, and V. Sorokine, "On the trellis structure of block codes," IEEE
Transactions on Information Theory, Vol. 41, No. 6, pp. 1924-19.37, Nov. 1995.
A. Lafourcade and A. Vardy, "Lower bounds on trellis complexity of block codes,"
IEEE Transactions on Information Theory, Vol. 41, No. 6, pp. 1924-19.37, Nov. 1995.
A. Lafourcade and A. Vardy, "Optimal Sectionalization of a trellis," submitted to IEEE
Transactions on Information Theory, 199,5, to appear.
R.J. McEliece, _'On the BCJR Trellis for Linear Block Codes," submitted to IEEE
Transactions on Information Theory, 1995.
T. Fujiwara, H. Yamamoto, T. Kasami and S. Lin, "A Recursive Maximum Likelihood
Decoding Procedure for a Linear Block Code Using a Sectionalized Trellis Diagram and
Its Optimization," (invited paper) Proceedings of The Thirty-Third Annual Allerton
Conference on Communication, Control and Computing, Allerton House, Monitcello,
Illinois, Oct. 4-6, 199,5, also submitted to IEEE Transactions on Information Theory,
special issue on Codes and Complexity, 1995.
, T. Fujiwara, T. Kasami, R.M. Zaragoza and S. Lin, "The State Complexity of Trel-
lis Diagrams for a Class of Generalized Concatenated Codes," submitted to IEEE
Transactions on Information Theory, 1994. (in revision).
P.J. Black and T.H. Meng, "A 140-Mb/s, .32-State, Radix-4 Viterbi Decoder," IEEE
Journal of Solid-State Circuits, Vol. 27, Dec. 1992.
P.G. Gulak and T. Kailath, "Locally Connected VLSI Architectures for the Viterbi
Algorithm," IEEE Journal on Selected Areas in Communications, Vol. 6. pp. 526-537,
April 1988.
2O
[2s]
[29]
[a0]
[all
[32]
[aa]
[a4]
[as]
O. bl. Collins, "The Subtleties and Intricacies of Building a Constraint Length 15
Convolutional Decoder," IEEE Transactions on Communications, \ioi. 40, No.12, pp.
1810-1819, Dec. 1992.
B.S. Vishwanath, "Soft-Decision Viterbi Decoding of the (32,16) Reed-Muller Code
and Its VLSI Implementation," M.S. Thesis, Department of Electrical Engineering,
University of Hawaii at Manoa, Aug. 1993.
G. Fettweis and H. Meyr, "Parallel Viterbi Algorithm Implementation: Breaking the
ACS-Bottleneck," IEEE Transactions on Communications, Vol. :37, pp. 785-789. Aug.
1989.
S. Lin, G. T. Uehara, E. Nakamura and W. P. Chu, "Circuit Design Approaches for
Implementation of a Subtrellis [C for Reed-Muller subcode ," NASA Technical Report
No. 96-001, February 1996.
H. Thapar and J. Cioffi, "A block processing method for designing high-speed Viterbi
detectors," Proceedings of the ICC, Vol. 2. pp. 1096-1100, June 1989.
A. K. Yeung and J. M. Rabaey, "A 210 Mb/s Radix-4 Bit-level Pipelined Viterbi
Decoder," ISSCC 199.5 Digest of Technical Papers, San Francisco, CA Feb. 199.5.
S. Lin and D.J. Costello,
Prentice-Hall, 1983.
"Error Control Coding: Fundamentals and Applications,"
M. Fossorier and S. Lin,
submitted to IEEE Transactions on Communications,
1995).
"Coset Codes Viewed as Terminated Convolutional Codes,"
February 1995 (revised June
21
Table 1: Set of row spansof trellis orientedgeneratormatrix of (32,21,6) extendedand
permuted BCH code
row-# span row-# span
1 [1,8] 12 [12,26]
2 [2,,.5] 13 [13,201
3 [3,131 14 [14,221
4 [4,14] 15 [15,271
5 [5,12] 16 [17,24]
6 [6,181 17 [18,311
7 [7,21] 18 [19,29]
8 [8,251 19 [20,301
9 [9,16] 20 [21,28]
10 [10,231 21 [25,32]
11 [11,19]
Table 2: Parameters of 4-section trellis of (32, 21,6) extended and permuted BCH code
i 0 1 2 3 4
SCP 0 7 9 7 0
CBP 0 4 6 7
EBP 7 6 4 0
i =ACS-# Connectivity of ACS-i
0 128
1-511 64
Table 3: Parameters of 4-section trellis of (32, 16) subcode of the (32, 21,6) extended and
permuted BCH code
i 0 1 2 3 4
SCP 0 4 4 4 0
CBP 0 4 4 4
EBP 4 4 4 0
i =ACS-# Connectivity of ACS-i
0 - 511 16
22
Table 4: Trellisesfor all I{.%Iand BCt{ codesof lengths 32,64
No of SectionsL
1 RM(3_9,6,16)
2 BCH(32,11,12)
3 RM(32,16,8)
4 BCH(32,21,6)
5 RM(32,26,4)
6 RM(64,7,32)
7 RM(64,10,28)
8 BCH(64,16,24)
9 BCH(64,18,22)
10 RM(64,22,16)
Ii BCH(64,24,16)
12 BCH(64,30,14)
13 BCH(64,36,12)
14 BCH(64,39,10)
15 RM(64,42,8)
16 BCH(64,45,8)
17 BCH(64,51,6)
18 RM(64,57,4)
em_..L(r)
Sma×,L(C')
em_x,L(T)
Smax,L (C /)
Po,_,L(T)
3max,L(C')
Pm_x,L(7)
Pmax,L(T)
$max,L (C')
Pmax,L (T )
,Smax,L (C / )
em_x,L(:r)
Sm_x,L(C')
Bronx.tiT)
8ma×,L (C I )
_m_,dC')
Pm_x,L(T)
a.._x,L(C')
Pma×,L ( T )
ar._,L(C')
Pma×,L(T)
Smax,L(C')
,Sma×,L ( C" )
em...,L(Z)
,Smax,L (C I )
Pm.x.d7")
_m_,L(C')
3,,_, r.(C')
64 32 16 8 4 2
4 4 4 3 4
i 1 i i 0
9 9 9 7
I i I 2
5 4 5 3
4 4 3 3
2 4 4 5
8 6 6 4
i i 2
4 4 3
5 5 5 5 4 5
i 1 1 i I 0
10 10 10 10 10 10
0 0 0 0 0 0
14 14 14 12 13 14
1 1 1 2 1 0
16 16 16 14 16 16
1 1 1 2 2 2
9 9 8 9 6
5 5 5 4 4
II 11 I0 ii 8
5 5 5 4 4
15 13 14 11 14
6 7 6 7 4
10 9 10 9 8
10 10 9 8 8
7 8 9 10 11
13 12 11 9 8
5 6 5 7
9 8 8 6
2 3 4 4
12 11 10 9
1 1 1 2
11 11 11 10
1 1 1 2
5 5 5 4
23
I I I I 1 1 I I I I l I l i I [ [ 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 1 0 1 I 0 I 0 I 0 0 l 0 1 l 0 I I 0 0 i I 0 O 0 0 0 0 0 0 0 0
0 0 I 1 i 0 0 I I I 0 0 1 0 0 I. 0 I 0 I 0 0 0 0 I 0 I 0 0 0 0 0
0 0 0 1 1 l 1 0 0 0 l O 1 1 0 [ 0 1 0 0 0 I 0 0 l 0 0 0 I 0 0 0
0 0 0 0 1 1 I 1 l 0 l 0 l 0 I 0 0 1 0 I 1 0 I 0 0 0 0 0 0 0 0 0
0 0 0 0 0 1 I 0 0 0 0 0 0 I I 0 I 0 1 0 1 I 0 0 l 0 ! 0 I 1 0 0
0 0 0 0 0 0 I i 0 ! (} 1 0 i I 0 0 I i 0 i _ ! 0 l ! 0 0 0 0 0 0
0 0 0 0 0 0 0 0 I t l [ l I I l I I i i i I I I 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 t 0 t I 0 I 0 0 1 0 l 0 I 0 I I i l 1 0 0 0 0
O 0 0 0 0 0 0 0 0 0 1 I 0 0 I I 0 I I 0 l 0 0 I 0 l 0 I l 0 l 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 l l l 1 I l l I 1 l I 1 1
Figure l: Parity check matrix in trellis oriented form of the ex-BCH (32,21,6) code with an
optimum order of bits with respect to trellis state complexity.
Figure 2: A 4-section, 64-s_ate trellis for the (.32, 16, 8) RM code composed of 8 parallel and
isomorphic S-state subtrellises
24
Section" 1 2 3 4 5 6 7 8
t,O
cJ_
Source
q o
Destination
N o. States: 64 64 64 8 64 64 64
RADIX: 8 8 8 64 8 8 8
Figure 3: An 8-secth,n, (i4-state subtrellis for the (64, 35, 8) subcode of the (64,40,8) l{IVl
s_ll>code
Sequence for Decoding time
[See.11Sec.81 S_c.21 Sec.71Sec.31 Sec.61 Sec.41Sec.5I CombineandnesolveI
Figure 4: Sequence for decoding using concurrent bi-directional execution sequence
RECEIVED VECTOR
O.-- ---I1_
SIGN C-'HA NO F.,R
FOR
COSET -0
SIGN CHANGER
FOR
COSET-I
vrrE_[ .'_rRl I =iDECODER _-_
FOR CODE ISUBTRELLIS WORD V. -
VTI'ERB[ METRI_ -_
DECODER CODE L,
FOR W° llSUBTRELLIS
SIGN CHANGER
FOR
COSET-31
V1"TERB[
DECODER
FOR
SUBTRELLIS
r
FINAL
OUTPUT
COD_,VORD
BEST OF
32
COMPARATOR
AND
RESOLV'ER
ADDRESS
INPUT
CODEWORD
D_-_UL_
Figure ,5: Block diagram of overall decoder with 32 Viterbi decoders
26


