A parametric study of the complexity of sequential decoders, volume 1 by Huth, G. K.
C <? i J -r v» /
^ .CORY
Axiomatix
Marina del Rey • California
https://ntrs.nasa.gov/search.jsp?R=19720010543 2020-03-11T18:42:13+00:00Z
A PARAMETRIC STUDY OF THE
COMPLEXITY OF SEQUENTIAL DECODERS
FINAL REPORT
VOLUME I
Contract No.: NAS 9-12091
Prepared by
Gay lord K. Huth
Axiomatix
13900 Panay Way, Suite 110M
Marina del Rey, California 90291
Prepared for
National Aeronautics and Space Administration
Manned Spacecraft Center
Houston, Texas 77058
Axiomatix Report No. R7201-1
27 January 1972
TABLE OF CONTENTS
Page
VOLUME I
LIST OF FIGURES iv
LIST OF TABLES vii
SECTION 1.0 INTRODUCTION 1
SECTION 2. 0 CONVOLUTIONAL CODE STRUCTURE . . . . 3
SECTION 3. 0 VITERBI MAXIMUM LIKELIHOOD DECODER 16
3. 1 Description of Decoding Algorithm 21
3. 2 Performance of a Transparent Code Versus ,
a Nontransparent Code 36
3.3 Complexity of the Viterbi Decoder 39
3.3.1 Branch Metric Computation 40
3.3.2 Arithmetic Unit Complexity 44
3.3.3 Path Memory Storage 50
3.4 Performance Versus Complexity 52
VOLUME II
SECTION 4.0 SEQUENTIAL DECODING 59
4. 1 Sequential Decoding as a Tree Searching
Algorithm 59
4. 2 Performance Parameters 69
4.2.1 Probability of Undetected Error 73
4. 2. 2 Computations Distribution 74
4. 2. 3 Backsearch Distribution 82
11
Page
4.2.4 Degradation Due to Metric Table Size . . 89
4; 3 Decoding Resynchronization ' . . 91
4.4 Complexity of the Sequential Decoder 104
4.4.1 Buffer Complexity 110
4.4.2 Decoding Resynchronization Complexity 111
4.5 Performance Versus Complexity 120
SECTION 5. 0 VITERBI AND SEQUENTIAL DECODER
PERFORMANCE VERSUS COMPLEXITY . . . . 123
SECTION 6.0 AREAS FOR FURTHER STUDY . . . . . . . . 128
REFERENCES 131
APPENDIX I TRANSFER FUNCTIONS OF CONVOLU-
TIONAL CODES 1-1
APPENDIX II TECHNIQUES OF BRANCH AND REFER-
ENCE PHASE SYNCHRONIZATION II-l
111
LIST OF FIGURES
Figure Page
VOLUME I
2. 1 Example Convolutional Encoder and State Diagram ... 4
{
2. 2 Tree Diagram of Convolutional Encoder of Figure 2 . 1 . 7
2. 3 Trellis Diagram for a K=3 Binary Convolutional Code
Whose Path Length is T Branches 9
2.4 Rate 1/n Convolutional Code Encoder and State
Diagram 13
2. 5 Example Encoder and State Diagram 15
3. 1 An Example of State Transitions of a Convolutional
Code . • 22
3.2 Flow Chart of Viterbi Decoder Algorithm 26
3.3 State Transition Table for Example 28
3.4 Present and Future Conditions at Startup 28
3.5 Trellis Diagram After One Received Branch 30
3.6 Present and Future Conditions after M=l6 Input Bits . . 31
3.7 Trellis Diagram After Two Received Branches 32
3.8 Present and Future Conditions after Three Input Bits . 34
3.9 Trellis Diagram After 16 Received Branches 35
3.10 Modified State Diagram for the Example Convolutional
Code 38
3.11 Comparison of Complexity Versus Performance at
Output Probability of Error Per Bit of 10"4 58
IV
Figure Page
VOLUME II
4. 1 Logical Flow Chart of the Sequential Decoding
Algorithm 63
4. 2 Functional Block Diagram of a Practical Sequential
Decoder . 70
4.3 Computations Distribution for Rate One-Half Hard
Decision Sequential Decoder with K=32 ; . 78
V
4.4 Computations Distribution for Rate One-Half Three-
Bit Quantization Sequential Decoder with K=44 . . . . 79
4. 5 Computations Distribution for Rate One-Third Hard
Decision Sequential Decoder with K=23 . 80
4.6 Computations Distribution for Rate One-Third Three-
Bit Quantization Sequential Decoder with K=23 . . . . 81
4. 7 Backsearch Distribution for Rate One-Half Hard
Decision Sequential Decoder with K=32 85
4.8 Backsearch Distribution for Rate One-Half Three-
Bit Quantization Sequential Decoder with K=44 . . . . 86
4.9 Backsearch Distribution for Rate 1/3 Hard Decision
Sequential Decoder with K=23 87
4.10 Backsearch Distribution for Rate 1/3 Three-Bit
Quantization Sequential Decoder with K=23 88
4. 11 Density of Errors in Initial Hypothesis for a Given
Re synchronization 97
4. 12 Detailed Sequential Decoder Algorithm Logic Flow
Diagram 105
4. 13 Sequential Decoder Performance Using Block Resyn-
chronization with Code Rate R and Q Bits of
Quantization 113
Figure . Page
4. 14 Comparison of the Performance of Sequential
Decoding Using Block Resynchronization and
Statistical Resynchronization 115
5. 1 Comparison of Complexity Versus Performance
of Viterbi Decoding and Sequential Decoding with
Code Rate 1/2 and Output Probability of Error
per Bit of 10'4 124
5. 2 Comparison of Complexity Versus Performance
of Viterbi Decoding and Sequential Decoding with
Code Rate 1/3 and Output Probability of Error
per Bit of 1CT4 125
II-1 Quadriphase Demodulation II-5
VI
LIST OF TABLES
Table Page
VOLUME I
2. 1 Modulo-2 Addition 5
3. 1 State Metrics for States VO and VI . . . • 45
3. 2 Complexity of Viterbi Maximum Likelihood Decoder
for Code Rate 1/2 . . . 54
3. 3 Complexity of Viterbi Maximum Likelihood Decoder
for Code Rate 1/3 55
VOLUME II
4. 1 Probability of Quantization Assignment 60
4. 2 Example of the Sequential Decoding Algorithm 65
4.3 Sequential Decoder Undetected Error Probability . . . . 74
4.4 Measured Computations Distribution Parameters . . . . 76
4. 5 Measured Backsearch Distribution Parameters 83
4. 6 Performance Versus Branch Metric Quantization ... 90
4. 7 Measured Probability of Resynchronization 96
4.8 Buffer Complexity Versus Branch Metric Quantization . 112
4. 9 Performance of Sequential Decoding Using Block
Resynchronization at Pe = 10"* 114
4. 10 Performance of Sequential Decoding Using Statistical
Re synchronization at P0 = 10~4 117C
4. 11 Overall Complexity of Sequential Decoding for Various
Data Rates at Output Probability of Error per Bit
of 10'4 121
VI1
SECTION 1. 0
INTRODUCTION
This study has examined the complexity and performance of
practical sequential decoders. Upon comparing sequential decoders
with Viterbi decoders, the study found that sequential decoders outper-
form Viterbi decoders at low data rates (19. 2-76. 8 kbps) for the same
complexity. However, Viterbi decoders are less complex than sequential
decoders at high data rates (9. 1 Mbps) for the same performance.
Section 5. 0 presents plots of complexity versus the required signal
energy per bit/single-sided noise spectral density (E^/NQ) to obtain
an output probability of error per bit of 10" for an ideal coherent
PSK demodulation. For code rate 1/2 and hard decisions on the demod-
ulated symbols and the same complexity, the sequential decoder requires
1. 3 dB less Eb/NQ than the Viterbi decoder at 19.2 kbps data rate. For
rate 1/2 and hard decisions, even for the high 9. 1 Mbps data rate, the
sequential decoder requires 0. 5 dB less E, /N than a Viterbi decoder
of the same complexity. For rate 1/2 and three-bit quantization on
the demodulated symbols, the sequential decoder requires 0. 8 dB to
0. 65 dB less E j / N Q than the Viterbi decoder of the same complexity
for data rates 19. 2 kbps to 76. 8 kbps. However, for the 9. 1 Mbps
data rate, the Viterbi decoder only requires half the complexity to
achieve the same performance as the sequential decoder.
Alternately, it is found that, for three-bit quantization, a sequential
decoder with code rate 1/2 requires 0.45 dB to 0.3 dB less E^/NQ than
a Viterbi decoder of the same complexity with code rate 1/3 for data
rates from 19. 2 kbps to 76.8 kbps. To obtain the same performance
using a Viterbi decoder with rate 1/3, five to three times the complexity
is required for data rates from 19.2 kbps to 76. 8 kbps. However, at
this complexity, the sequential decoder with code rate 1/3 requires 0.5
dB less E, /N than the Viterbi decoder with code rate 1/3.
To obtain these results, Section 2.0 presents fundamental pro-
perties of convolutional codes. Section 3. 0 presents properties of the
Viterbi decoder, beginning with a description of the decoding algorithm,
then finding the complexity of the decoder and comparing it to the per-
formance results obtained by computer simulation. Sequential decoding
is discussed in Section 4. 0 by first describing the algorithm and then
presenting the important performance and design parameters. The com-
plexity of the decoder is determined and compared to computer simu-
lated performance in Section 5. 0. Section 6. 0 concludes the report
with areas of further study which deal with the interfaces between the
demodulation and the decoding and between .the decoding and reconstruc-
tion of the data from data compression techniques.
SECTION 2. 0
CONVOLUT.IONAL CODE STRUCTURE
In the simplest context, a convolutional code is the output of
a finite-state machine consisting of a K-stage shift register .(memory
length is K) and n linear algebraic functions of the memory K. The
input data to the convolutional encoder is normally binary but this
is not necessary. The data is shifted into the register v bits at a
time, giving a code rate of v /n . The codes of most interest are for
v = I and most of the discussions -will involve these codes. The
optimum decoder for convolutional codes is the Viterbi maximum like-
lihood decoder. The Viterbi decoder is simpler than a corresponding
block decoder giving identical performance in terms of probability of
error per bit. However, here again, the complexity of the Viterbi
decoder increases exponentially -with the memory length of the convolu-
tional code, while the probability of error decreases exponentially
•with the memory length. Sequential decoding of convolutional codes is
a sub-optimum decoder. However, its complexity is almost independent
of the memory length of the convolutional code. Hence, under certain
circumstances, the complexity to achieve a design probability of error
per bit is much less using a sequential decoder than a Viterbi maximum
likelihood decoder.
The properties of convolutional codes can be presented best by
an example. Figure 2. i illustrates an encoder for a convolutional code
INPUT
SEQUENCE
d=OII
a=000
II
e = I O O
9=110
Figure 2. 1 Example Convolutional Encoder and State Diagram
of code rate R = 1/2 and memory length K =' 3. The summing device
is an "exclusive or" (modulo-2 addition) as defined in Table 2.1. Thus,
zero plus zero equals zero and one plus one equals zero. Otherwise,
the result of addition is one. After each input bit, the output lines
Table 2. 1 Modulo-2 Addition
+
0
1
0
0
1
1
1
0
marked 1 and 2 in Figure 2. 1 are sampled sequentially and transmitted
successively over a single channel or over separate channels. Hence,
for each input information bit, two binary symbols are output for trans-
mission over the channel. When the output of the second adder is sampled
and transmitted, the content of the encoder is shifted and the input infor-
mation bit is shifted into the encoder with a new information bit being
present on the input sequence line. Hence, the output symbols are a
function of the last K information bits and the present input information
bit on the input sequence line. If the content of the encoder is defined
as the state of the encoder, then there are 2 possible states, and the
input information bit determines the transition between states. For
example, if the state of the encoder is 001 and the input information bit
is 1, then the output of the encoder is 00 and, after the shift of the encoder,
the new encoder state is Oi l . Therefore, the 2 possible states can
be represented on a state diagram with the transitions corresponding
to an input information bit of 0 or 1. The state diagram corresponding
to the encoder is presented in Figure 2. 1. The output symbols are
placed on the transition corresponding to the input information bit
which produces the particular output. Hence, in the example, the
transition b_ - 001 to d_ = Oil has output symbols 00 corresponding,to
the input information bit of 1 for this transition. In this sense, a
sequence of input bits represents a path in the state diagram, and the
symbols on the path represent the output code word transmitted. For
example, the input sequence 1001 starting with encoder in the zero
state a_ corresponds to the path from a_ to b_ to c_ to e_ to b and the output
code word of 11110100.
An alternate representation of the code words of a convolutional
code is a tree diagram or code tree. The tree diagram of the encoder
presented in Figure 2. 1 is illustrated in Figure 2. 2. In the tree diagram,
if the first input bit is 0, then the corresponding output symbols 00 are-
shown on the first upper branch, while if the first input is 1, then the
corresponding output symbols 11 are shown on the first lower branch.
Similarly, if the second input bit is 0, the output symbols of the code
word correspond to the next upper branch. Hence, each node in the tree
represents a state of the encoder with the upper branch emanating from
the node corresponding to an input bit of 0 and the lower branch corre-
sponding to an input bit of 1. Therefore, for any input sequence, there
)
f
r
00
a
n
00
a
n
n
b
00
00
<3-
II
II
b
00
01
c
10
10
d
01
a
il
1 1
b
00
01
c
10
10
01
II
e
00
00
f II
to
n
9
 01
01
h
K>
d
!
»
-
d
f
<?
h
J
b
c
«
e
f
9
h
II
II
00.
01
10
10
01
II
00
00
II
10
01
01
10
00
II
II
00
01
10
10
01
II
00
00
II
10
01
01
10
b
d
f
h
b
d
f
h
b
r
H
Q
f
h
b
d
f
h
Figure Tree Diagram of Coiivolu.tioiia.1 Encoder GJ. ->. igure 2. 1
exists a collection of branches called a path, and the symbols on the
branches along the path represent the code word. For the encoder in
Figure 2.1, the nodes of the tree diagram in Figure 2. 2 are labeled
with states of the encoder. By the fourth input bit (K + 1 - 4), all the
states appear twice at that level in the tree diagram. In the state
diagram, by the fourth input bit, there are two paths that begin at the
zero state and end at each state. For example, the input sequences
1011 and 0011 each represent a path that ends at state d_ = O i l . The
first input sequence corresponds to the path from a_ to b_ to c_ to f_to d_,
while the second input sequence corresponds to the path from a_ to a_ to
a_ to b to d_. Furthermore, in the state diagram and the tree diagram,
the extension of either of these two paths after the fourth input bit will
lead to the same output symbols for a given sequence of input bits ,of
tile extension. Therefore, the paths are said to have remerged, which
is easily seen in the state diagram. This observation prompted Forney
to propose the trellis diagram in which the two identical states of the
tree diagram at the K+l input bit are joined. Figure 2. 3 presents the
trellis diagram for the example in Figure 2. 1. The trellis diagram
illustrates the fact that, after the Kth input bit, each state has two paths
leading to it.
An important subclass of convolutional codes is called "trans-
parent codes. " Transparent codes, in conjunction with differential
coding, eliminate the 180 degree phase ambiguity of biphase modulation
o
o
o
o
o
o
II
o
o
o
o
II
t-H
o
bO
O
I I I I I I I I 0a
o
rt
fi
O
•l-H
-l->
3
I 1
O
C ra
O CD
^
"Si
<ti C
v. ^
O ^
"^ rC
S -Urt
ni co
Tj OQ x
CO ^
oo
<N
I)
IH
3
GO
J-O
10
and reduce the fourfold 90 degree phase ambiguity of quadriphase mod-
ulation to a 90 degree phase ambiguity. Also, the degradation in per-
formance from ideal using this technique is small, typically about 0. 1
dB for sequential decoding and 0. 2 dB for Viterbi decoding with memory
length K = 3 at output probability of error per bit of 10" . Thus, because
of their importance, the properties of transparent codes are presented.
Differential coding is commonly used to deal with the 180 degree
phase ambiguity of PSK demodulation. Let z- be the ith_ binary symbol
out of the differential encoder and let yj be the ith_ binary symbol into
the differential encoder; then:
«i = Yi + Zi-1 ( 2 . D
where the addition is modulo-2. Thus, z ^ _ j is the "reference" for y^.
Let r^ be the received binary symbol; then:
i- =
 Zi + 6i + Qi ( 2 . 2 )
where e^ is the channel noise, equal to one or zero, and 9^ represents
the reference phase ambiguity of the demodulator, which is equal to one
or zero. The differential decoder output is Wj and is given by:
wi = ri + ri-l
The output of the differential decoder, w^, is equal to the corresponding
input to the differential encoder, y-, corrupted by the sum of two noises
and the sum of two phase ambiguities. If no phase transition of the
11
demodulator occurred at time i-1, 0- + 6. - 0 and the phase ambiguity
is eliminated.
Differential encoding does increase the probability of error. In
the absence of a phase transition:
wi = Yi + (ei + e i - l ) < 2 - 4 >
The probability of an error out of the differential decoder, assuming
independent channel errors, is:
p' = p[(ei + 6^) = l] = 2p(l-p)
p' = 2p ( 2 . 5 )
If errors occur in bursts, as is true out of a Viterbi or a sequential
decoder, then e^ + e ^ _ j = 0 -when adjacent symbols are in error, and
the probability of error out of the differential decoder is much less
than 2p.
A code is transparent to phase ambiguity if, when x = (x^,^, . . .
x.:, . . . ) is a code word, then its complement x = (xi +1, xo + 1, . . . ,
x- + l, . . . ) is also a code word. For such a code, if the received vector
y = (Yi > YZ* • • • » y-» • • • ) i-s decoded into x, then the decoder will decode
the complement y" = (y i-l.y^+l, . . . ,y^+l , . . . ) into x. Now, if this
transparent encoder and decoder are placed between the differential
encoder and decoder, then the phase ambiguity 0. , and 0^ will be
passed through the transparent decoder to the differential decoder,
whereupon the differential decoder can eliminate the ambiguity. The
probability of error out of the differential decoder can, at most, double
12
the error rate out of the transparent decoder, but this is much less
than the channel error rate and little degradation results.
In Figure 2.4, a rate 1/2 feed forward canonical convolutional
encoder of K=3 stages and its state diagram are presented as an example.
The code symbol, due to the a^ generator assigned to each transition, is
the modulo-2 addition of the indicated summations over the a^'s on each
transition. Similarly, the other n-1 generators (the b^'s in this case)
for a code rate 1/n assign code symbols to the transitions. Hence,
each transition has a n-dimensional vector of code symbols assigned
to it.
For a code to be transparent, the complement of any code word
must also be a code word. Therefore, since the all-zeroes vector is
a code word of all codes, then the all-ones vector must be a code word
of a transparent code. For the all-ones vector to be a code -word, the
self loop around the all-ones state must have a one assigned to it by
each generator. The symbol assigned to the self loop around the all-
ones state is the modulo-2 sum of the taps of the generator, as seen in
Figure 2.4. Thus, each of the n generators of a transparent code must
have an odd number of taps. Finally, an odd number of taps for all the
generators is a sufficient condition for a transparent code, as will be
illustrated with an example, since convolutional codes are group codes.
To illustrate the structure of a transparent convolutional code
and the transfer function, consider the example with binary input data
13
INPUT
b=OOI
°i+do
d=OII
3+°2+ai
Figure 2. 4 Rate 1/n Convolutional Code Encoder and State Diagram
14
K 3 and 11 <£, proHrnted in F i g u r e 2.5. Suppose the a l l - / , < ; rocs code
word was transmitted and a phase transition occurs so that the received
sequence is . . . 00 00 00 11 11 11 . . . , the closest (least number of dis-
agreements) possible code word is . . . 0 0 0 0 0 0 1 1 1 0 0 0 1 1 1 1 1 1 . . . ,
which differs from the received sequence in three positions. The input
sequence corresponding to this code word is ... 000111. . . and the output
of the differential decoder is ... 0001000. . . . Thus, only one error
occurs in the data at the output of the differential decoder due to the
phase transition. Since convolutional codes are group codes, this is
equally true of any other code word. For example, consider the input
sequence . . . 10101010.. . to the convolutional encoder resulting in the
code word ... 01 10 01 10 01 10. . . . Suppose the received sequence was
... 01 10 01 01 10 01 10 01 10. ... The closest code word is ... 01 10 01
01 11 10 10 01 1 0 . . . , which differs in three positions. The-input
Sequence corresponding to this code word is ... 101101010 . . . and the
output of the differential decoder is ... 11101111 .... Thus, the output
of the differential decoder again only contains a single error due to the .
phase transition.
15
^ y
b = 001
d = Oil
e = 100
01
g = MO
Figure 2. 5 Example Encoder and State Diagram
SECTION 3. 0
VITERBI MAXIMUM LIKELIHOOD DECODER
A scheme for decoding convolutional codes has been proposed
by Viterbi, which was subsequently shown to be synonymous with
the maximum likelihood decoding by Forney. If the input data stream
consists of a sequence of statistically independent equally likely sym-
bols, then this is the optimal decoding procedure under the criterion
of minimization of probability of error over the memoryless channel.
If the transmission channel is memoryless, then the channel
errors occur independently from channel symbol to channel symbol.
In the case of a binary symmetric channel (BSC) corresponding to hard
decision demodulation, a channel error transforms a channel symbol from
0 to 1 or from 1 to 0, and channel errors occur independently from symbol
to symbol with probability p if the channel is memoryless. For quantized
demodulation of a memoryless channel, the quantized value of the
received symbol is independent of other received symbols. In the
additive white Gaussian noise (AWGN) channel, the probability of a
given quantized value of a received symbol can be computed from the
Gaussian probability density.
If the input (information) sequences to the encoding device of any
code are equally likely, the decoder which minimizes the overall error
probability is one which examines the error-corrupted quantized received
16.
17
sequence and chooses the data sequence corresponding to the trans-
mitted code sequence which has the greatest probability of being
transmitted (i. e. , most likely sequence). In the case of the BSC,
this corresponds to choosing the data sequence of the transmitted
code sequence closest to the received sequence in the sense of Ham-
ming distance, that is, the transmitted sequence which differs from
the received sequence in the least number of positions. For quantized
demodulation of the AWGN channel, this corresponds to choosing the
data sequence j x- J of the transmitted code sequence { y^i } which
minimizes the interproduct:
r n
i = l j = l 1J 1J
with respect to the quantized received sequence j r— j , where the code
rate is 1/n and the message length is P. The received sequence, r. .,
is quantized to Q bits corresponding to numbers 0 to 2 - 1 , such that
0 is a transmitted zero with the highest probability and 2^-1 is a trans-
mitted one with the highest probability. The transmitted code symbol,
y^j, is represented by Q zeroes for a transmitted zero and Q ones for
a transmitted one. In this case, the digital implementation of equation
(3.1) is to exclusive OR each of the Q bits of r— with the Q bits of y. -
and to sum the resulting binary numbers to form the interproduct to be
minimized. For example, with three bit quantization and the quantized
received sequence { 000, 101, Oi l , 100 }, the interproduct for the
18
t ransmi t ted code sequence; { 0, 1, 0, 1 j is { 0, 2, 3, 3 } ~ 8, which
is the minimum interproduct over all possible four-bit sequences.
To define the maximum likelihood decoder for convolutional
codes, the example presented in Figure 2. 1 to illustrate the convolu-
tional code structure will be used. Referring to the trellis diagram
in Figure 2. 3, it is noted that, by the K+l input bit, there are two
Tf
paths that begin at the all-zeroes state and end at each of the 2
possible states. Of these two paths, only the path that minimizes the
interproduct in equation (3. 1) need be retained. For no matter what
the subsequent received symbols may be, they will affect each of these
paths in exactly the same way. Therefore, the paths are said to be
remerged.
After each received branch of received symbols, comparisons of
the two paths remerging at each state are compared to find the path that
minimizes the interproduct in equation (3. 1). The two possible paths
were the survivors at the previous input bit for different states. For
example, considering the BSC and the trellis diagram in Figure 2. 3, the
comparison at state "b" after the fifth received branch is between the
survivors at states "a" and "e" after the fourth received branch. After
the fourth received branch, if the received sequence is 11 11 10 00,
then the survivor at state "a" is the code sequence 11 11 01 11 corre-
sponding to the data sequence 1000 and is at Hamming distance 3 from
the received sequence. The survivor at state "e" for this received
19
sequence is the code sequence 11 00 10 10 corresponding to the data
sequence 1100 and is also at Hamming distance 3 from the received
sequence. If the next branch of the received sequence is 11 so that
the received sequence becomes 11 11 10 00 11 after the fifth received
branch, the surviving path from state "a" and the surviving path from
state "e" merge at state "b". The survivor after the fifth received
branch at state "b" is the path coming from state "a" which remains
at Hamming distance 3 from the received sequence. In this way, the
decoder may proceed through the received sequence. After each
branch has been received, one surviving path and its distance from
the received sequence, which is more generally called its metric, is
stored. This decision-making process and path elimination procedure
is performed for each state after each branch is received. This is the
untruncated version of the maximum likelihood (Viterbi) decoding
algorithm. If all surviving paths coincide over some branch in the
past, then those bits can be output as the final decision of the decoder
for that branch.
The only difficulty which may arise is the possibility that, in a
given comparison between remerging paths, the distances or metrics
are equal. Then a random selection of one of the equal distance paths
may be made as is done for block code words at equal distance paths
from the received word. If both paths are preserved, further received
20
symbols will affect both metrics in exactly the same way. Therefore,
the random choice would eventually have to be made.
There remains only the question of terminating and truncating
the algorithm and ultimately decoding on one path rather than 2^ (equal
to 8 in the example). Termination is easily carried out by forcing the
last K input bits of the input sequence to be zeroes. Then the final state
of the code necessarily is state "a", and consequently the ultimate sur-
vivor is the survivor at state "a", after the insertion into the encoder
of K dummy zeroes and transmission of the resulting nK code symbols.
In terms of the trellis diagram, this means that the number of states is
reduced from 2K to 2 by the insertion of the first zero, from 2
K-2
to 2 by the insertion of the second zero, etc. , until the Kth zero
reduces the number of states from 2 to 1. This is shown for K - 3 in
the right side of Figure 2.3.
TC
For each of the 2 convolutional encoder states, there is a basic
storage requirement in the decoder. First of all, each state has associ-
ated with it a metric. For a BSC, the metric may simply be the cumula-
tive Hamming distance on the path leading to that state. The question
then is how many bits of storage are required for each metric. Since
any cumulative metric will eventually overflow the storage provided for
it, some mechanism must be provided for re-normalizing the metrics
from time to time. Enough metric storage must be provided so that,
when the largest metric overflows and is re-normalized, the lowest
Zl
metric should not underflow. The storage required depends strongly
on the code used.
In addition to the metric storage, for each coder state, storage
must be provided to keep track of the bits on the path leading up to
that state. Ideally, in a Viterbi decoder, paths of great length should
be stored, enabling the decoder to delay bit decision until all state
paths agree on the bit in question. Practically speaking, the shorter
the path storage needed, the cheaper the decoder. The number of
memory lengths in a path to be stored will be illustrated by computer
simulations relating performance to path storage. It is possible that,
for the path storage length, not all of the paths will agree on the infor-
mation bit which the decoder must output. In this case, the bit corre-
sponding to the path with the best metric is output.
3. 1 DESCRIPTION OF DECODING ALGORITHM
To implement a Viterbi decoder, first consider the comparisons
of survivor paths at each state. It is convenient to perform two state
comparisons together because the two states, VO and VI, both have
as predecessor the states 0V and IV as shown in Figure 3.1. Here,
V is any K-l binary digit sequence. For the case of K = 3, letting
V - 10, both states 100 and 101 are preceded by 010 and 110.
The Viterbi decoder algorithm uses five tables in memory. Four
of the tables have 2 entries, one for each state. These four tables
are organized such that two tables represent the present condition of
22
V = any K-l binary digit sequence
c^ = code symbols corresponding to the state transition
~Cj_ = complement of c^
Figure 3. 1 An Example of State Transitions of a Convolutional
Code
the decoder, and two tables represent the condition that the decoder
will be in after the next received symbols are taken into account. In
the following discussion, all arithmetic will be decimal, and X will be
a decimal index beginning with one rather than the binary digit sequence
V beginning with zero in Figure 3. 1. The present condition of the
decoder is represented by METRIC(L.X) and PATH(L,X). METRICAL, X)
is the accumulated metric (Hamming distance between hypothesis path
and received path) for each state. PATH(L,X) is a table that stores
the actual path leading to each state. Thus, PATH(L,,X) must have
enough bits of storage for each state to accommodate the designed path
truncation. The future condition of the decoder is represented by
METRIC(I.X) and PATH(I,X). Now, if L, = 1, then 1 = 2 and METRJC(1,X)
and PATH(1,X) are present condition and METRIC(2, X) and PATH(2,X)
are the future condition. Next, I is set to 1 and L becomes 2, and thus
the entries in METRIC(1,X) and PATH(1,X) can be discarded and the
tables become storage for the future condition of the decoder. There
are 2 entries for HYPOTH(S,N) corresponding to the two possible
predecessor states for each state. Each entry in HYPOTH(S,N) is the
output digits of the encoder for the transition between the two states.
Thus, HYPOTH(S,N) is just a table representation of the state diagram.
The state entries for the five tables are arranged in numerical state
order. The first entry corresponds to state 00. . . 00, the second to
state 00. . . 01, the third to state 00. . . 10, etc.
To begin the decoder algorithm, the tables with L = 1 will be
the present condition. PATH(L,X) is set to zeroes for each entry,
METRICAL, 1) is set to zero, and. the entries of METRICAL, X) for X
not equal to 1 are set to a value about 2K. This condition represents
the decoder on the all-zeroes path corresponding to the origin of the
tree. The transition from the present condition to the future condition
begins by letting X = 1. The incremental metric for c- in Figure 3. 1
is computed by exclusive "OR"-ing (EOR, digital logic for bit by bit
modulo-2 addition) the entry in HYPOTH(1, 1) and the received branch
symbols. This incremental metric is added to the cumulative metric .
of state X. Likewise, the incremental metric for c- is added to the
TC 1
metric of state 2 ~ +X. These two sums are compared; the winning
metric is stored in METRIC (I, 2X-1). Suppose the path from state X
won. The path stored in PATH(L,X) is shifted to the left by one position
and stored in PATH(I, 2X-1). If the path from state 2K~1+X had won,
K 1
the path stored in PATH(L, 2 +X) is shifted to the left by one position,
a 1 is added in the rightmost position and is stored in PATH(I, 2X-1).
The same adding and comparing is done using c~j and c^ to get the new
entry 2X for the future condition. Now, X is increased by one, and
another pair of state comparisons is carried out. By the time X has
TC 1
taken on all of its 2 values, each state of the future condition has
a new path and metric associated with it. The algorithm next searches
for the best path metric in the present condition. The oldest bit on the
path associated with this metric is the decoder output. Now the entries
25
in the tables representing the present condition can be deleted. Then
the future condition tables are used for the present condition and a new
future condition is computed, storing results in the now empty tables.
A flow chart of a computer program that performs the outlined
procedure for Viterbi decoding is presented in Figure 3.2. To start
the algorithm, the input is read, L=2, 1 = 1 and several constants that
are used many times are defined. These defined constants are
PC 1Y = 2 ~ -which, when added to the index X, give the second half of
the number of states, and Yl = Y+l to be used to indicate when the
Tf
future condition tables are filled. Also, Y2 = 2 +1 is used by the
algorithm when the best path is being searched for as an indicator that
all the states have been searched, and YM = 2 "is used to find the
oldest bit in the best path to be output. The value M is the number of
bits in each path that are stored or the design truncation of the paths.
As the algorithm begins actual decoding, X is set to 1, and L and I are
interchanged so that the previous future condition becomes the present
condition. The next set of calculations is to compute temporary metric
values. These temporary values are computed by EOR(HYPOTH(2X-1, N),
INPUT) for N = 1, 2 and added to the appropriate stored value of metric
PC — 1 —«for paths coming from X and 2 +X. These temporary metric values
are compared for the smallest value to decide which path should be
shifted and stored in PATH(I, 2X-1) and which temporary metric should
be stored in METRIC(I, 2X-1). Note that, in terms of decimal arithmetic,
YES
METRIC <l. 2X-1I -TEMPO
H (L. X)
V \ P
PATHd, 2X-1) -2 PATH (L,
METRIC (1, 2X-1) -TEMPI
Y*X) +1
TEMP 0 • METRIC (L. X) * EOR (HYPOTH (2X. 1). INPUT)
TEMP 1 • METRIC (L, Y+X) « EOR (HYPOTH I2X, 2), INPUT)
NO
PATH (I, 2X) • 2 PATH (L.XI
METRIC li, 2xi -TEMPO
UL. PATH (1,2X1 -2 PATH <L.Y*X)t
METRIC (1,2X1 -TEMPI
1
X - X « 1
NO
Z -1
21 -1
TEMPZ -METRIC (I. Z)
Figure 3. 2 Flow Chart of Viterbi Decoder Algorithm
27
the path is shifted to the left by multiplying by 2, and if TEMPI is the
winning temporary metric value, a 1 is added to the shifted path.
This procedure is then repeated for entry 2X in the future condition
tables. Finally, X is updated by 1 and the procedures are repeated
"K 1for the new value of X. If X is equal to 2 ~ +1, then the future condi-
tion tables are filled and the path associated with the best metric of the
present condition is to be found. When this path is found, it is divided
by 2 which shifts the path to the right, leaving only the oldest bit
in the computer word OUTPUT.
To illustrate the changes in the tables as the received data is
processed, an example is presented. Figure 2.3 illustrated the trellis
diagram for the example to be used. Figure 3.3 presents the table
representation of the state diagram. Note that for X = 1 correspond-
ing to state "a", entry N = 1 is the transition digits coming from X
(state "a") and entry N = 2 is the transition digits coming from state
2K~1+X (state "e"). To begin processing, Figure 3.4 illustrates the
present condition in tables METRIC(1,X) and PATH(1,X) and the resulting
future condition after processing one input bit (one received branch).
At the start, METRIC(1, 1) is set to zero corresponding to the decoder
considering the all-zeroes path as the best path (the origin of the tree).
METRIC(1,X) for all X £ 1 is set to 6; this number is arbitrary but
should be about 2K. PATH(1,X) for all X is set to zero. Referring to
the trellis diagram in Figure 2. 3, with the received digits (11), the best
28
X
* J
" I
• S
- 1
• I
' 6
« ?
" S
N
1
2
1
2
1
2
1
2
1
2
1
2
1
2
1
2
HYPOTH (X, N)
.00
11
11
00
11
00
00
11
01
10
10
01
10
01
01
10
Figure 3. 3 State Transition Table for Example
X
a 1
b 2
c 3
d 4
e 5
f 6
g 7
h 8
METRIC (1,X)
0
6
6
6
6
6
6
6
PATH(l.X)
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
X
a 1
b 2
c 3
d 4
e 5
f 6
g 7
h 8
ME TRIG (2, X)
2
0
6
6
6
6
6
6
RECEIVED
PATH(2,X)
0
0
0
0
0
0
0
0
00
0
0
0
0
0
0
0
0
00
0
0
0
0
0
0
0
0
00
0
0
0
0
0
0
0
0
00
0
0
0
0
0
0
0
0
00
0
0
0
0
0
0
0
0
00
0
0
0
0
0
0
0
0
00
0
0
0
0
0
0
0
0
00
0
0
0
0
0
0
0
0
00
0
0
0
0
0
0
0
0
00
0
0
0
0
0
0
0
0
00
0
0
0
0
0
0
0
0
00
0
0
0
0
0
0
0
0
00
0
0
0
0
0
0
0
0
00
0
0
0
0
0
0
0
0
00
0
0
0
1
0
0
0
0
11
Figure 3. 4 Present and Future Conditions at Startup.
29
path to state l("a") is from state l("a"), giving a metric value of 2.
The best path to state 2("b") is from state l("a"), giving a metric value
of 0. In state 4("d"), the best path leading to this state is from state
6("f"), giving a metric value of 6. Since the best path leading to state
4("d") is from state 6("f") or 2K~1+x',' a 1 is shifted into PATH(2,4).
This procedure is continued for all the states filling METRJC(2,X) and
PATH(2,X). The surviving paths in the trellis diagram after one received
branch are presented in Figure 3. 5. In this diagram, it is seen that all
paths except the path ending at state 4("d") have a preceding state that
has an index of four or less. The path ending on state 4 has state 6 as
TC 1
its previous state (or 2 +X). All paths originate at the all-zeroes
state beginning -with K+l previous branches corresponding to the startup
conditions.
The output digit is a zero since the present condition path with the
smallest metric has zero as the oldest bit (leftmost). However, this
output bit corresponds to Hie startup conditions. The first bit correspond-
ing to the received digits will be output after M+K bits have been received.
Figure 3. 6 illustrates the present and future conditions after three
input bits have been received. Note that METRIC(1,X) and PATH(1,X)
are obtained by processing METRIC(2, X) and PATH(2, X) of Figure 3. 4.
The trellis diagram of Figure 3. 7 shows the surviving paths after the
METRIC(1,X) and PATH(1,X) are obtained. It should be noted that path
B is closer (fewer differences) to the received path. However, by the
W
o
z
wt»
a
w
co
P
W
>
t—1
w
u
= 8
§ §
o
o
o
o
0)
f!
o
o
n!M
ffl
•O
0)
>
• H(U
o
O
bo
ri
•r-<Q
CO
in
on
tH
3
bo
31
X
a 1
b 2
c 3
d 4
e 5
f 6
g 7
h 8
MKTRIC(l.X)
2
4
2
0
7
7
7
7
RECEIVED
PATH(l.X)
0 0 0 0 0
0 0 0 0 0
0 0 0 0 0
0 0 0 0 0
0 0 0 0 0
0 0 0 0 0
0 0 0 0 0
0 0 0 0 0
oo|oo| oo| oo| oo
0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 1 0
0 0 0 0 0 0 0 0 0 1 0
oo|ooloo|oo|oo|oo|oo|oo|oo|n|oo
X
a 1
b 2
c 3
d 4
e 5
f 6
g 7
h 8
METRIC (2, X)
2
4
6
4
3
3
1
1
RECEIVED
PATH(2,X)
0
0
0
0
0
0
0
0
00
0
0
0
0
0
0
0
0
00
0
0
0
0
0
0
0
0
00
0
0
0
0
0
0
0
0
00
0
0
0
0
0
0
0
0
00
0
0
0
0
0
0
0
0
00
0
0
0
0
0
0
0
0
00
0
0
0
0
0
0
0
0
00
0 0 0 0
0 0 0 0
0 0 0 0
0 0 0 0
0 0 0 0
0 0 0 0
0 0 0 0
0 0 0 0
oo| oo| oo| oo
0
0
0
0
0
0
0
0
00
0
0
0
0
0
0
0
0
11
0
0
0
0
0
0
0
0
00
0
0
0
0
0
0
0
0
00
Figure 3. 6 Present and Future Conditions After Three Input Bits
32
W
O
£
W£
a
w
Q
W
>i— i
W
u
H
O
O
o
o
O
O
8
o
o
8
o
o
o
o
o
I
CO
0)A
u
c
PQ
-o(U
o
<u
o£
EH
<^D
tn
&0
CU
^EH
c^
•
ro
(U
?H
?1
HO
33
algorithm convention, in case of equal metrics for the two paths, the
path emanating from the state with the smallest index is chosen. Thus,
path A is chosen to states g and h, due to the startup metric weights
assigned to each state. In this case, all the paths originate at the all-
zeroes state beginning with K4-2 previous branches corresponding to
the startup conditions.
The path A to state g and h, present after only two branches had
been received, is discarded after three branches have been received.
Hence, when the future conditions tables in Figure 3. 6 are filled, the
state of the metrics has now progressed completely from their initial
values and represent the metrics of the surviving paths corresponding
to just the received digits.
Figure 3.8 represents the present and future conditions after
M = 16 input bits. Note in the present condition tables METRIC(l.X)
and PATH(1,X), states 7 and 8 have paths containing a 1 in the position
corresponding to the first digit of the received data to be decoded. Also,
these states have a metric very close to the correct all-zeroes path.
In the trellis diagram shown in Figure 3.9, it is seen that this path
originates at the all-zeroes state beginning with 16 previous branches.
Thus, the occurrence of surviving paths of this length illustrates the
need for a large path memory. Simulations indicate that M = 4K is
necessary to make the effects of long surviving paths negligible in com-
parison to the error correction capability of the code in conjunction with
34
X
a 1
b 2
c 3
d 4
e 5
f 6
g 7
h 8
METRIC(l.X)
5
5
5
6
6
6
6
6
RECEIVED
PATH(l.X)
0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 1 1 0 1 0 1 0 1 0 0 1
0 0 0 1 1 0 1 0 1 0 1 0 0 1
ii|oo|oo|oo|oo |io|oo|oo|oo|oo|oo|ii |oo|oo
0 0
1 1
1 1
1 0
1 1
1 1
0 0
0 0
|oo |oo
X
a 1
b 2
c 3
d 4
e 5
f 6
g -7
h 8
ME TRIG (2, X)
5
6
6
5
6
6
7
7
RECEIVED
PATH(2,X)
0 0 0
0 0 0
0 0 0
0 0 0
0 0 0
0 0 0
0 0 0
0 0 0
oo|oo| oo
0 0 0 0 0 0 0 0 0 0 0 0 0
, 0 0 0 0 0 0 0 0 0 0 1 1 1
0 0 0 0 0 0 0 0 0 0 1 1 1
0 0 0 0 0 0 0 0 0 0 1 1 0
0 0 0 0 0 0 0 0 0 0 1 1 0
0 0 0 0 0 0 0 0 0 0 1 1 0
0 0 0 0 0 0 0 0 0 0 1 0 0
0 0 0 0 0 0 0 0 0 0 1 0 0
oo|io |oo|oo|oo|oo|oo|u|oo|oo|oo|oo |oo
Figure 3.8 Present and Future Conditions after M=l6 Input Bits
w
u
z
w
£
a
H
P
H
>i—i
H
U
H
o
o
o
o
o
o
o
0
o
o
o
o
O
O
o
0
8
o
o
o
o
o
o
o
o
o
0
o
0
oo
o
o
o
0
o
o
o
0
o
0
o
o
0
o
0
o
o
o
o
o
8
8
o
0
X
x
a>
a
o
•3a
10
a)
X
o
S
aJM
CQ
<u
o(U
vD
<;
g
rt^
00
ni
•I-)
P
CO
• l-tI—I
1)
M
H
oo
.f-i
.h
36
a decoder with infinite path memory. As may be noted, after processing
one more input bit, all the paths in the future condition have zero as
the first bit to be output corresponding to the received sequence. Thus,
the algorithm will decode the first bit of the all-zeroes path as the
correct path.
3. 2 PERFORMANCE OF A TRANSPARENT CODE VERSUS A
NONTRANSPARENT CODE
The performance of Viterbi decoding can be tightly bounded from
the transfer function of the convolutional code used to encode the data
with a technique developed by Viterbi. The transfer function T(L/ ,N,D)
of a convolutional code relates the Hamming distance between code words,
the number of input ones corresponding to each code word, and the length
of each code word. The distance structure, the Hamming distance
between code words, is found by considering the Hamming weights of the
code words (i. e. , the number of output ones in a code word) since con-
volutional code words are group codes. Alternately, the state a^ = 000
is split and its self loop is eliminated from the state diagram. On each
transition between the states, let the exponent of a dummy variable D
correspond to the number of'ones in the symbols on the transition, assign
a dummy variable L to the transition to indicate that the length of the
path between the two states is one, and finally let the exponent of a
dummy variable N correspond to whether the input bit for the transition
was a zero or one. For example, the transition from state b_ = 001 to
state £ = 010 of the transparent code in Figure 2. 4 would be LN D or
37
LD , and the transition from state cl = Oil to state h = 111 would be
or LN. The resulting diagram is the modified state diagram
of Figure 3. 10.
By considering the modified state diagram as a signal flow graph,
the closed form transfer function is obtained as shown in Appendix I.
For biphase modulation with additive white Gaussian noise, assuming
ideal coherent demodulation, Viterbi has shown that the probability of
error per bit of the decoder is bounded by:
d\ dT(L.N.D)
dN
L = l , N = l , D = exp(-E /N
o v
( 3 . 2 )
where d is the weight of the smallest weight path, called free distance,
and ES /NQ is the signal energy per symbol/single-sided noise spectral
density. The value of d for the transparent code in Figure 2. 4 -was 6.
The derivative with respect to N of the transfer function of this code is:
D6(4-2D2-3D4+2D6)
= T'(D) = —-* — L (3 .3)
= 1 ,N=1 (l-5D
dT(L.N,D)
dN
as seen from the transfer function in Appendix I. For Eg /N0 = 3dB(E^/NQ
= 6dB), the probability of error per bit of the transparent code in the
example is bounded by P^ < 2. 3x10" .
As a comparison of the performance of the transparent code in
the example to a good nontransparent code, consider the best code for
K-3, as shown in Figure 2. 1. In this case, the derivative with respect
LND'
Figure 3.10 Modified State Diagram for the Example
Convolutional Code
39
to N of the transfer function for this code is:
T,(D) =
(1-2D-D3)2
The value of free distance d is also 6 for this code. For E _/N = 3 dB,
! S O
fh e probability of error per bit of this nontransparent code is bounded
by P^ < 1 .66xlO~ . Thus, the error rate of the transparent code is
about 1 . 4 times greater than the error rate of the best nontransparent
code. Simulations have shown that the increase in error rate using
differential coding is about 1.2. Therefore, the error rate for the
transparent code in the example, in conjunction with differential coding,
would be about 1. 7 times the error rate of the best nontransparent code
at K=3 and no phase ambiguity.
The degradation from ideal in Ei /N is about 0. 2 dB at an output
probability of error per bit of 10"^ for the K=3 transparent code with
differential coding. Because of the simplicity of resolving the phase
ambiguity, the small degradation in performance makes the combination
of a transparent code and differential coding a very attractive technique.
3.3 COMPLEXITY OF THE VITERBI DECODER
The complexity of a hardware implementation of the Viterbi
maximum likelihood decoder will be measured in complexity bits. A
complexity bit is defined as a bit of storage or a latch, a bit in an addi-
tion or comparison, or a switch. Using this complexity measure,
Viterbi decoders can be compared for performance versus complexity
40
for various values of memory length K and code rate R with sequential
decoders. Also, each portion of the Viterbi decoder is described
separately so that an average cost per complexity bit may be estab-
lished. As the price of devices changes, a new average cost per com-
plexity bit can be computed giving the new cost. New MSI circuit
arrangements may change the physical number of integrated circuits
(1C) necessary to implement each decoder portion, but the number of
the complexity bits will remain the same. To provide insight into the
physical size of the Viterbi decoder, the number of IC's used by
LINKABIT in their design of a 10 Mbps TTL Viterbi decoder for
memory length K = 3 (K = 4, using the LINKABIT notation of constraint
length) will be included.
3.3.1 Branch Metric Computation
To implement the hardware to compute the branch metrics EOR
(HYPOTH, INPUT) in the flow diagram of Figure 3. 2, first note that,
in Figure 3.1, the c^ and cfj (complement c.) represent the code symbols
corresponding to the transition. For rate 1/2, c^ can take on the values
00, 01, 10, and 11; for rate 1/3, c^ can take on the eight possible
values of a three-bit binary number. Thus, there are 2n branch
metrics to be computed from the received code symbols for code rate
1/n. If each received code symbol is quantized to Q bits, then
2n I log2n(2 -1)1 bits would be necessary to represent the branch
metrics,where [x] is the least integer greater than or equal to x.
41
If, for code rate 1/2, r, and r,, represent the received code symbols
quantized to Q bits, then they may be labeled from 0 to 2 - 1 . Denoting
the branch metrics as rn , m , m , and m for the four possible
metrics, then:
mOO = rl + r2
m!0
Thus, for rate 1/2, there must be four adders of log- n(2^-l )| bits
each. For rate 1/2, there is only one addition to be performed by each
adder. Using TTL, the computation of the branch metrics can be per-
formed in 24 nanoseconds for Q = 3. For rate 1/3, there must be
eight adders. In this case, first the sum of two of the received quan-
tized symbols is added and the third symbol is added to the result.
The computation of the branch metrics can be performed in 60 nano-
seconds for Q = 3 using TTL. However, for a serial bit stream and
low data rate, the first addition can be performed while the third
symbol is being received, and only 30 nanoseconds are necessary to
compute the branch metrics after the third symbol is received.
The total complexity of the branch metric processor when the
branch metrics are computed for each received branch of coded digits
is:
branch metric processor = 2n Ilog2n(2 -1) + nQ complexity bits
( 3 . 6 )
42
The term nQ is the number of bits necessary to store the symbols of
the received branch as it is received. In addition, there are
2 | logpn(2 -1)1 complexity bits associated with the 2n adders and
also the same number of complexity bits to store the resulting branch
metrics to be accessed by the arithmetic unit.
For low data rates, the complexity of the branch metric proces-
sor can be further reduced by using a single adder time-shared to
compute the 2 branch metrics. With the low data rates to be con-
sidered (19. 2 kbps, 38.4kbps, 57. 6 kbps, and 76. 8 kbps), the use of
a single adder does not increase the number of processors in the arith-
metic unit except for code rate 1/3, memory length K = 7 or 8, and
data rate 76.8 kbps. In this latter case, the number of processors in
the arithmetic unit does not increase if four adders are time-shared
to compute the branch metrics. Using this time-sharing technique,
the total complexity of the branch metric processor is:
branch metric processor = 12l log~3(2 -1) + 3Q complexity bits
(3 .7)
for code rate 1/3, K = 7 or 8, and data rate 76.8 kbps; otherwise:
branch metric processor = (2 n+l) |log->n(2^-l) +nQ complexity bits
(3.8)
For high data rate Viterbi decoders, it is not feasible to calcu-
late the branch metrics for each received branch of coded digits.
Therefore, a read-only memory (ROM) is used to store the branch
43
metrics for the 2n possible quantized branch symbols. If a ROM
is used, then a metric compression technique is possible. As may
be observed from the flow diagram in Figure 3. 2, subtracting the
same constant from all the branch metrics does not affect the decoder
decisions. Therefore, if the smallest branch metric in equation (3 .5 )
is subtracted from all the branch metrics, then one branch metric will
be zero and all the others non-negative. If rn is the smallest metric,
for example, then:
m00 = 0
m01 = 2 Q - l - Z r 2
m1Q = 2Q-1 - 2ri
mn = 2(2Q-1)
Now, in equation (3. 9), the branch metrics rn and rn will be odd,
m and m are even. Therefore, for Q greater than one, if the two
central quantization intervals are considered only one quantization
interval, then the received code symbols can be labeled from 0 to
2 -2. In this case, the term 2 -1 in equation (3. 9) is replaced with
2^-2, and all the branch metrics are even. Therefore, the least sig-
nificant bit may be discarded, decreasing the number of bits needed to
represent the branch metrics. Although only using one central quanti-
zation interval instead of two produces some degradation, it is quite
small. This metric compression technique is also applicable to rate
1/3 as is easily shown. The complexity of the branch metric processor
44
using the ROM is:
branch metric processor = 2n(2nQ + l) jlog2n(2Q"1-1)| + nQ
complexity bits (3. 10)
The number of bits to represent each branch metric is log.,n(2 ~ _i)
since the least significant bit is discarded. The ROM has 2n words
with 2nllog-,n(2 -11 bits each to represent the 2n branch metrics.
The term nQ represents the storage for the quantized symbols as they
are received, and the additional 2n Iog2n(2 ~ -1) bits represent the
storage of the branch metrics output of the ROM for use by the arith-
metic unit.
3 .3 .2 Arithmetic Unit Complexity
The number of bits needed to represent the 2 state metrics,
S , S , ... S „, must be minimized for the minimum complexity of
1 £ 2*^
the decoder. To minimize this number of bits, note that the minimum
Hamming distance along all paths from the all-zeroes state to any
other state may be calculated from the trellis .or state diagram. The
maximum of these Hamming distances must be less than or equal to
nK since any state can be reached by K or less transitions, and each
transition can have at most n ones associated with it. The actual
value of the maximum of these Hamming distances is quite variable
over all possible codes. Since the size of this value contributes to
the decoder complexity, it is desirable to find good codes in terms of
performance for which this value is small. To compute the complexity
for a number of memory lengths, the value of nK is used. However,
45
for a particular memory length, an investigation should be performed
to compare the various codes for complexity of the decoder versus
performance.
Using the value nK, the most any state can differ from any other
state in Hamming distance is nK, since the convolutional codes are
group codes. For Q bits of quantization, then, the most any state
metric can differ from any other state metric is (2 -l)nK. Using the
metric compression technique on the branch metrics, the most any
state metric can differ is (2 ~ -l)nK for Q greater than one.
Another important observation is the maximum increase of the
smallest state metric. Consider Figure 3.1; if the state metric asso-
ciated with the state 0V is zero, the state metric associated with state
IV is y, and the branch metrics are x and n(2 -l)-x, where x takes on
integer values between 0 and n(2^-l), then Table 3. 1 describes the
state metrics of states VO and VI for n = 2 and Q = 3.
Table 3. 1 State Metrics for States VO and VI
y
X
State
Metrics
VO
VI
6
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
0 1 2 3 4 5 6 7 8 9 1 0 9 8 7 6
6 7 8 9 1 0 9 8 7 6 5 4 3 2 1 0
In this case, the minimum between the two state metrics is always
less than or equal to 7. Therefore, for n = 2 and Q = 3, the maximum
46
the smallest metric can increase is 7. By similar observations, it
is seen that, for n = 2 and Q = 1, the maximum increase of the smallest
metric is 1; for n = 3 and Q = 3, it is equal to 11. For the metric com-
pression technique, x can only take on values between 0 and n(2 -1)
and the branch metrics are x and n(2^ -l)-x. Therefore, it can be
similarly shown that the maximum the smallest metric could increase
is 3 for rate 1/2 and Q = 3. For rate 1/3 and Q = 3, the smallest
metric could increase by a maximum of 4. Hence, the state metrics
can be normalized such that the smallest metric is less than or equal
to the maximum the smallest metric can increase after each received
branch. The number of bits needed to represent each state metric is:
storage bits per state metric = Jlog2(nK(2Q -1) -f 2Sm)| (3. 11)
with the 2^ replaced by 2U~1 if the metric compression technique is
used, and where Sm is the maximum value of the smallest normalized
metric.
TCThere are 2 state metrics that must be stored. If all operations
are performed in parallel, then each state metric must be simultaneously
accessible, and when all the additions and comparisons are performed
at the end of the arithmetic unit cycles, the state metrics are replaced
with their new values. Hence, for a fully parallel processor, 2 state
metrics must be stored. For low data rates, where a completely serial
processor may be used, only 2^ state metrics must be stored, but an
efficient storage organization is required. As was observed in Figure 3.1,
47
the state metrics of states VO and VI resulted from adding the branch
metrics to the state metrics of states 0V and IV. An efficient storage
organization is proposed by LINKABIT , which is to store state metrics
in two sections, one section for those states that have even parity and
another section for those states that have odd parity. For example,
if V has even parity, then states 0V and VO have even parity and states
IV and VI have odd parity. In performing a computation, the metrics
of states 0V and IV are read out of storage, the computation is per-
formed, and the resulting metrics VO and VI are written into the storage
locations from which the original metrics were read. The state metrics
in this organization -will be found in different locations at each computa-
tion. However, the address of the locations can be computed simply
by a properly programmed K bit counter. When neither a fully parallel
processor or a fully serial processor is used, then additional temporary
storage is required for the state metrics. If more than two arithmetic
T^
units are time-shared, then 2 state metrics must have temporary
storage or the storage requirement doubles. Therefore, the required
storage for the state metrics is:
parallel state metric storage bits = 2K ["log2(nK(2Q-1) + 2Sm)j
serial state metric storage bits = 2K f log2(nK(2Q-l) + 2Sm)|
time-shared state metric storage bits = 2K+1 | log2(nK(2Q-l) + 2Sm)|
(3 .12)
When the metric compression technique is used, the 2^ in equation
(3.12) is replaced by 2Q'* .
48
The arithmetic unit performs the addition between the state
metrics 0V and IV with the appropriate branch metrics to determine
the new state metric and path memory for first VO and then VI as
shown in the flow diagram of Figure 3.2. In addition, the arithmetic
unit determines when normalization is necessary. When normalization
is necessary (i .e. , when the smallest metric is greater than or equal
to S after the computations are performed), the value of S is sub-
tracted from the branch metrics that are to be used for the next
received branch. The subtraction is quite simple and may be hard-
wired. Normalization can result in negative branch metrics. There-
fore, an. extra bit is provided as a result of the subtraction to represent
the sign bit. The total number of bits to be subtracted is 2n for n - 2, 3
and Q = 1, and is n2n for n = 2, 3 and Q = 3 if Sm is chosen to be a power
of two, which reduces the complexity of the subtraction and the com-
plexity of determining the need for normalization. To determine when
normalization is necessary, a check is made as to whether the metric
is greater than S each time a new state metric is computed. If the
new state metric is greater than S , then the normalization is not
inhibited. However, as soon as some new state metric is less than
or equal to Sm> then the normalization is inhibited. The comparison
requires an OR-ing of the most significant bits, and the normalization
inhibit requires a bit of storage for a total of two complexity bits per
arithmetic unit. This, normalization procedure is equally valid for
for a fully parallel processor and a fully serial processor.
49
The inputs to the arithmetic unit are two state metrics and two
branch metrics. The appropriate branch metrics are added to the
state metrics, and the results are subtracted to determine which
result is smaller. The smaller result is the new state metric. The
complexity of each processor of the arithmetic unit is 5
+ 2Sm)| bits that are added, subtracted, switched to the output, or
stored for transfer to the state metric storage. Thus, the complexity
of each processor of the arithmetic unit is:
Processor = 5 |"log2(nK(2Q-l) + 2Sm)"| + 2 complexity bits (3. 13)
where 2 is replaced by 2U ~* if the metric compression technique is
employed.
For the LINKABIT 10 Mbps TTL decoder with K = 3, Q = 3, and
n = 2, each processor of the arithmetic unit can be implemented with
10 integrated circuits. It should be noted that the three IC's used for
adding the branch metrics to the state metrics and subtracting the
results for comparison are large and relatively expensive.
Including'the propagation delay of the state metric storage, an
arithmetic operation can be performed in under 100 nanoseconds for
the range of memory lengths K = 3 to 8 using TTL. Thus, for the 9. 1
T^"
Mbps data rate, a fully parallel arithmetic unit is necessary or 2
processors. In this case, the complexity is:
Arithmetic unit for 9.1 Mbps = 2K(5 |log2(nK(2Q-l) + 2S )1 +2)
complexity bits
(3.14)
50
For the 19.2 kbps data rate, a fully serial processor is possible with
memory length K = 3 to 8. The branch metrics are switched to the
arithmetic unit by 2n Jlog^n^ -1) switches.
The switching of the branch metrics is determined by a convolu-
tional encoder counting through the states of complexity n(K+l) com-
plexity bits. The addressing of the state metric storage requires
a counter of K complexity bits. Hence, the total complexity of the
arithmetic unit at 19.2 kbps is:
Arithmetic unit for 19. 2 kbps = 5 |"log2(nK(2Q- 1) + 2Sm)j + 2 + n
+ K(n+l) + 2n[iog2n(2Q-lj) (3.15)
For the data rates of 38.4 kbps and 57. 6 kbps, the complexity is the
same as for 19. 2 kbps given in equation (3. 15) with memory length
K <_ 8 and K f_ 7, respectively. But, for K = 8, the complexity for the
57. 6 kbps data rate is:
Arithmetic unit for 57. 6 kbps = 10 |iog2(nK(2Q-l) + 2Sm)~] + 4 + n
+ K(n+l) + 2n [log2n(2Q-lj| (3.16)
For the 76. 8 kbps data rate and memory length K <_ 7, the complexity
is given by equation (3. 15); for K = 8, the complexity is given by equa-
tion (3. 16).
3 .3 .3 Path Memory Storage
As each received branch enters the decoder, decisions of the
starting point of the path are made and stored in the path memory.
This was shown in the flow diagram of Figure 3. 2 and the accompanying
51
example. In the flow diagram, the last bit in the path memory corre-
sponding to the minimum state metric was output as the decoded bit.
However, simulations have shown that, with a path memory of 5(K+1)
bits, the last bits of the path memories corresponding to state metrics
less than or equal to Sm may be OR-ed together to form the decoded
-4bit. At output probability of error per bit of 10 , there is negligible
degradation using this technique to determine the decoded bit which is
significantly less complex than comparing all the state metrics for the
minimum. For the parallel processor, this technique requires 2
T^gates, but for the serial processor, an additional 2 bits of storage
are required to specify which state metrics were less than or equal to
sm.
At the 9. 1 Mbps data rate, when the decisions in the path memory
are to be shifted to include a new decision and the decisions are to be
transferred to an appropriate new location, the transfer must be in
parallel rather than serially. Therefore, associated with each bit of
storage there is a switch to transfer the bit to its new location. The
total complexity of the path memory and the output of the decoded bit
for 9. 1 Mbps is:
Path memory complexity for 9. 1 Mbps = 5(K+1)2K+1 + 2K
complexity bits (3 .17)
For the lower data rates, the transfer of decisions in the path memory
may be performed serially. Since the state path memories are
t rans fe r red simultaneously, only 2^- state path memories are
52
~K 4-1
necessary rather than 2 state path memories in the software
implementation of the example. The actual transfe.r is performed
by 2 two-input multiplexers (switches) . Hence, the total com-
plexity of the path memory and output of the decoded bit for the
lower data rates is:
Serial path memory complexity = ( 5 ( K + 1 ) + 1 ) 2 K + 2K+1
complexity bits (3. 18)
3.4 PERFORMANCE VERSUS COMPLEXITY
The total complexity for the Viterbi maximum likelihood decoder
for various values of the code memory length K and various data rates
is described by the following equations derived from.the results of the
previous sections.
n = 2,3 K<8 19. 2 kbps
n = 2, 3 K<8 38.4 kbps
n = 2,3 K<7 57.6 kbps
n = 2 K.<7 76.8 kbps
n = 3 K<_6 76. 8 kbps
= (2n+2) |"log2n(2Q-l)] + (2K+5)[log2(nK(2Q-l)
+ 2Sm)]+ 2 K (5 (K+l )+2) + 2 + n(Q+2n+l)
+ ( n + l ) K (3 .19)
n = 2,3 K = 8 57 .6kbps) = (2n+2) [log2n(2Q -1)~| + (28 + 10) jlog2(8n(2Q -1)
+2Sm)] + 28(47) + 4 + n(Q + 2n+l)
n = 2 K = 8 76.8 kbps [
(3 .20)
76. 8 kbps = 13 [log23(2Q -1 )] + (2 7+5) flog2(21(2Q- 1 ) + 32)]
+ 27(42) + 2 + 3(Q+9) + 32 (3 .21)
76. 8 kbps = 13 [log23(2Q-l )] + (28+l 0) |log2(24(2Q-l ) + 32)]
+ 28(47) + 4 + 3(Q+9) + 32 (3. 22)
9. 1 Mbps = 2n(2nQ + l)[log2n(2Q '1-l)] + 2K(6[log2(nK(2Q"1-l)
+ 2Sm)]+ 13 + 10K) + n(Q + 2n) (3. 23)
53
K - < 8 Q = l 9. 1 Mbps = 2n(2n+l)flog2n| + 2K(6[log2(nK+4)| + 13 + 10KJ
+ n(2n+l) (3 .24)
Table 3. 2 presents the complexity of the Viterbi decoder for
code rate 1/2. For data rates 19.2 kbps to 76.8 kbps, there is only
a slight increase in the complexity for memory length K = 8. Also,
for three-bit quantization and memory length K = 3, the complexity for
the 9. 1 Mbps data rate is about four times the complexity for the low
data rates since the ROM used by the branch metric processor dominates
the complexity of the high data rate. In other cases, the complexity for
the 9. 1 Mbps data rate is about twice the complexity for the other data
rates. To indicate the physical size of the Viterbi decoder, LINKABIT
designed a decoder for Q = 3, K = 3 and code rate 1/2 with TTL to
operate at a data rate of 10 Mbps. This design required 180 integrated
circuit chips. Using a similar design technique, it is found that, for
Q = 3, K = 8, and code rate 1/2, a Viterbi decoder can be implemented
with TTL to operate at a data rate of 9. 1 Mbps requiring about 5500
integrated circuit chips. Thus, by increasing K from 3 to 8, the physical
size increases by about a factor of 30 or approximately 2 , which is
an exponential increase in complexity -with K. It may be noted that the
complexity measure increases by about a factor of 21 where the smaller
increase is due to the branch metric ROM being dominant at K = 3.
Table 3. 3 presents the complexity of the Viterbi decoder for
code rate 1/3. For data rates 19.2 kbps to 76.8 kbps, the complexity
54
TABLE 3.2. Complexity of Viterbi Maximum Likelihood
Decoder for Code Rate 1/2
Quantization
Q (bits)
1
3
Memory Length
K
3
8
3
X
8
Data Rate
kbps
19.2
38.4
57.6
76.8
9. IxlO3
19.2
38.4
57.6
76.8
9. IxlO3
19.2
38.4
57. 6
76.8
9. IxlO3
19.2
38.4
57. 6
76.8
9. IxlO3
Complexity
Bits
257
257
257
257
566
13,381
13,381
13,408
13,408
31, 518
305
305
305
305
1, 378
13,825
13,825
13, 962
13,962
33, 818
55
TABLE 3.3 Complexity of the Viterbi Maximum Likelihood
Decoder for Code Rate 1/3
Quantization
Q (bits)
1
3
Memory Length
K
3
8
3
8
Data Rate
kbps
19.2
38.4
57.6
76.8
9. IxlO3
19.2
38.4
57.6
76. 8
9. IxlO3
19.2
38.4
57.6
76.8
9. IxlO 3
19. 2
38.4
57. 6
76.8
9. IxlO3
Comple
(Wei*
TTL
292
292
292
292
707
13, 421
13,421
13,448
13,454
31,659
367
367
367
367
17, 081
14, 240
14, 240
14, 282
14,297
51, 009
xity bits
*hted)
MECL II
--
--
--
--
--
--
--
--
--
--
--
--
--
--
1, 657
--
--
--
--
61,969
56
is essentially equal at a given memory length and number of quantiza-
tion bits. For hard decisions, the complexity of the 9. 1 Mbps data
rate is about twice the complexity for the low data rates. Using TTL
for the 9.1 Mbps data rate results in an extremely large ROM in the
branch metric processor. Hence, for small K, the ROM dominates
the complexity. An alternate approach is to use MECL II and compute
the branch metrics with 2 adders. In order to compare the complexity
of a design using MECL II with a design using TTL, a weighting factor
must be applied to reflect the differences in parts cost, design cost,
and packaging cost. The author feels that a reasonable weighting factor
is three for MECL II versus TTL. It should be pointed out that this is
a subjective judgment on the part of the author and should not be con-
sidered exact. As the cost of TTL and MECL II vary and as new MSI
circuit announcements are made, this relationship will change. The
weighting factor is applied to the branch metric processor and the
arithmetic unit giving a weighted complexity for the decoder of:
MECL II decoder complexity n=3,Q = 3 = 273 + 2K(18 ["log2(nK(2Q-l)
+ 2S )] + 17 + 10K) (3. 25)
m I
Table 3. 3 illustrates the weighted complexity for the MECL II
decoder and the TTL decoder at the 9. 1 Mbps data rate. It is observed
that there is an order of magnitude decrease in the complexity using
MECL II rather than TTL for K = 3, but there is little difference
between the two implementations for K = 8; in fact, the TTL decoder
57
would have a relative smaller cost. The 9. 1 Mbps data rate decoder
for code rate 1/3 and three-bit quantization is about three times the
complexity of the low data rate decoder.
Figure 3.11 presents a comparison of complexity as a function
of the required signal energy per bit/single-sided noise spectral
density (Ek/No) to obtain an output probability of error per bit equal
to 10" . It may be observed that, at K = 8, there is a gain of about
2 dB in E^/N for the same complexity by using three-bit quantization
instead of hard decisions on the received symbols. Also, for code
rate 1/2, there is a gain in E^/NQ of about 0.3 dB for the same com-
plexity with the lower data rates instead of the 9. 1 Mbps data rate,
except for K = 3 where the gain is increased to 0. 8 dB due to the large
ROM needed in the branch metric processor for the 9. 1 Mbps data rate.
For code rate 1/3 and low: data rates, there is a gain of about 0.3 dB in
Ev /N with K _>_ 4 over code rate 1/2 for the same decoder complexity.
This is also true for the 9. 1 Mbps data rate with hard decisions on
the received symbols. However, for three-bit quantization and the
9. 1 Mbps data rate, code rate 1/3 illustrates only a gain of about 0. 1
dB for K = 6 and about 0. 2 dB for K ='7 and 8 over code rate 1/2 for
the same decoder complexity. Thus, for K < 6, there is little utility
in using code rate 1/3 instead of code rate 1/2 since the same perform-
ance can be obtained with the same decoder complexity for code rate 1/2.
58
Code Rate 1/3
1 , 1 .10
2.0 3.0 4.0 5.0
Eb/N0 (dB)
6. 0 7. 0
Figure 3.11 Comparison of Complexity Versus Performance
At Output Probability of Error Per Bit of 10~4
