A software simulation study of the long constraint length VLSI Viterbi decoder by Pollara, F. & Arnold, S.
TDA Progre._,Report42-94
i f.j_ ,
N89-10203
April-June 1988
A Software Simulation Study of the Long Constraint
Length VLSI Viterbi Decoder
S. Arnold and F. Pollara
CommunicationsSystemsResearchSection
A software simulation of long constraint length Viterbi decoders has been developed.
This software closely follows the hardware architecture that has been chosen for the
VLS1 implementation. The program is used to validate the design of the decoder and to
generate test vectors for the VLS1 circuits.
I. Introduction
Convolutional codes have been used on deep space probes
for several years. During the last few years, TDA Advanced
Systems undertook a research effort [1] to develop advanced
coding techniques capable of gaining an additional 2 dB over
the present performance of deep space missions. Current cod-
ing systems are based on a K = 7, r = 1/2 convolutional code
concatenated with an 8-bit (255,223) Reed-Solomon (RS)
code, where K is the constraint length and r is the code rate.
The main result of this research effort was the discovery of
new convolutional codes with K = 15 and r = 1/6 which exceed
the 2-dB goal when concatenated with a 10-bit (1023,959) RS
code. Recently, the delay imposed on the Galileo mission
introduced the possibility of including a K = 15, r = 1/4 code
in this mission. This experimental code [2] will gain approxi-
mately 1.5 dB over the current NASA-standard code. The
Galileo experiment, together with the potential offered by
these coding gains for future missions, has led to an effort to
build a VLSI-based Viterbi decoder capable of decoding codes
with K up to 15 and r = 1/n, n = 2,3,4,5,6, at speeds approach-
ing 1 Mbit/s. 1
II. Decoder Architecture
The complexity of a Viterbi decoder depends mainly on the
constraint length K, since the number of states is 2K-1 . The
decoder for the new K = 15 codes is approximately 256 times
more complex than the current MCD (Maximum-likelihood
Convolutional Decoder) used in the DSN stations to decode
K = 7 codes. The requirement on the information data rate
forces the use of heavily parallel architectures.
After evaluation of several design alternatives, it was decided
to use a fully parallel architecture consisting of 2K-2 = 8192
physical butterflies operating in parallel. Each butterfly uses
bit-serial arithmetic to perform the internal operations of add-
1j. Statman, "Preliminary Design Review for Big Viterbi Decoder
(BVD)," JPL internal document, March 31, 1988.
210
https://ntrs.nasa.gov/search.jsp?R=19890000832 2020-03-20T06:01:08+00:00Z
compare-select, since this is more suitable to fast VLSI cir-
cuits, and represents the metrics as 16-bit numbers. Each but-
terfly contains two states of the decoder and outputs two
decision bits to the trace-back memory. The 8192 butterflies
are organized in identical VLSI chips containing 32 butterflies
each, and in 16 identical boards containing 16 chips each [3].
The concern was to develop a software simulation of the
complete decoder so that (1) several new design ideas could be
tested and validated; and (2) test vectors could be generated
for signals at various key points in the decoder and then used
to test the VLSI design. Given the complexity and the cost of
this project, it was necessary to have a complete software de-
coder that closely emulated the hardware architecture and
demonstrated the validity of the design.
III. Software Decoder
The software decoder consists of a program developed on a
SUN 3/260 workstation and written in C-language. Since the
program runs on a sequential computer, it scans through the
butterflies in sequential order, while the hardware performs all
these operations in parallel. The decoder is based on the hard-
ware design summarized in Figs. 1 and 2.
The add--compare-select circuit of Fig. 2 takes the branch
metrics p and q just computed and the previously computed
accumulated metrics mlo and rail from states i0 and il
and generates the updated metrics mlo , mjl and the decision
bits bit o and bit 1 , which are stored in the trace-back memory.
This memory is organized in three banks of L bits each, where
L is the path truncation length. Decoded bits are given by the
trace-back performed on the bank containing the "oldest"
decision bits. The detailed operation of the add-compare-
select module is shown in the flow diagram of Fig. 5. The tes_
for overflow is performed on the output accumulated metric
mio of butterfly number zero. Renormalization occurs if the
two most significant bits of m/o are both equal to one. In this
case, the most significant bit of all accumulated metrics is reset
to zero to prevent overflow of the metrics. The decoder de-
scribed in this article and its future VLSI implementation carl
decode any code with connection vectors G i = (Xio,xil .....
xn4), where xii E ((3,1) and Xio = xn4 = 1. Code search results
[1 ], [2] show that good codes always meet the constraint of
having a leading and trailing "1" in the connection vectors.
Because of this constraint, only two branch metrics, p and q,
need to be computed. When K < 15, this constraint is no
longer met, but it can be observed that in this case rail is
always equal to m/o, as shown in [3]. This is accomplished
with the switch in Fig. 2 or the test (K < 15) in Fig. 5.
Figure 1 represents the metric computer module present
in each butterfly. It takes the received symbols in sign and
magnitude representation and computes the two branch
metrics, p and q, as two 16-bit numbers. The register denoted
as LABEL i is initialized at startup time and contains an appro-
priate label for the ith butterfly. The value of this label is pro-
vided by the module encoder, whose operation is described by
the flow diagram of Fig. 3. Here NB represents the total num-
ber of butterflies (8192), i is the index of the current butter-
fly, and / the index of encoded symbols e/. First, the n en-
coded symbols ei are computed for the current butterfly. Then
LABELj is just given by the decimal equivalent of the binary
array (e o, e 1 ..... e s). The other input to the metric computer
module, rmax, is just the sum of the magnitudes of the re-
ceived symbols for each information bit time. Notice that the
diagram in Fig. 1 shows six input received symbols, but it can
be used for any code rate r = l/n, n = 2,3,4,5,6, by setting the
unused symbols to zero. Figure 4 shows a flow diagram repre-
senting the computations taking place in the software. The
variable/" counts the received symbols modulo n.
IV. Operation of the Software Decoder
Testing a large Viterbi decoder is a complex task, since
some programming errors may be revealed only by particular
input sequences or error patterns. This decoder has been tested
first against an existing software decoder for K = 7 and r = 1/2,
which has been extensively used in the past. After it was ascer-
tained that the two programs had identical behavior, the new
program was tested with various other codes, and it also per-
formed according to expectations.
To run the program, which is reproduced in the Appendix,
the user must enter K, the inverse n of the rate, the path trun-
cation length L, and the generator polynomials in octal. Also,
the names of the Fries used to get the input received symbols
and to write the decoded bits must be provided. Currently, the
output consists of the decoded information bits and is written
to a disk file. The test signals to be used for future testing of
the VLSI circuits can be obtained by inserting print statements
anywhere desired in the program.
211
References
[1] J. H. Yuen and Q. D. Vo, "In Search ofa 2-dB Coding Gain," TDA Progress Report
42-8.3, vol. July-September 1985, Jet Propulsion Laboratory, Pasadena, California,
pp. 26-33, November 15, 1985.
[2] S. Dolinar, "A New Code for Galileo," TDA Progress Report 42-93, vol. January-
March 1988, Jet Propulsion Laboratory, Pasadena, California, pp. 83-96, May 15,
1988.
[3] O. Collins, "Techniques for Long Constraint Length Viterbi Decoders," Intern. Sym-
posium on Information Theory, Kobe, Japan, p. 28, June 1988.
212
LABEL i RECEIVED SYMBOLS
SIGN BITS
M
MAGNITUDE
M
@E>-
•
,,)D
_0__
Fig. 1. Metric computer hardware diagram
5
i=0
P
1
_+_
TEST L
OVERFLOW* I
-_ mjO
D bit0
D bit1
--- mjl
1
UP FOR K < 15
DOWN FOR K = 15
C = COMPARATOR
*THIS IS USED ONLY FOR BUTTERFLY = 0
Fig. 2. Add-compare-select hardware diagram
213
sC_-_--_
I _
I i=°lj=0
-1
I ej = 0MASK = Gj A i
: NO
_=__ ej = ej • MASK
RSHIFT MASK BY ONE BIT
1
I J=J+' I
I j=0LABEL i = 0 I
n-1 ILABELi = --_0 ej 2Jj=
I
I j=j+1 I
.. START
1
i, olp=0rma× _ 0
"1
_= MAGNITUDE OF r i
SIGN = SIGN OF rj
e = BIT j OF LABEL i
^
rma x = rma x + r
1
I IF e_SIGN=1 Ip=p+'_
I
I J=J+_ I
(° )q = rma x - p
1
Fig. 4. Metric computer flow diagram
Fig. 3. Encoder flow diagram
214
START
t
INITIALIZE LOCAL
VARIABLES
1
SOO= miO + p
slO = mil + q
sO1 = miO + q
s11=mi1+p
TRUE FALSE
I mjO = sO0 IbitO = 0
TRUE FALSE
I mjl = s01 Ibit1 = 0
I=
TRUE FALSE
I mjl=mj0 I
YES
( END )
1
mjO = slO
bitO= 1
mjl= sll Jbitl = 0
I RESET TO 0
MSB OF ALL
M TRICS
Fig. 5. Add-compare-select flow diagram
215
Appendix
# include <stdio.h>
/*********************************** VLSI
/*
/*
/*
/*
/*
*/
This program simulates the long constraint length VLSI Viterbi decoder. */
It allows the user to decode convolutional codes with constraint length */
up to 15 and code rate 1/2 to 1/6. */
*/
int n; /* rate= 1/n */
int L; /* buffer length */
int p,q; /* branch metrics */
int n_tb; /* traceback addresses */
int time; /* traceback time */
int butt; /* loop counter */
int NS; /* number of states */
int NB; /* number of butterflies */
int k; /* constraint length */
int dec; /*
int mi0, mil, mj0, mjl; f*
int blk_time; f*
int bit_no; /*
int GP[6]; /*
int out/100]; /*
int outr[100];
int LABEL/8192];
decoding bank */
accumulated metrics */
time in traceback */
number of symbols decoded */
generator polynomials */
temp storage for decoded bits */
/* storage for decoded bits */
/* butterfly labels */
int metric/16384], old_metric/16384]; /* accumulated metric storage */
char flag; /* renormalization flag */
char bit0, bitl; /* decision bits */
char RAM/16384]/100]/3]; /* traceback RAM*/
main 0
{
int
int
int
int
int
int
int
int
mt
int
tb; /* traceback (tb) bank */
Mo; /* parameter to calculate memory size */
bank; /* loop counter */
blk_no; /* number of blocks decoded */
blk_par; /* block parity (0 or 1) */
n_dec; /* addresses of decoded bits */
state; /* loop counter */
symbol_no; /* loop counter */
state0, statel; /* current states of butterfly */
prev_state0, prev_state 1; /* previous states of butterfly */
/* received symbols (8-bit) */
/* input and output files of decoder */
/* input/output file pointers */
int recsym[6];
char decinp[10], decout[10];
FILE *fpl, *fp2;
printf ("The simulation can decode binary data with a constraint length");
priutf Ck < = 15 and code rate of 1/2 to 1/6");
printf ("Enter constraint length k");
216
scant C%d",&k);
printf ("Enter number of symbols n (2-6)");
scant ("%d",&n);
printf ("Enter length of traceback buffer L");
scant ("%d",&L);
for (symbol_no= 0; symbol_no< n; symbol_no+ + ) {
printf ("Enter generating polynomial GP[%d] in OCTAL > ",symbol_no);
scanf C %o",&GP[symbol_no]);
if (k < 15) GP[symbol_no] < < = (15 - k);
]
printf ("Enter binary input filename");
scant C %s",decinp);
printf ("Enter output filename that will contain decoded bits");
scant (" %s",decout);
fpl = fopen (decinp,"r");
fp2 = fopen (decout,"w");
/* open file of received symbols */
/* open file for decoder output */
bit_no = 0;
symbol_no = 0;
flag = 0;
n_tb = 0;
/* set bit counter to zero */
/* set symbol counter to zero */
/* set renormalization flag to zero */
/* set starting tb addr. to zero */
Mo = 14;
NS = 01 < < Mo;
NB = NS/2;
/* number of states */
/* number of butterflies */
/* set storage of decoded bits to zero */
for time = 0; time < L; time+ + ) out[time] = 0;
/* initialize metrics, accumulated metrics, and traceback RAM to zero*/
for (state = 0; state < NS; state+ + ) {
metric[state] = 0;
old_metric[state] = 0;
for (time = 0; time < L; time+ + )
for (bank = 0; bank < 3; bank+ + )
RAM[state][time][bank] = 0;
]
/* generate the labels that are assigned to the butterfly */
encoder( );
/* receive data bits and enter decoder loop */
while ((recsym[symbol_no] = getc (fpl)) != EOF) {
symbol_no+ + ;
if (symbol_no = = n) {
symbol_no = 0;
217
/* check value of flag to determine to renormalize accumulated metrics */
if(flag = = O)
for (state = 0; state < NS; state+ + )
old_metric[state] = metric[state];
else {
for (state = 0; state < NS; state+ + )
old_metric[state] = metric[state]&077777;/* clear MSB */
flag = 0;
blk_time = bit_no%L;
/* check to see if new traceback must be started */
if (blk_time = = O) {
blk_no = bit_no/L;
blk_par = blk_no%2;
tb = blk no%3;
dec = (tb+ 1)%3;
n_dec = n_tb;
n_tb = 0;
for (time = 0; time < L; time+ + ) outr[time] = out[time];
}
/* determine whether to move left or right through traceback memory */
if (blk_par = = 0) time = L - blk_time - 1;
e Ise time = blk_time;
/* generate the addresses for the decoded bits and traceback */
ndec = (n_dec > > I) I(NB*RAM[n_dec][time][dec]);
n_tb = (n_tb > > 1) [ (NB*RAM{n_tb][time][tb]);
out[blk_time] = (n_dec > > 5)&01; /* extract decoded bits */
/* Generate branch metrics associated with new received symbol, add to */
/* existing accumulated metrics, determine smallest accumulated metric */
/* at current state, and output decision bits to traceback memory */
for (butt = 0; butt < NB; butt+ + ) {
/* compute the two current states of butterfly and their associated previous states */
state0 = butt < < 1;
statel = state0 + 1;
prev_state0 = butt;
prev_statel =prev state0 t NB;
metric__comp(recsym); /* call metric computer */
mi0 = old_metric[prev_state0];
mil = old_metric[prev_statel];
218
add_corn p_select(); /* calladd, compare, and select*/
metric[state0] = m j0 ;
metric[statel] = mjl ;
[* write to traceback RAM, the bits at corresponding state of butterfly */
RAM[state0][time][dec] = bit0;
RAM[statel][time][dec] = bitl;
}
fprintf (fp2,"%d",outr[L-blk_time-l]); /* output decoded bits */
fllush(fp2);
bit_no+ + ; /* increment bit counter */
/AAAAAAAAAAAAAAAAAAAAAAAAAAAAA METRIC COMPUTER ******************************
/* */
/* q'hJ_ subrout, ino computes the branch metrics from the "n" received */
/A nymt,ol _ . */
/* */
metric comp(recsym)
int *rccsym;
{
int symlx)l_no;
int sum_recsym;
int encoded_bit;
int mag_recsym;
int sign_recsym;
/* loop counter */
/* maximum branch metric */
/* one bit of branch label */
/* received symbol magnitude */
/* sign of received symbol */
sum_recsym = 0;
p=0;
/* set branch metric to zero */
for (symbol_no = 0; symbol_no < n; symbol_no+ + ) {
mag_recsym = recsym[symbol_no] & 0177; /* mask the first eight bits */
sign_recsym = (recsym[symbol_no] > > 7) & 01; /* extract sign bit */
encodedbit = (LABEL[butt] > > symbol_no) & 01; /* strip label bits */
sum_recsym + = mag_recsym; /* sum all the received symbol magnitudes */
if ((encodedbit ^ sign_recsym) = = 01) p + = mag_recsym;
}
q = (sum_recsym -p);
************************** ADD, COMPARE, AND SELECT *************************
/* */
/* Add branch metrics to accumulated metrics. The pair of sums at each */
/* of the states is compared and the smallest is selected. The output */
/* of each of these decisions is the smallest accumulated metric at each */
/* state and the decision bits which are sent to the traceback memory. */
/* */
219
add_comp_select( )
{
int s00, sl0, s01, sl 1; /* the accumulated metrics */
/* add branch metric to accumulated metric */
s00 = (mi0 + p);
sl0 = (mil + q);
s01 = (mi0 + q);
sll = (mil + p);
/* determine smallest metric for present two slatcs of butterfly */
if (st)() < sl0) { bit0 = O; mjO = sO0; }
else { bilO= 1; mjO= slO; }
if (sOl < sl I) I bill = O: mjl = sol; }
else [ bill = I;,njl =sll; }
/* check constr:fi.t Icnglh and set OUtlm! accumulated metrics respectively *[
if (k < 15) mjl = mjO;
/* dr'lorraine if at't'umuhltcd metrics tntml be rcnor,nalized, if so, set flag */
if (bull = = 0 && (¿njO > > 14) = = 3) flag = I;
*/
* This subroutine generates the labels for each butterfly by utilizing */
* the appropriate generating polynomials. */
*/
encoder( )
{
int butt;
int symbol_no;
int encoded[6];
unsigned int masked;
/* loop counter */
/* loop counter */
/* encoded symbols */
/* the masked state */
/* encode butterfly labels and do appropriate shifting */
for (butt = 0; butt < NB; butt+ + ) {
for (symbol_no = 0; symbol_no < n; symbol_no+ + ) {
encoded[symbol_no] = 0;
masked = (butt < < 1) & GP[symbol_no]; /* mask the butterfly */
for ( ; masked > 0; masked > > = 1)
encoded[symbol_no] ^= masked; /* sum the bits of butterfly */
220
LABEL[butt] = 0;
for (symbol_no = 0; symbol_no < n; symbol_no+ + )
LABEL[butt] 1=(encoded[symbol_no]&01) < < symbol_no;
221
