Architecture and implementation considerations of a high-speed Viterbi decoder for a Reed-Muller subcode by Uehara, Gregory T. et al.
NASA-CR-2C212?
SU! JRMED FOR
'.E ADMINISTRATION
.SPEED VITERBI
SUBCODE
Lin
ory T. Uehara
ineering
M_noa
Hall 483
;22
https://ntrs.nasa.gov/search.jsp?R=19960049767 2020-06-16T03:28:04+00:00Z
SUMMARY OF RESEARCH PERFORMED FOR
NATIONAL AERONAUTICS AND SPACE ADMINISTRATION
Electrical Engineering Division
Goddard Space Flight Center
on the project entitled
ARCHITECTURE AND IMPLEMENTATION
CONSIDERATIONS OF A HIGH-SPEED VITERB!
DECODER FOR A REED-MULLER SUBCODE
(Progress Report)
Grant Number NAG 5-2938
Principal Investigator. Shu Lin
Co-Princlpal Investigator: Gregory T. Uehara
Department of Electrical Engineering
University of Hawaii at Manoa
2540 Dole Street, Holmes Hall 483
Honolulu, Hawaii 96822
July 12, 1996
PROGRESS REPORT OF RESEARCH PERFORMED FOR
NATIONAL AERONAUTICS AND SPACE ADMINISTRATION
Electrical Engineering Division
Goddard Space Flight Center
oil the project entitled
PROGRESS ON IMPLEMENTATION OF TIlE SUBTRELL1S IC
Shu Lin, Gregory T. Uehara, Eric Nakarnura, and Cecilia W. E Chu
1
INTRODUCTION
In this research, we have proposed the (64, 40, 8) subcode of the third-order Reed-Muller (RM) code
to NASA for high-speed satellite communications. This RM subcode can be used either alone or as an
inner code of a concatenated coding system with the NASA standard (255, 233, 33) Reed-Solomon (RS)
code as the outer code to achieve high performance (or low bit-error rate) with reduced decoding
complexity. It can also be used as a component code in a multilevel bandwidth efficient coded modulation
system to achieve reliable bandwidth efficient data transmission.
This report will summarize the key progress we have made toward achieving our eventual goal of
implementing a decoder system based upon this code.
In previous reports [1,6], we described results from our investigations of the complexities of various
sectionalized trellis diagrams for the proposed (64, 40, 8) RM subcode. We found a specific 8-trellis
diagram for this code which requires the least decoding complexity with the potential to achieve a
decoding speed of 600 M bits per second (Mbps). The combination of a large number of states and a high
data rate will be made possible clue to the utilization of a high degree of parallelism throughout the
architecture. This trellis diagram was presented and described in detail [1]. We then investigated circuit
architectures to determine the feasibility of VLSI implementation of a high-speed Viterbi decoder based
on this 8-section trellis diagram. We made detailed design and feasibility examinations of implementation
approaches for the key blocks. Our key results for block level implementation were presented in [6].
This report will focus on our recent progress and plans regarding development of the integrated
circuit prototype sub-trellis IC, particularly focusing on the design methodology.
1. Summary of Previous Results
We will begin this section with a brief discussion of the system block diagram in which the proposed
decoder is assumed to be operating. Next, we will present some of the results from our architecture
development for a sub-trellis IC which will be the basic building block for a decoder system.
_ystem Block Diaerarn
A simplified block diagram of a receiver in which the proposed decoder may be used is shown in
Fig. 1. The signal enters the receiver via an antenna and is first amplified by a low noise amplifier (LNA)
before begin passed to the 2-PSK demodulator. We assume the functions of carrier and timing acquisition
and gain control are properly performed in the demodulator. The output of the demodulator is sampled at
the correct phase at the symbol rate of 960 MHz. The output of the sampler is converted to the digital
domain by the 3-bit analog-to-digital converter (ADC) for decoding by the Viterbi Decoder block which
follows. Our work currently focuses exclusively on the implementation of the Viterbi Decoder.
From
Antenn_ 2-PSKDemodulator 060 
Bits
Viterbi _._utDecoder
Figure 1 Block diagram of a high speed satellite receiver employing 2-PSK signalling and a Viterbi Decoder.
Smnmary of System Level Architecture Design
In our earlier reports [1,6], we describe in detail the different ways in which parallelism can be
utilized to decode the (64, 40) RM code. We will provide a brief summary of these descriptions in this
section.
There are many diverse issues at different levels of the design requiring consideration for
implementation of the (64, 40) RM code at a rate of 600 Mbits/sec. Fig. 2 illustrates the different layers of
hierarchy associated with the proposed implementation. First, there are N parallel decoders with each
operating on a different independent block of 64 symbols. Given a decoder which can decode a 64-symbol
block at a certain rate, using Ndecoders and having them each operate on a different block of 64 symbols
allows a throughput N times greater. Second, each decoder is implemented with K parallel isomorphic
subtrellises. As described in [5], the trellis for an RM code can be decomposed into parallel isomorphic
subtrellises that are connected at only the inputs and outputs as shown conceptually in Fig. 2 with K
parallel subtrellises. This has a tremendous advantage for IC implementation because it minimizes the
amount of routing required within the trellis which would otherwise be unrealizable at high speed for
applications requiring large numbers of states. This is the key which makes an implementation using
CMOS IC's at such a high rate and complexity possible. And third, there are a number of parameters
associated with the implementation of each of the K subtrellises. The first is the number of sections in the
subtrellis denoted as L. Next, is the number of states at the end of each section i (i = 1, 2..... L) denoted as
ISil which will generally not be the same. Finally, there is the radix of each section denoted as R i for radix
R in section i. As the number of sections L decreases, the complexity of each section and the number of
parallel branches per section increases. These trade-offs are discussed in detail in [1 ].
0/._ ViterbiDecoderI _)_---0VkerbiDecoder2
Input ; Output
0_ ViterbiD_coderN___
Section i Section
i
1 : 2
0
Number of States: S 1
Radix: R1
-_ 64 Symbols _:0
Section; Section ; Section; Section',
1 i 2 i 3 , L ,
.._7 Subtrellis K
64 Symbols "-,.-
Section,, Section Section
0o• L/2 i 00• L1 L
0-o-- _ ..- -0 ;O,kooo ooo --<3 •
82 SL/2 SL-1 SL
R2 RL/2 RL-1 RL
Figure 2 Levels of hierarchy in the proposed Viterbi decoder implementation. (a) Parallel Viterbi
decoders operating on different blocks of data. (b) Implementation with K parallel isomorphic
subtrellises. (c) Subtrellis implementation.
After examining a number of various permutations of N, K, L, S, and R, we settled on a solution
with the detailed structure shown in Fig. 3. We call this structure Trellis 2 and each Viterbi Decoder of
Fig. 2b will have K equal 32. In tiffs solution, our design goal is to meet the speed objectives in a currently
available CMOS technology with N equal 2.
Section: 1 2 3 4 5 6 7. 8
Source
Destination
No. States:
RADIX:
64
8
64 64 8 64 64 64
8 8 64 8 8 8
Figure 3 Detailed subtrellis structure for Trellis 2.
1
64
The key to the implementation of a (64, 40) RM decoder will be the successful implementa.tion of an
IC implementing the subtrellis shown in Fig. 3.
The key objectives of the subtrellis IC implementation are to:
1. Maximize the efficiency as measured by maximizing the utilization of the hardware (in
other words, attempt to minimize the time the majority of the hardware is not being
used).
2. Use a chip plan which minimizes the area used for routing (routing area is simply an
overhead which should be minimized).
3. Approach the speed of 600 Mbits/sec with 2 parallel decoders.
4. Consider reliability and robustness issues. In particular, use the lowest speed system
clock possible which allows high speed operation in order to reduce the number of
issues which can limit the performance (which in this case would be clock skew
between chips or race conditions both within and between the different ICs.
5. Consider the board design and the numbers of inputs and outputs to each chip to
facilitate implementation of the final decoder system.
6. Keep the size of the IC on the order of 10 mm per side to facilitate its implementation
and yield for testing.
2. Recent Results
Our recent efforts have focused on the design and development of the prototype IC. The goal of this
portion of the project is to design and layout the circuits in a computer aided environment to create a
database which can be used by a fabrication facility to generate the necessary masks to fabricate the
prototype IC.
Initial Design ProcedHre
The design procedure for development of the prototype IC we have used to date is as follows:
1. Block Level Design -- Define the functional performance the major blocks.
2. Timing Diagram Design -- Define the flow of data through the chip based upon the
design of the major blocks.
3. Circuit Design -- Design circuits first at the gate level and then the transistor level
depending upon the particular transistor logic style (complementary, pass-gate,
dynamic, etc...) to perform the desired functions and at the desired speed.
4. Circuit Layout -- Create a full custom layout by hand which defines the location and
size of the geometries which become the mask set for fabricating the chip.
5. Verification -- Verify that the layout performs the functions desired by the circuit
through extracting the connectivity and transistor geometry information and using the
file as part of an input control file for a circuit simulator, namely SPICE.
6. Full-Chip Layout -- Repeat Steps 3 - 5 outlined above for each of the ceils which make
up the sub-blocks which are then connected to make up the major blocks. Following
this, perform these steps again as the major blocks are connected and then verified.
Finally, verify the functionality of the entire layout.
7. Send a layout file for fabrication to an appropriate facility.
CJrip Plan -- Block Level Overvie_
An outline of the overall block plan is shown in Fig. 4. The Clock Generation and Control block will
generate the necessary clock phases to clock the chip. Input data will enter the Branch Metric Unit (BMU)
which will generate the branch metrics for the Add-Compare-Select Unit (ACSU). The outputs of the ACS
Unit include the winning path metrics and the winning branch labels. These are input to the Decoder
which determines the most likely path through the subtrellis for the 64-symbol block. Pipelining is used
extensively within the BMU, ACSU, and the Decoder. Due to the use of block processing, we currently
plan to have the input clock to the chip clock at a 60 MHz rate. The design goal is to have each IC process
data at a 480 Msymbol/sec rate (300 Mbps).
Input
Clock
(60 MHz)
Input
Data
Clock Generatior iandControl
Branch Metric
Unit
(BMU)
--. ,-ID
i _ i
Add-Compare-Select
Unit
(ACSU)
--¢ Decoder
t
Output
Data
----0
(300 Mbitrdsec)
To further
processing
Figure 4 (a) Block diagram of d_eIC being developed to implement a subtrellis.
Using the Initial Design Procedure outlined above, we came up with an initial functional design for
the entire chip. We completed Steps 1 - 3 for the major blocks shown in Fig. 4. We then developed a layout
for an 8-Way ACS Ceil which will be the basic building block in the ACSU and other key building blocks
which will be repeated clue to the modularity of the design. This allowed us to develop the estimates for
the chip layout of a subtrenis IC shown in Table 1. Layouts and simulations assumed the use of a 0.6 I_m
double-metal CMOS technology. With pads and routing, a consevative estimate of the die size in this
technology is 1.2 mm x 1.2 ram.
Table 1. Estimates Using Initial Design Procedure
1,000 500 p.m x 1000 I-tin
1,000 prn x 1,000 I-tin
8,000 p.rn x 11,000 l.tm
175,000 .L 2,500 gtm x 11,000 I.tm
Block
Clock Generation and Control
Branch Metric Unit
Add-Compare-Select Unit
Decoder
Limitations of the Initial Desi?n Procedure
The procedure outlined above was very useful for us to obtain estimates of the size and complexity
of each of the blocks. However, it underscored a basic limitation of our Initial Design Procedure which is
the adequate full-chip verification of the layout of a chip with nearly 500,000 transistors. This procedure
was used successfully in other university projects in high speed decoders [2 - 4]. However, the overall chip
complexity in our project has turned out to be significantly greater than that of this other previous work.
LSI Logic (San Jose. California) Association. Relationsl:ip. and Support
LSI Logic is a company based in San Jose, California focused on integrated circuit development for
high performance communication systems. One of the primary products of LSI Logic is a design
methodology which allows customers to design and develop custom integrated circuits in a systematic and
proven manner. Using this LSI Logic design methodology, customers begin with a set of functional
specifications and end with prototype IC devices in near state-of-the-art CMOS technologies.
This summer, our two students involved with the development of the prototype IC, Eric Nakamura
and Cecilia Chu, are spending the summer at LSI Logic as Temporary Employees. As a result of the
combination of coding and VLSI development research work here at the University of Hawaii, LSI Logic
has started what is planned to be a long term support relationship of our University. In this relationship,
LSI Logic will supply their design methodology and chip fabrication services to the University of Hawaii
in return for research updates (as is available to all companies). While there are other benefits to LSI Logic
such as the potential for student hires through internship experiences, research updates will probably
happen in a manner more timely than might be considered typical due to the established relationship with
faculty and students. In the longer term, LSI Logic plans to support increasing amounts of research in
coding and VLSI development here at the University.
Our students this summer are focusing their efforts on learning the LSI Logic methodology in the
context of a development project for LSI Logic as they are hired as Temporary Employees. In the coming
fall, they will bring back to the University the LSI Logic methodology which we plan to use for
development of the prototype subtrellis IC. The advantages for our current project are tlaree-fold.
First, LSI Logic has currently available a 0.35 _m CMOS technology which would result in a nearly
a 4 times reduction in the layout size of a given cell. As a result, we would probably tend toward the use of
standard cells from the LSI Logic family as opposed to a full custom hand design as we had planned. This
takes us to the second point. In the LSI Logic design methodology, functional blocks are described using a
Hardware Description Language (HDL). Tiffs programming language is first used to describe a given
function and which can be simulated and verified. This HDL can then be used to directly develop a circuit
layout which would be a connection of standard cells from a cell library which would implement the
described function. This will greatly reduce development time as compared with our Initial Design
Procedure. While the chip area is greater when standard cells are used to implement a given function as
compared with a full custom hand design 1, the use of a more aggressive technology can still result in a
decrease in the overall chip size. And third and most important, the LSI Logic design methodology has
been used effectively to develop integrated circuits with more than one million transistors. This proven
method will greatly increase the probability of working devices in our initial design. It will allow us to
verify the circuit layout with a much greater confidence as compared with the Initial Design Procedure.
We expect that the design results from the Initial Design Procedure, which provided an initial design
of the prototype IC through to the circuit level, will be extremely beneficial to our development using the
1. Hand design signifies a layout drawn by hand on a computer as opposed to one automatically generated on a computer.
LSI Logic design methodology. We are aware of the critical blocks and now have points of reference from
both the circuit and layout standpoints with which to compare our new designs. This should result in a
superior solution than would otherwise have been obtained if either of the design approaches were used
exclusively.
3. Summary
In our recent efforts, we completed the majority of the circuit design down to the circuit level using
what we call our Initial Design Procedure. Through this process, we believe a prototype IC which can be
used to implement the 600 Mbps decoder is achievable using a 0.6 p.m CMOS technology using the
approaches described in our previous report. This summer we have our two students Eric Nakamura and
Cecilia Chu at LSI Logic, learning the LSI Logic Design Methodology which they plan to bring back to
the University. We believe the LSI Logic design methodology will result in a circuit layout whose
performance can be better verified prior to fabrication than would otherwise be possible using our Initial
Design Procedure. Thus, the initial prototype circuits will have a much greater chance of functionality. LSI
Logic intends to be involved with the fabrication of the prototype IC using their 0.35 btm CMOS
technology. This newly developed association between LSI Logic and the University of Hawaii should
prove to be very useful as we progress in our development of the prototype IC on our way to building a
prototype decoder system.
REFERENCES
[1] H. T. Moorthy, S. Lin, and G. T. Uehara, "On the trellis structure of a (64,40,8) subcode of the
(64,42,8) third-order Reed-Muller code," NASA Report, NAG 5-931, Report No. 95-001, March 1,
1995.
[2] A. K. Yeung and J. M. Rabaey, "A 210 Mb/s radix-4 bit-level pipelined Viterbi decoder," ISSCC 1995
Digest of Technical Papers, San Francisco, CA, Feb. 1995.
[3] P. Black and T. Meng, "A 140 Ivlb/s 32-state radix-4 Viterbi decoder," ISSCC 1995 Digest of Techni-
cal Papers, San Francisco, CA, Feb. 1992.
[4] P. Black, "Algorithms and architectures for high speed Viterbi decoding," Ph.D. Ttzesis, Stanford Uni-
versity, May 1993.
[5] T. Kasami, T. Takata, T. Fujiwara, and S. Lin, "On Branch Labels of Parallel Components of the L-
section Minimal Trellis Diagrams for Binary Linear Block Codes," IEICE Transactions on Funda-
mentals of Electronics, Communications, and Computer Sciences, Vol. E76-A, No. 9, pp. 1411-1421,
September 1993.
[6] S. Lin, G. T. Uehara, E. Nakamura, and C. W. P. Chu, "Circuit design approaches for implementation
of a subtrellis IC for a Reed-Muller Subcode," NASA Report, February 20, 1996.
