Method and apparatus for implementing a traceback maximum-likelihood decoder in a hypercube network by Pollara-Bozzola, Fabrizio
United States Patent 
Pollara-Bozzola 
[i l l  Patent Number: 4,868,830 
[45] Date of Patent: Sep. 19, 1989 
[54] METHOD AND APPARATUS FOR 
IMPLEMENTING A TRACEBACK 
HYPERCUBE NETWORK 
MAXIMUM-LIKELIHOOD DECODER IN A 
[75] Inventor: , Fabrizio Pollara-Bozzola, Los 
Angeles, Calif. 
[73] Assignee: California Institute of Technology, 
Pasadena, Calif. 
[21] Appl. No.: 86,710 
[22] Filed: Aug. 18, 1987 
Related U.S. Application Data 
[63] Continuation-in-part of Ser. No. 781,224, Sep. 27, 1985, 
Pat. No. 4,730,322. 
[51] Int. (3.4 .............................................. G06F 11/10 
[52] U.S. Cl. ...................................................... 371/43 
[58] Field of Search ....................... 371/43, 37, 44, 38, 
371/45, 39,40; 364/200, 900 
1561 References Cited 
U.S. PATENT DOCUMENTS 
4,015,238 3/1977 Davis .................................... 371/43 
4,247,892 1/1981 Lawrence ........................... 364/200 
4,493,082 1/1985 Cumberton ........................... 371/43 
4,500,994 2/1985 McCallister ........................... 371/43 
4,545,054 10/1985 Davis .................................... 371/43 
4,730,322 3/ 198 8 Pollard-Bozzola ................... 37 1 /43 
Primary Examiner-Michael R. Fleming 
Attorney, Agent, or Firm-Jackson & Jones 
[571 ABSTRACT 
A method and a structure to implement maximum- 
likelihood decoding of convolutional codes on a net- 
work of microprocessors interconnected as an n-dimen- 
sional cube (hypercube). By proper reordering of states 
in the decoder, only communication between adjacent 
processors is required. Communication time is limited 
to that required for communication only of the accumu- 
lated metria and not the survivor parameters of a 
Viterbi decoding algorithm. The survivor parameters 
are stored at a local processor's memory and a trace- 
back method is employed to ascertain the decoding 
result. Faster and more efficient operation is enabled, 
and decoding of large constraint length codes is feasible 
using standard VLSI technology. 
18 Claims, 8 Drawing Sheets 
OUTPUT SIGNAL 
TO A USER 
n 
NEIGHBORS 
65 SELECTED - 
PATH I 60 
h 
TR AC EBAC K 
-b - OUTPUT : 72 DEVICE : 76 75 * INPUT a DEVICE 
UPDATE 
FROM METRIC 
ACCUMULATED 
METRIC - 
NE1 G HBOR 
COMPARE 
METRICS 
70 7/ -' 
L - 
LOCAL UPDATE - METRIC 
- 
f 
SEQUENCE 
n 
NEIGHBORS 
https://ntrs.nasa.gov/search.jsp?R=20080012293 2019-08-30T03:54:29+00:00Z
U.S. Patent sep. 19,1989 Sheet 1 of 8 4,868,830 
I I I I I I I I 
I I I I I I I I 
US. Patent Sep. 19,1989 Sheet 2 of 8 4,868,830 
0 
0 
0 
c\1 
a 0 0 0 c 
0 
0 
0 
I 
I 
L 
a" 
I- .a 
US. Patent Sep. 19,1989 Sheet 3 of 8 4,868,830 
A 
I s+  STAGE 2nd STAGE 3rd STAGE \ 3 6 . 7 7 3  372 
--I-- 
DECODED 
OUTPUT 
4 2 Y  
List STAGE+ znd STAGE+3rd S T A G E 4  
US. Patent Sep. 19,1989 Sheet 4 of 8 
Q d \  
I 
4,868,830 
Y 
v) 
U.S. Patent sep. 19,1989 Sheet 5 of 8 4,868,830 
FIG. 60 
FIG. 6b 
PO 
PI 
p2 
p 3  
p4 
p5 
p6 
p7 
s =  I N t  = 3 No = 3  
Po1 I
p2,3 
p4,5 
'6,7 s= 2 N t = 4  No= 6 
s= 4 Nt  = 4  No = 12 
U.S. Patent sep. 19,1989 
0 
I 
2 
3 
4 
5 
6 
7 
PO 
PI 
p2 
p3 
Sheet 6 of 8 4,868,830 
FIG 7u 
PO ij 
p2 p3 
U.S. Patent sep. 19,1989 Sheet 7 of 8 4,868,830 
00 I I  I O  I I  I I  01 --- 
R EC E I VE D SEQUENCE 
--- 0 I 0 0 I I 
DECODED SEQUENCE 
FIG 86 
US. Patent Sep. 19,1989 Sheet 8 of 8 4,868,830 
0,IO 
olooo 
CUBE O,Cn CUBE I ,  Cn 
( n =  3 )  
I,I I I 
1,OI I 
FIG. 9 
1 
4,868,830 
2 
straint length K = M + h .  An encoder for such codes is 
a finte-state machine with 2m states. The complexity of 
a maximum-likelihood decoder is approximately pro- 
portional to the number of states, Le., it grows exponen- 
The task of the decoder is to consider all possible 
BACKGROUND OF THE INVENTION paths on a trellis of about 5m stages, and find the most 
The invention described herein was made in the per- likely path, according to a specified goodness criterion. 
fomance of work under a NASA Contract and is sub- The goodness criterion is described in the article “The 
ject to the provisions of Public Law 96-517 (35 USC lo Viterbi Algorithm,” by G. D. Fomey, €’roc. IEEE, 
202) in which the contractor has elected not to retain v01. 61 (1973), pp. 263-278. 
title. The storage and update of hypothesized information 
1. Origin of the Invention sequences (survivors) is performed by using a parallel 
This application is a continuation-in-part of an appli- version of the trace-back method, described in the text- 
cation entitled “Method and Apparatus for Implement- l5 book “Error-Correction for Digital Communications,” 
ing a Maximum-Likelihood Decoder in a Hypercube by G. C. Clark and J. B. Cain, 01981, Plenum Press. 
METHOD AND APPARATUS FOR 
IMPLEMENTING A TRACEBACK 
MAXIMUM-LIKELIHOOD DECODER IN A 
HYPERCUBE NETWORK 5 tially with m. 
N&work”, filed on Sept. 27, 1985, Ser. No.. 781,224, 
now issued as U.S. Pat. No. 4,730,322, and assigned to 
- 
SUMMARY OF THE INVENTION - 
the same assignee. Multiprocessor systems have the potential to obtain 
2o large computation power. This is possible if one can 
solve the problem of how to decompose the decoding 
algorithm. There are two key requirements in the prob- 
lem decomposition: .( 1) divide the algorithm in equal 
parts, in order to share equally the resources available in 
25 each processor, and (2) minimize the communication 
between the parts, so that each processor needs to share 
information only with nearest neighbors. 
accommodate all the functions required to implement 
Asynchro- 30 complex decoders, methods must be found to efficiently 
use a network of processors. decoding is ac- 
cessor connected as the edges of an n-dimensional cube. 
The most basic embodiment assigns each available 
2. Field of the Invention 
The Present invention is CO~cemed with a method for 
maximum-likelihood decoding Of convo~utional codes 
on a network of microProcessors, and apparatus for 
executing this method. 
3. Brief Description of the Prior Art 
A concurrent computing system in which 
computers, each with a computational processor and a 
cube is described and claimed in an application entitled 
“Concurrent computing System 
nous Communication Channels,” Ser. No. 754,828, filed 
Technology. As there described, N nodes (numbered 0, 
1, . . . N-1) are connected together in a binary (or 
Boolean) n-dimensional cube in which N=2n or 35 
mes%e-han&g processor* are connected as a Since a single microprocessor or VLSI chip cannot 
On 12, and to California Institute Of complished by a network of processors with each pro- 
n=loglN. The above-identified application depicts one 
representative hypercube connection of a network of 
processors in which the processom are located at the 
with each other by bidirectional communication 
processor to each sing1e state Of the decoder’ The 
proper reordering of paths and states is Obtained by 
making a given processor act as state x, where x is a 
vertices of an n-dimensional cube and commuicate function of x and the stage of the trellis, based on cyclic 
40 shifts of the binary labe1 representing x* 
only along the edges of the cube. One manner of trans- 
mitting data and representative examples of micro- 
processor hardware and software suitable for perform- 
ing the data transfer feature of this invention is fully 
disclosed in the above-identified application. 
Maximum-likelihood decoding of convolutional 
codes is also well known. The Viterbi decoding algo- 
rithm is commonly used for such decoding, and many 
textbooks such as “Theory and Practice of Error Con- 
trol Codes” by Richard E. Blahut, Copyright 1983 by 
Addison-Wesley publishing Company, Inc. describe the 
encoding/decoding of convolutional codes. A Viterbi 
decoding algorithm is conceptualized in the text and a 
succinct summary of a trellis diagram, stages, states and 
common steps to obtain accumulated metrics and survi- 
vors is described at pages 350 through 353 and 377 
through 382 of the text. 
Convolutional codes are widely used in digital com- 
muncation systems to decrease the probability of error 
on a given noisy channel (coding gain). They are char- 
acterized by the constraint length K and by the code 
rate given by the ratio k&o, where ng channel symbols 
are generated by the encoder for each koinput data bit. 
Details can be found, for example, in the above-noted 
“Theory and Practice of Error Control Codes,” by R. 
E. Blahut. 
Convolutional codes can achieve large coding gain if 
they use large memory m, or equivalently, large con- 
The method also provides a way to group sets of 
s=2Sstates into each processor still requiring only com- 
munication between neighboring processors. This ar- 
rangement yields high computational efficiency for 
45 complex codes. The arrangement can accommodate 
different numbers S=SS of the M=2m decoder status 
into each processor, depending on the code complexity 
and the number of available processors N=2m, where 
M=SXN. 
In the aforementioned parent application, now issued 
as U.S. Pat. No. 4,730,322, each processor on the cube 
is in direct communiction with n neighbors. Each di- 
mension of the cube corresponds to a stage on an n-cube 
equivalent trellis, and those direct communication paths 
55 are shown on the equivalent trellis. Each given proces- 
sor represents one or more trellis states in accordance 
with a state-order formula of the invention. Decoding in 
the parent is provided by transmitting/receiving decod- 
ing parameters including both accumulated metrics and 
60 survivors along the n-cube communication paths. The 
amount of information required to communicate survi- 
vors between neighbors is several times, in relative 
terms, that required for transrnitting/receiving the ac- 
cumulated metrics. This continuation-in-part applica- 
65 tion reduces the communication time between neigh- 
boring processors because only the accumulated met- 
r i c~  need be communicated with neighbors. The decod- 
ing is performed by tracing-back paths on the cube. 
50 
4,868,830 
3 4 
Each processor receives from neighboring processors The trellis can be extended by repeating the m stages 
the goodness measure (accumulated metric) associated shown, to form a periodic structure; 
with paths previously examined in such processors, FIG. 3 is an example of an n-dimensional cube (n = 3) 
adds the goodness meawe for the current branch network of microprocessors placed at the vertices Of 
(branch metric), a d  then stores, in its local memory, 5 the cube, where solid lines show connections used in a 
the result of the comparison between the goodness of given stage; 
paths under consideration. stored information is FIG. 4 is a specific decoder structure where two 
then used to trace-back the most likely path and finally states are represented by each of the four processors; 
to compute the decoded incoming symbols. FIG. 5 is a block diagram of the internal arrangement 
reduces the communication requirements betweeen FIG. 6, including FIGS. 6, 66, and SC, contains the 
processors, by the need for exchanging arrangements for decoders with M=8 states and S= 1, 
FIG. 7a is a decoder structure on a two-dimensional hances efficiency. 
FIG. 76 is a decoder trellis for an 8-state (m = 3) code 
FIG. 8a is a trellis diagram of a four state decoder 
The method of this continuation-in-part application 10 of each Processor in the network; 
survivors. Reduced communication requirements en- 2, or 4 states per processor respectively; 
15 cube for an 8-state (m=3) code of b/no where k0=2; 
FEATURES OF THE INVENTION 
It is a feature of the invention to provide a method 
and apparatus for maximum-l~elihood decoding of 
interconnected as a n-dimensional cube and having each 
ness criteria, so that ultimately all paths of a trellis dia- 
gram in parallel with concurrent computations being 
achieved by other processors in the cube are deter- 
mined. This feature results in high efficiency, in terms of 
decoding speed, and allows use or codes more complex 
than those presently decodable with known methods. DETAILED DESCRIPTION OF THE 
In accordance with the present invention, a method is DRAWINGS 
accumulated metric and survivor steps of the known achieve large coding gain, i.e., improvement on the 
Viterbi algorithm, but with the novel and unique differ- performance of a digital communication link, if 
by a network of processors placed at the vertices of an straint length K = ~ +  1 (in the when k= 1). 
n-dimensional cube, and bidirectionally communicating 35 An encoder for such codes is a finite-state machine with 
cube- This decoding method can Operate with high effectively described in terms of a graph called a trellis 
efficiency because the parallel decoding operations diagram, as shown in FIG. 1 for an m=3, 8-state code. 
concerning each State of the decoder are performed in a The left-hand column of FIG. 1 shows all of the 
suitable and novel order within each processor. 40 possibilities for a three-bit grouping in a finite state 
Further provided in accordance with this invention is coding of three shift register cells, or memory 
an arrangement for implementing a novel method, com- m=3. In keeping with the Viterbi decoding algorithm, 
prising a decoder structure which is characterized by: a decoder would examine all eight possibilities by using 
(l) n ~ a n S  for sending the received Channel an iterative trial. Each iteration corresponds to a sepa- 
Symbol sequence to all Processors in the network; 45 rate vertical column such as iteration No. l, No. 2, etc. 
(2) output means for delivering the decoded, most through 15. The number of iterations in a time se- 
likely, data sequence to a user; quence, with time being depicted horizontally, is shown 
(3) a network of N = P  processors interconnected as in FIG. 1 for 15 iterations. It is a rule of thumb that 
an n-dimensional cube, having bidirectional communi- 15 = 5m iterations will almost always yield the proper 
cation along the edges of the cube for receiving/trans- 50 result. Obviously more iterations can be tried, but the 
dt t ing to neighboring processors accumulated metrics extra number of iterations normally does not achieve 
O d Y ;  any significantly-improved results. 
(4) an equivalent trellis which h a  Stages matching the The task of the decoder is to consider all possible 
n-dimensions of the n-cube, with the processors repre- paths on a trellis of about 5m stages, and find the most 
senting different states in the equivalent trellis; and 55 likely path, according to a specified goodness criterion. 
( 5 )  means in each processor to internally perform The goodness criterion is simply a number of merit 
comparisons between accumulated metrics, and to store which will be explained in more detail hereinafter for a 
the results of such comparisons for trace-back decod- typical simplified example of FIG. 8. Suffice it to say at 
ing. this point that a received sequence of symbols is viewed 
Of rate b’no$ where b=2; 
convolutional codes on a network of microprocessors how paths are eliminated according to the 
86 a trellis diagram for four states showing how 
20 usual Viterbi Algorithm, 
processor compute its own local portion of the good- paths are eliminated during the decoding operation of 
this invention; and 
FIG. 9 is a structure for a 2M-state decoder formed 
25 by joining two M-state decoders, where each decoder is 
implemented on a n-dimensional cube (n= 3). 
provided for decoding convolutional ‘Odes* by the 30 Convolutional codes are well known. Such codes can 
ence that decoding operations are Performed in Parallel th.ey use large memory m, or equivalently, large con- 
with neighboring processors only along the edges ofthe 2m states. Decoders for convolutional codes can be 
60 by the decoder as simply a trial of hypothesized sequen- 
ces. As each new received symbol is considered, the 
weighted values of trellis paths are reviewed by the 
decoding algorithm and the lowest number (indicating 
the highest level of goodness) is temporarily chosen as 
65 the best possible candidate to match the one that was 
originally encoded. 
The trellis diagram 10 in FIG. 1 is used in the Viterbi 
algorithm to represent all possible transitions between 
OF THE 
FIG. 1 is a trellis diagram of the Viterbi algorithm, 
showing all the possible state transitions for an 8-state 
decoder (M = 2” = 8,m= 3). Only four stages are shown, 
but the trellis can be extended as necessary; 
FIG. 2 is the hypercube trellis diagram of this inven- 
tion showing transitions between processors Pi of the 
hypercube and the states represented by each processor. 
4368,830 
5 
states as described in the above-mentioned articles. In 
particular, at each stage all eight states are examined 
and only one of the two possible paths 11, 12 coming 
into a state (such as 000) is preserved, along with its 
goodness or likelihood measure, called an accumulated 
metric. At a given time (stage), such as Time Ti, each 
state is associated with one (and only one) path (path 12, 
for example) reaching it. The algorithm performs the 
following steps: 
(1) Update the value of the accumulated metric of the 
two paths converging into each state, according to a 
known rule. This known rule, described in detail in the 
above-referenced paper by Forney consists, in sum- 
mary, of adding the so-called branch metric, which 
depends on the received symbol, to the accumulated 
metrics of the paths converging into each state; 
(2) Choose the preferred path between those two by 
comparing their metrics; and 
(3) Store the result of said comparison and the metric 
of the chosen path. 
Before describing the trellis for the processor/de- 
coder method and apparatus of this invention, a brief 
review of the simplified diagram of FIG. 3 is believed 
helpful. The earlier-identified application discloses a 
complete concurrent processing system in which bidi- 
rectional communication links transmit data along the 
edges of a cube. For simplicity sake each node, or pro- 
cessor, is shown simply as a dot. In FIG. 3, the bidirec- 
tional communication links are shown as solid lines with 
double-headed arrows 30, through 33, 35, through 38, 
and 40u through Mu, respectively. The letters x, y and z 
indicate directions of communication in the cube. Asso- 
ciated with the comer locations of the cube network are 
binary labels which identify the processors. For exam- 
ple, processor P o  is identified by the tribit group 000, 
Pi by the tribit group 001, etc. Each processor Po00 (Po) 
through Pill (P7) is directly connected only to its n 
' neighbors. 
6 
se uential al . time Noto - 
No@ + Nit1 - N Xa(parallel &. time) 
Where No is the number of parallel metric compari- 
sons, to is the comparison time, Nt is the number of 
parallel metric exchanges, and tt is the exchange time, 
which time remains high even when a large number of 
processors are used. 
While to and tt depend on the hardware for the pro- 
cessors, the method yields technology used for the pro- 
cessors, the method yields 
lo 
15 
20 
25 
30 
35 
6 is desirable to use direct communication links be- 40 
tween processors, in order to speed up communication. 
Bidirectional link 3OX delivers bit sequences back and 
forth between processors Po and Pi in any well known 
manner. Likewise, as shown, link 31, is connected be- 
tween P4 and Ps, l hk  32, is connected between Pg and 45 
P7, and link 33, is connected between P2 and P3. In the 
second stage of FIG. 3, bidirectional link 35,is connectd 
between processors Po and P2. link 36, is connected 
between P4 and Pg, etc. as is there depicted. 
Since comdex codes have a verv large number of 50 - -  
states, it becdmes impossible to perform sequentially all 
the above steps in reasonable time with a single proces- 
sor. It is therefore desirable to share the work-load 
among many processors working in parallel. 
Processors must be able to communicate among 
themselves in order to exchange intermediate results. 
Therefore, the decoding process will include some time 
spent in computation internal to each processor, and 
some time for interprocessor communication. It is this 
latter communication time which must be kept as small 
as possible to yield decoding efficiency. 
The advantage which can be achieved by the inven- 
tion is the ability to use large constraint length convolu- 
tional codes, which yield high coding gain and to keep 
acceptable decoding speed with feasible hardware. 
This is due to the fact that the efficiency of the 
method, given by 
55 
60 
65 
Mn 
N Nt = S(m - s) = - 
M n  
N N o = S m = -  
so that the efficiency is always above the ratio 
b 
to + tr  -
which is reached when N=M. 
We assign each state of FIG. 1 to a different proces- 
sor, so that all operations concerning all states can be 
done simultaneously (in parallel) at each stage, and 
sequentially stage by stage. Upon examination, how- 
ever, I discovered that if we assign state 0 to processor 
Po, state 1 to processor Pi, and so on up to state N- 1 to 
processor P N - ~  (P7) and we consider processors con- 
nected as in FIG. 3, then links between processors 
which are not directly connected in FIG. 3 would be 
necessary to implement all links between the states or 
processors in FIG. 1. My novel solution included map- 
ping the states in the trellis 10 of FIG. 1 as the hyper- 
cube trellis 20 of FIG. 2. 
According to the principles of my invention, a given 
processor is not assigned to a fixed state, as was the case 
in FIG. 1. Instead, for my invention the processors are 
identified by a binary lable (Po=Po00, P1=Pml, etc.) as 
shown in FIG. 2, and the trellis labelling and stage 
order is uniquely defined by the formula to be de- 
scribed. In particular, processor x represents state ? at 
stage k, if 
i=p(k)(x) ,  
where p@)(.) is the cyclic right shift of x by k binary 
positions. A path through given states in FIG. 1 is thus 
represented by a specified equivalent path in FIG. 2, 
which passes through the same states. This means that 
there is a well-defined correspondence between paths in 
FIGS. 1 and 2. According to this correspondence, a 
Viterbi-type algorithm, based on the trellis of FIG. 1, 
can be performed on the hypercube trellis of FIG. 2. 
The interesting and useful property of the trellis dia- 
gram of FIG. 2 is that all the required links between 
processors are now exactly those available in the hyper- 
cube network of FIG. 3. The first stage labeled as such 
in FIG. 3 shows how the fxst stage of FIG. 2 can be 
performed by using the connections 30, through 33, 
(marked with solid lines and double-headed arrows) 
between processors Po, P1 and P2, P3 and P4, P5 and Pg, 
P7, respectively. Similarly, the second and third stages 
7 
4,868,830 
8 
of FIG. 3 show the bidirectional communication links This method completely eliminates the need to ex- 
required for implementation of the second and third change hypothesized information sequences (survivors) 
stages of FIG. 2. It should be understood that the em- among processors. 
bodiment of the decoder on the network has been ex- At the start of the hypercube decoding algorithm, 
plained for the case m= 3, but it clearly can be general- 5 input device 60 loads a received channel sequence to be 
ized to other decoder sizes and hypercube dimensional- decoded into a suitable storage device 69. As noted 
ity. earlier, the preferred embodiment of this invention is 
When the number of states is larger than the available achieved bv VLSI techniques. Thus the storage device 
or feasible number of processors, it becomes necessary 69 may advantageously be an addressable storage space 
to assign more than one state per processor. This can be lo in the processor memory. A sequence of processor com- 
done as s h o w  in FIG. 4, where S = 2 states are assigned putations are then performed by the decoder. The input 
to each processor. The required interprocessor commu- sequence, stored in memory 69, is used to update both a 
nication links are provided by a two-cube-connected locally-store accumulated metric and an accumulated 
processor system which requires two decoding opera- metric that has been received from a neighboring pro- 
tions within each state. The method and apparatus of l5 cessor. The local metric is stored in a local metric mem- 
this invention thus generalizes to a number of states per ory 70. That metric value is then updated in the updated 
processor which is a power of two, i.e., S=2s, where S metric device 71. 
is the number of states per processor. Meanwhile, an accumulated metric value from a 
The simplest embodiment of the invention is shown in neighboring processor has been supplied by input de- 
FIG. 5 for the case of one state per processor s= 1. The ” vice 60 to an accumulated metric storage 75 which is 
block diagram of FIG. 5 represents the arrangement used to store the neighbor’s accumulated metric value. 
used in each processor of the hypercube, where the The metric at storage 75 is updated and made available 
input and output devices 60 and 65 sequentially connect in the update metric unit 76. A suitable comparison 
to neighbors along each dimension of the cube, one 25 between those updated metric values is achieved by 
dimension at a time, as shown by stages 1, 2 and 3 of comparator 72 and the proper metric value is retained 
FIG. 3. The block diagram of FIG. 5 may be thought of as a new local metric value. Note that the comparator 
as a particular means for performing the several desired 72 supplies that new local metric value both to the local 
functions. FIG. 5 is a timed operation which is readily metric memory 70 and to the output device 65. 
performable by any well-known and available proces- 3o After the above described operations have been com- 
sor, and to this extent FIG. 5 may be thought of as a pleted for a given block of data, the trace-back opera- 
flow diagram for the various computations. tion is performed. Thus, starting at any processor, the 
Although not depicted, it should be understood that storage of selected paths in device 90 is interrogated in 
all processors are initialized to the same initial state. order to find out from which previous processor the 
Operation of the decoder requires that blocks of re- 35 selected path came from. A signal is sent through output 
ceived symbols be loaded in every processor by an device 65 to said processor, and its local storage is now 
operation called “broadcasting.” In the hypercube net- interrogated, and SO on. After 5m such iterations, the 
work of processors under consideration, data from a method continues for another full block length, but now 
host processor is directly exchanged only through node the trace-back results are also sent through output de- 
zero (the origin of the cube). An efficient concurrent 40 vice 65 to the decoded sequence unit 95 in the origin 
method is required to broadcast a message from node node, since they represent reliable decoded bits. 
zero to all other nodes. Since the diameter D of an The decoded sequence output from storage 95 is 
n-cube is n, a lower bound of the broadcasting time is n delivered to a user. Note that input and output leads are 
time units (where the unit is the time to send a message labeled at the Origin node in FIG. 3. 
to a neighbor). In FIG. 6cz the trellis diagram is as that shown earlier 
Assme  that a message is in node zero, at time zero. in FIG. 2 and thus needs no further description. FIG. 6b 
In each subsequent time slot tk send messages in parallel describes the implementation of the same decoder of 
from each node FIG. 6a but with two states per processor (S=2). Note 
that double lines mean that an exchange of two metrics 
50 and two survivors along each bidirectional link between 
processors is required. Similarly, in FIG. SC, each pro- 
cessor performs all the operations on a set of four states 
(S=4). Note that, when sets of S=2Sstates of a decoder 
with a total of M=2m states are combined in each pro- 
55 cessor, the trace-back takes place inside local memories 
the neighbors along dimension k. After n time units, the for at least s stages out of m, 
message has propagated to all nodes. When more than one state is assigned to each proces- 
Even though this method does not minimize the num- sor, the invention can be applied to the decoding of a 
ber of COmm~cations (With the advantage of a very more general class of codes having rate k&, where 
simple indexing), it optimizes the total broadcasting 60 b> 1. FIG. 7 shows a structure for decoding an &state 
time to n t h e  Units, The result is clearly optimum, since code with b = 2  on a two-dimensional cube with four 
it achieves the lower bound. processors. Again, all the required links in the trellis of 
The decoded information is computed by the trace- FIG. 7a can be implemented on a 2-cube as in FIG. 7b. 
back method, whereby the result of each metric com- FIG. 8b is an example showing how various paths are 
parison is stored in each processor, instead of the actual 65 eliminated during decoding. The solid line in FIGS. Sa 
information sequence. This trace-back information is and 8b is the path which has been chosen. 
performed periodically, once every block of received Consider, an an example, a decoder for a 4-state, rate 
symbols has been processed. one-half, convolutional code given by the generator 
45 
x=[xm- 1s . . . , xk+ 19 0, ~ k -  1, . . . , XOJ 
to each node 
x ’ = [ s m - t . .  . x~+I ,  1, ~k-1,. . . , XO], 
4,868,830 
9 10 
polynomials g l = l l l  and gz=lOl. Such a received se- 
quence is given in FIG. 86. 
A conventional decoder searches for the maximum- 
likelihood path on the graph of FIG. 8a where, for a 
given received sequence, all survivor paths considered 5 
are shown. The decoder of my invention operates as 
shown in FIG. 8b, where the same survivor paths are 
shown in terms of transitions between processors. The 
decoded sequence is obviously the same in both cases, 
but the transitions in FIG. 8b involve only neighboring 10 
processors on a hypercube. 
FIG. 9 shows how, for m>n, two 2 ~ t a t e  decoders 
(M=2m), each based on an n-cube structure, can be 
used to build a 2m+btate decoder. This is done by 
observing that a (n+l)-cube Cn+I is the union of two 15 
n-cube O,Cn and l,Cn, such that every node [O,xn- I, . . 
. , &] in 0, Cn is adjacent to the nodes [1,xn-i, . . .. , &] 
from small, modular decoders on a VLSI chip, if 
enough spare connections are provided (one for each 20 
additional dimension). From a more theoretical point of 
comparing in each processor the updated locally- 
accumulated metrics with the updated accumu- 
lated metrics from a neighboring processor; 
locally storing the result of said comprising step in a 
local storage; 
tracing-back paths by interrogating local storage of 
each processor; and 
sending the decoded data resulting from this tracing- 
back step to the origin node, and then to a user. 
2. A method according to claim 1, and further charac- 
communicating the accumulated metrics during the 
transmitting step only between adjacent neighbor- 
ing processors, wherein each processor represents 
different states of the decoder at different stages on 
the equivalent trellis, according to the relation: 
i=p(”(x), 
terized by: 
in l,Cn. Therefore, large decoders can be assembled 
meaning that processor 2 represents states x at 
stage k, where p(k)(.) is the cyclic right shift of x by 
view, this shows how my invention can merge decisions 
in lower-dimensional subspaces for making decisions in 
the full-dimensional space. 
The above description presents the best mode con- 
templated in carrying out my invention. My invention 
is, however, susceptible to modifications and alternate 
constructions from the embodiments shown in the 
drawings and described above. Consequently, it is not 
the intention to limit the invention to the particular 
embodiments disclosed. On the contrary, the invention 
is intended and shall cover all modifications, sizes and 
alternate constructions falling within the spirit and 
scope of the invention, as expressed in the appended 
claims when read in light of the description and draw- 
ings. 
What is claimed is: 
1. A method for maximum-likelihood decoding of 
convolutional codes on a network of microprocessors 
interconnected as a hypercube in which a sequence of 
received symbols to be decoded is broadcast to every 
processor and each processor is adapted to compare the 
accumulated metrics and select survivors during about 
5m, stages of an m-state trellis (where m is a whole 
integer and each processor is fmably assigned to one 
state of said trellis), said method comprising the steps of: 
connecting each one of a plurality of 2m processors, 
equipped with a decoding algorithm means, in an 
n-cube configuration having bidirectional commu- 
nication links along the edges only of said cube and 
certain processors thereof not having a direct com- 
munication link between other processors of said 
n-cube configuration; 
mapping an equivalent trellis for said n-cube configu- 
ration wherein processors represent more than one 
state, and in that remapped state, have direct com- 
munication links between other processors of said 
n-cube configuration on said equivalent trellis; 
computing by the decoding algorithm t each proces- 
k binary positions. 
3. A method according to claim 1 comprising the 
designating one processor of the n-cube network as 
an origin processor to receive said broadcast of said 
sequence of received symbols to be decoded; and 
delivering to a user, as part of said sending step, said 
output signal from said origin processor after said 
4. A method according to claim 1 comprising the 
dividing the states of the decoding algorithm into 
equal parts wherein the number of states for each 
processor is a power of two. 
5. A maximum likelihood decoding system for deter- 
mining accumulated metrics and selecting survivor 
paths of M states of a trellis code in network of N=2n 
processors with each processor placed at the vertices of 
40 an n-dimensional cube and representing a set of S=2s 
states of the decoder formed from said processors; said 
decoder characterized by eliminating the need for ex- 
changing survivors between processors in said path 
selection, in that: 
each processor represents different states of the trellis 
code at different stages on the trellis, according to 
the relation: 
additional step of: 
25 
30 tracing-back step is concluded. 
additional step of: 
35 
45 
i=p(k’(x), 
50 
meaning that processor x represents state 2 at state 
k, where p(k)(x) is the cyclic right shift of x by k 
. binary positions 
means for receiving from neighboring processors the 
goodness measure (accumulated metric) associated 
with the paths previously examined in such proces- 
sor and storing the result obtained by adding the 
received goodness measure to the goodness mea- 
sure for the current branch (branch metric): and 
55 
sor, that processor’s local accumulated metrics; 
transmitting that accumulated metric to that proces- 
sor’s neighboring processors as identified by said 
equivalent trellis; 
determining the current branch metric at each of said 
processors; 
updating, based on each received symbol, the locally- 
accumulated metrics by adding the current branch 
metrics thereto; 
60 means for tracing-back the most likely path based 
upon the results stored in said local memory. 
6. A decoding system in accordance with claim 5 
wherein each processor includes computing means op- 
erative with accumulated and branch metrics, as factors 
65 for said computing means at each of said processors, 
and said system further characterized by: 
computing means at each processor for computing 
that processor’s locally accumulated metrics; 
4,868,830 
11 12 
means in each processor updating its locally- 
accumulated metrics by adding thereto the branch i=p"(x), 
metrics; 
comparing means in each processor for comparing meaning that processor x represents state 2 at stage 
the local updated accumulated metrics with those 5 k, where P(%) is the Cyclic fight shift of X by k 
accumulated branch metrics received over said binary positions; and 
links from said neighboring processor; and trace-back decoding algorithm means at each of said 
means for storing in a local processor memory for processors for determining accumulated metrics 
said trace-back means, the result from said compar- free from exchanging survivors between each of 
ing means. 10 said processors. 
7. A decoding system based on the traceback method 12. A maximum-likelihood decoding system in actor- 
in accordance with claim 6 and further characterized 
by: bidirectional processor-communication links along 
means for interrogating the local memory of a proces- the edges only of said n-dimensional cube for ex- 
sor and, according to the information there con- 15 changing accumulated metrics only (Le., eliminat- 
tained, to proceed to a specified other processor ing the need to exchange survivors over said links). 
and so on, iteratively; and 13. A decoding system in accordance with claim 11 
means for sending the results of said interrogating wherein each of said processor is characterized by 
means to the origin node. means at each processor for computing its local accu- 
8. A decoding system in accordance with claim 5 20 mulated metrics and for updating the locally- 
accumulated metrics by adding thereto branch metrics, 
and said decoding system further comprises: 
comparing means in each processor for comparing 
the local accumulated metrics with those accumu- 
lated metrics received over said links from said 
neighboring processors. 
14. A decoding system in accordance with claim 12 
an origin processor at one vertex only of said n- 
means for outputting to a user a decoded signal from 
dance with claim 11 and further comprising: 
. 
wherein one processor from said plurality is designated 
as an input processor and further comprising: 
input means at said input processor for receiving an 
encoded channel sequence to be broadcast to every 
one of said processors; and 
output means at said one processor for outputting the 
results of the traceback operation as a decoded 
output from said decoding system. 
9. A method to use two above described decoders 
each as defined by claim 5 and, each designed for 2" 3O 
states, to form a 2m+1 decoder. 
10. A maximum-likelihood decoding method for de- 
coding sequence generated by a COnVOhtiOnal encoder 
having M=2m states by considering at the decoder all convolutional codes for a received code sequence hav- 
possible paths on a trellis of about 5m stages (where m 35 ing 2m states (where m is a whole integer) by a plurality 
is a whole integer) and finding the most likely path of interconnected processors each of which is normally 
according to a specified goodness criterion, said method in commu~cation with all the other processors of said 
comprising the steps of: plurality and each provided with a decoding algorithm 
microprocessors interconnected as an n-dimen- 40 including branch metrics, accumulated metics and 
tion links for said microprocessors only along the 
connecting each one of a plurality of 2m processors, edges of said cube; 
equipped with said decoding algorithm means, in broadcasting a sequence to be decoded to every one 45 of said microprocessors; an n-cube configuration having bidirectional com- assigning each of the M states to each of the N micro- munication links along the edges only of said cube processor according to the formula: and certain processors thereof not having a direct 
.+JW(X) 50 communication link between other processors of 
said n-cube configuration; 
ration wherein processors represent more than one 
state and thus have direct communication links 
between other processors of said n-cube configura- 
tion on said equivalent trellis; 
derivinghterchanging, by said decoding algorithm 
means, the branch and accumulated metrics be- 
tween the processors while leaving the survivors in 
storage at each processor rather than exchanging 
said survivors between said processors; 
representing all of the states in said equivalent trellis, 
and 
tracing back through all of said processors in said 
equivalent trellis accordance with said stored sur- 
vivors in order to select, by said decoding algo- 
rithm means, the maximum likelihood path from 
said equivalent trellis for said cube system. 
25 
and further comprising: 
dimensional cube; and 
said origin processor. 
15. A method for maximum-likelihood decoding of 
forming a concurrent processor network Of N=2" 
sional cube having One microprocessor each at the 
vertices Of said and bidirectional 
means for &riving/interchanging decoding parameters 
s ~ v i v o r s  in an m-state trellis with each processor 
fiedly assigned to one state of said trellis, the improve- 
ment comprising: 
wherein microprocessor x represents state j, at mapping an equivalent trellis for said n-cube configu- 
stage k and the function p(k) is the cyclic right shift 
of x by k binary positions; 
computing and storing locally at each microproces- 55 
sor accumulated metrics; and 
transmitting/receiving said computations between 
neighboring processors in said n-dimensional cube. 
11. A maximum-likelihood decoding system for de- 
coding accumulated metrics and tracing back paths of a 60 
M-state trellis code in a network of N=n* processors 
with each processor placed at the vertices of an n-direc- 
tional cube and representing a set of S=2Sstates of the 
decoder formed from said processors; said decoder 
characterized in that: 
each processor represents different states of the trellis 
code at different staged on the trellis, according to 
the relation: 
65 
4.868.830 
13 
16. A maximum likelihood decoding system for deter- 
mining accumulated metrics and selecting survivor 
paths during a repetitive set of stages of a trellis code 
having a finite number of states M, (where M is a whole 
integer), said system comprising: 
N processors (N=29 placed at the vertices of an 
n-dimensional cube, where n is a whole integer, 
with each processor assigned to represent more 
than one state at different stages, K, in a repeating 
set of stages in the trellis code-that is-each pro- 
cessor x is assigned state S according to the rela- 
tion: 
.;=p@)(x), 
meaning that processor x represents state i at stage 
k, where p(k)(x) is the cyclic right shift of x by k 
binary position and k is number of the stages which 
make up a set in the trellis; 
means for comprising at each processor accumulated 
metrics and storing, but not exchanging between 
processors, survivors at said different states of the 
trellis code at correspondingly different stages 
numbered (k=O, k= 1, k=2, etc.) on the trellis; and 
bidirectional processor-communication links along 
the edges of only of said n-dimensional cube for 
exchanging said computed accumulated metrics 
only between adjacent neighboring processors in 
order to select the proper in said n-dimensional 
cube. 
17. A maximum-likelihood decoding method for de- 
coding sequences generated by a convolutional encod- 
ing having M=2m states by considering at the decoder 
all possible paths on an n-cube trellis and finding the 
most likely path according to a specified goodness crite- 
rion involving decoding parameters with survivors not 
to be exchanged but rather to be made available for a 4o 
trace-back in said path selection, said method compris- 
ing the steps of: 
forming a concurrent processor network of N=2n 
microprocessors interconnected as an n-dimen- 
sional cube having one microprocessor each at the 45 
vertices of said cube and bidirectional communica- 
tion links for said microprocessors only along the 
edges of said cube and corresponding to all paths to 
be checked on said n-cube trellis by transmitting- 
/receiving said decoding parameters over said 50 
communicating links; 
of said microprocessors; 
broadcasting a sequence to be decoded to every one 
55 
14 
assigning each of the M states on said n-cube trellis to 
each of the N microprocessors according to the 
formula: 
5 G=p"(x) 
wherein, when expressed in binary notations, mi- 
croprocessor x represents state S at stage k and the 
function p(k) is the cyclic right shift of x by k binary 
positions and M, m, N and n are whole integers; 
computing and storing locally at each microproces- 
sor the decoding parameters including the survi- 
vors; and 
transmitting/receiving said decoding parameters, 
exclusive of said survivors, between neighboring 
processors in said n-dimensional cube; and 
tracing back through said survivors to select the most 
likely path in said system. 
18. A decoding method for determining the max- 
20 imum-likelihood path of a trellis code having M=2* 
states representing a line sequence generated by a con- 
volutional encoder and subjected to channel noise, said 
method involving known decoding algorithms having 
decoding parameters to be transmitted/received be- 
25 tween N(N=29 decoding processors, with M, N, and n 
being whole integers comprising the steps of: 
dividing a decoding algorithm for said processors 
into equal parts as a power of two from all of said 
states for processing of each of said equal parts by 
one each of a plurality of N concurrent processors; 
locating N=2" processors at the vertices of an n- 
dimensional cube having bidirectional communica- 
tion paths for said processors only along the edges 
of said cube; 
mapping a trellis with certain processors representing 
more than one state on the trellis in order to repre- 
sent all of the M states on said trellis by said N 
processors, with each of said processors having 
10 
15 
30 
35 
. 
direct transmittingheceiving paths form neighbor- 
ing processors along the dimensions of said cube 
and in said trellis; and 
computing the maximum-likelihood path by a trace- 
back through the use of non-exchanged survivors 
at each of said processors for each of the M states 
by each of the n microprocessors of said trellis 
wherein said processors represent more than one 
state in an order according to: 
i=p"(x) 
wherein microprocessor x represents a set of states 
S at stages k and the function p(@ is the cyclic right 
shift of x by k binary positions * * * * *  
60 
65 
UNITED STATES PATENT AND TRADEMARK OFFICE 
CERTIFICATE OF CORRECTION 
PATENTNO. : 4,868,830 Page 1 of 2 
DATED : September 1 9 ,  1989 
INVENTOR(S) : F a b r i z i o  Pollara-Bozzola 
corrected as shown below: 
Column 2 ,  l i n e  58, " t h e  p a r e n t "  should r ead  --Patent  N o .  
It is certified that error appears in the above-identified patent and that said Letters Patent is hereby 
4,730,322--. 
Column 4 ,  l i n e  2 1 ,  " F I G .  8b a'' should read  --FIG. 8b i s  a--. 
Column 5 ,  l i n e  6 2 ,  " y i e l d  decoding" should r ead  - -y ie ld  h igh  
decoding--. 
Column 6 ,  l i n e s  1 0  and 11, d e l e t e  " f o r  t h e  p r o c e s s o r s ,  t h e  
method y i e l d s " .  
Column 9 ,  l i n e  1 6 ,  "n-cube" should r ead  --n-cubes--. 
Column 9 ,  l i n e  5 9 ,  ' I t "  should r ead  --at--. 
Column 1 0 ,  l i n e  4 ,  "comprisingn should read  --comparing--. 
Column 10, l i n e  2 0 ,  lrAX1l should read  --x--. 
Column 1 0 ,  l i n e  2 0 ,  IIx" should r ead  --x--. 4 
UNITED STATES PATENT AND TRADEMARK OFFICE 
CERTIFICATE OF CORRECTION 
Page 2 of 2 PATENT NU. :4 I 8 6 8  8 3 0  
DATED : S e p t e m b e r  1 9 ,  1 9 8 9  
t,MVENTO ~ ( s )  :Fabriz io Pollara-Boz zola 
It is certified that error appears in the above-identified patent and that said Letters Patent is hereby 
conected as shown below: 
Column 11, l i n e  67, “ t r e l l i s  accordance” should read - - t re l l i s  
in accordance--. 
Column 1 3 ,  l i n e  2 8 ,  “edges of only” should  read --edges only--. 
Attest: 
Attesting Officer 
Signed and Sealed this 
Fourth Day of December, 1990 
HARRY F. MANBECK. JR. 
Commissioner of Pmwts  and Traclc.tncirks 
