Ahstrurt -As a specific application of the material presented in Part I, this companion paper identifies VLSI layout strategies for realizing correlative encoded MSK-type Viterbi receivers. When the source symbols are correlatively encoded using a first-order polynomial, the appropriate Viterbi receiver takes the form of a cube-connected cycle (CCC) structure. Second-order encoding polynomials give rise to a new type of aiea-efficient VLSI structure which is a generalization of the CCC structure. The results are important from two perspectives: 1) Isomorphisms between certain concepts in theoretical computer science and digital communications are established, and 2) good practical VLSI layouts are generated, by a structured design methodology, which commercial silicon foundries can fabricate.
I. INTRODUCTION I N addition to decoding convolutional codes, and the demodulation of intersymbol interference and partialresponse PAM signals, the Viterbi algorithm [1]-[3] is applicable to maximum likelihood demodulation of bandwidth-efficient continuous-phase modulations (CPM).
Using concepts introduced in the previous paper, a VLSI design methodology [4] is presented in this paper for synthesizing highly concurrent computing structures which directly implements the Viterbi receiver for correlative encoded MSK signals [5] . Our interest is focussed on the MSK-type of CPM scheme because: 1) this technique gives rise to signals with excellent bandwidth efficiency, and 2) the receivers are not very complex as they either require four or eight states to be stored and processed during each symbol interval. Consequently, with access to computer assisted design (CAD) workstations and commercial silicon foundries, the Viterbi receiver specified in this paper can be commercially realized today with dies containing less than 32 000 transistors (excluding synchronization hardware and correlators for path metric generation) with throughputs on the order of lo7 bits/s. The presentation is organized into four sections. Initially, we review basic concepts on the VLSI grid model of computation. This model allows us to structure our design methodology when deaiing with both massively parallel computing structures in general, and those parallel computing structures of interest for encoded MSK modulation in particular. The second section introduces details of duobinary MSK modulation which are relevant to laying out the floorplan of the VLSI processing structure. In the third section, we establish that the Viterbi receiver for MSK modulation, using first-and second-order encoding polynomials, falls within a generalized class of cube-connected cycle processing structures. The final section summarizes our findings and presents extensions to multi-h phase codes and phase estimation.
VLSI MODELS OF COMPUTATION
In the VLSI model .of computation proposed by Thompson [6] , a parallel computing system is viewed as a computation graph consisting of "nodes" and "wires" placed on a rectangular grid. Nodes correspond, to processing elements and wires are responsible for providing communication between processing elements. Each wire has unit width on the silicon chip and, transmits a unit of information in a unit of time; information is taken from, or delivered to, the processing elements on the chip. The VLSI grid model is justified on the basis that its area and time charges are sufficiently realistic to accurately represent computational systems produced by current technology.
The task of VLSI design involves bridling the complexity of algorithm realization using a surface of perhaps thousands of simultaneously active computing elements. Complexity control is acheved by defining a compact and topologically regular processor and wire layout which allows the system to efficiently move information from where it was produced to where it is needed in the next time instant. By taking a realistic account of the placement of modules and their interconnection, the VLSI model of computation [7] allows us to identify which compositions will result in: modularity and ease of layout, local communication paths, regular control and timing structures, extensibility and minimum die area (yield is an inverse exponential function of die area). In the remaining sections 0733-8716/86/0100-Ol55$01.00 01986 IEEE 
CORRELATIVE ENCODED MSK MODULATION
The mathematical representation of a CPM signal x([), is
x(t)=Acos[2.rrf,t++(t)+B]
where f,. is the carrier frequency, 8 is the phase offset, and +( t ) is the information-carrying phase. We assume perfect carrier phase coherence and, hence, take B = 0 without loss of generality.
The information-carrying .phase +( t ) , with modulation index h = 4, can be written in the following form.
where k is an integer, T is the bit period and d, the correlative encoded data bit (implicitly rectangularly shaped) for the k th bit interval.
In the simple case of no encoding d, = a,, where a, is a source symbol drawn from the finite alphabet
This is the minimum shift-keying (MSK) modulation format which is just continuous-phase digital FM with modulation index one-half. For duobinary MSK, the encoding polynomial is If d , = 0, the phase remains constant; otherwise it follows the same linear trajectory between phases that occurs in MSK modulation. The incentive to correlate the data symbols prior to modulation is that duobinary MSK has less phase variation than MSK and consequently has better bandwidth efficiency. The Viterbi receiver, in this case, is specified by use of the four phase (0, ~/ 2 , r, 3?r/2) modulation state diagram in Fig. l(a) . The system states can be divided into two classes, one class (Type A) occupied at odd-bit times and the other class (Type B) occupied at even-bit times. The two classes are shown in the recursive trellis diagram of Fig. l(b) . Type A transitions terminate at states 2, 3, 6, and 7 while Type-B transitions terminate at states 1, 4, 5, and 8 in Fig. l(b) .
For tamed frequency modulation (TFM), the memory in the modulation is increased by one over that for duobinary MSK, providing additional bandwidth efficiency over duobinary MSK terminating in states. 1, 3, 5, 7, 9, 11, 13 , and 15 are occupied at even-bit times. The two classes are shown in the trellis diagram of Fig. 2(b) .
IV. VLSI REALIZATIONS
In this section, we demonstrate that correlative encoded MSK-type trellis structures can be implemented in a fully parallel manner on a relatively new processor interconnection scheme known as the cube-connected cycles (CCC). The topology of this network can be derived from a Boolean (k-dimensional) hypercube of 2, vertices by replacing each vertex with a cycle of k vertices, for a total of k2, vertices. It is known to have an area efficient and regular embedding within the VLSI grid model [9] and hence, it is of interest in this application (see Appendix B, Part I).
For duobinary MSK, the recursive two-stage trellis diagram of Fig. l(b) can be equivalently implemented by the CCC structure of Fig. l(c) , where k = 2 for a total of k2k = 8 unique processors. The processors or nodes have been numbered to correspond directly with the state information they manipulate and contain, consistent with the notation presented in [5] . Cycle connections can be identified as the' four vertical loops. Note that the data flow is unidirectional and counterclockwise in each of the loops. Cube connections, illustrated by the horizontal wires, handle bidirectional data. Each node contains add-compareselect logic and a state metric register for generating and storing state metrics and a survivor sequence register. The hardware required to implement a cycle slice is similar to that presented in Fig. 8 of the previous paper. No more than 2000 transistors per node (for eight-bit state metrics) are required to implement the required Boolean operations for a total of 16000 transistors.
In addition, it is important to realize that the branch metric for each state transition is obtained by the correlation function between the received waveform and the expected signal waveform. In duobinary MSK, three pairs of correlators, as illustrated in Fig. 3(a) , are required for this task. In Fig. 3(b) the complete duobinary MSK Viterbi receiver floorplan is illustrated. Using [5] as a reference, note that nodes in each cycle "slice" require three unique correlator outputs. Cycle slices can be grouped into pairs such that three correlators are local to a pair. This is the reasoning behind the rearrangement of cycle slices presented in Fig. 3(b) . The active wires, nodes, and correlators during even-and odd-bit intervals is illustrated in Fig. 4 . During odd-bit times, nodes 3, 2, 7 , and 6 generate corresponding state metrics and update associated survivor sequences. Nodes 1, 8, 4 , and 5 are inactive during this time. At even-bit times, state metric generation occurs at nodes 1, 8, 4, and 5 while nodes 3, 2,7, and 6 are inactive.
Fixed time lag estimates of the data can be obtained alternately from the truncated survivor sequence of any MSK modulation with correlative encoding using the TFM polynomial has a trellis structure which forces us to generalize the CCC structure into a new type of area efficient VLSI structure which we refer to as the "double CCC," or DCCC, shown in Fig. 5(a) . This name is derived from the fact that an implementation of the trellis of Fig.  2(b) requires double the number of cube connections of a standard CCC, as illustrated in Fig. 5(b) . During odd-bit times, state metric generation occurs at nodes 2 and 4, 14 and 16, 10 and 12, 6 and 8. The remaining nodes are inactive during this time. At even-bit times, precisely the converse occurs, state metric generation only occurs at nodes 3 and 9, 7 and 13, 5 and 15, 1 and 11. For each of the four cycle loops in Fig. 5(a) , it is instructive to follow the state trajectory through the trellis illustrated in Fig.  2(b) . Note that the direction of the data flow in each of the cycle loops, as illustrated, is not the same. One other important difference over duobinary MSK is that five pairs of correlators are required to generate the required path metrics, as described in McLane [5].
CORRELATORS P A T H M E T R I C S CORRELATORS P A T H M E T R I C S

V. DISCUSSION AND CONCLUSIONS
The Viterbi algorithm technique, as applied to correlative encoded MSK, is just a special case of a dynamic programming solution to modulo~2m phase sequence estimation. The same techniques that were presented in this paper can be extended to develop special types of digital VLSI phase-lock loop equivalents [ll] .
In addition, the VLSI realization of complex trellis structures, generated by multi-h phase codes can be realized by an analogous approach. Fig. 6 shows the trellis structure and VLSI grid model implementation of a Viterbi receiver for {2/4,1/4} constraint length 2 phase code [12] . Integrated into this design are the four unique stages of the trellis diagram and appropriate recirculation edges. T h s particular example is not meant to disqualify the universality of tlie CCC construct but rather is presented to hghlight the generality of a VLSI grid model approach to the expression and evaluation of digital communication algorithms insofar as it serves as a navigational aid in the design of practical VLSI circuits.
Many applications associated with the Viterbi algorithm can be classified into grid-point problem instances represented as a lattice in space and time. Wires provide inter- connection in space; memories provide interconnection in time. Resorting to the VLSI grid model allows us to quantify and thus appreciate the area and/or time tradeoff of having numerous processing elements cooperate in the execution of parallel algorithms.
In conclusion, well-structured VLSI layout strategies have been identified for realizing Correlative Encoded MSK-type Viterbi receivers. When the source symbols are correlatively encoded using a first-order polynomial, the appropriate Viterbi receiver takes the form of a cubeconnected cycle (CCC) structure. Second-order encoding polynomials give rise to a new type of area-efficient VLSI structure called a "Double CCC," which is a generalization of the CCC structure. ACKNOWLEDGMENT P. G. Gulak would like to thank Stanford University for proLiding the atmosphere and facilities to further develop $ese ideas. The authors would also like to thank the reviewers for providing many useful suggestions that were helpful in improving the clarity of this paper.
