A stream-oriented parallel concatenated convolutional (Turbo) code (PCCC) is examined. The stream encoder uses continuous encoding with the interleaver(s) viewed as a Finite State Permuters (FSP's). With this encoding architecture, block interleaver design is reexamined and convolutional interleavers are considered. Decoding is performed using a pipelined architecture where each constituent decoder uses a continuous version of the symbolby-symbol MAP decoding algorithm. Performance and implementation issues are discussed and comparisons made to the traditional block decoding architecture. Finally, stream PCCC design using convolutional interleavers is examined based on performance and implementation.
Introduction
Parallel concatenated convolutional codes (PCCC's), often termed Turbo codes, are one of the most powerful coding techniques yet discovered. Introduced in 1], PCCC's have been traditionally viewed as a block coding technique. With this viewpoint, research has focused on the block nature of these codes with extensive work done on block interleaver design and block decoding architectures. In this paper, we explore a stream encoding paradigm, without explicit block boundaries. Of course block turbo coding can be overlayed on an inde nite stream, but in applications where messages are essentially inde nitely long strings, advantages accrue to the stream-oriented viewpoint, as we shall develop.
We begin by describing the encoding and decoding architectures for a stream PCCC. The encoder uses continuous encoding with a block or non-block interleaver. Interleaver design is examined for continuous encoding with special attention given to a common type of non-block interleavers called convolutional interleavers 2] . A stream PCCC decoding architecture is presented and performance and implementation studied.
Sponsored by National Science Foundation, Grant NCR-9415996 and National Aeronautics and Space Administration Lewis Research Center under NAG3-1948. 2 Stream PCCC Encoder PCCC's are traditionally constructed from at least two recursive systematic convolutional codes (RSC's) connected in parallel and separated from the input data stream by interleaver(s). In 3] , it is noted that encoding can be performed blockwise or continuously. For block encoding, data is segmented into non-overlapping blocks of length L with each block encoded (and decoded) independently. This scheme requires the use of a block interleaver and the RSC's must begin in the same state for each new block. This requires either trellis termination or trellis truncation. Trellis termination requires appending extra symbols to the block in order to drive one or all of the the constituent encoders to the zero state. Trellis truncation simply involves resetting the state of the RSC's for each new block.
In a stream PCCC, continuous encoding is used where all constituent encoders are free-running. In a stream PCCC, the trellises of all constituent encoders evolve continuously over time. In this fashion, PCCC's can be thought of as a time-varying convolutional codes whose memory order and trellis structure is determined by the interleaver and constituent encoders. The interleaver in a stream PCCC is no longer constrained to have a block structure. In fact, any periodic interleaver can be used in design.
A periodic interleaver is a device for which the permutation is a periodic function of time. The two most common types are block and convolutional interleavers. Block interleavers accept symbols in blocks and perform identical permutation on each block. Convolutional interleavers have no xed block structure, but perform a periodic permutation over a semi-in nite sequence of symbols.
In the stream PCCC encoder, it is convenient to view the interleaver as a \synchronous interleaver". From 4], any periodic interleaver can be viewed as a synchronous interleaver for which a symbol is read out each time a symbol is read in. These devices can be realized using a tapped nite-state shift register and a commutator. We refer to these devices as Finite State Permuters (FSP's) as in 5] . A periodic interleaver implemented as an an FSP is characterized by a memory order M , a maximum delay D and a mapping function (k). The mapping function represents the position in the input stream of the symbol output at a given time. For example, at time k, symbol x k] enters the interleaver and symbol x (k)] is output, with k (k) for causality. The mapping function also determines the tap positions and hopping pattern for the commutator. In Fig. 1, a 4x4 16 16c. This realization illustrates that FSP's generally require a brief initialization. In Fig.  1 , the initial delay is 9 symbols. In Fig. 2 , a stream PCCC is shown using an FSP. The parity symbol for the interleaved data stream is indexed by (k) to indicate the position of the corresponding systematic symbol in the original data stream. If the interleaver in Fig. 1 were used in the the stream PCCC encoder from Fig. 2 along with constituent encoders with memory order , the stream PCCC is a time-varying convolutional code with memory memory order 2 + 19. Normally interleavers will have su ciently large M such that ML decoding on the entire code is infeasible. In 3], it was shown through the use of an average upper bound over all block interleavers that continuous encoding of PCCC's was superior to trellis termination and trellis truncation. While this observation considered the ensemble of all block interleavers, we now wish to examine the implications of continuous encoding for interleaver design in a stream PCCC. We begin the examination of interleaver design with several de nitions from 6]. First, an output sequence of an RSC encoder is called a Finite Codeword (FC) if the distance of the sequence from the all-zeros sequence is nite. Since an RSC encoder is an IIR device, not all input sequences will generate FC's. For example, a weight-1 input sequence will never generate an FC input by de nition. If an input sequence generates an FC, it is called an FC input. In a PCCC, an input sequence is termed a global FC input if it generates FC's in all constituent encoders. In classifying inputs, the notion of sequence span must also be de ned as the length, measured in symbols, between the rst and last non-zero symbols in an in nite sequence. For example, the span of the sequence u = 0:::010010::::] is S u = 4.
In a stream PCCC, the interleaver design should focus on low-weight, short-span FC inputs. Such sequences will produce low parity weight in one of the encoder. It is the purpose of the interleaver(s) to ensure that larger parity weight is produced in at least one of the other constituent encoders. Therefore, the interleaver should scramble all short-span FC inputs into 1) non-FC inputs or 2) longspan FC inputs. This design objective aims to maximize the minimum weight of all global FC's while also creating a thin weight spectrum. A thin weight spectrum creates the steepness of the BER curves at low SNR while the minimum distance determines the asymptotic coding gain.
Block Interleavers
In a stream PCCC, block interleaver design should follow the objectives previously mentioned. However, interleavers which achieve these objectives do not always perform well for block encoding. For a block PCCC using trellis truncation, interleaver design must avoid edge effects. Edge e ects occur when an interleaver maps short span sequences such that the resulting parity weight is limited by the truncation. These edge e ects often exclude the use of certain interleavers. For example, a row/column design is an attractive option of PCCC design because of its regular structure. However, in the permutation map, the last symbol in the block experiences a trivial permutation, meaning that it scrambles to the same position in the output sequence. If any row/column interleaver is used in a rate 1=3 block PCCC with trellis truncation, the minimum distance of the code is at most 3 and is generated by the sequence u = 0:::01]. A low minimum distance for a PCCC is evident by a very shallow \error oor" in bit error rate plots. Trellis termination xes this problem by excluding weight-1 inputs by incurring a modest rate penalty. Continuous encoding avoids the problem without a rate penalty since a weight-1 input sequence can never be a global FC input, regardless of the interleaver.
Convolutional Interleavers
Another class of periodic interleavers, termed convolutional interleavers (CI's), were rst introduced in 4] and used for coded systems on bursty channels 2]. As previously noted, CI's have no block structure, but can be implemented as an FSP with equally spaced taps and a commutator which follows a regular hopping pattern. (1) For application to stream PCCC's, CI's have several attractive qualities. First, CI's have regular structure making them amenable for analysis and simplying implementation. Secondly, PCCC decoding requires synchronization of the interleaver/deinterleaver. If a block interleaver is used, the synchronization ambiguity is roughly the block size L. For convolutional interleavers, this ambiguity is N which is typically much less than L for systems with comparable performance. A third advantage of CI's is that by choosing N B even, the interleaver is an \odd-even" interleaver which has been shown advantageous for PCCC design when puncturing is used 7] . The traditional PCCC decoder is constructed from constituent decoders which decode each constituent encoder separately using some variant of the symbol-by-symbol MAP decoding algorithm. The constituent decoders share information in the form of \extrinsic" information which can be viewed as an estimate of the \a priori" symbol probabilities. Through the sharing of information between constituent decoders, the decoder can be iterated to gain further con dence in the received symbols and to lower the bit error rate. For block encoding, blocks can be decoded separately due the lack of ambiguity in the starting state of the RSC's. With continuous encoding, each constituent trellis is continuous over time and block decoding is not applicable. For the decoding of stream PCCC's, we appeal to the pipelined architecture given in 1], where each decoding module in the pipeline is capable of performing one iteration. A decoding module is composed of two constituent decoders along with an interleaver/deinterleaver and bu ering for data alignment. The constituent decoders must use a continuous version of the symbol-by-symbol MAP decoding algorithm. Examples include sliding-window algorithms such as SW-Log-MAP (SW-LM), SW-MaxLog-MAP (SW-MLM) and SOVA. The complexity and Fig. 4 . Here, the decoding delay per iteration is the sum of the window lengths of the constituent deocders, W , plus the delay of the interleaver/deinterleaver, D. A PCCC decoder is shown in Fig.  5 where multiple decoding modules are connected in series with the number of iterations equal to the number of decoding modules in the pipeline. Here, the decoding delay I(D + 2W), where I is the number decoding modules in the pipeline. The decoding delay in the pipelined architecture is directly proportional to the number of iterations and the interleaving delay. For comparison of stream PCCC designs, it is logical to compare based on equal decoding delay. However, comparisons between stream PCCC and block PCCC designs is more di cult. In 11], the decoding delay per iteration of a block PCCC decoder used in a pipelined architecture was assumed equal to the block size. The validity of this assumption is a function of technology as well as the input data rate of the link. In 12], an IC implementation of a block PCCC encoder/decoder is described where the decoding delay per iteration (2178) is roughly equal to the block length (2048). Here, we will assume that the decoding per iteration of a block PCCC is L + 2W while the decoding delay per iteration of a stream PCCC is N + 2W. In this fashion, we can compare block PCCC's with interleavers of size L to stream PCCC using FSP's with memory order N where N L.
Simulations and Results
Here, we consider the rate 1=3 PCCC from Fig. 2 which has been punctured to achieve a code rate of 1=2. 16-state RSC's with generator (33; 31) 8 are used as advocated in 13]. All constituent decoders will use a SISO implementation of the sub-optimum SW2-MLM as described in 9] where the window length is chosen as W = 6( + 1) = 30. In regions of low SNR, the use of the sub-optimal SW-MLM has been observed to cost between 0:25 and 0.5 dB in performance as compared to SW- LM 8] . This same result is observed for block decoding 14].
To begin, we compare block PCCC's using trellis termination and trellis truncation to stream PCCC using two di erent block interleavers. Trellis termination uses tail bits to force one of the constituent encoders to the zero state as in 15]. We examined both a 12x16 row/column interleaver (L = 192) a random interleaver with block size of L = 1024. In Fig. 6 , simulation results are shown for 10 iterations of the SW2-MLM on the AWGN channel. For the row/column design, an error oor due to the low minimum distance was observed for trellis truncation. With trellis termination and continuous encoding, the error oor was lowered. However, for L = 1024 case, termination had little e ect and all schemes performed equally well. This result indicates that interleavers which are poor in a block PCCC can perform well with continuous encoding.
Next, results for a stream PCCC design using CI's are shown. In Fig. 7 and 8 , the e ect of N is shown for a xed value of B. Note that performance increases with N up to a point where increasing N actual degrades performance. This is contrary to the viewpoint that larger interleavers o er better performance. For stream PCCC with CI's, careful design must accompany the growth of the interleaver size if improved performance is to be observed. It is also worth noting that the spike in Fig. 8 at N = 15 indicates that this design is poor. This can be attributed to the relationship between N and the period of the RSC which is 15 and proven with careful analysis of of weight-2 FC inputs and (1).
It is also interesting to compare the performance of a stream PCCC using a CI to a block PCCC with equal decoding delay. In 15], it is noted that the 12x16 row/column interleaver with trellis termination performed as well as any random interleaver with same block size. For the stream PCCC using a CI with N = 14 and B = 1, the decoding latency is D = 183 symbols per iteration. Comparing performance of this design to the 12x16 block PCCC where L = 192, a coding gain of 0.5 dB is observed for the CI design for less decoding delay. Therefore, stream PCCC's using CI's are viable options for latency-sensitive applications.
In Fig. 9 , the e ects of the parameter B for a xed N = 14 are shown. It is interesting to note that performance increases up to a point where increasing B is no longer bene cial. Unlike Fig. 8 , performance does not degrade but also does not improve for large D. From Fig. 8 and   9 , we can state that large CI designs should balance the parameters N and B. In Fig. 8 , roughly 4.6 dB were required for a BER=10 ?5 for the design of N = 103, B = 1 and D = 10507. For roughly the same D, the design with N = 14, B = 55 and D = 10011 shown in Fig. 9 achieved the same BER at 1.6 dB. Another design with N = 30, B = 11 and D = 9571 and achieved BER = 10 ?5 at 1.6 dB with 10 iterations of the SW2-MLM. Using the SW2-LM, the same design achieved a BER = 10 ?5 at 0.95 dB after 18 iterations. For performance within 1 dB of the capacity limit capacity limit, large \random" interleavers have been thought necessary. However, the CI with N = 30 and B = 11 is a very structured interleaver which yields nearcapacity performance when used in a stream PCCC.
Conclusions
In conclusions, we have examined a stream-oriented PCCC which uses continuous encoding together with a modi ed decoding architecture. This scheme was shown compatible with any periodic interleaver and studied using both block and convolutional interleavers. For certain block interleavers, continuous encoding provides comparable performance to block encoding with trellis termination. Convolutional interleavers were shown to be attractive for low decoding latency applications, and with higher delay, provide a structured interleaving option with near-capacity performance.
14] P. Roberston, E. Villebrun, and P. Hoeher, \A comparison of optimal and sub-optimal MAP decoding alogrithms operation in the log-domain," in Interna 
