A new method of designing high-performance, low-memory, interleaver banks for Turbo-codes is presented. The new interleavers are called dithered relative prime (DRP) interleavers. Only a small number of parameters are required to both store and implement each interleaver in the bank. The error rate performance is similar to that achieved by other good interleaver designs that typically require the storage of all K indexes for each interleaver of length K.
Introduction
Turbo-codes [ 1,2] have received considerable attention since their introduction in 1993. This is due to their powerful error correcting capability, reasonable complexity, and flexibility in terms of accommodating different block lengths and code rates. The Turbo-code (TC) encoder considered here consists of two 8-state, rate 1/2 recursive systematic convolutional (RSC) encoders operating in parallel with the data bits, d,, interleaved between the two RSC encoders, as shown in Figure 1 . The (feedback, feedforward) polynomials are (13,15) octal, as specified by the 3GPP standard [3] . Without puncturing, the overall code rate is 1/3. Other code rates are obtained by puncturing the coded bits. Standard practice has been to only puncture the parity bits, p,. However, it will be shown that a significant increase in (Hamming) distance can be achieved by also puncturing a small number of data bits. A high minimum distance is desirable for both lowering the so-called "error floor" or flare and for making the asymptotic flare performance as steep as possible.
Interleaving is a key component of Turbo-codes, as shown in Figure 1 . Two interleaver types that have been commonly investigated are the "random" interleaver and the so-called "S-random" or "spread" interleaver [4,5,6]. It was recognized early on that good spreading properties are desirable for both fast convergence and good distance properties. More recent high-spread interleavers include the dithered golden interleavers introduced in [7] , and the low extrinsic correlation interleavers described in [8]. An efficient method of generating high-spread random (HSR) interleavers was described in [9] . This method also uses a more natural and effective definition of spread that is closely related to the distance properties of Turbo-codes. The same spread definition is used here. The HSR method, along with distance spectrum testing and index shuffling to eliminate low-weight codewords (post-processing), has provided some of the best performance results to date. The HSR method is used herein as one performance benchmark.
The above interleaver design methods typical1 y require that all Kb indexes be stored to implement a single interleaver of length Kb. This is not a major concern when only one interleaver is required, as the other memory requirements for the corresponding TC encoder and decoder are also order Kb. However, when a bank of E interleavers i'3 required to accommodate E different block lengths, and B is on the order of the longest interleaver length, KB, then the interleaver bank memory requirements become order K i . This can be prohibitive, especially if KB is many thousands of bits. This is the interleaver bank problem. In general, there are several criteria that a good interleaver bank should satisfy. The bank should provide a wide range of interleaver lengths, for example from a few tens of bits to many thousands of bits, depending on the application. The bank should have good resolution with convenient interleaver lengths. For example, the lengths could increase by 1 or 2 bits for short lengths (tens of bits), by a single byte (8 bits) for medium lengths (hundreds of bits:), or by a few bytes for long lengths (thousands of bits). The amount of memory required to define and store each interleaver should be low. Ideally, there should only be a few parameters per interleaver length. The algorithm used to generate the interleaver indexes should also be simple. If the algorithm is simple enough, the indexes for a selected interleaver can be generated "on-the-fly", as needed by the encoder and decoder, saving even more memory. On-the-fly index generation is considered a bonus feature since the overall memory requirements remain order KB, with or without this feature. H[owever, this feature can still reduce the amount of memory required and simplify the initialization process when changing block lengths. Finally, the interleaver bank should provide good error rate performance for all block lengths. It is easy to design highly structured interleaver banks that satisfy all of the above criteria, except for the last one. The challenge is to get good performance too. For example, given a block length, K, a simple relative prime (RP) interleaver can be defined by just one other parameter, p , the modulo-K where Z(O)=s. Thus, an RP interleaver can be implemented using a single modulo-K index increment, p .
The new spread measure associated with two write indexes i and j , for any interleaver I, is defined as [9] index increment [7] . These interleavers can easily achieve high spreads and thus can eliminate the worst-case lowweight codewords. In fact, these interleavers do provide excellent performance for short block lengths. However, the performance for medium and long block lengths is poor because of the large number of compound low-weight codewords generated by the repetitive structure. Another example of a low-memory interleaver bank is that specified in the 3GPP standard [3] . The 3GPP standard is used herein as one performance benchmark. The dithered-diagonal interleavers described in [9] are also candidates. In particular, excellent performance results have been obtained for the special block lengths of K=2n2, where n is an integer, but not a multiple of 7 (the period of the feedback polynomial in the RSC encoders). These special interleavers can be stored and implemented using just n index increment values. This represents a significant reduction in the memory requirements. However, the bank resolution is rather coarse and the block lengths are not the most convenient (e.g. they are generally not multiples of bytes). Even so, it was the good error rate performance, and the low-memory requirements, of these special dithered diagonal interleavers that partly motivated the new approach presented here. This paper describes a new family of interleavers, called dithered relative prime (DRP) interleavers, that provides a good solution to the interleaver bank problem for Turbocodes. Section 2 reviews the interleaver and spread
The (minimum) spread associated with index i is then
The overall (minimum) spread is defined as
1
Proper termination of the TC's RSC constituent codes is very important for good performance at low error rates [7] . Some form of dual termination or dual tail-biting is recommended, as defined in [IO, ] I] for example. With dual tail-biting, the absolute differences in (3) should be computed in a tailbiting sense. For these spread definitions, it can be shown that the theoretical maximum spread (with dual tail-biting) is floor(fi). As an example, for a block length of K=512, the theoretical maximum spread is 32 (i.e. S,,,,,,132). example distance Section presents prime (DRP) interleavers. The approach is well suited to dual tail-biting, which is the most difficult TC termination option simulation results. Section 6 contains the conclusions.
Dithered Relative prime ~~~~~l~~~~~~
Section addresses distance testing and presents Some Figure 3 shows the approach used to design dithered relative
Interleaver and Spread Definitions
Interleavers can be defined and implemented in a number of different ways. Figure 2 shows the definition used here. The interleaver reads from a vector of input symbols or samples, v,,, and writes to a vector of interleaved or permuted output samples, v,,,,. The output samples are written using the write indexes i=O.. . K-I, where K is the interleaver length. Vector I defines the order that the samples are read from the input vector. That is, the i-th output, written to location i in the output vector, is read from location I(i) in the input vector. The interleaver is completely defined by read vector I.
For example, letting [ x k denote x modulo-m arithmetic, a simple Rp interleaver of length K is defined by
to accommodate. The approach consists of three stages. First, the input vector, v,,, is dithered (permuted locally) using a small read dither vector, r, of length R. Vector r is a permutation of indexes 0 through R-1. Next, the resulting vector, v,, is permuted using an RP interleaver to obtain good spread. Finally, the resulting vector, vh, is dithered using a small write dither vector, w, of length W, to generate the output vector v,,,,. Vector w is a permutation of indexes 0 through W-1 , The interleaver length, K, must be a multiple of both R and W. Note that short read and write dither vectors will not destroy the good spreading properties of an RP interleaver, but will tend to lower the spread somewhat. While a DRP interleaver could be implemented using the 3-stage process shown in Figure 3 , this is not the recommended approach. The equivalent overall interleaver vector, I, as illustrated in Figure 2 , is determined next.
where p and K are relative primes and s is the starting index.
Note that I can also be computed recursively using Figure 3 can be expressed as follows:
where Thus, the input vector can be interleaved using where the interleaver is completely defined by
All the indexes of I can be computed using equations (7), (S), (9) , and (1 1). Thus, all the indexes of I can be computed using the simple recursion in (13), and the interleaver can be stored by just storing P. (Z(0) is arbitrary.) Further, equation (13) is simple enough to accommodate "on-the-fly" index generation, saving even more memory. In particular, this method works well with the circular buffer feature provided by most modern digital signal processors.
~( [ i
A few important properties are now explained further. can still force a large number of index increments, M. This is undesirable since M is also the resolution of the interleaver bank (i.e. K must be a multiple 01' M). There is also no banefit derived from trying different s \ alues since all relative shifts between dither vectors r and w will occur for every value of s. At the other extreme we have the special case where M=R= W. This case offers the largest amount of dither for the smallest number of index increinents, M, and the finest interleaver bank resolution. In this case, different results can be achieved for all shifts s=O.. .M-I , and thus all of the diifferent shift values are worth considering. This second case is more convenient and has generally been found to give better distance results. This is the only case considered further below. As an example, with M=R= W=S, only 8 index increments are required to both store and implement each interleaver, and the interleaver bank resolutiori is conveniently in bytes.
Example Distance Results
The lowest weight TC codewords are constructed from combinations of low input-weight (IW) patterns that lead to low-weight RSC codewords in both RSC constituent codes. It is important to determine which combinations of low IW patterns need to be considered. For example, certain combinations do not need to be considered because of high spread. A number of distance lower bounds were derived. The presentation of these bounds is beyond tha scope of this paper. From these bounds it was concluded that the most importanl cases to test, and to try and improve, are: "IW2:2,2", "IW3:3,3", "IW4 Distance measurement routines have been developed for all of these cases. For completeness, and because it was feasible, routines were also developed to handle the other IW4 cases, namely "IW4:4,4", "IW4:4,22", and "IW4:22,4". With these extra IW4 cases included the minimum measured distances are guaranteed to be the true minimum distances for all possible IW2, IW3, and IW4 cases. While the minimum distances for IW5 and IW6 cannot be guaranteed in general, the minimum measured distance for IW6 is sure to be the true minimum distance (over IW5 and IW6) for long blocks with large spread. This is because all the other IW5 and IW6 cases improve as the spread increases. interleaver is expected to perform the best for a code rate of 1/3, but the M=4 interleaver should also perform well when puncturing is used to achieve higher code rates. 
Simulation Results
Simulation results are presented for binary antipodal signaling (e.g. BPSK or QPSK) and a block length of K=5 12.
Dual termination was used, as described in [IO, ] I]. The TC used 8-state constituent codes, and the decoder used an enhanced maximum-log-a-posteriori-probability (max-log-APP) approach, with scaled extrinsic information, as described in [12, 13, 14] . It has been found that this decoding approach typically provides performance within 0.1 dB of true log-APP processing for 8-state codes. The maximum number of decoding iterations was set to 16. Early stopping was also used where the decisions before and after each halfiteration must agree 3 times in a row before stopping [14] . Figure 4 shows the packet error rate (PER) results for a block length of K=512 and a code rate of 1/3. (The nominal code rate is used for convenience. The exact code rate is slightly less due to the 6 termination bits included in the interleaver length, K.) Results are shown for the 4 DRP interleavers indicated in Table 1 , with K 4 1 2 and M=l, 2, 4, and 8. For comparison, results are also shown for a random interleaver, the 3GPP interleaver, and a good HSR interleaver with postprocessing to improve the distance spectrum. As expected, the random interleaver performs poorly, the 3GPP interleaver performs better, and the HSR interleaver performs the best.
Not surprisingly, the DRP interleaver with M=l (actually a simple RP interleaver) performs worse than the random interleaver (although it is expected to cross over at higher SNRs). There is a significant improvement with M=2, but performance is still a little worse than that for the 3GPP interleaver. Performance continues to improve with M=4 and M=8. Note that the performance with M=8 is essentially the same as that for the HSR interleaver. with a code rate of 2/3. Most of the results were obtained without any data puncturing using the puncture masks (data, parl, par2) = (I, 0100, OOIO), where a "0" indicates a punctured bit. The DRP (M=l) interleaver is as good as the 3GPP interleaver. The DRP (M=2) interleaver is better than the HSR interleaver, and performance continues to improve with M=4. Note that the M=4 result is slightly better than the M=8 result. This is not surprising given the unpunctured distance results shown in Table 1 . The low IW cases (IW2, IW3, and IW4) are clearly dominating the performance. It should be noted that the HSR interleaver was not designed with puncturing in mind, but the DRP interleavers were.
A small amount of data puncturing, in exchange for more parity bits, can significantly improve the flare performance. This works because most of the distance, especially for the low IW cases, tends to come from the parity bits. It follows that the better the interleaver the better data puncturing works. There is a practical trade-off, however, as too much data puncturing can significantly degrade the convergence performance up top. Figure 5 shows results with 116 data puncturing for the two DRP interleavers with M=4 and M=8. The puncture masks were (data, par I, par2) = (1 1 1 1 IO, 00 1, 001). As can be seen, the flare performance is improved with only a small degradation up top. Note that the M=8 result is now slightly better than the M=4 result, reversing the trend without data puncturing. This is expected because a decrease in the amount of parity puncturing tends to shift the emphasis away from the lowest IW cases. Even so, there is very little to choose from between these two DRP interleavers. 
Conclusions
Dithered relative prime (DRP) interleavers were introduced.
These interleavers provide a good solution to the interleaver bank problem for Turbo-codes. The design is based on using a small read dither vector, r, of length R, a high-spread RP interleaver with starting index s and index increment p , and a small write dither vector, w, of length W. Distance testing is used to help select the dither parameters. A DRP interleaver can be stored by just storing r, w , s and p .
A DRP interleaver can also be stored using a vector, P, containing M index increments, where M is the least common multiple of R and W. The interleaver is generated by repeatedly cycling through these A4 index increments. This method is simple enough to accommodate "on-the-fly'' index generation, and works well with the circular buffer feature provided by most modern digital signal processors. The special case of M=R=W offers the largest amount of dither for the smallest number of index increments. This is important because M is also the resolution of the interleaver bank. As an example, with M=8, only 8 index increments are required to both store and implement each interleaver, and the interleaver bank resolution is conveniently in bytes.
The memory can be reduced further by selecting a small number of "good" dither combinations (r, w, and s) and then just optimizing over p for each interleaver length. Good distance results have been obtained with as few as 8 dither combinations. With this approach, each interleaver in the bank can be stored by just storing 2 integers, the number of the best dither combination and the corresponding best p value found. In this case, the memory that is required to store a large bark of B interleavers is only about 2B iiitegers.
