The Digital Video Broadcasting (DVB) Project was founded in 1993 by the European Telecommunications Standards Institute (ETSI) with the goal of standardizing digital television services. Its initial standard for satellite delivery of digital television, dubbed DVB-S, used a concatenation of an outer (204,188) byte shortened Reed Solomon code and an inner constraint length 7, variable rate (r ranges from 1/2 to 7/8) convolutional code [1] .
The Digital Video Broadcasting (DVB) Project was founded in 1993 by the European Telecommunications Standards Institute (ETSI) with the goal of standardizing digital television services. Its initial standard for satellite delivery of digital television, dubbed DVB-S, used a concatenation of an outer (204,188) byte shortened Reed Solomon code and an inner constraint length 7, variable rate (r ranges from 1/2 to 7/8) convolutional code [1] .
The same infrastructure used to deliver television via satellite can also be used to deliver Internet and data services to the subscriber. Internet over DVB-S is a natural competitor against cable modem and DSL technology, and its universal coverage allows even the most remote areas to be served. Because DVB-S only provides a downlink, an uplink is also needed to enable interactive applications such as web browsing. The uplink and downlink need not be symmetric, since many Internet services require a faster downlink.
One alternative for the uplink is to use a telephone modem, but this does not allow for always-on service, has modest data rates, and can be costly in remote areas. A more attractive alternative is for the subscriber equipment to transmit an uplink signal back to the satellite over the same antenna used for receiving the downlink signal. However, given the small antenna aperture and requirement for a low-cost, low-power amplifier, there is very little margin on the uplink. Therefore, strong FEC coding is desired. For this reason, the DVB Project has adopted turbo codes for the satellite return channel in its DVB-RCS (Return Channel via Satellite) standard [2] .
At the same time that the DVB Project was developing turbo coding technology for the return channel, it was updating the downlink with modern coding technology. The latest standard, called DVB-S2, replaces the concatenated Reed-Solomon/convolutional coding approach of DVB-S with a concatenation of an outer BCH code and inner low density parity check (LDPC) code [3] . The result is a 30% increase in capacity over DVB-S. In this chapter, the coding strategies used by both DVB-RCS and DVB-S2 are discussed.
DVB-RCS
The DVB-RCS turbo code was optimized for short frame sizes and high data rates. Twelve frame sizes are supported raging from 12 bytes to 216 bytes, including a 53 byte frame compatible with ATM and a 188 byte frame compatible with both MPEG-2 and the original DVB-S standard. The return link supports data rates from 144 kbps to 2 Mbps and is shared among terminals by using muti-frequency time-division multiple-access (MF-TDMA) and demand-assigned multiple-access (DAMA) techniques. Eight code rates are supported, ranging from r = 1/3 to r = 6/7.
Like the turbo codes used in other standards, a pair of constituent RSC encoders is used along with log-MAP or max-log-MAP decoding [4] . The decoder for each constituent code performs best if the encoder begins and ends in a known state, such as the all-zeros state. This can be accomplished by independently terminating the trellis of each encoder with a tail which forces the encoder back to the all-zeros state. However, for the small frame lengths supported by DVB-RCS, such a tail imposes a non-negligible reduction in code rate and is therefore undesirable. As an alternative to terminating the trellis of the code, DVB-RCS uses circular recursive systematic convolutional (CRSC) encoding [5] , which is based on the concept of tailbiting [6] . CRSC codes do not use tails, but rather are encoded in such a way that the ending state matches the starting state.
Most turbo codes use binary encoders defined over GF (2) . However, to facilitate faster decoding in hardware, the DVB-RCS code uses duobinary constituent encoders defined over GF(4) [7] . During each clock cycle, the encoder takes in two data bits and outputs two parity bits so that, when the systematic bits are included, the code rate is r = 2/4. In order to avoid parallel transitions in the code trellis, the memory of the encoder must exceed the number of input bits, and so DVB-RCS uses constituent encoders with memory three (a constraint length of four).
There are several benefits to using duobinary encoders. First, the trellis contains half as many states as a binary code of identical constraint length (but the same number of edges) and therefore needs half as much memory and the decoding hardware can be clocked at half the rate as a binary code. Second, the duobinary code can be decoded with the suboptimal but efficient max-log-MAP algorithm at a cost of only about 0.1-0.2 dB relative to the optimal log-MAP algorithm. This is in contrast with binary codes, which lose about 0.3-0.4 dB when decoded with the max-log-MAP algorithm [8] . Additionally, duobinary codes are less impacted by the uncertainty of the starting and ending states when using tailbiting and perform better than their binary counterparts when punctured to higher rates. 
Encoding
The CRSC constituent encoder used by DVB-RCS is shown in Fig. 1 . The encoder is fed blocks of k message bits which are grouped into N = k/2 couples. The number of couples per block can be N ∈ {48, 64, 212, 220, 228, 424, 432, 440, 752, 848, 856, 864}. The number of bytes per block is N/4. In Fig. 1 , A represents the the first bit of the couple, and B represents the second bit. The two parity bits are denoted W and Y . For ease of exposition, subscripts are left off the figure, but below a single subscript is used to denote the time index k ∈ {0, ..., N − 1} and an optional second index is used on the parity bits W and Y to indicate which of the two constituent encoders produced them.
Let the vector
, 1} denote the state of the encoder at time k. Note that although the inputs and outputs of the encoder are defined over GF (4) , only binary values are stored within the shift register and thus the encoder has just eight states. The encoder state at time k is related to the state at time k − 1 by
where
and
Because of the tailbiting nature of the code, the block must be encoded twice by each constituent encoder. During the first pass at encoding, the encoder is initialized to the all-zeros state, S 0 = [0 0 0] T . After the block is encoded, the final state of the encoder S N is used to derive the circulation state
Where the above operations are over GF (2) . Note that the matrix I + G N is not invertible if N is a multiple of the period of the encoder's impulse response (which is seven for this encoder). However, this is not a problem because none of the permitted values of N are multiples of seven. In practice, the circulation state S c can be found from S N by using a lookup table (which is specified in the standard). Once the circulation state is found, the data is encoded again. This time, the encoder is set to start in state S c and will be guaranteed to also end in state S c .
The first encoder operates on the data in its natural order, yielding parity couples {W k,1 , Y k,1 }. The second encoder operates on the data after it has been interleaved. Interleaving is performed on two levels. First, interleaving is performed within the couples, and second, interleaving is performed between couples. Let {A k , B k } denote the sequence after the first level of interleaving and {A k , B k } denote the sequence after the second level of interleaving. In the first level of interleaving, every other couple is reversed in order, i.e.
In the second level of interleaving, couples are permuted in a pseudorandom fashion. The exact details of the second level permutation can be found in the standard [2] .
After the two levels of interleaving, the second encoder (which is identical to the first) encodes the sequence {A k , B k } to produce the sequence of parity couples {W k,2 , Y k,2 }. As with the first encoder, two passes of encoding must be performed, and the second encoder will have its own independent circulation state. To create a rate r = 1/3 turbo code, a codeword is formed by first transmitting all the uninterleaved data couples {A k , B k }, then transmitting {Y k,1 , Y k,2 } and finally transmitting {W k,1 , W k,2 }. The bits are transmitted using QPSK modulation, so there is a one-to-one correspondence between couples and QPSK symbols. Alternatively, the code word can be transmitted by exchanging the parity and systematic bits, i.e.
Code rates higher than r = 1/3 are supported through the puncturing of parity bits. To achieve r = 2/5, both encoders maintain all the Y k but delete odd-indexed W k . For rate 1/2 and above, the encoders delete all W k . For rate r = 1/2, all the Y k bits are maintained, while for rate r = 2/3 only the even-indexed Y k are maintained, and for rate r = 4/5 only every fourth Y k is maintained. Rates r = 3/4 and 6/7 maintain every third and sixth Y k respectively, but are only exact rates if N is a multiple of three (otherwise the rates are slightly lower). Fig. 2 . A decoder for the DVB-RCS code.
Decoding
Decoding of the DVB-RCS code is complicated by the fact that the constituent codes are duobinary and circular. As with conventional turbo codes, decoding involves the iterative exchange of extrinsic information between the two component decoders. While decoding can be performed in the probability domain, the log-domain is preferred since the low complexity max-log-MAP algorithm can then be applied [4] . Unlike the decoder for a binary turbo code, which can represent each binary symbol as a single log-likelihood ratio, the decoder for a duobinary code requires three log-likelihood ratios. For example, the likelihood ratios for message couple (A k , B k ) can be represented in the form
where (a, b) can be (0, 1), (1, 0), or (1, 1). An iterative decoder that can be used to decode the DVB-RCS turbo code is shown in Fig. 2 . The goal of each of the two constituent decoders is to update the set of log-likelihood ratios associated with each message couple. In the figure and in the following discussion, {Λ a,b (A k , B k )} along with the received values of the parity bits generated by the corresponding encoder (in LLR form). Using these inputs and knowledge of the code constraints, it is able to produce the updated LLRs {Λ
As with binary turbo codes, extrinsic information is passed to the other constituent decoder instead of the raw LLRs. This prevents the positive feedback of previously resolved information. Extrinsic information is found by simply subtracting the appropriate input LLR from each output LLR, as indicated in Fig. 2 .
The extrinsic information that is passed between the two decoders must be interleaved or deinterleaved so that it is in the proper sequence at the input of the other decoder. Interleaving and deinterleaving between the two constituent decoders must be done on a symbol-wise basis by assuring that the three likelihood ratios
The trellis for the duobinary constituent code is as shown in Fig. 3 . The trellis contains eight states, with four branches entering and exiting each state. Note that this is in contrast with a conventional binary code which only has a pair of branches entering and exiting each state. The trellis contains two 4 by 4 butterflies, and because these two butterflies are independent, they can be processed in parallel. In the following, the i th state is denoted by S i where i ∈ {0, ..., 7} for DVB-RCS. Note that the subscript i takes on a slightly different connotation depending on whether encoding or decoding is being discussed. When discussing encoding, the subscript was used to indicate a time step, but when discussing decoding the subscript indicates a particular state.
The extension of the log-MAP and max-log-MAP algorithms [4] to the duobinary case is fairly straightforward. Each branch must be labeled with the log-likelihood ratios corresponding to the systematic and parity couples associated with that branch. Because QPSK modulation is orthogonal, the LLR of message couple (A, B) can be initialized prior to being fed into the first decoder as Λ
Because extrinsic information about the parity bits is not exchanged, the parity bits can always be decomposed in a similar manner. However, for the systematic bits, the three likelihood ratios defined in (5) must be calculated during each iteration and exchanged between the decoders.
Let γ k (S i → S j ) denote the branch metric corresponding to state transition S i → S j at time k. The branch metric depends on the message and parity couples that label the branch along with the channel observation and extrinsic information at the decoder input. In particular, if transition
As with binary codes, the constituent decoder must perform a forward and a backward recursion. Let α k (S i ) denote the normalized forward metric at trellis stage k and state S i , while α k+1 (S j ) is the forward metric at trellis stage k + 1 and state S j prior to normalization. The forward recursion is
where the max * operation 1 is performed over the four branches S i → S j leading into state S j at time k + 1. While the log-MAP algorithm uses the exact definition of max * , the max-log-MAP algorithm uses the approximation max * (x, y) ≈ max(x, y).
After computing α k+1 (S j ) for all S j at time k + 1, the forward metrics are normalized with respect to the metric stored in state zero
Similarly, let β k+1 (S j ) denote the normalized backward metric at trellis state k + 1 and state S j and β k (S i ) denote the backward metric at trellis state k and state S i prior to normalization. The backward recursion is
where max * is over the four branches S i → S j exiting state S i at time k. As with α, the β's are normalized with respect to the metric stored in state zero
Because the encoders are circular, special care must be taken to initialize the forward and backward recursions. Since the starting and stopping states are identical, the code trellis can be visualized as a cylinder (see, for example, Fig. 1 in [10] ). The forward recursion can be interpreted as going around the cylinder in the clockwise direction and the backward recursion as going around the cylinder in the counter-clockwise direction.
Several algorithms are presented in [10] for decoding circular/tailbiting convolutional codes. The most practical algorithm, called algorithm A3 in [10] , begins by initializing the decoder so that all initial states are equally likely. The forward recursion is initialized so that α 0 (S j ) = 0, ∀S j . The forward recursion then cycles through the entire trellis in the clockwise direction. If the encoder was terminated in a known state, then the forward recursion could halt once it reaches the end of the trellis. However, since the starting and ending states are not known, the forward recursion continues around the cylinder a second time. During the second cycle, the new value of the α k 's are compared against the same value computed during the first cycle, and the new α k 's are used to replace the ones from the last cycle. Once all the α k 's are close to the value from the last cycle, the forward recursion halts. The number of extra trellis sections beyond the first cycle around the cylinder is called the wrap depth. For long frames, the wrap depth is typically smaller than the frame length (so an entire second cycle does not need to be run). However, for short frames, a third or fourth cycle around the trellis cylinder could be required, i.e. the wrap depth could exceed N .
The backward recursion is executed in similar manner, with β N (S i ) = 0, ∀S i and the decoder cycling around the cylinder in the counter-clockwise direction. After making one lap around the cylinder, the algorithm continues until the β k 's closely match the values computed during the previous lap.
After the forward and backward recursions have been completed, a full set of {α k } and {β k } metrics will be stored in memory. The next step is for the decoder to use these metrics to compute the LLRs given by (5). This is accomplished by first computing the likelihood of each branch
Next, the likelihood that message pair (A k , B k ) = (a, b) is calculated using
where the max * operator is over the eight branches labelled by message couple (a, b). Finally, the LLR at the output of the decoder is found as
where (a, b) ∈ {(0, 1), (1, 0), (1, 1)}.
After the turbo decoder has completed a fixed number of iterations or met some other convergence criterion, a final decision on the bits must be made. This is accomplished by computing the LLR of each bit in the couple (A k , B k ) according to
The hard bit decisions can be found by comparing each of these likelihood ratios to a threshold.
Simulation Results
In this section, simulation results are presented that illustrate the performance of the DVB-RCS turbo code. Fig. 4 shows the frame error rate (FER) of several decoding algorithms when using blocks of N = 212 message couples (53 bytes) and code rate r = 1/3. One problem with using circular constituent codes is that the circulation state is unknown at the decoder. The two lowermost curves in Fig. 4 show the impact of the unknown circulation state when using eight iterations of log-MAP decoding. The lowermost curve was created by using a genie-aided decoder that knows the exact circulation state of the encoder. While this decoder is not feasible in practice, it serves as a bound for more practical decoders. The second curve shows the performance when the circulation state is not known and algorithm A3 from [10] is used to compensate for the unknown circulation state. Note that the loss due to imperfect knowledge of the circulation state is only about 0.02 dB at a FER of 10 −4 . The uppermost curve in Fig. 4 shows the performance when using eight iterations of the max-log-MAP algorithm along with algorithm A3 from [10] to handle the unknown circulation state. At a frame error rate (FER) of 10 −4 , the loss due to using max-log-MAP is only about 0.16 dB. This is in contrast with the 0.3 − 0.4 dB losses that are incurred when decoding binary turbo codes with the max-log-MAP algorithm, and for this reason many DVB-RCS decoder implementations use max-log-MAP [11] . Fig. 5 shows the influence of the block size. Frame error rate results are shown for blocks of N = {48, 64, 212, 432, 752} message couples, or correspondingly {12, 16, 53, 108, 188} bytes. In each case, the code rate is r = 1/3, the circulation state is unknown at the decoder, and eight iterations of maxlog-MAP decoding are performed. The SNR required to achieve a FER of 10 
DVB-S2
The DVB-S2 standard uses a serial concatenation of two binary linear codes: an outer BCH code and an inner low density parity check (LDPC) code [3] . With binary linear block codes, a k bit message d is encoded into a n bit codeword c according to
where G is the k by n generator matrix and the matrix multiplication is modulo-2. The parity-check matrix H is a m = n − k by n matrix that spans the null space of G and therefore satisfies GH T = 0. LDPC codes are characterized by having very sparse parity check matrices. An LDPC code is said to be regular if all rows in H have the same weight (number of ones) and all columns in H have the same weight; otherwise the code is irregular. LDPC codes were originally proposed by Gallager in his 1960 dissertation [12] along with an iterative process for decoding. Although Gallager proved that the codes were good in theory, they were largely ignored until the advent of turbo codes because the decoder was thought to be too complex. However, after turbo codes showed the practicality of iterative decoding, interest in LDPC codes was soon renewed. In the mid-1990's, MacKay rediscovered LDPC codes [13, 14] and showed that they are capable of approaching the Shannon limit. Soon afterwards, Richardson and Urbanke [15] and Luby et al [16] showed that long irregular LDPC codes can be superior to turbo codes of the same length and can approach the Shannon capacity by a fraction of a decibel [17] .
Encoding
The DVB-S2 channel encoder begins by first encoding a length k binary message into a n bit systematic BCH codeword. The k = n BCH codeword is then encoded into a n bit systematic LDPC codeword. The codeword length n can be either 64,800 or 16,200 bits long, producing normal and short frames, respectively. Note that unlike DVB-RCS, which fixes the value of the encoder input, DVB-S2 fixes the length of the encoder output. Because of this, the length k of the input to the BCH encoder and the length k of the input to the LDPC encoder (which equals the length n of the output of the BCH encoder) are variable and depend on the rate r of the LDPC code. Normal frames can be encoded at eleven different code rates, as shown in Table 1 . Short frames can be encoded at all the same code rates except for rate r = 9/10, which is not supported, as shown in Table 2 . For short frames, the "rates" Table 1 . The input k and output n word sizes of the outer BCH code used by normal DVB-S2 frames. Also listed is the error correcting capability t of the BCH code. Table 2 . The input k and output n word sizes of the outer BCH code used by short DVB-S2 frames. Each of these BCH codes has error correcting capability t = 12. Also listed is the actual rate r = k/16200 of the inner LDPC code.
1/4, 1/2, 3/4, 4/5, and 5/6 are misnomers because the actual rates, as defined by r = k/n = k/16200 are 1/5, 4/9, 11/15, 7/9 and 37/45, respectively. For short frames, the BCH code can correct t = 12 errors at the output of the LDPC decoder, while for normal frames it can correct between t = 8 and t = 12 errors, depending on rate of the LDPC code. The BCH code requires 168 parity bits for short frames and between 128 and 192 parity bits for long frames. Thus the overall rate r of the concatenated BCH/LDPC code is slightly lower than that of the LDPC code alone. For instance, short frames encode messages of length k = k − 168 = 16200r − 168 and thus have overall rate r = r − 168/16200.
Unlike with turbo codes, the encoding of LDPC codes can be very complex if the code is not designed with encoding in mind [18] . This is especially true for systematic LDPC codes, because a sparse H matrix could require a dense G matrix. In order to facilitate systematic encoding and produce irregular LDPC that are especially suitable for high code rates, DVB-S2 uses a class of LDPC codes called extended irregular repeat accumulate (eIRA) codes [19] . With eIRA codes, the parity check matrix is constrained to be in the form
where H 1 is a sparse m by k matrix and H 2 has the form
Given the constraint on the H matrix, the generator matrix can be expressed in systematic form as
The matrix H −T 2 is actually the generator matrix for a differential encoder (also called an accumulator). Thus the encoding of the DVB-S2 LDPC code can be accomplished in two stages. First the output of the BCH encoder d is multiplied by the sparse matrix H The modulation can be either QPSK, 8PSK, 16APSK, or 32APSK. The 16APSK constellation consists of two concentric rings with 4 uniformly spaced symbols on the inner ring and 12 uniformly spaced symbols on the outer ring. The 32APSK adds a third ring outside the 16APSK constellation, with 16 equally spaced symbols along the third ring. For the higher order modulations (everything except QPSK), a bit interleaver is placed between the channel encoder and the modulator, and thus the system uses bit interleaved coded modulation (BICM) [20] .
Decoding
An LDPC code can be decoded iteratively using a message passing algorithm [14] over a graphical representation of the code's parity check matrix called a Tanner graph [21] . A Tanner graph is a bipartite graph consisting of n variable nodes (v-nodes) and m check nodes (c-nodes). Variable node y j is connected to check node f i if and only if the (i, j) th entry of H is equal to one. As an example, consider the systematic (7, 4) Hamming code with parity check matrix:
The corresponding Tanner graph for this code is shown in Fig. 7 . In the message passing algorithm, messages (in the form of extrinsic information) flow up from the variable nodes to the check nodes and down from the check nodes to the variable nodes. A full description of the decoding algorithm is presented in [22] , so here only the main results are presented. The goal of the decoder is to compute the LLR of the i th code bit, Λ (o) (c i ), from which the systematic bits can be extracted and a hard decision made.
In the following, let q i,j represent the message passed from v-node i to c-node j and r j,i represent the message passed from c-node j to v-node i. Let C i = {j : h j,i = 1} denote the row locations of the 1's in the i th column of H and likewise R j = {j : h j,i = 1} the column locations of the 1's in the j th row. The set R j\i is equal to R j with i excluded.
Initially, the messages passed from the v-nodes to the c-nodes are set to the channel likelihood values, i.e. q i,j = Λ (i) (c i ). Then each c-node computes the messages that it sends to every v-node it is attached to according to:
where the nonlinear φ(x) function is defined as
Next, each v-node updates the LLR of its corresponding code bit according to
and then generates the output message sent to every c-node that it is connected to
A hard decision can be made by simply comparing Λ (o) (c i ) to a threshold. Since LDPC codes are linear block codes, they have a built in error detecting mechanism. When the estimated codewordĉ is a valid codeword, thenĉH T = 0 and thus the decoder can halt. IfĉH T = 0, then the decoder can iterate by having each c-node compute new messages to send to the v-nodes according to (21) . Once a maximum number of iterations is reached, the decoder will quit.
Simulation Results
In this section, simulation results are presented that illustrate the performance of the DVB-S2 LDPC code. Fig. 8 shows the frame error rate (FER) performance of the short frame size and Fig. 9 shows the FER performance of normal frame size. In each case, up to 100 iterations of the log-domain sumproduct algorithm [14] described in the last subsection are executed. Table 3 shows the E b /N o required to achieve a FER of 10 −3 for each rate and frame size. Because of the large size of the normal frame size code and the steepness of the corresponding FER curve, results could not be simulated all the way down to a FER 10 −3 for every code rate. Thus, extrapolated results are given for rates r = 1/3, 1/2, 2/3 and 5/6. Note that the results presented here are only for the LDPC code. The outer BCH code used by DVB-S2 helps to clean up additional errors at the output of the LDPC decoder and will improve the overall performance (mainly be reducing an error floor that begins to emerge below the shown error rates).
rate short normal 9/10 N/A 3.78 dB 8/9 3.78 dB 3.68 dB 5/6 3.06 dB 3.03* dB 4/5 2.72 dB 2.68 dB 3/4 2.33 dB 2.18 dB 2/3 1.95 dB 1.86* dB 3/5 1.59 dB 1.36 dB 1/2 0.93 dB 0.85* dB 2/5 0.55 dB 0.54 dB 1/3 0.37 dB 0.22* dB 1/4 0.25 dB 0.13 dB . Frame error rate performance of the n = 64, 800 bit (normal frame) LDPC code used in DVB-S2. The decoder uses 100 iterations of the log-domain sumproduct algorithm.
Putting It All Together
The turbo principle stands to revolutionize the delivery of digital content via satellite. The future of the DVB project hinges upon turbo-like coding techniques. The DVB-RCS standard, which uses a circular duobinary turbo code, provides a return channel for Internet services, thereby instantly making satellite a serious competitor to cable modems and DSL. The DVB-S2 standard, which uses LDPC codes, represents a significant improvement in the satellite downlink. However, for these technological improvements to be a complete success several hurdles remain. Turbo and LDPC codes are still more complex than their convolutional and Reed Solomon brethren, and therefore significant advances in implementation must still come to fruition. In addition, iteratively decodable codes are more sensitive to channel estimation and synchronization errors and therefore these issues must be dealt with carefully.
About the Simulations
The software to generate the plots in this chapter has been made available to the public at the http://www.iterativesolutions.com website. The software runs within matlab, but the key encoding and decoding functions are written in c for rapid execution and called as c-mex functions from matlab.
