Transactions Letters________________________________________________________________ I. INTRODUCTION

I
N THIS LETTER, a novel decoding procedure for pragmatic trellis-coded modulation (TCM) schemes is introduced. It allows the use of an off-the-shelf Viterbi decoder, originally designed for a standard rate-1/2 64-state convolutional code over B/QPSK modulation. A Viterbi decoder estimates the coded bits based on transformed received symbols in a first stage. The coded bits are interpreted as indexes of cosets of a signal subset in the constellation. To estimate the uncoded bits, a small look-up table (LUT) can be used in a second decoding stage, with addresses generated from hard-decisions that are based on the location of the received symbols and the reencoded coset indexes.
The proposed trellis decoded for -PSK applies a rotational transformation to the received in-phase and quadrature channel symbols. The transformed channel symbols are then input to a conventional Viterbi decoder (VD) that operates in either BPSK or QPSK mode, depending upon the number of coded bits per symbol (CBPS). Let denote the number of CBPS. For TCM schemes with the VD operates in QPSK mode, while for TCM schemes with , operation is in BPSK mode. For practical considerations, the signal constellations are assumed to be partitioned in such a way that the cosets are associated with the outputs of a standard rate-1/2 64-state convolutional encoder. This mapping leads to a so-called "pragmatic" trellis code [1] .
The decoding procedure is similar to that presented in [2] , [3] , with the notable exception that a standard VD core is used in the first decoding stage without any modification in the branch metric computation stage. This leads to savings in the hardware resources needed to implement a two-stage trellis decoder. Also, the symbol transformation presented in this paper can be extended to trellis coded QAM modulations [4] , whereas previous results hold only for PSK constellations. In particular, the modulo phase operation in -PSK modulation, introduced in this paper, is replaced by a linear modulo operation for two-stage decoding of pragmatic trellis-coded -QAM modulation.
II. SYMBOL TRANSFORMATION OF TRELLIS-CODED -PSK
Let denote the and coordinates of a received -PSK symbol . Then the amplitude and phase of are given by and
Based on the phase of a received symbol, a transformation is applied such that the -PSK points are mapped into "coset" points labeled by the outputs of a rate-1/2 64-state convolutional encoder. For trellis-coded (TC) -PSK modulation , let denote the number of CBPS, where . Then the following rotational transformation is applied to each received symbol to obtain an input symbol to the VD.
where and are given by (1) and is a constant phase rotation of the constellation. With this transformation, a -PSK coset in the original -PSK constellation "collapses" into a coset point in a -PSK coset constellation in the -plane.
To estimate the uncoded bits, hard decisions on the received symbols are performed. The outputs are "sectors" give information on the proximity of the received symbols to -PSK cosets. Each sector needs to be delayed by an amount equal to the decoding delay in the VD. This sector information together with the estimated coded bits, is used to estimate the uncoded bits, on a symbol-by-symbol basis. A block diagram of the two-stage decoding procedure is shown in Fig. 1 .
A more general decoding scheme passes to the VD, at each decoding stage, a metric weighting the relative merit of each coset. In the case of a Gaussian channel, each coset metric is proportional to the square of the distance between the channel symbol and the nearest member of a coset. For each coset metric, the member of a coset nearest to the received channel symbols is recorded. Application of the Viterbi algorithm then produces a maximum-likelihood coset sequence estimate. Reencoding yields a coset selection sequence which indexes the recorded list of coset members in order to estimated the uncoded information.
Note that any TC -PSK system with CBPS needs a bits-to-signal mapping such that the LSB alternates between neighboring signal points (like in Ungerboeck mapping). On the other hand, TC -PSK modulation with CBPS requires that the signal points be grouped into subsets each of which is a symmetric -PSK constellation . The two coded bits are then associated with a phase rotation of the subset in the -PSK constellation. The uncoded bits can be labeled in an arbitrary manner with relatively small impact in the overall error performance. (Thus, many mappings are possible.)
To illustrate the proposed two-stage trellis decoder, two examples are presented below.
Example 1: Rate-2/3 trellis-coded 8-PSK modulation with two coded bits per symbol.
Two information bits are encoded to produce three coded bits , which are mapped onto an 8-PSK signal point, where are the outputs of the standard rate-1/2 64-state convolutional encoder ( and are the outputs from generators 171 and 133, in octal, respectively.) The signal points are labeled by bits , and the pair is the index of a coset of a BPSK subset in the 8-PSK constellation, as shown at the top of Fig. 2 .
In this case
and, under the rotational transformation, a BPSK subset in the original 8-PSK constellation, collapses to a coset point of the QPSK coset constellation in the -plane, as shown in Fig. 2 . Note that both points of a given BPSK coset have the same value of . This is because their phases are given by and . The output of the VD is an estimate of the coded information bit, . In order to estimate the uncoded information bit, , it is necessary to reencode to determine the most likely coset index. This index and a sector in which the received 8-PSK symbol lies can be used to decode . For a given coset, each sector gives the closest point (indexed by ) in the BPSK pair to the received 8-PSK symbol. For example, if the decoded coset is and the received symbol lies within sector 3, then , as can be verified from Fig. 2 . Fig. 3 shows plots of bit error rate (BER) versus signal-to-noise ratio for the two-stage decoding procedure (TC8PSK_23_TSD) compared with maximum-likelihood decoding (or MLD, denoted by TC8PSK_23_SSD). With respect to MLD, at a BER of , the proposed two-stage decoder exhibits a small loss of less than 0.3 dB. This loss can be explained, with the aid of Fig. 4 , as follows. Without loss of generality, consider the squared Euclidean distance from a point on the unit circle to the coset . We refer to this distance as the coset distance. Let denote the phase of . The squared Euclidean distance between and the closest point in coset is denoted by . Let denote the distance between the rotated point , after the transformation, and the coset point . Then
Both coset distances and are periodic functions of with period (or 180 degrees). The top part of Fig. 4 shows a plot of and with respect to , for (in the plot, from degrees to 90 degrees). The suboptimality of the proposed decoding procedure comes from the fact that the metrics delivered to the VD are no longer optimal. That is, the ratio is not a constant, as shown in the bottom part of Fig. 4 . Therefore, depending on their phases, the transformed received points are weighted differently than the optimal metric, by a factor of . It follows that the output sequence estimated by the VD decoder is not always associated with the most likely path.
Example 2: Rate-5/6 trellis-coded 8-PSK modulation with 1 coded bit per symbol.
In rate-5/6 TC 8PSK modulation with 1 CBPS, the rate is increased by encoding in 4 dimensions. In this section, the European DVB-DSNG standard [6] is used as an example. The encoder of a TC 8PSK modulation system with 2.5 bits/symbol is a conventional 64-state rate-1/2 convolutional encoder with outputs distributed over two 8-PSK symbols. There are a total of five information bits, one bit encoded and the remaining four bits uncoded, for every two output 8-PSK symbols. An information vector is encoded into , , which are mapped onto two 8-PSK signal points, where are the outputs of the standard rate-1/2 64-state convolutional encoder with input ( and are the outputs from generators 171 and 133, in octal, respectively). Fig. 5 shows the mapping of bits to 8-PSK constellation points. In this case, the convolutional encoder outputs, , are indexes of a pair of cosets of a QPSK subset in the 8-PSK constellation, over two symbol intervals. As a result, for the case of TC 8PSK modulation with 1 CBPS, the coset constellation is a BPSK signal set.
Again, let and denote the amplitude and phase of the received symbols. Then for TC 8-PSK modulation with 1 CBPS, the transformed phase is . The transformed -channel and -channel symbols and are input to a VD that treats them as if BPSK modulation was employed. At each decoding stage, two BPSK symbols are passed to the VD. Fig. 6 shows simulation results of the above two-stage decoding procedure (TC8PSK_56_TSD) and MLD (TC8PSK_56_SSD) for TC 8PSK modulation with 2.5 bits per symbol. In this case, it is interesting to note that at the "quasi-error-free (QEF)" point, i.e., a BER of , the performance of the two-stage decoder is practically the same as MLD. The reason behind this behavior is that the minimum squared Euclidean distance (MSED) between sequences associated with the uncoded bits is much smaller that the MSED between sequences associated with the coded bit. As a result, at medium to large values of , the error performance is dominated by the uncoded bits. Since the uncoded bits select points in a QPSK signal set, the performance curve becomes parallel to that of uncoded QPSK. The difference between the curves equals the rate advantage of the TCM scheme (2.5 versus 2 bits per symbol), which translates into approximately 0.1 dB.
III. IMPLEMENTATION ISSUES
In terms of complexity of implementation, the proposed two stage decoder offers two advantages compared with existing similar decoders [7] .
The first advantage is that there is no need to modify the branch metric computation stage. Existing trellis decoder architectures need to "pull-out" the branch metric computation (BMC) stage out of the Viterbi decoder. In this case, two different modes of operation (and signal paths) of the BMC are required, one for B/QPSK modulation and another for trellis coded -PSK modulation. The BMC for trellis decoding produces a set of branch metrics that are passed to the add-compare-select unit. In contrast, the proposed decoder uses a preprocessing stage, in the form of a rotational transformation of the incoming symbol, if the mode is TCM. This stage is by-passed if the mode is B/QPSK modulation. The outputs are pairs of modified channel symbols to be processed by the Viterbi decoder. In this case, the Viterbi decoder is not modified. Note that in both cases the sector information needs to be generated from the received channel symbols. The sector information together with the reencoded VD output are used to estimate the uncoded information bits in the second decoding stage.
The second advantage of the proposed decoder is that smaller memory elements are needed for symbol transformation and sector computation. Consider trellis-coded -PSK modulation with coded bits per symbol. Assume a fixed-point arithmetic implementation of the decoder with an LUT memory to compute branch metrics, transformed symbols and sectors. Let and denote the number of bits per metric and per channel symbol, respectively. Due to the symmetry of the -PSK constellations, bits are used to address the LUT memory. In previous works, the LUT memory outputs the metrics for the VD and the sector information used in the second stage. Note that there are sectors. Therefore, a total of bits of memory are needed. On the other hand, our decoder generates pairs of transformed symbols with the same number of bits as the input symbols, . As a result, the total number of memory bits is For example, consider 8-PSK modulation with and bits. The proposed two-stage decoder needs bits per input symbol . This compares favorably with bits per input symbol of previous approaches. In this case, the new trellis decoder reduces the memory requirements by 40%. Although the complexity estimates depend on the specific implementation of the decoder, the above example shows the savings in hardware that are possible.
IV. FINAL REMARKS AND CONCLUSION
In this letter, a novel two-stage decoding procedure has been introduced for pragmatic trellis-coded -PSK modulation, with one or two coded bits per symbol. In the first decoding stage, a rotational transformation applied to the incoming -channel and -channel symbols allows the use of an off-the-shelf Viterbi decoder without any modifications. In a second decoding stage, reencoded coset indexes and sector information give estimates of the uncoded bits. Simulation results of TC 8PSK with both one and two coded bits per symbol show that the two-stage decoder performs within 0.3 dB from maximum-likelihood decoding. Finally, we note that, without loss of generality, this approach can be extended toward punctured codes [1] , [5] , where the process of de-puncturing, or erased symbol insertion, is a synchronized preprocessing step before the Viterbi decoder.
