ABSTRACT
INTRODUCTION
Signal Space Diversity (SSD) [1] [2] doubles the diversity order of the conventional BICM schemes and largely improves the fading performance especially for high coding rate systems [3] . When a severe fading occurs, BICM with SSD [3] avoids the simultaneous fading in both the I and Q components leading to important gains. This scheme has been adopted by the second generation of the terrestrial digital video broadcasting standard (DVB-T2). In order to achieve additional improvement in performance, iterations between the decoder output and the demapper (BICM-ID) can be introduced. BICM-ID with an outer LDPC code was investigated for different DVB-T2 transmission scenarios [3] . It is shown that an iterative processing associated with SSD can provide additional error correction capability reaching more than 1.0 dB over some types of channels. Thanks to these advantages, BICM-ID has been recommended in the DVB-T2 implementation guidelines [4] as a candidate solution to improve the performance at the receiver. However, designing a low complexity high throughput iterative receiver remains a challenging task. One critical problem is the additional latency introduced by this additional iterative loop. Therefore, a more efficient information exchange method between the demapper and the decoder has to be applied. Another critical problem is the computation complexities of both the rotated QAM demapper and the LDPC decoder. In [5] , a flexible demapper architecture for DVB-T2 is presented. Lowering complexity is achieved by decomposing the rotated constellation into two-dimensional sub-regions in signal space. In [6] , a novel complexity-reduced LDPC decoder architecture based on the vertical layered schedule [7] and the normalized Min-Sum (MS) algorithm is detailed. It closely approaches the full-complexity BP performance provided in the implementation guidelines of the DVB-T2 standard. An additional critical problem is dealing with memory conflicts in the presence of double-diagonal, triplediagonal or multiple-diagonal sub-matrices within the parallel decoding units. They could introduce performance loss when the conflicts are not properly solved [8] . Since the vertical layered schedule enables parallel LDPC decoding, this schedule can be extended to the demapping process. Circuits continuously exchanging information between the Soft-Input Soft-Output (SISO) decoder on one hand and the demapper on the other hand are appealing to the implementation. In this context, processing one frame can be decomposed into multiple parallel smaller sub-frame processing having each a length equal to the parallelism level. While having a comparable computational complexity as the standard iterative schedule, the receiver with a shuffled iterative schedule enjoys a lower latency. However, such a parallel processing requires good matching between the demapping and the decoding processors in order to guarantee a high throughout pipeline architecture. This calls for an efficient message passing between both sides. In this paper, different schedule solutions are investigated for the DVB-T2 standard. The remainder of the paper is organized as follows. In Section 2, the basic principles of the BICM-ID and SSD are briefly recalled. The detection principle for rotated constellation and a vertical layered decoding using a normalized MS algorithm are also introduced. In Section 3, hardware oriented iterative processing for the BICM receiver is detailed. Finally, simulation results validate the potential of the proposed iterative receiver for the DVB-T2 standard. The SSD introduces two modifications to the classical BICM system, which are shown in Fig. 1 . The classical QAM constellation is rotated by a fixed angle and becomes a rotated constellation whose Q component is delayed for d symbol periods, the delayed Q component and the current I component consist of a new complex symbol [3] . The in-phase and quadrature components of the classical QAM constellation are sent into two different rotated QAM symbols therefore doubling the constellation diversity of the BICM scheme. When a severe fading occurs, one of the components is erased and the according LLRs could be rescued from the remaining component. At the transmitter side, the messages u are encoded as the codeword c . Afterwards, this codeword c is interleaved by and becomes the input sequence v of the mapper. At each symbol time t, m consecutive bits of the interleaved sequence v are mapped into complex symbol t x . At the receiver side, the demapper calculates a two-dimensional squared Euclidean distance to obtain the bit LLR ˆi t v of the i th bit of symbol v t . These demapped LLRs are then deinterleaved and used as inputs of the decoder. The extrinsic information is finally generated by the decoder and fed back to the demapper for additional iterations. where n llr denotes the intrinsic channel reliability value of the bit node n , mn E denotes the message sent from check node m to bit node n , mn T denotes the message sent from bit node n to check node m and n T denotes a posteriori information of bit node n .
Demapping algorithm for iterative receiver

ITERATIVE RECEIVER FOR DVB-T2
Hardware oriented iterative algorithm
To reduce the computation complexity of (1), a sub-region selection algorithm [5] is proposed to avoid a complete search of signals in the constellation plane. However, when iterative processing is considered, the sub-region selection algorithm becomes sub-optimal. In fact, the selected region by the algorithm may not contain the minimum Euclidean distance due to the extrinsic information. Therefore, the Maxlog approximation is the only complexity-reduced demapping approach applied in this case.
On the LDPC side, a VSS Min-Sum (VSSMS) was proposed in [6] . It introduces only a 0.1~0.2 dBs penalty with respect to VSS BP while greatly reducing complexity. However, in the context of BICM-ID, the VSSMS introduces an additional penalty and reduces the expected performance gain. In fact, a decoding algorithm with a higher accuracy is a must in this case. The Min-Sum-3 or VSSMS3 seems to provide the required precision at the lowest impact on complexity. The difference between the VSSMS and VSSMS3 is that the 3 rd minimum values are updated and saved leading to more accuracy of the check node process 1 m M . The crucial problem in the implementation of a frame-byframe schedule in the iterative receiver is the latency introduced by the block interleaver and block de-interleaver. To overcome this problem, two new solutions are proposed. The first consists of replacing the classical RAM based block interleaver and de-interleaver memorizing the connections between the demapper output and the decoder input by a Look-Up- Table ( LUT). The other is the application of VSS decoding to replace of the classical layered HSS LDPC decoding. In this way, both the decoded and demapped extrinsic information could be exchanged without waiting for the complete frame processing.
Shuffled demapping and decoding algorithm
The shuffled demapping and decoding algorithm is detailed as follows. Fig.2 also gives (4) Decoder processing for
The demappers perform (4) for the corresponding bits. Still for these updated bits, the decoding processors perform equation (5)-(11). Then another group of Q bits are considered. The advantage of such a scheduling is a lower decoding latency leads to a decrease in the number of required iterations and/or better BER performance. There are several possible message passing schedules. They correspond to the possible combinations of the parallelism of LDPC decoder and the message passing ways between the LDPC decoder and the demapper. There interesting cases are listed in Table. 1 under consideration of implementation. Schedule A is based on the demapper, LDPC works in the VSS schedule serially. Each symbol leads to 2 log ( ) M variable bits updating, then all the extrinsic information is fed back to the original symbol. Schedules B and C are based on VSS LDPC decoder, with parallelism of 90. So 90 variable bits get updated and generate 90 extrinsic informations that are fed back to a maximum of 90 demapper symbols. If all bits originate from different symbols, then there are 90 demappers working in parallel benefiting from the extrinsic information to update LLRs. The difference between Schedule B and Schedule C is the number of the LLRs that are updated during the iterative processing at the demapper. Schedule C is the desired schedule for HW implementation.
EXPERIMENTAL RESULTS
The simulation is carried out for all the schedules. Two comparisons of simulated performance of BER for different decoding schemes (QPSK and 16QAM, a code rate R=4/5 and 16,200 bit frames), are presented in Fig.3 and Fig. 4 , with a maximum of 50 iterations. The channel model used to simulate and emulate the effect of erasure events is a modified version of the classical Rayleigh fading channel. More information about this model is given in [5] . There is around 1.0 dB performance improvement @ 10e-6 of BER for the iterative floating point VSSBP receiver when compared to the non-iterative receiver. The gain increases to 2.0 dB when a 16QAM constellation is used. In a BICM-ID context, VSSMS3 enjoys a reduced penalty with respect to VSSBP. In other words, iterative processing between the decoder and the demapper seems to reduce the penalty of suboptimal LDPC decoding. In both cases, fixed point algorithms suffer from a small performance loss compared with floating point algorithms. However, this penalty is once again smaller for the iterative receiver.
CONCLUSION
In this paper, we have investigated possible scheduling of the BICM iterative receiver. It defines the order of passing messages between the demapper and the decoder. Our objective is to ensure a good matching between reception algorithms on one hand and the iterative receiver architecture on the other hand. Hardware-oriented simulated BER performance was given for two reception schemes over a fading channel with erasure. These results validate the potential of an iterative receiver as a practical and a competitive solution for the DVB-T2 standard.
Currently, an FPGA prototyping to measure the performance of the proposed iterative receiver is under integration.
ACKNOWLEDGMENT
This work has been carried out in the framework of the SME42 project of the EUREKA's Eurostars programme.
