A row-column parallel architecture of a turbo decoder dedicated to product codes is presented. This architecture enables simultaneous decoding of the row and the column of a block. The performance of the proposed row-column parallel turbo decoder is similar to that of conventional turbo decoder.
performing the successive decoding of rows and columns (two half-iterations).
Note that each iteration provides decoding operations for all rows and all columns of a matrix (product code). Moreover, a reconstruction of the matrix is necessary between each decoding. The block diagram of an elementary SISO decoder is presented in Figure 1 , where k stands for the number of halfiterations.
R is the initial word received from the channel, W k is the pattern that contains the extrinsic information, that is, the additional information given by the decoder concerning the reliability of the decoded symbol, R' k is the result of extrinsic information weighted by a factor of α k , D k is the result of symbol decoding, α k and β k are constants that depend on the current half-iteration.
Previous work:
In the last few years, many block turbo decoder architectures have been designed [4] . The conventional approach involves decoding all the rows or all the columns of a matrix before the next half-iteration. When an application requires high-speed decoders, an architectural solution is to cascade SISO elementary decoders for each half-iteration as shown in Figure   2 . In this case, memory blocks are necessary between each half-iteration to store channel data and extrinsic information. Each memory block is composed of four memories of qn 2 symbols where q is the number of bits to quantify the matrix symbols. Thus, duplicating a SISO elementary decoder results in duplicating the memory block which is very costly in terms of area. The latency can be defined as the number of symbols that was input into the turbo decoder before one symbol was fully handled. In the case of cascaded decoders, the architecture latency is equal to 2k*(n 2 +n) where n is the code length and 2k is the iteration number. For one half-iteration, n 2 and n are block memory latency and row or column decoding latency, respectively. This structure needs only one memory block per iteration. This enables the memory complexity be reduced by a factor of 8/3. Indeed, two memory blocks per iteration are necessary for the conventional cascaded architecture of a block turbo decoder. Moreover, the proposed parallel architecture is also interesting from a latency point of view. In this case, the architecture latency is equal to k*(n 2 +n), which enables the latency to be reduced by a factor of two.
Row-column parallel turbo decoder for product codes:
Performance comparison: Bit-error performance of block turbo codes using extended BCH component codes with single error correction power after 8 iterations is given in Figure 4 . We considered an AWGN channel and a system employing BPSK modulation. Simulation results are presented for three BCH components codes with the code lengths being 32, 64 or 128 bits.
The performance of the proposed row-column parallel turbo decoding is similar to that of conventional turbo decoding.
Conclusion:
In this letter, a row-column parallel architecture of a turbo decoder dedicated to product codes has been presented. Unlike the conventional BTC decoding approach, this architecture uses the more reliable extrinsic information at each step. This reduces the decoding latency by a factor of two, and the memory necessary for the block reconstruction between row decoding and column decoding is removed. It is clear that this innovative proposal provides an attractive solution from the complexity and decoding delay points of view. Simulation results revealed that there is no performance degradation for row-column parallel approach. 
