5,193 research outputs found

    An FPGA Design of High-Speed Adaptive Turbo Decoder for Broadband Wireless Communications

    Get PDF
    This thesis proposes an adaptive turbo decoding algorithm for high order modulation scheme combined with original design for a standard rate-1/2 turbo decoder for B/QPSK modulation. A transformation applied to the incoming I-channel and Q-channel symbols allows the use of an off-the-shelf B/QPSK turbo decoder without any modifications. Adaptive turbo decoder processes the received symbols recursively to improve the performance. As the number of iterations increases, the execution time and power consumption also increase as well. To reduce the latency and power consumption, this thesis employs the combination of the radix-4, dual-path processing, parallel decoding, and early-stop algorithms. This thesis implemented the proposed scheme on a field-programmable gate array (FPGA) and compared its decoding speed with that of a conventional decoder. From the result of implementation, it was found that the decoding speed of proposed adaptive decoding is faster than that of conventional scheme by 6.4 times under the following conditions : N=212, iteration=3, 8-states, 3 iterations, and 8PSK modulation scheme.Chapter I. Introduction = 1 Chapter II. Adaptive Turbo Decoding Algorithm = 4 2.1 Mapping of bits to signal = 7 2.2 Coset Symbol Transformer(CST) = 8 2.3 Phase Sector Quantizer(PSQ) = 10 2.4 Simulation Results = 13 Chapter III. High Speed Turbo Decoder Algorithm = 15 3.1 Radix-4 Algorithm = 16 3.2 Dual-Path Processing Algorithm = 18 3.3 Parallel Decoding Algorithm = 21 3.4 Early Stop Algorithm = 22 3.5 Simulation Results = 23 Chapter IV. Design of the Adaptive High-Speed Turbo Decoder = 24 4.1 The Adaptive High-Speed Turbo Decoder Structure = 25 4.2 The Optimum Quantized Bits of the Adaptive Turbo Decoder = 28 4.3 FPGA Implementation = 29 Chapter V. Conclusion = 33 References = 3

    Beyond Gbps Turbo Decoder on Multi-Core CPUs

    Get PDF
    International audienceThis paper presents a high-throughput implementation of a portable software turbo decoder. The code is optimized for traditional multi-core CPUs (like x86) and it is based on the Enhanced max-log-MAP turbo decoding variant. The code follows the LTE-Advanced specification. The key of the high performance comes from an inter-frame SIMD strategy combined with a fixed-point representation. Our results show that proposed multi-core CPU implementation of turbo-decoders is a challenging alternative to GPU implementation in terms of throughput and energy efficiency. On a high-end processor, our software turbo-decoder exceeds 1 Gbps information throughput for all rate-1/3 LTE codes with K < 4096

    High speed low complexity radix-16 Max-Log-MAP SISO decoder

    No full text
    International audienceAt present, the main challenge for hardware implementation turbo decoders is to achieve the high data rates required by current and future communication system standards. In order to address this challenge, a low complexity radix-16 SISO decoder for the Max-Log- MAP algorithm is proposed in this paper. Based on the elimination of parallel paths in the radix-16 trellis diagram, architectural solutions to reduce the hardware complexity of the different blocks of a SISO decoder are detailed. Moreover, two complementary techniques are introduced order to overcome BER/FER performance degradation when turbo decoders based on the proposed SISO decoder are considered. Thus, a penalty lower than 0.05dB is observed for a 8 state binary turbo code with respect to a traditional radix-2 turbo decoder for 6 decoding iterations

    A New Simplified Algorithm Suitable for Implementation on FPGA for Turbo Codes

    Get PDF
    In this thesis, a new algorithm for Turbo codes and a novel implementation of turbo decoder employed with this algorithm is developed. The decoder has an optimal performance in terms of Bit Error Rate(BER) in all Signal to Noise Ratio(SNR) for all frame sizes and any states of Turbo codes. In hardware implementation, we combine the normalization and matrices modules in a single module in order to minimize the internal connection delay which is the bottleneck in hardware implementation, so that the result can be obtained in one single clock signal. Having implemented in this fashion, data rate of 28Mbps for16 state decoder has been achieved. This can be further improved by changing the algorithm for the normalization modules and LLR modules with MAX operator. The matrices modules with the proposed algorithm and the normalization and LLR modules with MAX-LOG-MAP algorithm have been implemented to achieve a data rate of 60Mbps

    Implementation of a 3GPP LTE Turbo Decoder Accelerator on GPU

    Get PDF
    This paper presents a 3GPP LTE compliant turbo decoder accelerator on GPU. The challenge of implementing a turbo decoder is finding an efficient mapping of the decoder algorithm on GPU, e.g. finding a good way to parallelize workload across cores and allocate and use fast on-die memory to improve throughput. In our implementation, we increase throughput through 1) distributing the decoding workload for a codeword across multiple cores, 2) decoding multiple codewords simultaneously to increase concurrency and 3) employing memory optimization techniques to reduce memory bandwidth requirements. In addition, we analyze how different MAP algorithm approximations affect both throughput and bit error rate (BER) performance of this decoder

    On Maximum Contention-Free Interleavers and Permutation Polynomials over Integer Rings

    Full text link
    An interleaver is a critical component for the channel coding performance of turbo codes. Algebraic constructions are of particular interest because they admit analytical designs and simple, practical hardware implementation. Contention-free interleavers have been recently shown to be suitable for parallel decoding of turbo codes. In this correspondence, it is shown that permutation polynomials generate maximum contention-free interleavers, i.e., every factor of the interleaver length becomes a possible degree of parallel processing of the decoder. Further, it is shown by computer simulations that turbo codes using these interleavers perform very well for the 3rd Generation Partnership Project (3GPP) standard.Comment: 13 pages, 2 figures, submitted as a correspondence to the IEEE Transactions on Information Theory, revised versio

    Block turbo codes : towards implementation

    Get PDF
    International audienceThis paper presents two implementations of the same block turbo decoding algorithm : on the one hand an elementary decoder in association with a sequencer performs the complete turbo decoding process, and on the other hand, the circuit contains one elementary decoder per half-iteration. The choice of different parameters for each algorithm implemented bring the results more or less close to the theoretical limit. We briefly describe the iterative process which creates the "turbo" effect and explain the essential choices in order to adapt the algorithm to an ASIC implementation

    1.5 Gbit/s FPGA implementation of a fully-parallel turbo decoder designed for mission-critical machine-type communication applications

    No full text
    In wireless communication schemes, turbo codes facilitate near-capacity transmission throughputs by achieving reliable forward error correction. However, owing to the serial data dependencies imposed by the underlying Logarithmic Bahl-Cocke-Jelinek-Raviv (Log- BCJR) algorithm, the limited processing throughputs of conventional turbo decoder implementations impose a severe bottleneck upon the overall throughputs of realtime wireless communication schemes. Motivated by this, we recently proposed a Fully Parallel Turbo Decoder (FPTD) algorithm, which eliminates these serial data dependencies, allowing parallel processing and hence offering a significantly higher processing throughput. In this paper, we propose a novel resource-efficient version of the FPTD algorithm, which reduces its computational resource requirement by 50%, which enhancing its suitability for Field-Programmable Gate Array (FPGA) implementations. We propose a model FPGA implementation. When using a Stratix IV FPGA, the proposed FPTD FPGA implementation achieves an average throughput of 1.53 Gbit/s and an average latency of 0.56 s, when decoding frames comprising N=720 bits. These are respectively 13.2 times and 11.1 times superior to those of the state-of-the- art FPGA implementation of the Log-BCJR Long- Term Evolution (LTE) turbo decoder, when decoding frames of the same frame length at the same error correction capability. Furthermore, our proposed FPTD FPGA implementation achieves a normalized resource usage of 0.42 kALUTs Mbit/s , which is 5.2 times superior to that of the benchmarker decoder. Furthermore, when decoding the shortest N=40-bit LTE frames, the proposed FPTD FPGA implementation achieves an average throughput of 442 Mbit/s and an average latency of 0.18 s, which are respectively 21.1 times and 10.6 times superior to those of the benchmarker decoder. In this case, the normalized resource usage of 0.08 kALUTs Mbit/s is 146.4 times superior to that of the benchmarker decoder

    FPGA implementation of LTE turbo decoder using MAX-log MAP algorithm

    Get PDF
    © 2017 IEEE. Implementation of an efficient turbo decoder with low complexity, short delay and insignificant performance degradation is currently a quite challenging task. The paper presents an implementation of a 3GPP TS 36.212 LTE turbo decoder. The design of the turbo decoder has been optimized to achieve efficient FPGA resource utilization. This design can be useful for applications, which is critical to resource utilizations, but do not need high throughput
    corecore