24 research outputs found

    A survey of FPGA-based LDPC decoders

    No full text
    Low-Density Parity Check (LDPC) error correction decoders have become popular in communications systems, as a benefit of their strong error correction performance and their suitability to parallel hardware implementation. A great deal of research effort has been invested into LDPC decoder designs that exploit the flexibility, the high processing speed and the parallelism of Field-Programmable Gate Array (FPGA) devices. FPGAs are ideal for design prototyping and for the manufacturing of small-production-run devices, where their in-system programmability makes them far more cost-effective than Application-Specific Integrated Circuits (ASICs). However, the FPGA-based LDPC decoder designs published in the open literature vary greatly in terms of design choices and performance criteria, making them a challenge to compare. This paper explores the key factors involved in FPGA-based LDPC decoder design and presents an extensive review of the current literature. In-depth comparisons are drawn amongst 140 published designs (both academic and industrial) and the associated performance trade-offs are characterised, discussed and illustrated. Seven key performance characteristics are described, namely their processing throughput, latency, hardware resource requirements, error correction capability, processing energy efficiency, bandwidth efficiency and flexibility. We offer recommendations that will facilitate fairer comparisons of future designs, as well as opportunities for improving the design of FPGA-based LDPC decoder

    High throughput low power decoder architectures for low density parity check codes

    Get PDF
    A high throughput scalable decoder architecture, a tiling approach to reduce the complexity of the scalable architecture, and two low power decoding schemes have been proposed in this research. The proposed scalable design is generated from a serial architecture by scaling the combinational logic; memory partitioning and constructing a novel H matrix to make parallelization possible. The scalable architecture achieves a high throughput for higher values of the parallelization factor M. The switch logic used to route the bit nodes to the appropriate checks is an important constituent of the scalable architecture and its complexity is high with higher M. The proposed tiling approach is applied to the scalable architecture to simplify the switch logic and reduce gate complexity. The tiling approach generates patterns that are used to construct the H matrix by repeating a fixed number of those generated patterns. The advantages of the proposed approach are two-fold. First, the information stored about the H matrix is reduced by onethird. Second, the switch logic of the scalable architecture is simplified. The H matrix information is also embedded in the switch and no external memory is needed to store the H matrix. Scalable architecture and tiling approach are proposed at the architectural level of the LDPC decoder. We propose two low power decoding schemes that take advantage of the distribution of errors in the received packets. Both schemes use a hard iteration after a fixed number of soft iterations. The dynamic scheme performs X soft iterations, then a parity checker cHT that computes the number of parity checks in error. Based on cHT value, the decoder decides on performing either soft iterations or a hard iteration. The advantage of the hard iteration is so significant that the second low power scheme performs a fixed number of iterations followed by a hard iteration. To compensate the bit error rate performance, the number of soft iterations in this case is higher than that of those performed before cHT in the first scheme

    High Performance Decoder Architectures for Error Correction Codes

    Get PDF
    Due to the rapid development of the information industry, modern communication and storage systems require much higher data rates and reliability to server various demanding applications. However, these systems suffer from noises from the practical channels. Various error correction codes (ECCs), such as Reed-Solomon (RS) codes, convolutional codes, turbo codes, Low-Density Parity-Check (LDPC) codes and so on, have been adopted in lots of current standards. With the increasing data rate, the research of more advanced ECCs and the corresponding efficient decoders will never stop.Binary LDPC codes have been adopted in lots of modern communication and storage applications due their superior error performance and efficient hardware decoder implementations. Non-binary LDPC (NB-LDPC) codes are an important extension of traditional binary LDPC codes. Compared with its binary counterpart, NB-LDPC codes show better error performance under short to moderate block lengths and higher order modulations. Moreover, NB-LDPC codes have lower error floor than binary LDPC codes. In spite of the excellent error performance, it is hard for current communication and storage systems to adopt NB-LDPC codes due to complex decoding algorithms and decoder architectures. In terms of hardware implementation, current NB-LDPC decoders need much larger area and achieve much lower data throughput.Besides the recently proposed NB-LDPC codes, polar codes, discovered by Ar{\i}kan, appear as a very promising candidate for future communication and storage systems. Polar codes are considered as a major breakthrough in recent coding theory society. Polar codes are proved to be capacity achieving codes over binary input symmetric memoryless channels. Besides, polar codes can be decoded by the successive cancelation (SC) algorithm with of complexity of O(Nlog2N)\mathcal{O}(N\log_2 N), where NN is the block length. The main sticking point of polar codes to date is that their error performance under short to moderate block lengths is inferior compared with LDPC codes or turbo codes. The list decoding technique can be used to improve the error performance of SC algorithms at the cost higher computational and memory complexities. Besides, the hardware implementation of current SC based decoders suffer from long decoding latency which is unsuitable for modern high speed communications.ECCs also find their applications in improving the reliability of network coding. Random linear network coding is an efficient technique for disseminating information in networks, but it is highly susceptible to errors. K\ {o}tter-Kschischang (KK) codes and Mahdavifar-Vardy (MV) codes are two important families of subspace codes that provide error control in noncoherent random linear network coding. List decoding has been used to decode MV codes beyond half distance. Existing hardware implementations of the rank metric decoder for KK codes suffer from limited throughput, long latency and high area complexity. The interpolation-based list decoding algorithm for MV codes still has high computational complexity, and its feasibility for hardware implementations has not been investigated.In this exam, we present efficient decoding algorithms and hardware decoder architectures for NB-LDPC codes, polar codes, KK and MV codes. For NB-LDPC codes, an efficient shuffled decoder architecture is presented to reduce the number of average iterations and improve the throughput. Besides, a fully parallel decoder architecture for NB-LDPC codes with short or moderate block lengths is also presented. Our fully parallel decoder architecture achieves much higher throughput and area efficiency compared with the state-of-art NB-LDPC decoders. For polar codes, a memory efficient list decoder architecture is first presented. Based on our reduced latency list decoding algorithm for polar codes, a high throughput list decoder architecture is also presented. At last, we present efficient decoder architectures for both KK and MV codes

    Hardware Design of Decoder for Low-Density Parity Check Codes

    Get PDF
    A hardware decoder architecture is presented in this thesis for quasi-cyclic (QC) low-density parity check (LDPC) codes. The decoder is real-time configurable and supports 15 codes which are combination of 3 rates and 5 lengths. The partly parallel architecture implements layered decoding. A check node decoder is serial and implements min-sum correction algorithm. The proposed design techniques include out-of-order memory-write, two-stage multi-size shifter, serial decoding termination. The decoder consumes about half amount of logic resource on the Xilinx FPGA chip XC2VP50-5F1152. The worst case throughput at 20 iterations ranges from 5 Mbits to 60 Mbits (information bits) per second. Higher throughput can be obtained by the proposed optimisation. Reuse for similar codes is possible

    Area and energy efficient VLSI architectures for low-density parity-check decoders using an on-the-fly computation

    Get PDF
    The VLSI implementation complexity of a low density parity check (LDPC) decoder is largely influenced by the interconnect and the storage requirements. This dissertation presents the decoder architectures for regular and irregular LDPC codes that provide substantial gains over existing academic and commercial implementations. Several structured properties of LDPC codes and decoding algorithms are observed and are used to construct hardware implementation with reduced processing complexity. The proposed architectures utilize an on-the-fly computation paradigm which permits scheduling of the computations in a way that the memory requirements and re-computations are reduced. Using this paradigm, the run-time configurable and multi-rate VLSI architectures for the rate compatible array LDPC codes and irregular block LDPC codes are designed. Rate compatible array codes are considered for DSL applications. Irregular block LDPC codes are proposed for IEEE 802.16e, IEEE 802.11n, and IEEE 802.20. When compared with a recent implementation of an 802.11n LDPC decoder, the proposed decoder reduces the logic complexity by 6.45x and memory complexity by 2x for a given data throughput. When compared to the latest reported multi-rate decoders, this decoder design has an area efficiency of around 5.5x and energy efficiency of 2.6x for a given data throughput. The numbers are normalized for a 180nm CMOS process. Properly designed array codes have low error floors and meet the requirements of magnetic channel and other applications which need several Gbps of data throughput. A high throughput and fixed code architecture for array LDPC codes has been designed. No modification to the code is performed as this can result in high error floors. This parallel decoder architecture has no routing congestion and is scalable for longer block lengths. When compared to the latest fixed code parallel decoders in the literature, this design has an area efficiency of around 36x and an energy efficiency of 3x for a given data throughput. Again, the numbers are normalized for a 180nm CMOS process. In summary, the design and analysis details of the proposed architectures are described in this dissertation. The results from the extensive simulation and VHDL verification on FPGA and ASIC design platforms are also presented

    Multiple Parallel Concatenated Gallager Codes and Their Applications

    Get PDF
    Due to the increasing demand of high data rate of modern wireless communications, there is a significant interest in error control coding. It now plays a significant role in digital communication systems in order to overcome the weaknesses in communication channels. This thesis presents a comprehensive investigation of a class of error control codes known as Multiple Parallel Concatenated Gallager Codes (MPCGCs) obtained by the parallel concatenation of well-designed LDPC codes. MPCGCs are constructed by breaking a long and high complexity of conventional single LDPC code into three or four smaller and lower complexity LDPC codes. This design of MPCGCs is simplified as the option of selecting the component codes completely at random based on a single parameter of Mean Column Weight (MCW). MPCGCs offer flexibility and scope for improving coding performance in theoretical and practical implementation. The performance of MPCGCs is explored by evaluating these codes for both AWGN and flat Rayleigh fading channels and investigating the puncturing of these codes by a proposed novel and efficient puncturing methods for improving the coding performance. Another investigating in the deployment of MPCGCs by enhancing the performance of WiMAX system. The bit error performances are compared and the results confirm that the proposed MPCGCs-WiMAX based IEEE 802.16 standard physical layer system provides better gain compared to the single conventional LDPC-WiMAX system. The incorporation of Quasi-Cyclic QC-LDPC codes in the MPCGC structure (called QC-MPCGC) is shown to improve the overall BER performance of MPCGCs with reduced overall decoding complexity and improved flexibility by using Layered belief propagation decoding instead of the sum-product algorithm (SPA). A proposed MIMO-MPCGC structure with both a 2X2 MIMO and 2X4 MIMO configurations is developed in this thesis and shown to improve the BER performance over fading channels over the conventional LDPC structure

    VLSI decoding architectures: flexibility, robustness and performance

    Get PDF
    Stemming from previous studies on flexible LDPC decoders, this thesis work has been mainly focused on the development of flexible turbo and LDPC decoder designs, and on the narrowing of the power, area and speed gap they might present with respect to dedicated solutions. Additional studies have been carried out within the field of increased code performance and of decoder resiliency to hardware errors. The first chapter regroups several main contributions in the design and implementation of flexible channel decoders. The first part concerns the design of a Network-on-Chip (NoC) serving as an interconnection network for a partially parallel LDPC decoder. A best-fit NoC architecture is designed and a complete multi-standard turbo/LDPC decoder is designed and implemented. Every time the code is changed, the decoder must be reconfigured. A number of variables influence the duration of the reconfiguration process, starting from the involved codes down to decoder design choices. These are taken in account in the flexible decoder designed, and novel traffic reduction and optimization methods are then implemented. In the second chapter a study on the early stopping of iterations for LDPC decoders is presented. The energy expenditure of any LDPC decoder is directly linked to the iterative nature of the decoding algorithm. We propose an innovative multi-standard early stopping criterion for LDPC decoders that observes the evolution of simple metrics and relies on on-the-fly threshold computation. Its effectiveness is evaluated against existing techniques both in terms of saved iterations and, after implementation, in terms of actual energy saving. The third chapter portrays a study on the resilience of LDPC decoders under the effect of memory errors. Given that the purpose of channel decoders is to correct errors, LDPC decoders are intrinsically characterized by a certain degree of resistance to hardware faults. This characteristic, together with the soft nature of the stored values, results in LDPC decoders being affected differently according to the meaning of the wrong bits: ad-hoc error protection techniques, like the Unequal Error Protection devised in this chapter, can consequently be applied to different bits according to their significance. In the fourth chapter the serial concatenation of LDPC and turbo codes is presented. The concatenated FEC targets very high error correction capabilities, joining the performance of turbo codes at low SNR with that of LDPC codes at high SNR, and outperforming both current deep-space FEC schemes and concatenation-based FECs. A unified decoder for the concatenated scheme is subsequently propose

    Hardware implementation aspects of polar decoders and ultra high-speed LDPC decoders

    Get PDF
    The goal of channel coding is to detect and correct errors that appear during the transmission of information. In the past few decades, channel coding has become an integral part of most communications standards as it improves the energy-efficiency of transceivers manyfold while only requiring a modest investment in terms of the required digital signal processing capabilities. The most commonly used channel codes in modern standards are low-density parity-check (LDPC) codes and Turbo codes, which were the first two types of codes to approach the capacity of several channels while still being practically implementable in hardware. The decoding algorithms for LDPC codes, in particular, are highly parallelizable and suitable for high-throughput applications. A new class of channel codes, called polar codes, was introduced recently. Polar codes have an explicit construction and low-complexity encoding and successive cancellation (SC) decoding algorithms. Moreover, polar codes are provably capacity achieving over a wide range of channels, making them very attractive from a theoretical perspective. Unfortunately, polar codes under standard SC decoding cannot compete with the LDPC and Turbo codes that are used in current standards in terms of their error-correcting performance. For this reason, several improved SC-based decoding algorithms have been introduced. The most prominent SC-based decoding algorithm is the successive cancellation list (SCL) decoding algorithm, which is powerful enough to approach the error-correcting performance of LDPC codes. The original SCL decoding algorithm was described in an arithmetic domain that is not well-suited for hardware implementations and is not clear how an efficient SCL decoder architecture can be implemented. To this end, in this thesis, we re-formulate the SCL decoding algorithm in two distinct arithmetic domains, we describe efficient hardware architectures to implement the resulting SCL decoders, and we compare the decoders with existing LDPC and Turbo decoders in terms of their error-correcting performance and their implementation efficiency. Due to the ongoing technology scaling, the feature sizes of integrated circuits keep shrinking at a remarkable pace. As transistors and memory cells keep shrinking, it becomes increasingly difficult and costly (in terms of both area and power) to ensure that the implemented digital circuits always operate correctly. Thus, manufactured digital signal processing circuits, including channel decoder circuits, may not always operate correctly. Instead of discarding these faulty dies or using costly circuit-level fault mitigation mechanisms, an alternative approach is to try to live with certain malfunctions, provided that the algorithm implemented by the circuit is sufficiently fault-tolerant. In this spirit, in this thesis we examine decoding of polar codes and LDPC codes under the assumption that the memories that are used within the decoders are not fully reliable. We show that, in both cases, there is inherent fault-tolerance and we also propose some methods to reduce the effect of memory faults on the error-correcting performance of the considered decoders

    A Study on High Performance Gbps MIMO Wireless System

    Get PDF
    九州工業大学博士学位論文 学位記番号:情工博甲第294号 学位授与年月日:平成26年12月25日1 Introduction||2 Wireless System Overview||3 RC4 Encryption Architectures||4 MIMO Detection Algorithm and Architecture||5 LDPC Decoder Architecture||6 Conclusion and Future Wor

    연판정 오류정정을 위한 낮은 복잡도의 블록 터보부호 복호화 연구

    Get PDF
    학위논문 (박사)-- 서울대학교 대학원 : 전기·컴퓨터공학부, 2016. 8. 성원용.As the throughput needed for communication systems and storage devices increases, high-performance forward error correction (FEC), especially soft-decision (SD) based technique, becomes essential. In particular, block turbo codes (BTCs) and low-density parity check (LDPC) codes are considered as candidate FEC codes for the next generation systems, such as beyond-100Gbps optical networks and under-20nm NAND flash memory devices, which require capacity-approaching performance and very low error floor. The BTCs have definite strengths in diversity and encoding complexity because they generally employ a two-dimensional structure, which enables sub-frame level decoding for the row or column code-words. This sub-frame level decoding gives a strong advantage for parallel processing. The BTC decoding throughput can be improved by applying a low-complexity algorithm to the small level decoding or by running multiple sub-frame decoding modules simultaneously. In this dissertation, we develop high-throughput BTC decoding software that pursuits these advantages. The first part of this dissertation is devoted to finding efficient test patterns in the Chase-Pyndiah algorithm. Although the complexity of this algorithm linearly increases according to the number of the test patterns, it naively considers all possible patterns containing least reliable positions. As a result, consideration of one more position nearly doubles the complexity. To solve this issue, we first introduce a new position selection criterion that excludes some of the selected ones having a relatively large reliability. This technique excludes the selection of sufficiently reliable positions, which greatly reduces the complexity. Secondly, we propose a pattern selection scheme considering the error coverage. We define the error coverage factor that represents the influence on the error-correcting performance and compute it by analyzing error events. Based on the computed factor, we select the patterns with the greedy algorithm. By using these methods, we can flexibly balance the complexity and the performance. The second part of this dissertation is developing low-complexity soft-output processing methods needed for BTC decoding. In the Chase-Pyndiah algorithm, the soft-output is updated in two different ways according to whether competing code-words exist on the updating positions or not. If the competing code-words exist, the Euclidean distance between the soft-input signal and the code-words that are generated from the test patterns is used. However, the cost of distance computation is very high and linearly increases with the sub-frame length. We identify computationally redundant positions and optimize the computing process by ignoring them. If the competing ones do not exist, the reliability factor that should be pre-determined by an extensive search is demanded. To avoid this, we propose adaptive determination methods, which provides even better error-correcting performance. In addition, we investigate the Pyndiah's soft-output computation and find its drawbacks that appear during the approximation process. To remove the drawbacks, we replace the updating method of the positions that are expected to be seriously damaged by the approximation with the reliability factor-based one, which is much simpler, even though they have the competing words. This dissertation also develops a graphics processing unit (GPU) based BTC decoding program. In order to hide the latency of arithmetic and memory access operations, this software applies the kernel structure that processes multiple BTC-words and allocates multiple sub-frames to each thread-block. Global memory access optimization and data compression, which demands less shared memory space, are also employed. For efficient mapping of the Chase-Pyndiah algorithm onto GPUs, we propose parallel processing schemes employing efficient reduction algorithms and provide step-by-step parallel algorithms for the algebraic decoding. The last part of this dissertation is devoted to summarizing the developed decoding method and comparing it with the decoding of the LDPC convolutional code (CC), which is currently reported as the most powerful candidate for the 100Gbps optical network. We first investigate the complexity reduction and the error rate performance improvement of the developed method. Then, we analyze the complexity of the LDPC-CC decoding and compare it with the developed BTC decoding for the 20% overhead codes. This dissertation is intended to develop high-throughput SD decoding software by introducing complexity reduction techniques for the Chase-Pyndiah algorithm and efficient parallel processing methods, and to emphasize the competitiveness of the BTC. The proposed decoding methods and parallel processing algorithms verified in the GPU-based systems are also applicable to hardware-based ones. By implementing hardware-based decoders that employ the developed methods in this dissertation, significant improvements on the throughputs and the energy efficiency can be obtained. Moreover, thanks to the wide rate coverage of the BTC, the developed techniques can be applied to many high-throughput error correction applications, such as the next-generation optical network and storage device systems.Chapter 1 Introduction 1 1.1 Turbo Codes 1 1.2 Applications of Turbo Codes 4 1.3 Outline of the Dissertation 5 Chapter 2 Encoding and Iterative Decoding of Block Turbo Codes 7 2.1 Introduction 7 2.2 Encoding Procedure of Shortened-Extended BTCs 9 2.3 Scheduling Methods for Iterative Decoding 9 2.3.1 Serial Scheduling 10 2.3.2 Parallel Scheduling 10 2.3.3 Replica Scheduling 11 2.4 Elementary Decoding with Chase-Pyndiah Algorithm 13 2.4.1 Chase-Pyndiah Algorithm for Extended BTCs 13 2.4.2 Reliability Computation of the ML Code-Word 17 2.4.3 Algebraic Decoding for SEC and DEC BCH Codes 20 2.5 Issues of Chase-Pyndiah Algorithm 23 Chapter 3 Complexity Reduction Techniques for Code-Word Set Generation of the Chase-Pyndiah Algorithm 24 3.1 Introduction 24 3.2 Adaptive Selection of LRPs 25 3.2.1 Selection Constraints of LRPs 25 3.2.2 Simulation Results 26 3.3 Test Pattern Selection 29 3.3.1 The Error Coverage Factor of Test Patterns 30 3.3.2 Greedy Selection of Test Patterns 33 3.3.3 Simulation Results 34 3.4 Concluding Remarks 34 Chapter 4 Complexity Reduction Techniques for Soft-Output Update of the Chase-Pyndiah Algorithm 37 4.1 Introduction 37 4.2 Distance Computation 38 4.2.1 Position-Index List Based Method 39 4.2.2 Double Index Set-Based Method 42 4.2.3 Complexity Analysis 46 4.2.4 Simulation Results 47 4.3 Reliability Factor Determination 49 4.3.1 Refinement of Distance-Based Reliability Factor 51 4.3.2 Adaptive Determination of the Reliability Factor 51 4.3.3 Simulation Results 53 4.4 Accuracy Improvement in Extrinsic Information Update 54 4.4.1 Drawbacks of the Sub-Optimal Update 55 4.4.2 Low-Complexity Extrinsic Information Update 58 4.4.3 Simulation Results 59 4.5 Concluding Remarks 61 Chapter 5 High-Throughput BTC Decoding on GPUs 64 5.1 Introduction 64 5.2 BTC Decoder Architecture for GPU Implementations 66 5.3 Memory Optimization 68 5.3.1 Global Memory Access Reduction 68 5.3.2 Improvement of Global Memory Access Coalescing 68 5.3.3 Efficient Shared Memory Control with Data Compression 70 5.3.4 Index Parity Check Scheme 73 5.4 Parallel Algorithms with the CUDA Shuffle Function 77 5.5 Implementation of Algebraic Decoder 78 5.5.1 Galois Field Operations with Look-Up Tables 78 5.5.2 Error-Locator Polynomial Setting with the LUTs 81 5.5.3 Parallel Chien Search with the LUTs 84 5.6 Simulation Results 85 5.7 Concluding Remarks 89 Chapter 6 Competitiveness of BTCs as FEC codes for the Next-Generation Optical Networks 91 6.1 Introduction 91 6.2 The Complexity Reduction of the Modified Chase-Pyndiah Algorithm 92 6.2.1 Summary of the Complexity Reduction 92 6.2.2 The Error-Correcting Performance 94 6.3 Comparison of BTCs and LDPC-CCs 97 6.3.1 Complexity Analysis of the LDPC-CC Decoding 97 6.3.2 Comparison of the 20% Overhead BTC and LDPC-CC 100 6.4 Concluding Remarks 101 Chapter 7 Conclusion 102 Bibliography 105 국문 초록 113Docto
    corecore