Search CORE

27 research outputs found

Area and energy efficient VLSI architectures for low-density parity-check decoders using an on-the-fly computation

Author: Gunnam Kiran Kumar
Publication venue
Publication date: 15/05/2009
Field of study

The VLSI implementation complexity of a low density parity check (LDPC) decoder is largely influenced by the interconnect and the storage requirements. This dissertation presents the decoder architectures for regular and irregular LDPC codes that provide substantial gains over existing academic and commercial implementations. Several structured properties of LDPC codes and decoding algorithms are observed and are used to construct hardware implementation with reduced processing complexity. The proposed architectures utilize an on-the-fly computation paradigm which permits scheduling of the computations in a way that the memory requirements and re-computations are reduced. Using this paradigm, the run-time configurable and multi-rate VLSI architectures for the rate compatible array LDPC codes and irregular block LDPC codes are designed. Rate compatible array codes are considered for DSL applications. Irregular block LDPC codes are proposed for IEEE 802.16e, IEEE 802.11n, and IEEE 802.20. When compared with a recent implementation of an 802.11n LDPC decoder, the proposed decoder reduces the logic complexity by 6.45x and memory complexity by 2x for a given data throughput. When compared to the latest reported multi-rate decoders, this decoder design has an area efficiency of around 5.5x and energy efficiency of 2.6x for a given data throughput. The numbers are normalized for a 180nm CMOS process. Properly designed array codes have low error floors and meet the requirements of magnetic channel and other applications which need several Gbps of data throughput. A high throughput and fixed code architecture for array LDPC codes has been designed. No modification to the code is performed as this can result in high error floors. This parallel decoder architecture has no routing congestion and is scalable for longer block lengths. When compared to the latest fixed code parallel decoders in the literature, this design has an area efficiency of around 36x and an energy efficiency of 3x for a given data throughput. Again, the numbers are normalized for a 180nm CMOS process. In summary, the design and analysis details of the proposed architectures are described in this dissertation. The results from the extensive simulation and VHDL verification on FPGA and ASIC design platforms are also presented

Texas A&M Repository

High-Performance Decoder Architectures For Low-Density Parity-Check Codes

Author: Zhang Kai
Publication venue: Digital WPI
Publication date: 09/01/2012
Field of study

The Low-Density Parity-Check (LDPC) codes, which were invented by Gallager back in 1960s, have attracted considerable attentions recently. Compared with other error correction codes, LDPC codes are well suited for wireless, optical, and magnetic recording systems due to their near- Shannon-limit error-correcting capacity, high intrinsic parallelism and high-throughput potentials. With these remarkable characteristics, LDPC codes have been adopted in several recent communication standards such as 802.11n (Wi-Fi), 802.16e (WiMax), 802.15.3c (WPAN), DVB-S2 and CMMB. This dissertation is devoted to exploring efficient VLSI architectures for high-performance LDPC decoders and LDPC-like detectors in sparse inter-symbol interference (ISI) channels. The performance of an LDPC decoder is mainly evaluated by area efficiency, error-correcting capability, throughput and rate flexibility. With this work we investigate tradeoffs between the four performance aspects and develop several decoder architectures to improve one or several performance aspects while maintaining acceptable values for other aspects. Firstly, we present a high-throughput decoder design for the Quasi-Cyclic (QC) LDPC codes. Two new techniques are proposed for the first time, including parallel layered decoding architecture (PLDA) and critical path splitting. Parallel layered decoding architecture enables parallel processing for all layers by establishing dedicated message passing paths among them. The decoder avoids crossbar-based large interconnect network. Critical path splitting technique is based on articulate adjustment of the starting point of each layer to maximize the time intervals between adjacent layers, such that the critical path delay can be split into pipeline stages. Furthermore, min-sum and loosely coupled algorithms are employed for area efficiency. As a case study, a rate-1/2 2304-bit irregular LDPC decoder is implemented using ASIC design in 90 nm CMOS process. The decoder can achieve an input throughput of 1.1 Gbps, that is, 3 or 4 times improvement over state-of-art LDPC decoders, while maintaining a comparable chip size of 2.9 mm^2. Secondly, we present a high-throughput decoder architecture for rate-compatible (RC) LDPC codes which supports arbitrary code rates between the rate of mother code and 1. While the original PLDA is lack of rate flexibility, the problem is solved gracefully by incorporating the puncturing scheme. Simulation results show that our selected puncturing scheme only introduces the BER performance degradation of less than 0.2dB, compared with the dedicated codes for different rates specified in the IEEE 802.16e (WiMax) standard. Subsequently, PLDA is employed for high throughput decoder design. As a case study, a RC- LDPC decoder based on the rate-1/2 WiMax LDPC code is implemented in CMOS 90 nm process. The decoder can achieve an input throughput of 975 Mbps and supports any rate between 1/2 and 1. Thirdly, we develop a low-complexity VLSI architecture and implementation for LDPC decoder used in China Multimedia Mobile Broadcasting (CMMB) systems. An area-efficient layered decoding architecture based on min-sum algorithm is incorporated in the design. A novel split-memory architecture is developed to efficiently handle the weight-2 submatrices that are rarely seen in conventional LDPC decoders. In addition, the check-node processing unit is highly optimized to minimize complexity and computing latency while facilitating a reconfigurable decoding core. Finally, we propose an LDPC-decoder-like channel detector for sparse ISI channels using belief propagation (BP). The BP-based detection computationally depends on the number of nonzero interferers only and are thus more suited for sparse ISI channels which are characterized by long delay but a small fraction of nonzero interferers. Layered decoding algorithm, which is popular in LDPC decoding, is also adopted in this paper. Simulation results show that the layered decoding doubles the convergence speed of the iterative belief propagation process. Exploring the special structure of the connections between the check nodes and the variable nodes on the factor graph, we propose an effective detector architecture for generic sparse ISI channels to facilitate the practical application of the proposed detection algorithm. The proposed architecture is also reconfigurable in order to switch flexible connections on the factor graph in the time-varying ISI channels

DigitalCommons@WPI

Energy-Efficient Decoders of Near-Capacity Channel Codes.

Author: Park Youn Sung
Publication venue
Publication date
Field of study

Channel coding has become essential in state-of-the-art communication and storage systems for ensuring reliable transmission and storage of information. Their goal is to achieve high transmission reliability while keeping the transmit energy consumption low by taking advantage of the coding gain provided by these codes. The lowest total system energy is achieved with a decoder that provides both good coding gain and high energy-efficiency. This thesis demonstrates the VLSI implementation of near-capacity channel decoders using the LDPC, nonbinary LDPC (NB-LDPC) and polar codes with an emphasis of reducing the decode energy. LDPC code is a widely used channel code due to its excellent error-correcting performance. However, memory dominates the power of high-throughput LDPC decoders. Therefore, these memories are replaced with a novel non-refresh embedded DRAM (eDRAM) taking advantage of the deterministic memory access pattern and short access window of the decoding algorithm to trade off retention time for faster access speed. The resulting LDPC decoder with integrated eDRAMs achieves state-of-the-art area- and energy-efficiency. NB-LDPC code achieves better error-correcting performance than LDPC code at the cost of higher decoding complexity. However, the factor graph is simplified, permitting a fully parallel architecture with low wiring overhead. To reduce the dynamic power of the decoder, a fine-grained dynamic clock gating technique is applied based on node-level convergence. This technique greatly reduces dynamic power allowing the decoder to achieve high energy-efficiency while achieving high throughput. The recently invented polar code has a similar error-correcting performance to LDPC code of comparable block length. However, the easy reconfigurability of code rate as well as block length makes it desirable in numerous applications where LDPC is not competitive. In addition, the regular structure and simple processing enables a highly efficient decoder in terms of area and power. Using the belief propagation algorithm with architectural and memory improvements, a polar decoder is demonstrated achieving high throughput and high energy- and area-efficiency. The demonstrated energy-efficient decoders have advanced the state-of-the-art. The decoders will allow the continued reduction of decode energy for the latest communication and storage applications. The developed techniques are widely applicable to designing low-power DSP processors.PhDElectrical EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/108731/1/parkyoun_1.pd

Deep Blue Documents at the University of Michigan

Algorithm Development and VLSI Implementation of Energy Efficient Decoders of Polar Codes

Author: Xu Jingwei
Publication venue
Publication date: 27/02/2020
Field of study

With its low error-floor performance, polar codes attract significant attention as the potential standard error correction code (ECC) for future communication and data storage. However, the VLSI implementation complexity of polar codes decoders is largely influenced by its nature of in-series decoding. This dissertation is dedicated to presenting optimal decoder architectures for polar codes. This dissertation addresses several structural properties of polar codes and key properties of decoding algorithms that are not dealt with in the prior researches. The underlying concept of the proposed architectures is a paradigm that simplifies and schedules the computations such that hardware is simplified, latency is minimized and bandwidth is maximized. In pursuit of the above, throughput centric successive cancellation (TCSC) and overlapping path list successive cancellation (OPLSC) VLSI architectures and express journey BP (XJBP) decoders for the polar codes are presented. An arbitrary polar code can be decomposed by a set of shorter polar codes with special characteristics, those shorter polar codes are referred to as constituent polar codes. By exploiting the homogeneousness between decoding processes of different constituent polar codes, TCSC reduces the decoding latency of the SC decoder by 60% for codes with length n = 1024. The error correction performance of SC decoding is inferior to that of list successive cancellation decoding. The LSC decoding algorithm delivers the most reliable decoding results; however, it consumes most hardware resources and decoding cycles. Instead of using multiple instances of decoding cores in the LSC decoders, a single SC decoder is used in the OPLSC architecture. The computations of each path in the LSC are arranged to occupy the decoder hardware stages serially in a streamlined fashion. This yields a significant reduction of hardware complexity. The OPLSC decoder has achieved about 1.4 times hardware efficiency improvement compared with traditional LSC decoders. The hardware efficient VLSI architectures for TCSC and OPLSC polar codes decoders are also introduced. Decoders based on SC or LSC algorithms suffer from high latency and limited throughput due to their serial decoding natures. An alternative approach to decode the polar codes is belief propagation (BP) based algorithm. In BP algorithm, a graph is set up to guide the beliefs propagated and refined, which is usually referred to as factor graph. BP decoding algorithm allows decoding in parallel to achieve much higher throughput. XJBP decoder facilitates belief propagation by utilizing the specific constituent codes that exist in the conventional factor graph, which results in an express journey (XJ) decoder. Compared with the conventional BP decoding algorithm for polar codes, the proposed decoder reduces the computational complexity by about 40.6%. This enables an energy-efficient hardware implementation. To further explore the hardware consumption of the proposed XJBP decoder, the computations scheduling is modeled and analyzed in this dissertation. With discussions on different hardware scenarios, the optimal scheduling plans are developed. A novel memory-distributed micro-architecture of the XJBP decoder is proposed and analyzed to solve the potential memory access problems of the proposed scheduling strategy. The register-transfer level (RTL) models of the XJBP decoder are set up for comparisons with other state-of-the-art BP decoders. The results show that the power efficiency of BP decoders is improved by about 3 times

Texas A&M Repository

Algorithm Development and VLSI Implementation of Energy Efficient Decoders of Polar Codes

Author: Xu Jingwei
Publication venue
Publication date: 27/02/2020
Field of study

An Iterative Soft Decision Based LR-Aided MIMO Detector

Author: Rahman Mehnaz
Publication venue
Publication date: 27/02/2020
Field of study

The demand for wireless and high-rate communication system is increasing gradually and multiple-input-multiple-output (MIMO) is one of the feasible solutions to accommodate the growing demand for its spatial multiplexing and diversity gain. However, with high number of antennas, the computational and hardware complexity of MIMO increases exponentially. This accumulating complexity is a paramount problem in MIMO detection system directly leading to large power consumption. Hence, the major focus of this dissertation is algorithmic and hardware development of MIMO decoder with reduced complexity for both real and complex domain, which can be a beneficial solution with power efficiency and high throughput. Both hard and soft domain MIMO detectors are considered. The use of lattice reduction (LR) algorithm and on-demand-child-expansion for the reduction of noise propagation and node calculation respectively are the two of the key features of our developed architecture, presented in this literature. The real domain iterative soft MIMO decoding algorithm, simulated for 4 × 4 MIMO with different modulation scheme, achieves 1.1 to 2.7 dB improvement over Lease Sphere Decoder (LSD) and more than 8x reduction in list size, K as well as complexity of the detector. Next, the iterative real domain K-Best decoder is expanded to the complex domain with new detection scheme. It attains 6.9 to 8.0 dB improvement over real domain K-Best decoder and 1.4 to 2.5 dB better performance over conventional complex decoder for 8 × 8 MIMO with 64 QAM modulation scheme. Besides K, a new adjustable parameter, Rlimit has been introduced in order to append re-configurability trading-off between complexity and performance. After that, a novel low-power hardware architecture of complex decoder is developed for 8 × 8 MIMO and 64 QAM modulation scheme. The total word length of only 16 bits has been adopted limiting the bit error rate (BER) degradation to 0.3 dB with K and Rlimit equal to 4. The proposed VLSI architecture is modeled in Verilog HDL using Xilinx and synthesized using Synopsys Design Vision in 45 nm CMOS technology. According to the synthesize result, it achieves 1090.8 Mbps throughput with power consumption of 580 mW and latency of 0.33 us. The maximum frequency the design proposed is 181.8 MHz. All of the proposed decoders mentioned above are bounded by the fixed K. Hence, an adaptive real domain K-Best decoder is further developed to achieve the similar performance with less K, thereby reducing the computational complexity of the decoder. It does not require accurate SNR measurement to perform the initial estimation of list size, K. Instead, the difference between the first two minimal distances is considered, which inherently eliminates complexity. In summary, a novel iterative K-Best detector for both real and complex domain with efficient VLSI design is proposed in this dissertation. The results from extensive simulation and VHDL with analysis using Synopsys tool are also presented for justification and validation of the proposed works

An Iterative Soft Decision Based LR-Aided MIMO Detector

Author: Rahman Mehnaz
Publication venue
Publication date: 27/02/2020
Field of study

Texas A&M Repository

System Development and VLSI Implementation of High Throughput and Hardware Efficient Polar Code Decoder

Author: Che Tiben
Publication venue
Publication date: 21/02/2020
Field of study

Polar code is the first channel code which is provable to achieve the Shannon capacity. Additionally, it has a very good performance in terms of low error floor. All these merits make it a potential candidate for the future standard of wireless communication or storage system. Polar code is received increasing research interest these years. However, the hardware implementation of hardware decoder still has not meet the expectation of practical applications, no matter from neither throughput aspect nor hardware efficient aspect. This dissertation presents several system development approaches and hardware structures for three widely known decoding algorithms. These algorithms are successive cancellation (SC), list successive cancellation (LSC) and belief propagation (BP). All the efforts are in order to maximize the throughput meanwhile minimize the hardware cost. Throughput centric successive cancellation (TCSC) decoder is proposed for SC decoding. By introducing the concept of constituent code, the decoding latency is significantly reduced with a negligible decoding performance loss. However, the specifically designed computation unites dramatically increase the hardware cost, and how to handle the conventional polar code sets and constituent codes sets makes the hardware implementation more complicated. By exploiting the natural property of conventional SC decoder, datapaths for decoding constituent codes are compatibly built via computation units sharing technique. This approach does not incur additional hardware cost expect some multiplexer logic, but can significantly increase the decoding throughput. Other techniques such as pre-computing and gate-level optimization are used as well in order to further increase the decoding throughput. A specific designed partial sum generator (PSG) is also investigated in this dissertation. This PSG is hardware efficient and timing compatible with proposed TCSC decoder. Additionally, a polar code construction scheme with constituent codes optimization is also presents. This construction scheme aims to reduce the constituent codes based SC decoding latency. Results show that, compared with the state-of-art decoder, TCSC can achieve at least 60% latency reduction for the codes with length n = 1024. By using Nangate FreePDK 45nm process, TCSC decoder can reach throughput up to 5.81 Gbps and 2.01 Gbps for (1024, 870) and (1024, 512) polar code, respectively. Besides, with the proposed construction scheme, the TCSC decoder generally is able to further achieve at least around 20% latency deduction with an negligible gain loss. Overlapped List Successive Cancellation (OLSC) is proposed for LSC decoding as a design approach. LSC decoding has a better performance than LS decoding at the cost of hardware consumption. With such approach, the l (l > 1) instances of successive cancellation (SC) decoder for LSC with list size l can be cut down to only one. This results in a dramatic reduction of the hardware complexity without any decoding performance loss. Meanwhile, approaches to reduce the latency associated with the pipeline scheme are also investigated. Simulation results show that with proposed design approach the hardware efficiency is increased significantly over the recently proposed LSC decoders. Express Journey Belief Propagation (XJBP) is proposed for BP decoding. This idea origins from extending the constituent codes concept from SC to BP decoding. Express journey refers to the datapath of specific constituent codes in the factor graph, which accelerates the belief information propagation speed. The XJBP decoder is able to achieve 40.6% computational complexity reduction with the conventional BP decoding. This enables an energy efficient hardware implementation. In summary, all the efforts to optimize the polar code decoder are presented in this dissertation, supported by the careful analysis, precise description, extensively numerical simulations, thoughtful discussion and RTL implementation on VLSI design platforms

Texas A&M Repository

System Development and VLSI Implementation of High Throughput and Hardware Efficient Polar Code Decoder

Author: Che Tiben
Publication venue
Publication date: 21/02/2020
Field of study

Hardware implementation aspects of polar decoders and ultra high-speed LDPC decoders

Author: Balatsoukas Stimming Alexios Konstantinos
Publication venue: Lausanne, EPFL
Publication date: 19/10/2016
Field of study

The goal of channel coding is to detect and correct errors that appear during the transmission of information. In the past few decades, channel coding has become an integral part of most communications standards as it improves the energy-efficiency of transceivers manyfold while only requiring a modest investment in terms of the required digital signal processing capabilities. The most commonly used channel codes in modern standards are low-density parity-check (LDPC) codes and Turbo codes, which were the first two types of codes to approach the capacity of several channels while still being practically implementable in hardware. The decoding algorithms for LDPC codes, in particular, are highly parallelizable and suitable for high-throughput applications. A new class of channel codes, called polar codes, was introduced recently. Polar codes have an explicit construction and low-complexity encoding and successive cancellation (SC) decoding algorithms. Moreover, polar codes are provably capacity achieving over a wide range of channels, making them very attractive from a theoretical perspective. Unfortunately, polar codes under standard SC decoding cannot compete with the LDPC and Turbo codes that are used in current standards in terms of their error-correcting performance. For this reason, several improved SC-based decoding algorithms have been introduced. The most prominent SC-based decoding algorithm is the successive cancellation list (SCL) decoding algorithm, which is powerful enough to approach the error-correcting performance of LDPC codes. The original SCL decoding algorithm was described in an arithmetic domain that is not well-suited for hardware implementations and is not clear how an efficient SCL decoder architecture can be implemented. To this end, in this thesis, we re-formulate the SCL decoding algorithm in two distinct arithmetic domains, we describe efficient hardware architectures to implement the resulting SCL decoders, and we compare the decoders with existing LDPC and Turbo decoders in terms of their error-correcting performance and their implementation efficiency. Due to the ongoing technology scaling, the feature sizes of integrated circuits keep shrinking at a remarkable pace. As transistors and memory cells keep shrinking, it becomes increasingly difficult and costly (in terms of both area and power) to ensure that the implemented digital circuits always operate correctly. Thus, manufactured digital signal processing circuits, including channel decoder circuits, may not always operate correctly. Instead of discarding these faulty dies or using costly circuit-level fault mitigation mechanisms, an alternative approach is to try to live with certain malfunctions, provided that the algorithm implemented by the circuit is sufficiently fault-tolerant. In this spirit, in this thesis we examine decoding of polar codes and LDPC codes under the assumption that the memories that are used within the decoders are not fully reliable. We show that, in both cases, there is inherent fault-tolerance and we also propose some methods to reduce the effect of memory faults on the error-correcting performance of the considered decoders

Infoscience - École polytechnique fédérale de Lausanne