1,461 research outputs found

    High throughput low power decoder architectures for low density parity check codes

    Get PDF
    A high throughput scalable decoder architecture, a tiling approach to reduce the complexity of the scalable architecture, and two low power decoding schemes have been proposed in this research. The proposed scalable design is generated from a serial architecture by scaling the combinational logic; memory partitioning and constructing a novel H matrix to make parallelization possible. The scalable architecture achieves a high throughput for higher values of the parallelization factor M. The switch logic used to route the bit nodes to the appropriate checks is an important constituent of the scalable architecture and its complexity is high with higher M. The proposed tiling approach is applied to the scalable architecture to simplify the switch logic and reduce gate complexity. The tiling approach generates patterns that are used to construct the H matrix by repeating a fixed number of those generated patterns. The advantages of the proposed approach are two-fold. First, the information stored about the H matrix is reduced by onethird. Second, the switch logic of the scalable architecture is simplified. The H matrix information is also embedded in the switch and no external memory is needed to store the H matrix. Scalable architecture and tiling approach are proposed at the architectural level of the LDPC decoder. We propose two low power decoding schemes that take advantage of the distribution of errors in the received packets. Both schemes use a hard iteration after a fixed number of soft iterations. The dynamic scheme performs X soft iterations, then a parity checker cHT that computes the number of parity checks in error. Based on cHT value, the decoder decides on performing either soft iterations or a hard iteration. The advantage of the hard iteration is so significant that the second low power scheme performs a fixed number of iterations followed by a hard iteration. To compensate the bit error rate performance, the number of soft iterations in this case is higher than that of those performed before cHT in the first scheme

    Scalable and Low Power LDPC Decoder Design Using High Level Algorithmic Synthesis

    Get PDF
    This paper presents a scalable and low power low-density parity-check (LDPC) decoder design for the next generation wireless handset SoC. The methodology is based on high level synthesis: PICO (program-in chip-out) tool was used to produce efficient RTL directly from a sequential untimed C algorithm. We propose two parallel LDPC decoder architectures: (1) per-layer decoding architecture with scalable parallelism, and (2) multi-layer pipelined decoding architecture to achieve higher throughput. Based on the PICO technology, we have implemented a two-layer pipelined decoder on a TSMC 65nm 0.9V 8-metal layer CMOS technology with a core area of 1.2 mm2. The maximum achievable throughput is 415 Mbps when operating at 400 MHz clock frequency and the estimated peak power consumption is 180 mW.NokiaNokia Siemens Networks (NSN)XilinxNational Science Foundatio

    An Improved Throughput for Non-Binary Low-Density-Parity-Check Decoder

    Get PDF
    Low-Density-Parity-Check (LDPC) based error control decoders find wide range of application in both storage and communication systems, because of the merits they possess which include high appropriateness towards parallelization and excellent performance in error correction. Field-Programmable Gate Array (FPGA) has provided a robust platform in terms of parallelism, resource allocation and excellent performing speed for implementing non-binary LDPC decoder architectures. This paper proposes, a high throughput LDPC decoder through the implementation of fully parallel architecture and a reduction in the maximum iteration limit, needed for complete error correction. A Galois field of eight was utilized alongside a non-uniform quantization scheme, resulting in fewer bits per Log Likelihood Ratio (LLR) for the implementation. Verilog Hardware Description Language (HDL) was used in the description of the non-binary error control decoder. The propose decoder attained a throughput of 10Gbps at 400-MHz clock frequency when synthesized on a ZYNQ 7000 Series FPGA

    A 2.0 Gb/s Throughput Decoder for QC-LDPC Convolutional Codes

    Full text link
    This paper propose a decoder architecture for low-density parity-check convolutional code (LDPCCC). Specifically, the LDPCCC is derived from a quasi-cyclic (QC) LDPC block code. By making use of the quasi-cyclic structure, the proposed LDPCCC decoder adopts a dynamic message storage in the memory and uses a simple address controller. The decoder efficiently combines the memories in the pipelining processors into a large memory block so as to take advantage of the data-width of the embedded memory in a modern field-programmable gate array (FPGA). A rate-5/6 QC-LDPCCC has been implemented on an Altera Stratix FPGA. It achieves up to 2.0 Gb/s throughput with a clock frequency of 100 MHz. Moreover, the decoder displays an excellent error performance of lower than 101310^{-13} at a bit-energy-to-noise-power-spectral-density ratio (Eb/N0E_b/N_0) of 3.55 dB.Comment: accepted to IEEE Transactions on Circuits and Systems

    System-on-chip Computing and Interconnection Architectures for Telecommunications and Signal Processing

    Get PDF
    This dissertation proposes novel architectures and design techniques targeting SoC building blocks for telecommunications and signal processing applications. Hardware implementation of Low-Density Parity-Check decoders is approached at both the algorithmic and the architecture level. Low-Density Parity-Check codes are a promising coding scheme for future communication standards due to their outstanding error correction performance. This work proposes a methodology for analyzing effects of finite precision arithmetic on error correction performance and hardware complexity. The methodology is throughout employed for co-designing the decoder. First, a low-complexity check node based on the P-output decoding principle is designed and characterized on a CMOS standard-cells library. Results demonstrate implementation loss below 0.2 dB down to BER of 10^{-8} and a saving in complexity up to 59% with respect to other works in recent literature. High-throughput and low-latency issues are addressed with modified single-phase decoding schedules. A new "memory-aware" schedule is proposed requiring down to 20% of memory with respect to the traditional two-phase flooding decoding. Additionally, throughput is doubled and logic complexity reduced of 12%. These advantages are traded-off with error correction performance, thus making the solution attractive only for long codes, as those adopted in the DVB-S2 standard. The "layered decoding" principle is extended to those codes not specifically conceived for this technique. Proposed architectures exhibit complexity savings in the order of 40% for both area and power consumption figures, while implementation loss is smaller than 0.05 dB. Most modern communication standards employ Orthogonal Frequency Division Multiplexing as part of their physical layer. The core of OFDM is the Fast Fourier Transform and its inverse in charge of symbols (de)modulation. Requirements on throughput and energy efficiency call for FFT hardware implementation, while ubiquity of FFT suggests the design of parametric, re-configurable and re-usable IP hardware macrocells. In this context, this thesis describes an FFT/IFFT core compiler particularly suited for implementation of OFDM communication systems. The tool employs an accuracy-driven configuration engine which automatically profiles the internal arithmetic and generates a core with minimum operands bit-width and thus minimum circuit complexity. The engine performs a closed-loop optimization over three different internal arithmetic models (fixed-point, block floating-point and convergent block floating-point) using the numerical accuracy budget given by the user as a reference point. The flexibility and re-usability of the proposed macrocell are illustrated through several case studies which encompass all current state-of-the-art OFDM communications standards (WLAN, WMAN, xDSL, DVB-T/H, DAB and UWB). Implementations results are presented for two deep sub-micron standard-cells libraries (65 and 90 nm) and commercially available FPGA devices. Compared with other FFT core compilers, the proposed environment produces macrocells with lower circuit complexity and same system level performance (throughput, transform size and numerical accuracy). The final part of this dissertation focuses on the Network-on-Chip design paradigm whose goal is building scalable communication infrastructures connecting hundreds of core. A low-complexity link architecture for mesochronous on-chip communication is discussed. The link enables skew constraint looseness in the clock tree synthesis, frequency speed-up, power consumption reduction and faster back-end turnarounds. The proposed architecture reaches a maximum clock frequency of 1 GHz on 65 nm low-leakage CMOS standard-cells library. In a complex test case with a full-blown NoC infrastructure, the link overhead is only 3% of chip area and 0.5% of leakage power consumption. Finally, a new methodology, named metacoding, is proposed. Metacoding generates correct-by-construction technology independent RTL codebases for NoC building blocks. The RTL coding phase is abstracted and modeled with an Object Oriented framework, integrated within a commercial tool for IP packaging (Synopsys CoreTools suite). Compared with traditional coding styles based on pre-processor directives, metacoding produces 65% smaller codebases and reduces the configurations to verify up to three orders of magnitude
    corecore