382 research outputs found

    System-on-chip Computing and Interconnection Architectures for Telecommunications and Signal Processing

    Get PDF
    This dissertation proposes novel architectures and design techniques targeting SoC building blocks for telecommunications and signal processing applications. Hardware implementation of Low-Density Parity-Check decoders is approached at both the algorithmic and the architecture level. Low-Density Parity-Check codes are a promising coding scheme for future communication standards due to their outstanding error correction performance. This work proposes a methodology for analyzing effects of finite precision arithmetic on error correction performance and hardware complexity. The methodology is throughout employed for co-designing the decoder. First, a low-complexity check node based on the P-output decoding principle is designed and characterized on a CMOS standard-cells library. Results demonstrate implementation loss below 0.2 dB down to BER of 10^{-8} and a saving in complexity up to 59% with respect to other works in recent literature. High-throughput and low-latency issues are addressed with modified single-phase decoding schedules. A new "memory-aware" schedule is proposed requiring down to 20% of memory with respect to the traditional two-phase flooding decoding. Additionally, throughput is doubled and logic complexity reduced of 12%. These advantages are traded-off with error correction performance, thus making the solution attractive only for long codes, as those adopted in the DVB-S2 standard. The "layered decoding" principle is extended to those codes not specifically conceived for this technique. Proposed architectures exhibit complexity savings in the order of 40% for both area and power consumption figures, while implementation loss is smaller than 0.05 dB. Most modern communication standards employ Orthogonal Frequency Division Multiplexing as part of their physical layer. The core of OFDM is the Fast Fourier Transform and its inverse in charge of symbols (de)modulation. Requirements on throughput and energy efficiency call for FFT hardware implementation, while ubiquity of FFT suggests the design of parametric, re-configurable and re-usable IP hardware macrocells. In this context, this thesis describes an FFT/IFFT core compiler particularly suited for implementation of OFDM communication systems. The tool employs an accuracy-driven configuration engine which automatically profiles the internal arithmetic and generates a core with minimum operands bit-width and thus minimum circuit complexity. The engine performs a closed-loop optimization over three different internal arithmetic models (fixed-point, block floating-point and convergent block floating-point) using the numerical accuracy budget given by the user as a reference point. The flexibility and re-usability of the proposed macrocell are illustrated through several case studies which encompass all current state-of-the-art OFDM communications standards (WLAN, WMAN, xDSL, DVB-T/H, DAB and UWB). Implementations results are presented for two deep sub-micron standard-cells libraries (65 and 90 nm) and commercially available FPGA devices. Compared with other FFT core compilers, the proposed environment produces macrocells with lower circuit complexity and same system level performance (throughput, transform size and numerical accuracy). The final part of this dissertation focuses on the Network-on-Chip design paradigm whose goal is building scalable communication infrastructures connecting hundreds of core. A low-complexity link architecture for mesochronous on-chip communication is discussed. The link enables skew constraint looseness in the clock tree synthesis, frequency speed-up, power consumption reduction and faster back-end turnarounds. The proposed architecture reaches a maximum clock frequency of 1 GHz on 65 nm low-leakage CMOS standard-cells library. In a complex test case with a full-blown NoC infrastructure, the link overhead is only 3% of chip area and 0.5% of leakage power consumption. Finally, a new methodology, named metacoding, is proposed. Metacoding generates correct-by-construction technology independent RTL codebases for NoC building blocks. The RTL coding phase is abstracted and modeled with an Object Oriented framework, integrated within a commercial tool for IP packaging (Synopsys CoreTools suite). Compared with traditional coding styles based on pre-processor directives, metacoding produces 65% smaller codebases and reduces the configurations to verify up to three orders of magnitude

    A Simplified Min-Sum Decoding Algorithm for Non-Binary LDPC Codes

    Full text link
    Non-binary low-density parity-check codes are robust to various channel impairments. However, based on the existing decoding algorithms, the decoder implementations are expensive because of their excessive computational complexity and memory usage. Based on the combinatorial optimization, we present an approximation method for the check node processing. The simulation results demonstrate that our scheme has small performance loss over the additive white Gaussian noise channel and independent Rayleigh fading channel. Furthermore, the proposed reduced-complexity realization provides significant savings on hardware, so it yields a good performance-complexity tradeoff and can be efficiently implemented.Comment: Partially presented in ICNC 2012, International Conference on Computing, Networking and Communications. Accepted by IEEE Transactions on Communication

    Configurable LDPC Decoder Architecture for Regular and Irregular Codes

    Get PDF
    Low Density Parity Check (LDPC) codes are one of the best error correcting codes that enable the future generations of wireless devices to achieve higher data rates with excellent quality of service. This paper presents two novel flexible decoder architectures. The first one supports (3, 6) regular codes of rate 1/2 that can be used for different block lengths. The second decoder is more general and supports both regular and irregular LDPC codes with twelve combinations of code lengths −648, 1296, 1944-bits and code rates-1/2, 2/3, 3/4, 5/6- based on the IEEE 802.11n standard. All codes correspond to a block-structured parity check matrix, in which the sub-blocks are either a shifted identity matrix or a zero matrix. Prototype architectures for both LDPC decoders have been implemented and tested on a Xilinx field programmable gate array.NokiaNational Science Foundatio

    Improve the Usability of Polar Codes: Code Construction, Performance Enhancement and Configurable Hardware

    Full text link
    Error-correcting codes (ECC) have been widely used for forward error correction (FEC) in modern communication systems to dramatically reduce the signal-to-noise ratio (SNR) needed to achieve a given bit error rate (BER). Newly invented polar codes have attracted much interest because of their capacity-achieving potential, efficient encoder and decoder implementation, and flexible architecture design space.This dissertation is aimed at improving the usability of polar codes by providing a practical code design method, new approaches to improve the performance of polar code, and a configurable hardware design that adapts to various specifications. State-of-the-art polar codes are used to achieve extremely low error rates. In this work, high-performance FPGA is used in prototyping polar decoders to catch rare-case errors for error-correcting performance verification and error analysis. To discover the polarization characteristics and error patterns of polar codes, an FPGA emulation platform for belief-propagation (BP) decoding is built by a semi-automated construction flow. The FPGA-based emulation achieves significant speedup in large-scale experiments involving trillions of data frames. The platform is a key enabler of this work. The frozen set selection of polar codes, known as bit selection, is critical to the error-correcting performance of polar codes. A simulation-based in-order bit selection method is developed to evaluate the error rate of each bit using Monte Carlo simulations. The frozen set is selected based on the bit reliability ranking. The resulting code construction exhibits up to 1 dB coding gain with respect to the conventional bit selection. To further improve the coding gain of BP decoder for low-error-rate applications, the decoding error mechanisms are studied and analyzed, and the errors are classified based on their distinct signatures. Error detection is enabled by low-cost CRC concatenation, and post-processing algorithms targeting at each type of the error is designed to mitigate the vast majority of the decoding errors. The post-processor incurs only a small implementation overhead, but it provides more than an order of magnitude improvement of the error-correcting performance. The regularity of the BP decoder structure offers many hardware architecture choices. Silicon area, power consumption, throughput and latency can be traded to reach the optimal design points for practical use cases. A comprehensive design space exploration reveals several practical architectures at different design points. The scalability of each architecture is also evaluated based on the implementation candidates. For dynamic communication channels, such as wireless channels in the upcoming 5G applications, multiple codes of different lengths and code rates are needed to t varying channel conditions. To minimize implementation cost, a universal decoder architecture is proposed to support multiple codes through hardware reuse. A 40nm length- and rate-configurable polar decoder ASIC is demonstrated to fit various communication environments and service requirements.PHDElectrical EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttps://deepblue.lib.umich.edu/bitstream/2027.42/140817/1/shuangsh_1.pd

    Evaluation of flexible SPA based LPDC decoder using hardware friendly approximation methods

    Get PDF
    Due to computation-intensive nature of LDPC decoders, a lot of research is going towards efficient implementation of their original algorithm (SPA). As "Min-Sum" approximation is basically an overestimation of SPA, this thesis investigates more accurate, yet area efficient, approximations of SPA, to select an optimum one. In a general comparison between main approximation methods (e.g. LUT, PWL, CRI), PWL showed the most area-efficiency. Studying different mathematical formats of SPA, Soft-XOR based format with forward-backward scheme was chosen for hard- ware implementation. Its core function (Soft-XOR) was implemented with CRI approximation, which achieved the highest efficiency, compare to other approxi- mations. Using this core function, a flexible, pipe-lined, Soft-XOR based CNU (the computational unit of LDPC decoders) with forward-backward architecture was developed in 18nm CMOS. The implemented CNU’s area and speed can eas- ily be changed in instantiation. A SPA decoder based on the developed CNU was estimated to have an area of 1.6M as equivalent gate count and a throughput of 10Gb/s, with a frequency of 1.25GHz and for 10 iterations. The decoder uses IEEE 802.11n Wi-Fi standard with flooding schedule. The BER/SNR loss, com- pare to floating-point SPA, is 0.3dB for 10 iterations and less than 0.1dB for 20 iterations.You have to get lost before you can be found, a quote by Jeff Rasley goes very well for Low Density Parity Check (LDPC) codes. First invented by Gallager in 1962 but kind of lost during the journey of evolution of telecommunication networks because of their high complexity and demanding computations, which technology was not so advanced to handle, at that time. However, during late 1990s, success of turbo codes invoked the re-discovery of Low Density Parity Check (LDPC) codes. Recently it has attracted tremendous research interest among the scientific com- munity, as today’s technology is advanced enough and to make LDPC decoders completely commercial. In a wireless network, the information is not just sim- ply sent, but first encoded. In a sense, all the transmitted bits are tied together, according to some mathematical rules. Therefore, if noise destructs parts of the information while traveling, the LDPC decoder at the receiver side, can automat- ically detect and retrieve those parts, based on the other parts. Here, our main focus is on the decoder. For actual hardware implementation of the decoder, some level of approximation of the ideal algorithm is always necessary, which reduces the accuracy depending on the approximation. Ericsson is developing the next-generation wireless network for 5G, and already possesses the "Min-Sum" approximation of the LDPC decoder. As the current requirements demand more accurate decoders, the goal of this thesis is to evalu- ate a more accurate but more costly version of the LDPC decoder, as well as its flexibility. Thus, several candidates were selected and evaluated based on their complexity, cost, and their accuracy towards error correction. After performing several trade-offs, an approximation method is chosen and the corresponding cost is derived. With this acquired data, a trade-off between accuracy and cost can be made, depending on the application

    Design of High Throughput Reconfigurable LDPC CODEC

    Full text link
    Channel coding is an essential part of communication systems, which significantly reduces the error rate of receiving messages. Nowadays, iterative decoding methods play an important role in wireless communication such as 5G, Wi-Fi etc. Low-Density Parity-Check (LDPC) codes are one of the most used iterative decoding codes, which attract lots of interest in a wide range of applications. LDPC codes have a channel approaching capacity, which is practical for implementation as well. The thesis focuses on the design of high throughput reconfigurable LDPC channel codec with good performance. The main focus of this thesis is the design of a novel decoding algorithm for LDPC codes. The new decoding algorithm is configurable to adjust its performance and complexity, which is very flexible for applications. Its error correction capability is close to the sum-product algorithm but with significantly lower complexity. We further implement the LDPC encoder/decoder on FPGA, which is reconfigurable for 5G NR or user-defined LDPC codes. In particular, we apply the new decoding algorithm to the decoder and analyse its performance on hardware. Moreover, we compared the error detection performance of 5G NR CRC and LDPC Syndrome to investigate the necessity of using CRC decoding or LDPC syndrome check, or both in practical systems. At last, a 5G NR physical layer simulating SoC embedded system is built on FPGA for the verification of the encoder and decoder

    Comparison of Polar Decoders with Existing Low-Density Parity-Check and Turbo Decoders

    Full text link
    Polar codes are a recently proposed family of provably capacity-achieving error-correction codes that received a lot of attention. While their theoretical properties render them interesting, their practicality compared to other types of codes has not been thoroughly studied. Towards this end, in this paper, we perform a comparison of polar decoders against LDPC and Turbo decoders that are used in existing communications standards. More specifically, we compare both the error-correction performance and the hardware efficiency of the corresponding hardware implementations. This comparison enables us to identify applications where polar codes are superior to existing error-correction coding solutions as well as to determine the most promising research direction in terms of the hardware implementation of polar decoders.Comment: Fixes small mistakes from the paper to appear in the proceedings of IEEE WCNC 2017. Results were presented in the "Polar Coding in Wireless Communications: Theory and Implementation" Worksho

    A Flexible LDPC/Turbo Decoder Architecture

    Get PDF
    Low-density parity-check (LDPC) codes and convolutional Turbo codes are two of the most powerful error correcting codes that are widely used in modern communication systems. In a multi-mode baseband receiver, both LDPC and Turbo decoders may be required. However, the different decoding approaches for LDPC and Turbo codes usually lead to different hardware architectures. In this paper we propose a unified message passing algorithm for LDPC and Turbo codes and introduce a flexible soft-input soft-output (SISO) module to handle LDPC/Turbo decoding. We employ the trellis-based maximum a posteriori (MAP) algorithm as a bridge between LDPC and Turbo codes decoding. We view the LDPC code as a concatenation of n super-codes where each super-code has a simpler trellis structure so that the MAP algorithm can be easily applied to it. We propose a flexible functional unit (FFU) for MAP processing of LDPC and Turbo codes with a low hardware overhead (about 15% area and timing overhead). Based on the FFU, we propose an area-efficient flexible SISO decoder architecture to support LDPC/Turbo codes decoding. Multiple such SISO modules can be embedded into a parallel decoder for higher decoding throughput. As a case study, a flexible LDPC/Turbo decoder has been synthesized on a TSMC 90 nm CMOS technology with a core area of 3.2 mm2. The decoder can support IEEE 802.16e LDPC codes, IEEE 802.11n LDPC codes, and 3GPP LTE Turbo codes. Running at 500 MHz clock frequency, the decoder can sustain up to 600 Mbps LDPC decoding or 450 Mbps Turbo decoding.NokiaNokia Siemens Networks (NSN)XilinxTexas InstrumentsNational Science Foundatio

    A High-Performance and Low-Complexity 5G LDPC Decoder: Algorithm and Implementation

    Full text link
    5G New Radio (NR) has stringent demands on both performance and complexity for the design of low-density parity-check (LDPC) decoding algorithms and corresponding VLSI implementations. Furthermore, decoders must fully support the wide range of all 5G NR blocklengths and code rates, which is a significant challenge. In this paper, we present a high-performance and low-complexity LDPC decoder, tailor-made to fulfill the 5G requirements. First, to close the gap between belief propagation (BP) decoding and its approximations in hardware, we propose an extension of adjusted min-sum decoding, called generalized adjusted min-sum (GA-MS) decoding. This decoding algorithm flexibly truncates the incoming messages at the check node level and carefully approximates the non-linear functions of BP decoding to balance the error-rate and hardware complexity. Numerical results demonstrate that the proposed fixed-point GAMS has only a minor gap of 0.1 dB compared to floating-point BP under various scenarios of 5G standard specifications. Secondly, we present a fully reconfigurable 5G NR LDPC decoder implementation based on GA-MS decoding. Given that memory occupies a substantial portion of the decoder area, we adopt multiple data compression and approximation techniques to reduce 42.2% of the memory overhead. The corresponding 28nm FD-SOI ASIC decoder has a core area of 1.823 mm2 and operates at 895 MHz. It is compatible with all 5G NR LDPC codes and achieves a peak throughput of 24.42 Gbps and a maximum area efficiency of 13.40 Gbps/mm2 at 4 decoding iterations.Comment: 14 pages, 14 figure
    corecore