57 research outputs found

    A High-Performance and Low-Complexity 5G LDPC Decoder: Algorithm and Implementation

    Full text link
    5G New Radio (NR) has stringent demands on both performance and complexity for the design of low-density parity-check (LDPC) decoding algorithms and corresponding VLSI implementations. Furthermore, decoders must fully support the wide range of all 5G NR blocklengths and code rates, which is a significant challenge. In this paper, we present a high-performance and low-complexity LDPC decoder, tailor-made to fulfill the 5G requirements. First, to close the gap between belief propagation (BP) decoding and its approximations in hardware, we propose an extension of adjusted min-sum decoding, called generalized adjusted min-sum (GA-MS) decoding. This decoding algorithm flexibly truncates the incoming messages at the check node level and carefully approximates the non-linear functions of BP decoding to balance the error-rate and hardware complexity. Numerical results demonstrate that the proposed fixed-point GAMS has only a minor gap of 0.1 dB compared to floating-point BP under various scenarios of 5G standard specifications. Secondly, we present a fully reconfigurable 5G NR LDPC decoder implementation based on GA-MS decoding. Given that memory occupies a substantial portion of the decoder area, we adopt multiple data compression and approximation techniques to reduce 42.2% of the memory overhead. The corresponding 28nm FD-SOI ASIC decoder has a core area of 1.823 mm2 and operates at 895 MHz. It is compatible with all 5G NR LDPC codes and achieves a peak throughput of 24.42 Gbps and a maximum area efficiency of 13.40 Gbps/mm2 at 4 decoding iterations.Comment: 14 pages, 14 figure

    VLSI algorithms and architectures for non-binary-LDPC decoding

    Full text link
    Tesis por compendio[EN] This thesis studies the design of low-complexity soft-decision Non-Binary Low-Density Parity-Check (NB-LDPC) decoding algorithms and their corresponding hardware architectures suitable for decoding high-rate codes at high throughput (hundreds of Mbps and Gbps). In the first part of the thesis the main aspects concerning to the NB-LDPC codes are analyzed, including a study of the main bottlenecks of conventional softdecision decoding algorithms (Q-ary Sum of Products (QSPA), Extended Min-Sum (EMS), Min-Max and Trellis-Extended Min-Sum (T-EMS)) and their corresponding hardware architectures. Despite the limitations of T-EMS algorithm (high complexity in the Check Node (CN) processor, wiring congestion due to the high number of exchanged messages between processors and the inability to implement decoders over high-order Galois fields due to the high decoder complexity), it was selected as starting point for this thesis due to its capability to reach high-throughput. Taking into account the identified limitations of the T-EMS algorithm, the second part of the thesis includes six papers with the results of the research made in order to mitigate the T-EMS disadvantages, offering solutions that reduce the area, the latency and increase the throughput compared to previous proposals from literature without sacrificing coding gain. Specifically, five low-complexity decoding algorithms are proposed, which introduce simplifications in different parts of the decoding process. Besides, five complete decoder architectures are designed and implemented on a 90nm Complementary Metal-Oxide-Semiconductor (CMOS) technology. The results show an achievement in throughput higher than 1Gbps and an area less than 10 mm2. The increase in throughput is 120% and the reduction in area is 53% compared to previous implementations of T-EMS, for the (837,726) NB-LDPC code over GF(32). The proposed decoders reduce the CN area, latency, wiring between CN and Variable Node (VN) processor and the number of storage elements required in the decoder. Considering that these proposals improve both area and speed, the efficiency parameter (Mbps / Million NAND gates) is increased in almost five times compared to other proposals from literature. The improvements in terms of area allow us to implement NB-LDPC decoders over high-order fields which had not been possible until now due to the highcomplexity of decoders previously proposed in literature. Therefore, we present the first post-place and route report for high-rate codes over high-order fields higher than Galois Field (GF)(32). For example, for the (1536,1344) NB-LDPC code over GF(64) the throughput is 1259Mbps occupying an area of 28.90 mm2. On the other hand, a decoder architecture is implemented on a Field Programmable Gate Array (FPGA) device achieving 630 Mbps for the high-rate (2304,2048) NB-LDPC code over GF(16). To the best knowledge of the author, these results constitute the highest ones presented in literature for similar codes and implemented on the same technologies.[ES] En esta tesis se aborda el estudio del diseño de algoritmos de baja complejidad para la decodificación de códigos de comprobación de paridad de baja densidad no binarios (NB-LDPC) y sus correspondientes arquitecturas apropiadas para decodificar códigos de alta tasa a altas velocidades (cientos de Mbps y Gbps). En la primera parte de la tesis los principales aspectos concernientes a los códigos NB-LDPC son analizados, incluyendo un estudio de los principales cuellos de botella presentes en los algoritmos de decodificación convencionales basados en decisión blanda (QSPA, EMS, Min-Max y T-EMS) y sus correspondientes arquitecturas hardware. A pesar de las limitaciones del algoritmo T-EMS (alta complejidad en el procesador del nodo de chequeo de paridad (CN), congestión en el rutado debido al intercambio de mensajes entre procesadores y la incapacidad de implementar decodificadores para campos de Galois de orden elevado debido a la elevada complejidad), éste fue seleccionado como punto de partida para esta tesis debido a su capacidad para alcanzar altas velocidades. Tomando en cuenta las limitaciones identificadas en el algoritmo T-EMS, la segunda parte de la tesis incluye seis artículos con los resultados de la investigación realizada con la finalidad de mitigar las desventajas del algoritmo T-EMS, ofreciendo soluciones que reducen el área, la latencia e incrementando la velocidad comparado con propuestas previas de la literatura sin sacrificar la ganancia de codificación. Especificamente, cinco algoritmos de decodificación de baja complejidad han sido propuestos, introduciendo simplificaciones en diferentes partes del proceso de decodificación. Además, arquitecturas completas de decodificadores han sido diseñadas e implementadas en una tecnologia CMOS de 90nm consiguiéndose una velocidad mayor a 1Gbps con un área menor a 10 mm2, aumentando la velocidad en 120% y reduciendo el área en 53% comparado con previas implementaciones del algoritmo T-EMS para el código (837,726) implementado sobre campo de Galois GF(32). Las arquitecturas propuestas reducen el área del CN, latencia, número de mensajes intercambiados entre el nodo de comprobación de paridad (CN) y el nodo variable (VN) y el número de elementos de almacenamiento en el decodificador. Considerando que estas propuestas mejoran tanto el área comola velocidad, el parámetro de eficiencia (Mbps / Millones de puertas NAND) se ha incrementado en casi cinco veces comparado con otras propuestas de la literatura. Las mejoras en términos de área nos ha permitido implementar decodificadores NBLDPC sobre campos de Galois de orden elevado, lo cual no habia sido posible hasta ahora debido a la alta complejidad de los decodificadores anteriormente propuestos en la literatura. Por lo tanto, en esta tesis se presentan los primeros resultados incluyendo el emplazamiento y rutado para códigos de alta tasa sobre campos finitos de orden mayor a GF(32). Por ejemplo, para el código (1536,1344) sobre GF(64) la velocidad es 1259 Mbps ocupando un área de 28.90 mm2. Por otro lado, una arquitectura de decodificador ha sido implementada en un dispositivo FPGA consiguiendo 660 Mbps de velocidad para el código de alta tasa (2304,2048) sobre GF(16). Estos resultados constituyen, según el mejor conocimiento del autor, los mayores presentados en la literatura para códigos similares implementados para las mismas tecnologías.[CA] En esta tesi s'aborda l'estudi del disseny d'algoritmes de baixa complexitat per a la descodificació de codis de comprovació de paritat de baixa densitat no binaris (NB-LDPC), i les seues corresponents arquitectures per a descodificar codis d'alta taxa a altes velocitats (centenars de Mbps i Gbps). En la primera part de la tesi els principals aspectes concernent als codis NBLDPC són analitzats, incloent un estudi dels principals colls de botella presents en els algoritmes de descodificació convencionals basats en decisió blana (QSPA, EMS, Min-Max i T-EMS) i les seues corresponents arquitectures. A pesar de les limitacions de l'algoritme T-EMS (alta complexitat en el processador del node de revisió de paritat (CN), congestió en el rutat a causa de l'intercanvi de missatges entre processadors i la incapacitat d'implementar descodificadors per a camps de Galois d'orde elevat a causa de l'elevada complexitat), este va ser seleccionat com a punt de partida per a esta tesi degut a la seua capacitat per a aconseguir altes velocitats. Tenint en compte les limitacions identificades en l'algoritme T-EMS, la segona part de la tesi inclou sis articles amb els resultats de la investigació realitzada amb la finalitat de mitigar els desavantatges de l'algoritme T-EMS, oferint solucions que redueixen l'àrea, la latència i incrementant la velocitat comparat amb propostes prèvies de la literatura sense sacrificar el guany de codificació. Específicament, s'han proposat cinc algoritmes de descodificació de baixa complexitat, introduint simplificacions en diferents parts del procés de descodificació. A més, s'han dissenyat arquitectures completes de descodificadors i s'han implementat en una tecnologia CMOS de 90nm aconseguint-se una velocitat major a 1Gbps amb una àrea menor a 10 mm2, augmentant la velocitat en 120% i reduint l'àrea en 53% comparat amb prèvies implementacions de l'algoritme T-EMS per al codi (837,726) implementat sobre camp de Galois GF(32). Les arquitectures proposades redueixen l'àrea del CN, la latència, el nombre de missatges intercanviats entre el node de comprovació de paritat (CN) i el node variable (VN) i el nombre d'elements d'emmagatzemament en el descodificador. Considerant que estes propostes milloren tant l'àrea com la velocitat, el paràmetre d'eficiència (Mbps / Milions deportes NAND) s'ha incrementat en quasi cinc vegades comparat amb altres propostes de la literatura. Les millores en termes d'àrea ens ha permès implementar descodificadors NBLDPC sobre camps de Galois d'orde elevat, la qual cosa no havia sigut possible fins ara a causa de l'alta complexitat dels descodificadors anteriorment proposats en la literatura. Per tant, nosaltres presentem els primers reports després de l'emplaçament i rutat per a codis d'alta taxa sobre camps finits d'orde major a GF(32). Per exemple, per al codi (1536,1344) sobre GF(64) la velocitat és 1259 Mbps ocupant una àrea de 28.90 mm2. D'altra banda, una arquitectura de descodificador ha sigut implementada en un dispositiu FPGA aconseguint 660 Mbps de velocitat per al codi d'alta taxa (2304,2048) sobre GF(16). Estos resultats constitueixen, per al millor coneixement de l'autor, els millors presentats en la literatura per a codis semblants implementats per a les mateixes tecnologies.Lacruz Jucht, JO. (2016). VLSI algorithms and architectures for non-binary-LDPC decoding [Tesis doctoral no publicada]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/73266TESISCompendi

    Decomposition Methods for Large Scale LP Decoding

    Full text link
    When binary linear error-correcting codes are used over symmetric channels, a relaxed version of the maximum likelihood decoding problem can be stated as a linear program (LP). This LP decoder can be used to decode error-correcting codes at bit-error-rates comparable to state-of-the-art belief propagation (BP) decoders, but with significantly stronger theoretical guarantees. However, LP decoding when implemented with standard LP solvers does not easily scale to the block lengths of modern error correcting codes. In this paper we draw on decomposition methods from optimization theory, specifically the Alternating Directions Method of Multipliers (ADMM), to develop efficient distributed algorithms for LP decoding. The key enabling technical result is a "two-slice" characterization of the geometry of the parity polytope, which is the convex hull of all codewords of a single parity check code. This new characterization simplifies the representation of points in the polytope. Using this simplification, we develop an efficient algorithm for Euclidean norm projection onto the parity polytope. This projection is required by ADMM and allows us to use LP decoding, with all its theoretical guarantees, to decode large-scale error correcting codes efficiently. We present numerical results for LDPC codes of lengths more than 1000. The waterfall region of LP decoding is seen to initiate at a slightly higher signal-to-noise ratio than for sum-product BP, however an error floor is not observed for LP decoding, which is not the case for BP. Our implementation of LP decoding using ADMM executes as fast as our baseline sum-product BP decoder, is fully parallelizable, and can be seen to implement a type of message-passing with a particularly simple schedule.Comment: 35 pages, 11 figures. An early version of this work appeared at the 49th Annual Allerton Conference, September 2011. This version to appear in IEEE Transactions on Information Theor

    High Performance Decoder Architectures for Error Correction Codes

    Get PDF
    Due to the rapid development of the information industry, modern communication and storage systems require much higher data rates and reliability to server various demanding applications. However, these systems suffer from noises from the practical channels. Various error correction codes (ECCs), such as Reed-Solomon (RS) codes, convolutional codes, turbo codes, Low-Density Parity-Check (LDPC) codes and so on, have been adopted in lots of current standards. With the increasing data rate, the research of more advanced ECCs and the corresponding efficient decoders will never stop.Binary LDPC codes have been adopted in lots of modern communication and storage applications due their superior error performance and efficient hardware decoder implementations. Non-binary LDPC (NB-LDPC) codes are an important extension of traditional binary LDPC codes. Compared with its binary counterpart, NB-LDPC codes show better error performance under short to moderate block lengths and higher order modulations. Moreover, NB-LDPC codes have lower error floor than binary LDPC codes. In spite of the excellent error performance, it is hard for current communication and storage systems to adopt NB-LDPC codes due to complex decoding algorithms and decoder architectures. In terms of hardware implementation, current NB-LDPC decoders need much larger area and achieve much lower data throughput.Besides the recently proposed NB-LDPC codes, polar codes, discovered by Ar{\i}kan, appear as a very promising candidate for future communication and storage systems. Polar codes are considered as a major breakthrough in recent coding theory society. Polar codes are proved to be capacity achieving codes over binary input symmetric memoryless channels. Besides, polar codes can be decoded by the successive cancelation (SC) algorithm with of complexity of O(Nlog2N)\mathcal{O}(N\log_2 N), where NN is the block length. The main sticking point of polar codes to date is that their error performance under short to moderate block lengths is inferior compared with LDPC codes or turbo codes. The list decoding technique can be used to improve the error performance of SC algorithms at the cost higher computational and memory complexities. Besides, the hardware implementation of current SC based decoders suffer from long decoding latency which is unsuitable for modern high speed communications.ECCs also find their applications in improving the reliability of network coding. Random linear network coding is an efficient technique for disseminating information in networks, but it is highly susceptible to errors. K\ {o}tter-Kschischang (KK) codes and Mahdavifar-Vardy (MV) codes are two important families of subspace codes that provide error control in noncoherent random linear network coding. List decoding has been used to decode MV codes beyond half distance. Existing hardware implementations of the rank metric decoder for KK codes suffer from limited throughput, long latency and high area complexity. The interpolation-based list decoding algorithm for MV codes still has high computational complexity, and its feasibility for hardware implementations has not been investigated.In this exam, we present efficient decoding algorithms and hardware decoder architectures for NB-LDPC codes, polar codes, KK and MV codes. For NB-LDPC codes, an efficient shuffled decoder architecture is presented to reduce the number of average iterations and improve the throughput. Besides, a fully parallel decoder architecture for NB-LDPC codes with short or moderate block lengths is also presented. Our fully parallel decoder architecture achieves much higher throughput and area efficiency compared with the state-of-art NB-LDPC decoders. For polar codes, a memory efficient list decoder architecture is first presented. Based on our reduced latency list decoding algorithm for polar codes, a high throughput list decoder architecture is also presented. At last, we present efficient decoder architectures for both KK and MV codes

    Power Characterization of a Digit-Online FPGA Implementation of a Low-Density Parity-Check Decoder for WiMAX Applications

    Get PDF
    Low-density parity-check (LDPC) codes are a class of easily decodable error-correcting codes. Published parallel LDPC decoders demonstrate high throughput and low energy-per-bit but require a lot of silicon area. Decoders based on digit-online arithmetic (processing several bits per fundamental operation) process messages in a digit-serial fashion, reducing the area requirements, and can process multiple frames in frame-interlaced fashion. Implementations on Field-Programmable Gate Array (FPGA) are usually power- and area-hungry, but provide flexibility compared with application-specific integrated circuit implementations. With the penetration of mobile devices in the electronics industry the power considerations have become increasingly important. The power consumption of a digit-online decoder depends on various factors, like input log-likelihood ratio (LLR) bit precision, signal-to-noise ratio (SNR) and maximum number of iterations. The design is implemented on an Altera Stratix IV GX EP4SGX230 FPGA, which comes on an Altera DE4 Development and Education Board. In this work, both parallel and digit-online block LDPC decoder implementations on FPGAs for WiMAX 576-bit, rate-3/4 codes are studied, and power measurements from the DE4 board are reported. Various components of the system include a random-data generator, WiMAX Encoder, shift-out register, additive white Gaussian noise (AWGN) generator, channel LLR buffer, WiMAX Decoder and bit-error rate (BER) Calculator. The random-data generator outputs pseudo-random bit patterns through an implemented linear-feedback shift register (LFSR). Digit-online decoders with input LLR precisions ranging from 6 to 13 bits and parallel decoders with input LLR precisions ranging from 3 to 6 bits are synthesized in a Stratix IV FPGA. The digit-online decoders can be clocked at higher frequency for higher LLR precisions. A digit-online decoder can be used to decode two frames simultaneously in frame-interlaced mode. For the 6-bit implementation of digit-online decoder in single-frame mode, the minimum throughput achieved is 740 Mb/s at low SNRs. For the case of 11-bit LLR digit-online decoder in frame-interlaced mode, the minimum throughput achieved is 1363 Mb/s. Detailed analysis such as effect of SNR and LLR precision on decoder power is presented. Also, the effect of changing LLR precision on max clock frequency and logic utilization on the parallel and the digit-online decoders is studied. Alongside, power per iteration for a 6-bit LLR input digit-online decoder is also reported

    Improve the Usability of Polar Codes: Code Construction, Performance Enhancement and Configurable Hardware

    Full text link
    Error-correcting codes (ECC) have been widely used for forward error correction (FEC) in modern communication systems to dramatically reduce the signal-to-noise ratio (SNR) needed to achieve a given bit error rate (BER). Newly invented polar codes have attracted much interest because of their capacity-achieving potential, efficient encoder and decoder implementation, and flexible architecture design space.This dissertation is aimed at improving the usability of polar codes by providing a practical code design method, new approaches to improve the performance of polar code, and a configurable hardware design that adapts to various specifications. State-of-the-art polar codes are used to achieve extremely low error rates. In this work, high-performance FPGA is used in prototyping polar decoders to catch rare-case errors for error-correcting performance verification and error analysis. To discover the polarization characteristics and error patterns of polar codes, an FPGA emulation platform for belief-propagation (BP) decoding is built by a semi-automated construction flow. The FPGA-based emulation achieves significant speedup in large-scale experiments involving trillions of data frames. The platform is a key enabler of this work. The frozen set selection of polar codes, known as bit selection, is critical to the error-correcting performance of polar codes. A simulation-based in-order bit selection method is developed to evaluate the error rate of each bit using Monte Carlo simulations. The frozen set is selected based on the bit reliability ranking. The resulting code construction exhibits up to 1 dB coding gain with respect to the conventional bit selection. To further improve the coding gain of BP decoder for low-error-rate applications, the decoding error mechanisms are studied and analyzed, and the errors are classified based on their distinct signatures. Error detection is enabled by low-cost CRC concatenation, and post-processing algorithms targeting at each type of the error is designed to mitigate the vast majority of the decoding errors. The post-processor incurs only a small implementation overhead, but it provides more than an order of magnitude improvement of the error-correcting performance. The regularity of the BP decoder structure offers many hardware architecture choices. Silicon area, power consumption, throughput and latency can be traded to reach the optimal design points for practical use cases. A comprehensive design space exploration reveals several practical architectures at different design points. The scalability of each architecture is also evaluated based on the implementation candidates. For dynamic communication channels, such as wireless channels in the upcoming 5G applications, multiple codes of different lengths and code rates are needed to t varying channel conditions. To minimize implementation cost, a universal decoder architecture is proposed to support multiple codes through hardware reuse. A 40nm length- and rate-configurable polar decoder ASIC is demonstrated to fit various communication environments and service requirements.PHDElectrical EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttps://deepblue.lib.umich.edu/bitstream/2027.42/140817/1/shuangsh_1.pd

    Polar coding for optical wireless communication

    Get PDF

    VLSI decoding architectures: flexibility, robustness and performance

    Get PDF
    Stemming from previous studies on flexible LDPC decoders, this thesis work has been mainly focused on the development of flexible turbo and LDPC decoder designs, and on the narrowing of the power, area and speed gap they might present with respect to dedicated solutions. Additional studies have been carried out within the field of increased code performance and of decoder resiliency to hardware errors. The first chapter regroups several main contributions in the design and implementation of flexible channel decoders. The first part concerns the design of a Network-on-Chip (NoC) serving as an interconnection network for a partially parallel LDPC decoder. A best-fit NoC architecture is designed and a complete multi-standard turbo/LDPC decoder is designed and implemented. Every time the code is changed, the decoder must be reconfigured. A number of variables influence the duration of the reconfiguration process, starting from the involved codes down to decoder design choices. These are taken in account in the flexible decoder designed, and novel traffic reduction and optimization methods are then implemented. In the second chapter a study on the early stopping of iterations for LDPC decoders is presented. The energy expenditure of any LDPC decoder is directly linked to the iterative nature of the decoding algorithm. We propose an innovative multi-standard early stopping criterion for LDPC decoders that observes the evolution of simple metrics and relies on on-the-fly threshold computation. Its effectiveness is evaluated against existing techniques both in terms of saved iterations and, after implementation, in terms of actual energy saving. The third chapter portrays a study on the resilience of LDPC decoders under the effect of memory errors. Given that the purpose of channel decoders is to correct errors, LDPC decoders are intrinsically characterized by a certain degree of resistance to hardware faults. This characteristic, together with the soft nature of the stored values, results in LDPC decoders being affected differently according to the meaning of the wrong bits: ad-hoc error protection techniques, like the Unequal Error Protection devised in this chapter, can consequently be applied to different bits according to their significance. In the fourth chapter the serial concatenation of LDPC and turbo codes is presented. The concatenated FEC targets very high error correction capabilities, joining the performance of turbo codes at low SNR with that of LDPC codes at high SNR, and outperforming both current deep-space FEC schemes and concatenation-based FECs. A unified decoder for the concatenated scheme is subsequently propose

    Polar-Coded OFDM with Index Modulation

    Get PDF
    Polar codes, as the first error-correcting codes with an explicit construction to provably achieve thesymmetric capacity of memoryless channels, which are constructed based on channel polarization, have recently become a primary contender in communication networks for achieving tighter requirements with relatively low complexity. As one of the contributions in this thesis, three modified polar decoding schemes are proposed. These schemes include enhanced versions of successive cancellation-flip (SC-F), belief propagation (BP), and sphere decoding (SD). The proposed SC-F utilizes novel potential incorrect bits selection criteria and stack to improve its error correction performance. Next, to make the decoding performance of BP better, permutation and feedback structure are utilized. Then, in order to reduce the complexity without compromising performance, a SD by using novel decoding strategies according to modified path metric (PM) and radius extension is proposed. Additionally, to solve the problem that BP has redundant iterations, a new stopping criterion based on bit different ratio (BDR) is proposed. According to the simulation results and mathematical proof, all proposed schemes can achieve corresponding performance improvement or complexity reduction compared with existing works. Beside applying polar coding, to achieve a reliable and flexible transmission in a wireless communication system, a modified version of orthogonal frequency division multiplexing (OFDM) modulation based on index modulation, called OFDM-in-phase/quadrature-IM (OFDM-I/Q-IM), is applied. This modulation scheme can simultaneously improve spectral efficiency and bit-error rate (BER) performance with great flexibility in design and implementation. Hence, OFDM-I/Q-IM is considered as a potential candidate in the new generation of cellular networks. As the main contribution in this work, a polar-coded OFDM-I/Q-IM system is proposed. The general design guidelines for overcoming the difficulties associated with the application of polar codes in OFDM-I/Q-IM are presented. In the proposed system, at the transmitter, we employ a random frozen bits appending scheme which not only makes the polar code compatible with OFDM-I/Q-IM but also improves the BER performance of the system. Furthermore, at the receiver, it is shown that the \textit{a posteriori} information for each index provided by the index detector is essential for the iterative decoding of polar codes by the BP algorithm. Simulation results show that the proposed polar-coded OFDM-I/Q-IM system outperforms its OFDM counterpart in terms of BER performance
    corecore