382 research outputs found
System-on-chip Computing and Interconnection Architectures for Telecommunications and Signal Processing
This dissertation proposes novel architectures and design techniques targeting SoC building blocks for telecommunications and signal processing applications.
Hardware implementation of Low-Density Parity-Check decoders is approached at both the algorithmic and the architecture level. Low-Density Parity-Check codes are a promising coding scheme for future communication standards due to their outstanding error correction performance.
This work proposes a methodology for analyzing effects of finite precision arithmetic on error correction performance and hardware complexity. The methodology is throughout employed for co-designing the decoder. First, a low-complexity check node based on the P-output decoding principle is designed and characterized on a CMOS standard-cells library. Results demonstrate implementation loss below 0.2 dB down to BER of 10^{-8} and a saving in complexity up to 59% with respect to other works in recent literature. High-throughput and low-latency issues are addressed with modified single-phase decoding schedules. A new "memory-aware" schedule is proposed requiring down to 20% of memory with respect to the traditional two-phase flooding decoding. Additionally, throughput is doubled and logic complexity reduced of 12%. These advantages are traded-off with error correction performance, thus making the solution attractive only for long codes, as those adopted in the DVB-S2 standard. The "layered decoding" principle is extended to those codes not specifically conceived for this technique. Proposed architectures exhibit complexity savings in the order of 40% for both area and power consumption figures, while implementation loss is smaller than 0.05 dB.
Most modern communication standards employ Orthogonal Frequency Division Multiplexing as part of their physical layer. The core of OFDM is the Fast Fourier Transform and its inverse in charge of symbols (de)modulation. Requirements on throughput and energy efficiency call for FFT hardware implementation, while ubiquity of FFT suggests the design of parametric, re-configurable and re-usable IP hardware macrocells. In this context, this thesis describes an FFT/IFFT core compiler particularly suited for implementation of OFDM communication systems. The tool employs an accuracy-driven configuration engine which automatically profiles the internal arithmetic and generates a core with minimum operands bit-width and thus minimum circuit complexity. The engine performs a closed-loop optimization over three different internal arithmetic models (fixed-point, block floating-point and convergent block floating-point) using the numerical accuracy budget given by the user as a reference point. The flexibility and re-usability of the proposed macrocell are illustrated through several case studies which encompass all current state-of-the-art OFDM communications standards (WLAN, WMAN, xDSL, DVB-T/H, DAB and UWB). Implementations results are presented for two deep sub-micron standard-cells libraries (65 and 90 nm) and commercially available FPGA devices. Compared with other FFT core compilers, the proposed environment produces macrocells with lower circuit complexity and same system level performance (throughput, transform size and numerical accuracy).
The final part of this dissertation focuses on the Network-on-Chip design paradigm whose goal is building scalable communication infrastructures connecting hundreds of core. A low-complexity link architecture for mesochronous on-chip communication is discussed. The link enables skew constraint looseness in the clock tree synthesis, frequency speed-up, power consumption reduction and faster back-end turnarounds. The proposed architecture reaches a maximum clock frequency of 1 GHz on 65 nm low-leakage CMOS standard-cells library. In a complex test case with a full-blown NoC infrastructure, the link overhead is only 3% of chip area and 0.5% of leakage power consumption.
Finally, a new methodology, named metacoding, is proposed. Metacoding generates correct-by-construction technology independent RTL codebases for NoC building blocks. The RTL coding phase is abstracted and modeled with an Object Oriented framework, integrated within a commercial tool for IP packaging (Synopsys CoreTools suite). Compared with traditional coding styles based on pre-processor directives, metacoding produces 65% smaller codebases and reduces the configurations to verify up to three orders of magnitude
A Simplified Min-Sum Decoding Algorithm for Non-Binary LDPC Codes
Non-binary low-density parity-check codes are robust to various channel
impairments. However, based on the existing decoding algorithms, the decoder
implementations are expensive because of their excessive computational
complexity and memory usage. Based on the combinatorial optimization, we
present an approximation method for the check node processing. The simulation
results demonstrate that our scheme has small performance loss over the
additive white Gaussian noise channel and independent Rayleigh fading channel.
Furthermore, the proposed reduced-complexity realization provides significant
savings on hardware, so it yields a good performance-complexity tradeoff and
can be efficiently implemented.Comment: Partially presented in ICNC 2012, International Conference on
Computing, Networking and Communications. Accepted by IEEE Transactions on
Communication
Configurable LDPC Decoder Architecture for Regular and Irregular Codes
Low Density Parity Check (LDPC) codes are one of the best error correcting codes that enable the future generations of wireless devices to achieve higher
data rates with excellent quality of service. This paper presents two novel flexible decoder architectures. The first one supports (3, 6) regular codes of rate 1/2 that can be used for different block lengths. The second decoder is more general and supports both regular and irregular LDPC codes with twelve combinations of code lengths −648, 1296, 1944-bits and code rates-1/2, 2/3, 3/4, 5/6- based on the IEEE 802.11n standard. All codes correspond to a block-structured parity check matrix, in which the sub-blocks are either a shifted identity matrix or a zero matrix. Prototype architectures for both LDPC decoders have been implemented and tested on a Xilinx field programmable gate array.NokiaNational Science Foundatio
Improve the Usability of Polar Codes: Code Construction, Performance Enhancement and Configurable Hardware
Error-correcting codes (ECC) have been widely used for forward error correction (FEC) in modern communication systems to dramatically reduce the signal-to-noise ratio (SNR) needed to achieve a given bit error rate (BER). Newly invented polar codes have attracted much interest because of their capacity-achieving potential, efficient encoder and decoder implementation, and flexible architecture design space.This dissertation is aimed at improving the usability of polar codes by providing a practical code design method, new approaches to improve the performance of polar code, and a configurable hardware design that adapts to various specifications.
State-of-the-art polar codes are used to achieve extremely low error rates. In this work, high-performance FPGA is used in prototyping polar decoders to catch rare-case errors for error-correcting performance verification and error analysis. To discover the polarization characteristics and error patterns of polar codes, an FPGA emulation platform for belief-propagation (BP) decoding is built by a semi-automated construction flow. The FPGA-based emulation achieves significant speedup in large-scale experiments involving trillions of data frames. The platform is a key enabler of this work.
The frozen set selection of polar codes, known as bit selection, is critical to the error-correcting performance of polar codes. A simulation-based in-order bit selection method is developed to evaluate the error rate of each bit using Monte Carlo simulations. The frozen set is selected based on the bit reliability ranking. The resulting code construction exhibits up to 1 dB coding gain with respect to the conventional bit selection.
To further improve the coding gain of BP decoder for low-error-rate applications, the decoding error mechanisms are studied and analyzed, and the errors are classified based on their distinct signatures. Error detection is enabled by low-cost CRC concatenation, and post-processing algorithms targeting at each type of the error is designed to mitigate the vast majority of the decoding errors. The post-processor incurs only a small implementation overhead, but it provides more than an order of magnitude improvement of the error-correcting performance.
The regularity of the BP decoder structure offers many hardware architecture choices. Silicon area, power consumption, throughput and latency can be traded to reach the optimal design points for practical use cases. A comprehensive design space exploration reveals several practical architectures at different design points. The scalability of each architecture is also evaluated based on the implementation candidates.
For dynamic communication channels, such as wireless channels in the upcoming 5G applications, multiple codes of different lengths and code rates are needed to t varying channel conditions. To minimize implementation cost, a universal decoder architecture is proposed to support multiple codes through hardware reuse. A 40nm length- and rate-configurable polar decoder ASIC is demonstrated to fit various
communication environments and service requirements.PHDElectrical EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttps://deepblue.lib.umich.edu/bitstream/2027.42/140817/1/shuangsh_1.pd
Evaluation of flexible SPA based LPDC decoder using hardware friendly approximation methods
Due to computation-intensive nature of LDPC decoders, a lot of research is going towards efficient implementation of their original algorithm (SPA). As "Min-Sum" approximation is basically an overestimation of SPA, this thesis investigates more accurate, yet area efficient, approximations of SPA, to select an optimum one. In a general comparison between main approximation methods (e.g. LUT, PWL, CRI), PWL showed the most area-efficiency. Studying different mathematical formats of SPA, Soft-XOR based format with forward-backward scheme was chosen for hard- ware implementation. Its core function (Soft-XOR) was implemented with CRI approximation, which achieved the highest efficiency, compare to other approxi- mations. Using this core function, a flexible, pipe-lined, Soft-XOR based CNU (the computational unit of LDPC decoders) with forward-backward architecture was developed in 18nm CMOS. The implemented CNU’s area and speed can eas- ily be changed in instantiation. A SPA decoder based on the developed CNU was estimated to have an area of 1.6M as equivalent gate count and a throughput of 10Gb/s, with a frequency of 1.25GHz and for 10 iterations. The decoder uses IEEE 802.11n Wi-Fi standard with flooding schedule. The BER/SNR loss, com- pare to floating-point SPA, is 0.3dB for 10 iterations and less than 0.1dB for 20 iterations.You have to get lost before you can be found, a quote by Jeff Rasley goes very well for Low Density Parity Check (LDPC) codes. First invented by Gallager in 1962 but kind of lost during the journey of evolution of telecommunication networks because of their high complexity and demanding computations, which technology was not so advanced to handle, at that time. However, during late 1990s, success of turbo codes invoked the re-discovery of Low Density Parity Check (LDPC) codes. Recently it has attracted tremendous research interest among the scientific com- munity, as today’s technology is advanced enough and to make LDPC decoders completely commercial. In a wireless network, the information is not just sim- ply sent, but first encoded. In a sense, all the transmitted bits are tied together, according to some mathematical rules. Therefore, if noise destructs parts of the information while traveling, the LDPC decoder at the receiver side, can automat- ically detect and retrieve those parts, based on the other parts. Here, our main focus is on the decoder. For actual hardware implementation of the decoder, some level of approximation of the ideal algorithm is always necessary, which reduces the accuracy depending on the approximation. Ericsson is developing the next-generation wireless network for 5G, and already possesses the "Min-Sum" approximation of the LDPC decoder. As the current requirements demand more accurate decoders, the goal of this thesis is to evalu- ate a more accurate but more costly version of the LDPC decoder, as well as its flexibility. Thus, several candidates were selected and evaluated based on their complexity, cost, and their accuracy towards error correction. After performing several trade-offs, an approximation method is chosen and the corresponding cost is derived. With this acquired data, a trade-off between accuracy and cost can be made, depending on the application
Design of High Throughput Reconfigurable LDPC CODEC
Channel coding is an essential part of communication systems, which significantly reduces the error rate of receiving messages. Nowadays, iterative decoding methods play an important role in wireless communication such as 5G, Wi-Fi etc. Low-Density Parity-Check (LDPC) codes are one of the most used iterative decoding codes, which attract lots of interest in a wide range of applications. LDPC codes have a channel approaching capacity, which is practical for implementation as well. The thesis focuses on the design of high throughput reconfigurable LDPC channel codec with good performance.
The main focus of this thesis is the design of a novel decoding algorithm for LDPC codes. The new decoding algorithm is configurable to adjust its performance and complexity, which is very flexible for applications. Its error correction capability is close to the sum-product algorithm but with significantly lower complexity. We further implement the LDPC encoder/decoder on FPGA, which is reconfigurable for 5G NR or user-defined LDPC codes. In particular, we apply the new decoding algorithm to the decoder and analyse its performance on hardware.
Moreover, we compared the error detection performance of 5G NR CRC and LDPC Syndrome to investigate the necessity of using CRC decoding or LDPC syndrome check, or both in practical systems. At last, a 5G NR physical layer simulating SoC embedded system is built on FPGA for the verification of the encoder and decoder
Comparison of Polar Decoders with Existing Low-Density Parity-Check and Turbo Decoders
Polar codes are a recently proposed family of provably capacity-achieving
error-correction codes that received a lot of attention. While their
theoretical properties render them interesting, their practicality compared to
other types of codes has not been thoroughly studied. Towards this end, in this
paper, we perform a comparison of polar decoders against LDPC and Turbo
decoders that are used in existing communications standards. More specifically,
we compare both the error-correction performance and the hardware efficiency of
the corresponding hardware implementations. This comparison enables us to
identify applications where polar codes are superior to existing
error-correction coding solutions as well as to determine the most promising
research direction in terms of the hardware implementation of polar decoders.Comment: Fixes small mistakes from the paper to appear in the proceedings of
IEEE WCNC 2017. Results were presented in the "Polar Coding in Wireless
Communications: Theory and Implementation" Worksho
A Flexible LDPC/Turbo Decoder Architecture
Low-density parity-check (LDPC) codes and convolutional Turbo codes are two of the most powerful error correcting codes that are widely used in modern
communication systems. In a multi-mode baseband receiver, both LDPC and Turbo decoders may be required. However, the different decoding approaches
for LDPC and Turbo codes usually lead to different hardware architectures. In this paper we propose a unified message passing algorithm for LDPC and Turbo
codes and introduce a flexible soft-input soft-output (SISO) module to handle LDPC/Turbo decoding. We employ the trellis-based maximum a posteriori (MAP)
algorithm as a bridge between LDPC and Turbo codes decoding. We view the LDPC code as a concatenation of n super-codes where each super-code has a simpler
trellis structure so that the MAP algorithm can be easily applied to it. We propose a flexible functional unit (FFU) for MAP processing of LDPC and Turbo
codes with a low hardware overhead (about 15% area and timing overhead). Based on the FFU, we propose an area-efficient flexible SISO decoder architecture to
support LDPC/Turbo codes decoding. Multiple such SISO modules can be embedded into a parallel decoder for higher decoding throughput. As a case study, a
flexible LDPC/Turbo decoder has been synthesized on a TSMC 90 nm CMOS technology with a core area of 3.2 mm2. The decoder can support IEEE 802.16e LDPC codes, IEEE 802.11n LDPC codes, and 3GPP LTE Turbo codes. Running at 500 MHz clock frequency, the decoder can sustain up to 600 Mbps LDPC decoding or
450 Mbps Turbo decoding.NokiaNokia Siemens Networks (NSN)XilinxTexas InstrumentsNational Science Foundatio
A High-Performance and Low-Complexity 5G LDPC Decoder: Algorithm and Implementation
5G New Radio (NR) has stringent demands on both performance and complexity
for the design of low-density parity-check (LDPC) decoding algorithms and
corresponding VLSI implementations. Furthermore, decoders must fully support
the wide range of all 5G NR blocklengths and code rates, which is a significant
challenge. In this paper, we present a high-performance and low-complexity LDPC
decoder, tailor-made to fulfill the 5G requirements. First, to close the gap
between belief propagation (BP) decoding and its approximations in hardware, we
propose an extension of adjusted min-sum decoding, called generalized adjusted
min-sum (GA-MS) decoding. This decoding algorithm flexibly truncates the
incoming messages at the check node level and carefully approximates the
non-linear functions of BP decoding to balance the error-rate and hardware
complexity. Numerical results demonstrate that the proposed fixed-point GAMS
has only a minor gap of 0.1 dB compared to floating-point BP under various
scenarios of 5G standard specifications. Secondly, we present a fully
reconfigurable 5G NR LDPC decoder implementation based on GA-MS decoding. Given
that memory occupies a substantial portion of the decoder area, we adopt
multiple data compression and approximation techniques to reduce 42.2% of the
memory overhead. The corresponding 28nm FD-SOI ASIC decoder has a core area of
1.823 mm2 and operates at 895 MHz. It is compatible with all 5G NR LDPC codes
and achieves a peak throughput of 24.42 Gbps and a maximum area efficiency of
13.40 Gbps/mm2 at 4 decoding iterations.Comment: 14 pages, 14 figure
- …