672 research outputs found

    Concatenated Turbo/LDPC codes for deep space communications: performance and implementation

    Get PDF
    Deep space communications require error correction codes able to reach extremely low bit-error-rates, possibly with a steep waterfall region and without error floor. Several schemes have been proposed in the literature to achieve these goals. Most of them rely on the concatenation of different codes that leads to high hardware implementation complexity and poor resource sharing. This work proposes a scheme based on the concatenation of non-custom LDPC and turbo codes that achieves excellent error correction performance. Moreover, since both LDPC and turbo codes can be decoded with the BCJR algorithm, our preliminary results show that an efficient hardware architecture with high resource reuse can be designe

    System-on-chip Computing and Interconnection Architectures for Telecommunications and Signal Processing

    Get PDF
    This dissertation proposes novel architectures and design techniques targeting SoC building blocks for telecommunications and signal processing applications. Hardware implementation of Low-Density Parity-Check decoders is approached at both the algorithmic and the architecture level. Low-Density Parity-Check codes are a promising coding scheme for future communication standards due to their outstanding error correction performance. This work proposes a methodology for analyzing effects of finite precision arithmetic on error correction performance and hardware complexity. The methodology is throughout employed for co-designing the decoder. First, a low-complexity check node based on the P-output decoding principle is designed and characterized on a CMOS standard-cells library. Results demonstrate implementation loss below 0.2 dB down to BER of 10^{-8} and a saving in complexity up to 59% with respect to other works in recent literature. High-throughput and low-latency issues are addressed with modified single-phase decoding schedules. A new "memory-aware" schedule is proposed requiring down to 20% of memory with respect to the traditional two-phase flooding decoding. Additionally, throughput is doubled and logic complexity reduced of 12%. These advantages are traded-off with error correction performance, thus making the solution attractive only for long codes, as those adopted in the DVB-S2 standard. The "layered decoding" principle is extended to those codes not specifically conceived for this technique. Proposed architectures exhibit complexity savings in the order of 40% for both area and power consumption figures, while implementation loss is smaller than 0.05 dB. Most modern communication standards employ Orthogonal Frequency Division Multiplexing as part of their physical layer. The core of OFDM is the Fast Fourier Transform and its inverse in charge of symbols (de)modulation. Requirements on throughput and energy efficiency call for FFT hardware implementation, while ubiquity of FFT suggests the design of parametric, re-configurable and re-usable IP hardware macrocells. In this context, this thesis describes an FFT/IFFT core compiler particularly suited for implementation of OFDM communication systems. The tool employs an accuracy-driven configuration engine which automatically profiles the internal arithmetic and generates a core with minimum operands bit-width and thus minimum circuit complexity. The engine performs a closed-loop optimization over three different internal arithmetic models (fixed-point, block floating-point and convergent block floating-point) using the numerical accuracy budget given by the user as a reference point. The flexibility and re-usability of the proposed macrocell are illustrated through several case studies which encompass all current state-of-the-art OFDM communications standards (WLAN, WMAN, xDSL, DVB-T/H, DAB and UWB). Implementations results are presented for two deep sub-micron standard-cells libraries (65 and 90 nm) and commercially available FPGA devices. Compared with other FFT core compilers, the proposed environment produces macrocells with lower circuit complexity and same system level performance (throughput, transform size and numerical accuracy). The final part of this dissertation focuses on the Network-on-Chip design paradigm whose goal is building scalable communication infrastructures connecting hundreds of core. A low-complexity link architecture for mesochronous on-chip communication is discussed. The link enables skew constraint looseness in the clock tree synthesis, frequency speed-up, power consumption reduction and faster back-end turnarounds. The proposed architecture reaches a maximum clock frequency of 1 GHz on 65 nm low-leakage CMOS standard-cells library. In a complex test case with a full-blown NoC infrastructure, the link overhead is only 3% of chip area and 0.5% of leakage power consumption. Finally, a new methodology, named metacoding, is proposed. Metacoding generates correct-by-construction technology independent RTL codebases for NoC building blocks. The RTL coding phase is abstracted and modeled with an Object Oriented framework, integrated within a commercial tool for IP packaging (Synopsys CoreTools suite). Compared with traditional coding styles based on pre-processor directives, metacoding produces 65% smaller codebases and reduces the configurations to verify up to three orders of magnitude

    1.5 Gbit/s FPGA implementation of a fully-parallel turbo decoder designed for mission-critical machine-type communication applications

    No full text
    In wireless communication schemes, turbo codes facilitate near-capacity transmission throughputs by achieving reliable forward error correction. However, owing to the serial data dependencies imposed by the underlying Logarithmic Bahl-Cocke-Jelinek-Raviv (Log- BCJR) algorithm, the limited processing throughputs of conventional turbo decoder implementations impose a severe bottleneck upon the overall throughputs of realtime wireless communication schemes. Motivated by this, we recently proposed a Fully Parallel Turbo Decoder (FPTD) algorithm, which eliminates these serial data dependencies, allowing parallel processing and hence offering a significantly higher processing throughput. In this paper, we propose a novel resource-efficient version of the FPTD algorithm, which reduces its computational resource requirement by 50%, which enhancing its suitability for Field-Programmable Gate Array (FPGA) implementations. We propose a model FPGA implementation. When using a Stratix IV FPGA, the proposed FPTD FPGA implementation achieves an average throughput of 1.53 Gbit/s and an average latency of 0.56 s, when decoding frames comprising N=720 bits. These are respectively 13.2 times and 11.1 times superior to those of the state-of-the- art FPGA implementation of the Log-BCJR Long- Term Evolution (LTE) turbo decoder, when decoding frames of the same frame length at the same error correction capability. Furthermore, our proposed FPTD FPGA implementation achieves a normalized resource usage of 0.42 kALUTs Mbit/s , which is 5.2 times superior to that of the benchmarker decoder. Furthermore, when decoding the shortest N=40-bit LTE frames, the proposed FPTD FPGA implementation achieves an average throughput of 442 Mbit/s and an average latency of 0.18 s, which are respectively 21.1 times and 10.6 times superior to those of the benchmarker decoder. In this case, the normalized resource usage of 0.08 kALUTs Mbit/s is 146.4 times superior to that of the benchmarker decoder

    Exploring HLS Coding Techniques to Achieve Desired Turbo Decoder Architectures

    Get PDF
    Software defined radio (SDR) platforms implement many digital signal processing algorithms. These can be accelerated on an FPGA to meet performance requirements. Due to the flexibility of SDR\u27s and continually evolving communications protocols, high level synthesis (HLS) is a promising alternative to standard handcrafted design flows. A crucial component in any SDR is the error correction codes (ECC). Turbo codes are a common ECC that are implemented on an FPGA due to their computational complexity. The goal of this thesis is to explore the HLS coding techniques required to produce a design that targets the desired hardware architecture and can reach handcrafted levels of performance. This work implemented three existing turbo decoder architectures with HLS to produce quality hardware which reaches handcrafted performance. Each targeted design was analyzed to determine its functionality and algorithm so a C implementation could be developed. Then the C code was modified and HLS directives were added to refine the design through the HLS tools. The process of code modification and processing through the HLS tools continued until the desired architecture and performance were reached. Each design was implemented and the bottlenecks were identified and dealt with through appropriate usage of directives and C style. The use of pipelining to bypass bottlenecks added a small overhead from the ramp-up and ramp-down of the pipeline, reducing the performance by at most 1.24%. The impact of the clock constraint set within the HLS tools was also explored. It was found that the clock period and resource usage estimate generated by the HLS tools is not accurate and all evaluations should occur after hardware synthesis

    Efficient Decoder for Optical Transport Networks Achieving Near Capacity Performance

    Get PDF
    Today’s optical transport networks (OTNs) support a plethora of services such as video streaming, cloud computing, social networking and many more. To make such a wide assortment of services possible, a tremendous amount of data needs to be carried over the internet backbone supported by these optical transport networks. In order to cope with this increase in traffic, data rate on OTNs has increased significantly. Product codes (PC) are a class of codes that provide good coding gain at reasonable decoding complexity and, hence, have been a popular choice for OTNs in recent times. The key goal of this thesis is to implement a decoder for a Product Code (PC) on a Virtex7 Field Programmable Gate Array(FPGA). The product code of choice for this project is based on a (1023,993) BCH code as a component code. The conventional decoder for BCH codes has a computationally expensive step for finding the roots of error locator polynomial. The BCH decoder implemented as a part of this project is optimized to speed up the decoding process while at the same time also simplifying the hardware complexity of the design. The implementation is parallelized and pipelined to achieve high throughputs. This provides a hardware platform to evaluate the performance of product codes at low bit error rates that is infeasible using software simulations

    Domain specific high performance reconfigurable architecture for a communication platform

    Get PDF

    Towards a reconfigurable hardware architecture for implementing a LDPC module suitable for software radio systems

    Get PDF
    Forward Error Correction is a key piece in modern digital communications. When a signal is transmitted over a noisy channel, multiple errors are generated. FEC techniques are directed towards the recovery of such errors. In last years, LDPC (Low Density Parity Check) codes have attracted attention of researchers because of their excellent error correction capabilities, but for actual radios high performance is not enough since they require to communicate with other multiple radios too. In general, communication between multiple radios requires the use of different standards. In this sense, Software Defined Radio (SDR) approach allows building multi standard radios based on reconfigurability abilities which means that base components including recovery errors block must provide reconfigurable options. In this paper, some open problems in designing and implementing reconfigurable LDPC components are presented and discussed. Some features of works in the state of the art are commented and possible research lines proposed

    Implementation Effort and Parallelism - Metrics for Guiding Hardware/Software Partitioning in Embedded System Design

    Get PDF
    corecore