9 research outputs found

    VLSI implementation of a multi-mode turbo/LDPC decoder architecture

    Get PDF
    Flexible and reconfigurable architectures have gained wide popularity in the communications field. In particular, reconfigurable architectures for the physical layer are an attractive solution not only to switch among different coding modes but also to achieve interoperability. This work concentrates on the design of a reconfigurable architecture for both turbo and LDPC codes decoding. The novel contributions of this paper are: i) tackling the reconfiguration issue introducing a formal and systematic treatment that, to the best of our knowledge, was not previously addressed; ii) proposing a reconfigurable NoCbased turbo/LDPC decoder architecture and showing that wide flexibility can be achieved with a small complexity overhead. Obtained results show that dynamic switching between most of considered communication standards is possible without pausing the decoding activity. Moreover, post-layout results show that tailoring the proposed architecture to the WiMAX standard leads to an area occupation of 2.75 mm2 and a power consumption of 101.5 mW in the worst case

    Concatenated Turbo/LDPC codes for deep space communications: performance and implementation

    Get PDF
    Deep space communications require error correction codes able to reach extremely low bit-error-rates, possibly with a steep waterfall region and without error floor. Several schemes have been proposed in the literature to achieve these goals. Most of them rely on the concatenation of different codes that leads to high hardware implementation complexity and poor resource sharing. This work proposes a scheme based on the concatenation of non-custom LDPC and turbo codes that achieves excellent error correction performance. Moreover, since both LDPC and turbo codes can be decoded with the BCJR algorithm, our preliminary results show that an efficient hardware architecture with high resource reuse can be designe

    Exploiting generalized de-Bruijn/Kautz topologies for flexible iterative channel code decoder architectures

    Get PDF
    Modern iterative channel code decoder architectures have tight constrains on the throughput but require flexibility to support different modes and standards. Unfortunately, flexibility often comes at the expense of increasing the number of clock cycles required to complete the decoding of a data-frame, thus reducing the sustained throughput. The Network- on-Chip (NoC) paradigm is an interesting option to achieve flexibility, but several design choices, including the topology and the routing algorithm, can affect the decoder throughput. In this work logarithmic diameter topologies, in particular generalized de-Bruijn and Kautz topologies, are addressed as possible solutions to achieve both flexible and high throughput architectures for iterative channel code decoding. In particular, this work shows that the optimal shortest-path routing algorithm for these topologies, that is still available in the open literature, can be efficiently implemented resorting to a very simple circuit. Experimental results show that the proposed architecture features a reduction of about 14% and 10% for area and power consumption respectively, with respect to a previous shortest-path routing-table-based desig

    Exploiting generalized de-Bruijn/Kautz topologies for flexible iterative channel code decoder architectures

    Get PDF
    Modern iterative channel code decoder architectures have tight constrains on the throughput but require flexibility to support different modes and standards. Unfortunately, flexibility often comes at the expense of increasing the number of clock cycles required to complete the decoding of a data-frame, thus reducing the sustained throughput. The Network- on-Chip (NoC) paradigm is an interesting option to achieve flexibility, but several design choices, including the topology and the routing algorithm, can affect the decoder throughput. In this work logarithmic diameter topologies, in particular generalized de-Bruijn and Kautz topologies, are addressed as possible solutions to achieve both flexible and high throughput architectures for iterative channel code decoding. In particular, this work shows that the optimal shortest-path routing algorithm for these topologies, that is still available in the open literature, can be efficiently implemented resorting to a very simple circuit. Experimental results show that the proposed architecture features a reduction of about 14% and 10% for area and power consumption respectively, with respect to a previous shortest-path routing-table-based design

    Flexible and Low-Complexity Encoding and Decoding of Systematic Polar Codes

    Full text link
    In this work, we present hardware and software implementations of flexible polar systematic encoders and decoders. The proposed implementations operate on polar codes of any length less than a maximum and of any rate. We describe the low-complexity, highly parallel, and flexible systematic-encoding algorithm that we use and prove its correctness. Our hardware implementation results show that the overhead of adding code rate and length flexibility is little, and the impact on operation latency minor compared to code-specific versions. Finally, the flexible software encoder and decoder implementations are also shown to be able to maintain high throughput and low latency.Comment: Submitted to IEEE Transactions on Communications, 201

    Unified turbo/LDPC code decoder architecture for deep-space communications

    Get PDF
    Deep-space communications are characterized by extremely critical conditions; current standards foresee the usage of both turbo and low-density-parity-check (LDPC) codes to ensure recovery from received errors, but each of them displays consistent drawbacks. Code concatenation is widely used in all kinds of communication to boost the error correction capabilities of single codes; serial concatenation of turbo and LDPC codes has been recently proven effective enough for deep space communications, being able to overcome the shortcomings of both code types. This work extends the performance analysis of this scheme and proposes a novel hardware decoder architecture for concatenated turbo and LDPC codes based on the same decoding algorithm. This choice leads to a high degree of datapath and memory sharing; postlayout implementation results obtained with complementary metal-oxide semiconductor (CMOS) 90 nm technology show small area occupation (0.98 mm 2 ) and very low power consumption (2.1 mW)

    VLSI architectures design for encoders of High Efficiency Video Coding (HEVC) standard

    Get PDF
    The growing popularity of high resolution video and the continuously increasing demands for high quality video on mobile devices are producing stronger needs for more efficient video encoder. Concerning these desires, HEVC, a newest video coding standard, has been developed by a joint team formed by ISO/IEO MPEG and ITU/T VCEG. Its design goal is to achieve a 50% compression gain over its predecessor H.264 with an equal or even higher perceptual video quality. Motion Estimation (ME) being as one of the most critical module in video coding contributes almost 50%-70% of computational complexity in the video encoder. This high consumption of the computational resources puts a limit on the performance of encoders, especially for full HD or ultra HD videos, in terms of coding speed, bit-rate and video quality. Thus the major part of this work concentrates on the computational complexity reduction and improvement of timing performance of motion estimation algorithms for HEVC standard. First, a new strategy to calculate the SAD (Sum of Absolute Difference) for motion estimation is designed based on the statistics on property of pixel data of video sequences. This statistics demonstrates the size relationship between the sum of two sets of pixels has a determined connection with the distribution of the size relationship between individual pixels from the two sets. Taking the advantage of this observation, only a small proportion of pixels is necessary to be involved in the SAD calculation. Simulations show that the amount of computations required in the full search algorithm is reduced by about 58% on average and up to 70% in the best case. Secondly, from the scope of parallelization an enhanced TZ search for HEVC is proposed using novel schemes of multiple MVPs (motion vector predictor) and shared MVP. Specifically, resorting to multiple MVPs the initial search process is performed in parallel at multiple search centers, and the ME processing engine for PUs within one CU are parallelized based on the MVP sharing scheme on CU (coding unit) level. Moreover, the SAD module for ME engine is also parallelly implemented for PU size of 32×32. Experiments indicate it achieves an appreciable improvement on the throughput and coding efficiency of the HEVC video encoder. In addition, the other part of this thesis is contributed to the VLSI architecture design for finding the first W maximum/minimum values targeting towards high speed and low hardware cost. The architecture based on the novel bit-wise AND scheme has only half of the area of the best reference solution and its critical path delay is comparable with other implementations. While the FPCG (full parallel comparison grid) architecture, which utilizes the optimized comparator-based structure, achieves 3.6 times faster on average on the speed and even 5.2 times faster at best comparing with the reference architectures. Finally the architecture using the partial sorting strategy reaches a good balance on the timing performance and area, which has a slightly lower or comparable speed with FPCG architecture and a acceptable hardware cost

    VLSI decoding architectures: flexibility, robustness and performance

    Get PDF
    Stemming from previous studies on flexible LDPC decoders, this thesis work has been mainly focused on the development of flexible turbo and LDPC decoder designs, and on the narrowing of the power, area and speed gap they might present with respect to dedicated solutions. Additional studies have been carried out within the field of increased code performance and of decoder resiliency to hardware errors. The first chapter regroups several main contributions in the design and implementation of flexible channel decoders. The first part concerns the design of a Network-on-Chip (NoC) serving as an interconnection network for a partially parallel LDPC decoder. A best-fit NoC architecture is designed and a complete multi-standard turbo/LDPC decoder is designed and implemented. Every time the code is changed, the decoder must be reconfigured. A number of variables influence the duration of the reconfiguration process, starting from the involved codes down to decoder design choices. These are taken in account in the flexible decoder designed, and novel traffic reduction and optimization methods are then implemented. In the second chapter a study on the early stopping of iterations for LDPC decoders is presented. The energy expenditure of any LDPC decoder is directly linked to the iterative nature of the decoding algorithm. We propose an innovative multi-standard early stopping criterion for LDPC decoders that observes the evolution of simple metrics and relies on on-the-fly threshold computation. Its effectiveness is evaluated against existing techniques both in terms of saved iterations and, after implementation, in terms of actual energy saving. The third chapter portrays a study on the resilience of LDPC decoders under the effect of memory errors. Given that the purpose of channel decoders is to correct errors, LDPC decoders are intrinsically characterized by a certain degree of resistance to hardware faults. This characteristic, together with the soft nature of the stored values, results in LDPC decoders being affected differently according to the meaning of the wrong bits: ad-hoc error protection techniques, like the Unequal Error Protection devised in this chapter, can consequently be applied to different bits according to their significance. In the fourth chapter the serial concatenation of LDPC and turbo codes is presented. The concatenated FEC targets very high error correction capabilities, joining the performance of turbo codes at low SNR with that of LDPC codes at high SNR, and outperforming both current deep-space FEC schemes and concatenation-based FECs. A unified decoder for the concatenated scheme is subsequently propose

    Energy Efficient VLSI Circuits for MIMO-WLAN

    Get PDF
    Mobile communication - anytime, anywhere access to data and communication services - has been continuously increasing since the operation of the first wireless communication link by Guglielmo Marconi. The demand for higher data rates, despite the limited bandwidth, led to the development of multiple-input multiple-output (MIMO) communication which is often combined with orthogonal frequency division multiplexing (OFDM). Together, these two techniques achieve a high bandwidth efficiency. Unfortunately, techniques such as MIMO-OFDM significantly increase the signal processing complexity of transceivers. While fast improvements in the integrated circuit (IC) technology enabled to implement more signal processing complexity per chip, large efforts had and have to be done for novel algorithms as well as for efficient very large scaled integration (VLSI) architectures in order to meet today's and tomorrow's requirements for mobile wireless communication systems. In this thesis, we will present architectures and VLSI implementations of complete physical (PHY) layer application specific integrated circuits (ASICs) under the constraints imposed by an industrial wireless communication standard. Contrary to many other publications, we do not elaborate individual components of a MIMO-OFDM communication system stand-alone, but in the context of the complete PHY layer ASIC. We will investigate the performance of several MIMO detectors and the corresponding preprocessing circuits, being integrated into the entire PHY layer ASIC, in terms of achievable error-rate, power consumption, and area requirement. Finally, we will assemble the results from the proposed PHY layer implementations in order to enhance the energy efficiency of a transceiver. To this end, we propose a cross-layer optimization of PHY layer and medium access control (MAC) layer
    corecore