4 research outputs found

    Efficient VLSI Implementation of Soft-input Soft-output Fixed-complexity Sphere Decoder

    Get PDF
    Fixed-complexity sphere decoder (FSD) is one of the most promising techniques for the implementation of multipleinput multiple-output (MIMO) detection, with relevant advantages in terms of constant throughput and high flexibility of parallel architecture. The reported works on FSD are mainly based on software level simulations and a few details have been provided on hardware implementation. The authors present the study based on a four-nodes-per-cycle parallel FSD architecture with several examples of VLSI implementation in 4 × 4 systems with both 16-quadrature amplitude modulation (QAM) and 64-QAM modulation and both real and complex signal models. The implementation aspects and details of the architecture are analysed in order to provide a variety of performance-complexity trade-offs. The authors also provide a parallel implementation of loglikelihood- ratio (LLR) generator with optimised algorithm to enhance the proposed FSD architecture to be a soft-input softoutput (SISO) MIMO detector. To the authors best knowledge, this is the first complete VLSI implementation of an FSD based SISO MIMO detector. The implementation results show that the proposed SISO FSD architecture is highly efficient and flexible, making it very suitable for real application

    A Parallel Radix-Sort-Based VLSI Architecture for Finding the First W Maximum/Minimum Values

    Get PDF
    Very-large-scale integration (VLSI) architectures for finding the first W (W>2) maximum (or minimum) values are required in the implementation of several applications such as nonbinary low-density-parity-check decoders, K-best multiple-input–multiple-output (MIMO) detectors, and turbo product codes. In this brief, a parallel radix-sort-based VLSI architecture for finding the first W maximum (or minimum) values is proposed. The described architecture, called Bit-Wise-And (BWA) architecture, relies on analyzing input data from the most significant bit to the least significant one, with very simple logic circuits. One key feature in the BWA architecture is its high level of scalability, which enables the adoption of this solution in a large spectrum of applications, corresponding to large ranges for both W and the size of the input data set. Experimental results, achieved by implementing the proposed architecture on a high-speed 90-nm CMOS standard-cell technology, show that BWA architecture requires significantly less area than other solutions available in the literature, i.e., less than or about 50% in all the considered cases and about 50% in the worst case. Moreover, the BWA architecture exhibits the lowest area–delay product among almost all considered cases

    VLSI architectures design for encoders of High Efficiency Video Coding (HEVC) standard

    Get PDF
    The growing popularity of high resolution video and the continuously increasing demands for high quality video on mobile devices are producing stronger needs for more efficient video encoder. Concerning these desires, HEVC, a newest video coding standard, has been developed by a joint team formed by ISO/IEO MPEG and ITU/T VCEG. Its design goal is to achieve a 50% compression gain over its predecessor H.264 with an equal or even higher perceptual video quality. Motion Estimation (ME) being as one of the most critical module in video coding contributes almost 50%-70% of computational complexity in the video encoder. This high consumption of the computational resources puts a limit on the performance of encoders, especially for full HD or ultra HD videos, in terms of coding speed, bit-rate and video quality. Thus the major part of this work concentrates on the computational complexity reduction and improvement of timing performance of motion estimation algorithms for HEVC standard. First, a new strategy to calculate the SAD (Sum of Absolute Difference) for motion estimation is designed based on the statistics on property of pixel data of video sequences. This statistics demonstrates the size relationship between the sum of two sets of pixels has a determined connection with the distribution of the size relationship between individual pixels from the two sets. Taking the advantage of this observation, only a small proportion of pixels is necessary to be involved in the SAD calculation. Simulations show that the amount of computations required in the full search algorithm is reduced by about 58% on average and up to 70% in the best case. Secondly, from the scope of parallelization an enhanced TZ search for HEVC is proposed using novel schemes of multiple MVPs (motion vector predictor) and shared MVP. Specifically, resorting to multiple MVPs the initial search process is performed in parallel at multiple search centers, and the ME processing engine for PUs within one CU are parallelized based on the MVP sharing scheme on CU (coding unit) level. Moreover, the SAD module for ME engine is also parallelly implemented for PU size of 32×32. Experiments indicate it achieves an appreciable improvement on the throughput and coding efficiency of the HEVC video encoder. In addition, the other part of this thesis is contributed to the VLSI architecture design for finding the first W maximum/minimum values targeting towards high speed and low hardware cost. The architecture based on the novel bit-wise AND scheme has only half of the area of the best reference solution and its critical path delay is comparable with other implementations. While the FPCG (full parallel comparison grid) architecture, which utilizes the optimized comparator-based structure, achieves 3.6 times faster on average on the speed and even 5.2 times faster at best comparing with the reference architectures. Finally the architecture using the partial sorting strategy reaches a good balance on the timing performance and area, which has a slightly lower or comparable speed with FPCG architecture and a acceptable hardware cost

    Energy Efficient VLSI Circuits for MIMO-WLAN

    Get PDF
    Mobile communication - anytime, anywhere access to data and communication services - has been continuously increasing since the operation of the first wireless communication link by Guglielmo Marconi. The demand for higher data rates, despite the limited bandwidth, led to the development of multiple-input multiple-output (MIMO) communication which is often combined with orthogonal frequency division multiplexing (OFDM). Together, these two techniques achieve a high bandwidth efficiency. Unfortunately, techniques such as MIMO-OFDM significantly increase the signal processing complexity of transceivers. While fast improvements in the integrated circuit (IC) technology enabled to implement more signal processing complexity per chip, large efforts had and have to be done for novel algorithms as well as for efficient very large scaled integration (VLSI) architectures in order to meet today's and tomorrow's requirements for mobile wireless communication systems. In this thesis, we will present architectures and VLSI implementations of complete physical (PHY) layer application specific integrated circuits (ASICs) under the constraints imposed by an industrial wireless communication standard. Contrary to many other publications, we do not elaborate individual components of a MIMO-OFDM communication system stand-alone, but in the context of the complete PHY layer ASIC. We will investigate the performance of several MIMO detectors and the corresponding preprocessing circuits, being integrated into the entire PHY layer ASIC, in terms of achievable error-rate, power consumption, and area requirement. Finally, we will assemble the results from the proposed PHY layer implementations in order to enhance the energy efficiency of a transceiver. To this end, we propose a cross-layer optimization of PHY layer and medium access control (MAC) layer
    corecore