4 research outputs found

    A highly parallel Turbo Product Code decoder without interleaving resource

    No full text
    International audienceThis article presents an innovative turbo product code (TPC) decoder architecture without any interleaving resource. This architecture includes a full-parallel SISO decoder able to process n symbols in one clock period. Syntheses show the better efficiency of such an architecture compared with existing previous solutions. Considering a 6-iteration turbo decoder of a (32,26)2 BCH product code, synthetized in a 90 nm CMOS technology, the resulting information throughput is 2.5 Gb/s with an area of 233 Kgates. Finally a second architecture enhancing parallelism rate is described. The information throughput is 33.7 Gb/s while an area estimation gives A=10 mum2

    NanoMagnetic Logic Microprocessor Hierarchical Power Model

    Get PDF
    The interest on emerging nanotechnologies has been recently focused on NanoMagnetic Logic (NML), which has unique appealing features. NML circuits have a very low power consumption and, due to their magnetic nature, they maintain the information safely stored even without power supply. The nature of these circuits is highly different from the CMOS ones. As a consequence, to better understand NML logic, complex circuits and not only simple gates must be designed. This constraint calls for a new design and simulation methodology. It should efficiently encompass manifold properties: 1) being based on commonly used hardware description language (HDL) in order to easily manage complexity and hierarchy; 2) maintaining a clear link with physical characteristics 3) modeling performance aspects like speed and power, together with logic behavior. In this contribution we present a VHDL behavioral model for NML circuits, which allows to evaluate not only logic behavior but also power dissipation. It is based on a technological solution called ``snake-clock''. We demonstrate this model on a case study which offers the right variety of internal substructures to test the method: a four bit microprocessor designed using asynchronous logic. The model enables a hierarchical bottom-up evaluation of the processor logic behavior, area and power dissipation, which we evaluated using as benchmark a division algorithm. Results highlight the flexibility and the efficiency of this model, and the remarkable improvements that it brings to the analysis of NML circuit

    High speed architectures for finding the firsttwo maximum/minimum values

    Get PDF
    High speed architectures for finding the first two maximum/minimum values are of paramount importance in several applications, including iterative (e.g. turbo and LDPC) decoders. In this brief, stemming from a previous work, based on radix-2 solutions, we propose higher and mixed radix implementations that improve the architecture latency. Post place and route results on a 180 nm CMOS standard cell technology show that the proposed architectures achieve lower latency than radix-2 solutions with a moderate area increas

    VLSI architectures design for encoders of High Efficiency Video Coding (HEVC) standard

    Get PDF
    The growing popularity of high resolution video and the continuously increasing demands for high quality video on mobile devices are producing stronger needs for more efficient video encoder. Concerning these desires, HEVC, a newest video coding standard, has been developed by a joint team formed by ISO/IEO MPEG and ITU/T VCEG. Its design goal is to achieve a 50% compression gain over its predecessor H.264 with an equal or even higher perceptual video quality. Motion Estimation (ME) being as one of the most critical module in video coding contributes almost 50%-70% of computational complexity in the video encoder. This high consumption of the computational resources puts a limit on the performance of encoders, especially for full HD or ultra HD videos, in terms of coding speed, bit-rate and video quality. Thus the major part of this work concentrates on the computational complexity reduction and improvement of timing performance of motion estimation algorithms for HEVC standard. First, a new strategy to calculate the SAD (Sum of Absolute Difference) for motion estimation is designed based on the statistics on property of pixel data of video sequences. This statistics demonstrates the size relationship between the sum of two sets of pixels has a determined connection with the distribution of the size relationship between individual pixels from the two sets. Taking the advantage of this observation, only a small proportion of pixels is necessary to be involved in the SAD calculation. Simulations show that the amount of computations required in the full search algorithm is reduced by about 58% on average and up to 70% in the best case. Secondly, from the scope of parallelization an enhanced TZ search for HEVC is proposed using novel schemes of multiple MVPs (motion vector predictor) and shared MVP. Specifically, resorting to multiple MVPs the initial search process is performed in parallel at multiple search centers, and the ME processing engine for PUs within one CU are parallelized based on the MVP sharing scheme on CU (coding unit) level. Moreover, the SAD module for ME engine is also parallelly implemented for PU size of 32×32. Experiments indicate it achieves an appreciable improvement on the throughput and coding efficiency of the HEVC video encoder. In addition, the other part of this thesis is contributed to the VLSI architecture design for finding the first W maximum/minimum values targeting towards high speed and low hardware cost. The architecture based on the novel bit-wise AND scheme has only half of the area of the best reference solution and its critical path delay is comparable with other implementations. While the FPCG (full parallel comparison grid) architecture, which utilizes the optimized comparator-based structure, achieves 3.6 times faster on average on the speed and even 5.2 times faster at best comparing with the reference architectures. Finally the architecture using the partial sorting strategy reaches a good balance on the timing performance and area, which has a slightly lower or comparable speed with FPCG architecture and a acceptable hardware cost
    corecore