33 research outputs found

    Thermal Characterization of Next-Generation Workloads on Heterogeneous MPSoCs

    Get PDF
    Next-generation High-Performance Computing (HPC) applications need to tackle outstanding computational complexity while meeting latency and Quality-of-Service constraints. Heterogeneous Multi-Processor Systems-on-Chip (MPSoCs), equipped with a mix of general-purpose cores and reconfigurable fabric for custom acceleration of computational blocks, are key in providing the flexibility to meet the requirements of next-generation HPC. However, heterogeneity brings new challenges to efficient chip thermal management. In this context, accurate and fast thermal simulators are becoming crucial to understand and exploit the trade-offs brought by heterogeneous MPSoCs. In this paper, we first thermally characterize a next-generation HPC workload, the online video transcoding application, using a highly-accurate Infra-Red (IR) microscope. Second, we extend the 3D-ICE thermal simulation tool with a new generic heat spreader model capable of accurately reproducing package surface temperature, with an average error of 6.8% for the hot spots of the chip. Our model is used to characterize the thermal behaviour of the online transcoding application when running on a heterogeneous MPSoC. Moreover, by using our detailed thermal system characterization we are able to explore different application mappings as well as the thermal limits of such heterogeneous platforms

    VLSI Implementation of a Cost-Efficient Loeffler-DCT Algorithm with Recursive CORDIC for DCT-Based Encoder

    Get PDF
    This paper presents a low-cost and high-quality; hardware-oriented; two-dimensional discrete cosine transform (2-D DCT) signal analyzer for image and video encoders. In order to reduce memory requirement and improve image quality; a novel Loeffler DCT based on a coordinate rotation digital computer (CORDIC) technique is proposed. In addition; the proposed algorithm is realized by a recursive CORDIC architecture instead of an unfolded CORDIC architecture with approximated scale factors. In the proposed design; a fully pipelined architecture is developed to efficiently increase operating frequency and throughput; and scale factors are implemented by using four hardware-sharing machines for complexity reduction. Thus; the computational complexity can be decreased significantly with only 0.01 dB loss deviated from the optimal image quality of the Loeffler DCT. Experimental results show that the proposed 2-D DCT spectral analyzer not only achieved a superior average peak signal–noise ratio (PSNR) compared to the previous CORDIC-DCT algorithms but also designed cost-efficient architecture for very large scale integration (VLSI) implementation. The proposed design was realized using a UMC 0.18-μm CMOS process with a synthesized gate count of 8.04 k and core area of 75,100 μm2. Its operating frequency was 100 MHz and power consumption was 4.17 mW. Moreover; this work had at least a 64.1% gate count reduction and saved at least 22.5% in power consumption compared to previous designs
    corecore