2,459 research outputs found

    Custom-Instruction Synthesis for Extensible-Processor Platforms

    Full text link

    Low Power Processor Architectures and Contemporary Techniques for Power Optimization – A Review

    Get PDF
    The technological evolution has increased the number of transistors for a given die area significantly and increased the switching speed from few MHz to GHz range. Such inversely proportional decline in size and boost in performance consequently demands shrinking of supply voltage and effective power dissipation in chips with millions of transistors. This has triggered substantial amount of research in power reduction techniques into almost every aspect of the chip and particularly the processor cores contained in the chip. This paper presents an overview of techniques for achieving the power efficiency mainly at the processor core level but also visits related domains such as buses and memories. There are various processor parameters and features such as supply voltage, clock frequency, cache and pipelining which can be optimized to reduce the power consumption of the processor. This paper discusses various ways in which these parameters can be optimized. Also, emerging power efficient processor architectures are overviewed and research activities are discussed which should help reader identify how these factors in a processor contribute to power consumption. Some of these concepts have been already established whereas others are still active research areas. © 2009 ACADEMY PUBLISHER

    BioThreads: a novel VLIW-based chip multiprocessor for accelerating biomedical image processing applications

    Get PDF
    We discuss BioThreads, a novel, configurable, extensible system-on-chip multiprocessor and its use in accelerating biomedical signal processing applications such as imaging photoplethysmography (IPPG). BioThreads is derived from the LE1 open-source VLIW chip multiprocessor and efficiently handles instruction, data and thread-level parallelism. In addition, it supports a novel mechanism for the dynamic creation, and allocation of software threads to uncommitted processor cores by implementing key POSIX Threads primitives directly in hardware, as custom instructions. In this study, the BioThreads core is used to accelerate the calculation of the oxygen saturation map of living tissue in an experimental setup consisting of a high speed image acquisition system, connected to an FPGA board and to a host system. Results demonstrate near-linear acceleration of the core kernels of the target blood perfusion assessment with increasing number of hardware threads. The BioThreads processor was implemented on both standard-cell and FPGA technologies; in the first case and for an issue width of two, full real-time performance is achieved with 4 cores whereas on a mid-range Xilinx Virtex6 device this is achieved with 10 dual-issue cores. An 8-core LE1 VLIW FPGA prototype of the system achieved 240 times faster execution time than the scalar Microblaze processor demonstrating the scalability of the proposed solution to a state-of-the-art FPGA vendor provided soft CPU core

    Real-time VLSI architecture for bio-medical monitoring

    Get PDF
    This paper discusses the architecture and implementation of SSS2, a high-performance real-time signal processing system developed with a hybrid ESL/RTL methodology and targeted to biomedical image processing. Traditional methodologies, as well as new tools, such as Cebatech's C2R untimed-C synthesizer have been employed in the design of the system. The SSS2 platform specifies a parametric number of scalar processing elements, based on multiple 32-bit Sparc-compliant engines, augmented with LE2, an ESL-designed 2-way LIW/SIMD accelerator. LE2, which is purely designed in C, exposes a consistent interface to its SIMD datapath directly which is directly derived from the C-source of open-source image processing codes. It is synthesized to Verilog RTL with C2R. Behaviorally-synthesized SIMD datapaths are then 'plugged-in' into the exposed LE2 datapath interface. The LE2 memory interface can be either a cache- based configurable vector load/store unit or a multi-banked, multi-channel streaming local memory system. Results drawn from this work strongly suggest a shift towards a hybrid approach in designing multi-core systems for high bandwidth streaming and for dealing with large scale medical image transfers and non-linear bio-signal processing algorithms
    • …
    corecore