1,272 research outputs found

    Low Power Processor Architectures and Contemporary Techniques for Power Optimization – A Review

    Get PDF
    The technological evolution has increased the number of transistors for a given die area significantly and increased the switching speed from few MHz to GHz range. Such inversely proportional decline in size and boost in performance consequently demands shrinking of supply voltage and effective power dissipation in chips with millions of transistors. This has triggered substantial amount of research in power reduction techniques into almost every aspect of the chip and particularly the processor cores contained in the chip. This paper presents an overview of techniques for achieving the power efficiency mainly at the processor core level but also visits related domains such as buses and memories. There are various processor parameters and features such as supply voltage, clock frequency, cache and pipelining which can be optimized to reduce the power consumption of the processor. This paper discusses various ways in which these parameters can be optimized. Also, emerging power efficient processor architectures are overviewed and research activities are discussed which should help reader identify how these factors in a processor contribute to power consumption. Some of these concepts have been already established whereas others are still active research areas. © 2009 ACADEMY PUBLISHER

    Distributed data cache designs for clustered VLIW processors

    Get PDF
    Wire delays are a major concern for current and forthcoming processors. One approach to deal with this problem is to divide the processor into semi-independent units referred to as clusters. A cluster usually consists of a local register file and a subset of the functional units, while the L1 data cache typically remains centralized in What we call partially distributed architectures. However, as technology evolves, the relative latency of such a centralized cache will increase, leading to an important impact on performance. In this paper, we propose partitioning the L1 data cache among clusters for clustered VLIW processors. We refer to this kind of design as fully distributed processors. In particular; we propose and evaluate three different configurations: a snoop-based cache coherence scheme, a word-interleaved cache, and flexible LO-buffers managed by the compiler. For each alternative, instruction scheduling techniques targeted to cyclic code are developed. Results for the Mediabench suite'show that the performance of such fully distributed architectures is always better than the performance of a partially distributed one with the same amount of resources. In addition, the key aspects of each fully distributed configuration are explored.Peer ReviewedPostprint (published version

    DeSyRe: on-Demand System Reliability

    No full text
    The DeSyRe project builds on-demand adaptive and reliable Systems-on-Chips (SoCs). As fabrication technology scales down, chips are becoming less reliable, thereby incurring increased power and performance costs for fault tolerance. To make matters worse, power density is becoming a significant limiting factor in SoC design, in general. In the face of such changes in the technological landscape, current solutions for fault tolerance are expected to introduce excessive overheads in future systems. Moreover, attempting to design and manufacture a totally defect and fault-free system, would impact heavily, even prohibitively, the design, manufacturing, and testing costs, as well as the system performance and power consumption. In this context, DeSyRe delivers a new generation of systems that are reliable by design at well-balanced power, performance, and design costs. In our attempt to reduce the overheads of fault-tolerance, only a small fraction of the chip is built to be fault-free. This fault-free part is then employed to manage the remaining fault-prone resources of the SoC. The DeSyRe framework is applied to two medical systems with high safety requirements (measured using the IEC 61508 functional safety standard) and tight power and performance constraints

    The hArtes Tool Chain

    Get PDF
    This chapter describes the different design steps needed to go from legacy code to a transformed application that can be efficiently mapped on the hArtes platform

    Parallel Architectures and Parallel Algorithms for Integrated Vision Systems

    Get PDF
    Computer vision is regarded as one of the most complex and computationally intensive problems. An integrated vision system (IVS) is a system that uses vision algorithms from all levels of processing to perform for a high level application (e.g., object recognition). An IVS normally involves algorithms from low level, intermediate level, and high level vision. Designing parallel architectures for vision systems is of tremendous interest to researchers. Several issues are addressed in parallel architectures and parallel algorithms for integrated vision systems

    Design techniques for low-power systems

    Get PDF
    Portable products are being used increasingly. Because these systems are battery powered, reducing power consumption is vital. In this report we give the properties of low-power design and techniques to exploit them on the architecture of the system. We focus on: minimizing capacitance, avoiding unnecessary and wasteful activity, and reducing voltage and frequency. We review energy reduction techniques in the architecture and design of a hand-held computer and the wireless communication system including error control, system decomposition, communication and MAC protocols, and low-power short range networks

    THE RAFAEL MULTI-TARGET HETEROGENEOUS SIGNAL-FLOW GRAPH COMPILER

    Get PDF
    This paper describes a signal-flow graph compiler which produces distributed code for heterogeneous target systems. The compiler is devoted for mainly Digital Signal Process- ing problems. The code generator features reprogrammable operation library, the static scheduler supports fully heterogeneous systems and the input graph may contain run-time decisions in a limited way. The system has been implemented on IBM PC compatibles under MS-VVindows so it does not require expansive host computer

    Mapping Framework for Heterogeneous Reconfigurable Architectures:Combining Temporal Partitioning and Multiprocessor Scheduling

    Get PDF

    Hardware-software codesign in a high-level synthesis environment

    Get PDF
    Interfacing hardware-oriented high-level synthesis to software development is a computationally hard problem for which no general solution exists. Under special conditions, the hardware-software codesign (system-level synthesis) problem may be analyzed with traditional tools and efficient heuristics. This dissertation introduces a new alternative to the currently used heuristic methods. The new approach combines the results of top-down hardware development with existing basic hardware units (bottom-up libraries) and compiler generation tools. The optimization goal is to maximize operating frequency or minimize cost with reasonable tradeoffs in other properties. The dissertation research provides a unified approach to hardware-software codesign. The improvements over previously existing design methodologies are presented in the frame-work of an academic CAD environment (PIPE). This CAD environment implements a sufficient subset of functions of commercial microelectronics CAD packages. The results may be generalized for other general-purpose algorithms or environments. Reference benchmarks are used to validate the new approach. Most of the well-known benchmarks are based on discrete-time numerical simulations, digital filtering applications, and cryptography (an emerging field in benchmarking). As there is a need for high-performance applications, an additional requirement for this dissertation is to investigate pipelined hardware-software systems\u27 performance and design methods. The results demonstrate that the quality of existing heuristics does not change in the enhanced, hardware-software environment
    • 

    corecore