4,105 research outputs found

    Low Power Processor Architectures and Contemporary Techniques for Power Optimization – A Review

    Get PDF
    The technological evolution has increased the number of transistors for a given die area significantly and increased the switching speed from few MHz to GHz range. Such inversely proportional decline in size and boost in performance consequently demands shrinking of supply voltage and effective power dissipation in chips with millions of transistors. This has triggered substantial amount of research in power reduction techniques into almost every aspect of the chip and particularly the processor cores contained in the chip. This paper presents an overview of techniques for achieving the power efficiency mainly at the processor core level but also visits related domains such as buses and memories. There are various processor parameters and features such as supply voltage, clock frequency, cache and pipelining which can be optimized to reduce the power consumption of the processor. This paper discusses various ways in which these parameters can be optimized. Also, emerging power efficient processor architectures are overviewed and research activities are discussed which should help reader identify how these factors in a processor contribute to power consumption. Some of these concepts have been already established whereas others are still active research areas. © 2009 ACADEMY PUBLISHER

    Coarse-grained reconfigurable array architectures

    Get PDF
    Coarse-Grained Reconfigurable Array (CGRA) architectures accelerate the same inner loops that benefit from the high ILP support in VLIW architectures. By executing non-loop code on other cores, however, CGRAs can focus on such loops to execute them more efficiently. This chapter discusses the basic principles of CGRAs, and the wide range of design options available to a CGRA designer, covering a large number of existing CGRA designs. The impact of different options on flexibility, performance, and power-efficiency is discussed, as well as the need for compiler support. The ADRES CGRA design template is studied in more detail as a use case to illustrate the need for design space exploration, for compiler support and for the manual fine-tuning of source code

    Automatic synthesis and optimization of chip multiprocessors

    Get PDF
    The microprocessor technology has experienced an enormous growth during the last decades. Rapid downscale of the CMOS technology has led to higher operating frequencies and performance densities, facing the fundamental issue of power dissipation. Chip Multiprocessors (CMPs) have become the latest paradigm to improve the power-performance efficiency of computing systems by exploiting the parallelism inherent in applications. Industrial and prototype implementations have already demonstrated the benefits achieved by CMPs with hundreds of cores.CMP architects are challenged to take many complex design decisions. Only a few of them are:- What should be the ratio between the core and cache areas on a chip?- Which core architectures to select?- How many cache levels should the memory subsystem have?- Which interconnect topologies provide efficient on-chip communication?These and many other aspects create a complex multidimensional space for architectural exploration. Design Automation tools become essential to make the architectural exploration feasible under the hard time-to-market constraints. The exploration methods have to be efficient and scalable to handle future generation on-chip architectures with hundreds or thousands of cores.Furthermore, once a CMP has been fabricated, the need for efficient deployment of the many-core processor arises. Intelligent techniques for task mapping and scheduling onto CMPs are necessary to guarantee the full usage of the benefits brought by the many-core technology. These techniques have to consider the peculiarities of the modern architectures, such as availability of enhanced power saving techniques and presence of complex memory hierarchies.This thesis has several objectives. The first objective is to elaborate the methods for efficient analytical modeling and architectural design space exploration of CMPs. The efficiency is achieved by using analytical models instead of simulation, and replacing the exhaustive exploration with an intelligent search strategy. Additionally, these methods incorporate high-level models for physical planning. The related contributions are described in Chapters 3, 4 and 5 of the document.The second objective of this work is to propose a scalable task mapping algorithm onto general-purpose CMPs with power management techniques, for efficient deployment of many-core systems. This contribution is explained in Chapter 6 of this document.Finally, the third objective of this thesis is to address the issues of the on-chip interconnect design and exploration, by developing a model for simultaneous topology customization and deadlock-free routing in Networks-on-Chip. The developed methodology can be applied to various classes of the on-chip systems, ranging from general-purpose chip multiprocessors to application-specific solutions. Chapter 7 describes the proposed model.The presented methods have been thoroughly tested experimentally and the results are described in this dissertation. At the end of the document several possible directions for the future research are proposed

    Remotely pumped optical distribution networks: a distributed amplifier model

    Get PDF
    Optical distribution networks using remotely pumped erbium-doped fiber amplifiers (EDFAs) with a single pump source at the head end can conveniently provide signal gain without adding to the power-consumption cost and management complexity of having multiple locally pumped EDFAs in densely populated metropolitan areas. We introduce an analytical model for understanding the basic physical foundations of remotely pumped network design and for analyzing the number of users that can be supported using such a remote-pumping scheme

    Entanglement genesis by ancilla-based parity measurement in 2D circuit QED

    Get PDF
    We present an indirect two-qubit parity meter in planar circuit quantum electrodynamics, realized by discrete interaction with an ancilla and a subsequent projective ancilla measurement with a dedicated, dispersively coupled resonator. Quantum process tomography and successful entanglement by measurement demonstrate that the meter is intrinsically quantum non-demolition. Separate interaction and measurement steps allow commencing subsequent data qubit operations in parallel with ancilla measurement, offering time savings over continuous schemes.Comment: 5 pages, 4 figures; supplemental material with 5 figure

    A Hardware Architecture of a Counter-Based Entropy Coder

    Full text link
    This paper describes a hardware architectural design of a real-time counter based entropy coder at a register transfer level (RTL) computing model. The architecture is based on a lossless compression algorithm called Rice coding, which is optimal for an entropy range of bits per sample. The architecture incorporates a word-splitting scheme to extend the entropy coverage into a range of bits per sample. We have designed a data structure in a form of independent code blocks, allowing more robust compressed bitstream. The design focuses on an RTL computing model and architecture, utilizing 8-bit buffers, adders, registers, loader-shifters, select-logics, down-counters, up-counters, and multiplexers. We have validated the architecture (both the encoder and the decoder) in a coprocessor for 8 bits/sample data on an FPGA Xilinx XC4005, utilizing 61% of F&G-CLBs, 34% H-CLBs, 32% FF-CLBs, and 68% IO resources. On this FPGA implementation, the encoder and decoder can achieve 1.74 Mbits/s and 2.91 Mbits/s throughputs, respectively. The architecture allows pipelining, resulting in potentially maximum encoding throughput of 200 Mbit/s on typical real-time TTL implementations. In addition, it uses a minimum number of register elements. As a result, this architecture can result in low cost, low energy consumption and reduced silicon area realizations
    corecore