180 research outputs found

    Dynamically Tuning Processor Resources with Adaptive Processing

    Get PDF
    The productivity of modern society has become inextricably linked to its ability to produce energy-efficient computing technology. Increasingly sophisticated mobile computing systems, powered for hours solely by batteries, continue to proliferate rapidly throughout society, while battery technology improves at a much slower pace. In large data centers that handle everything from online orders for a dot-com company to sophisticated Web searches, row upon row of tightly packed computers may be warehoused in a city block. Microprocessor energy wastage in such a facility directly translates into higher electric bills. Simply receiving sufficient electricity from utilities to power such a center is no longer certain. Given this situation, energy efficiency has rapidly moved to the forefront of modern microprocessor design. The adaptive processing approach to improving microprocessor energy efficiency dynamically tunes major microprocessor resources—such as caches and hardware queues—during execution to better match varying application needs.1,2 This tuning usually involves reducing the size of a resource when its full capabilities are not needed, then restoring the disabled portions when they are needed again. Dynamically tailoring processor resources in active use contrasts sharply with techniques that simply turn off entire sections of a processor when they become idle. Presenting the application with the required amount of hardware—and nothing more— throughout its execution can achieve a potentially significant reduction in energy consumption. The challenges facing adaptive processing lie in achieving this greater efficiency with reasonable hardware and software overhead, and doing so without incurring undue performance loss. Unlike reconfigurable computing, which typically uses very different technology such as FPGAs, adaptive processing exploits the dynamic superscalar design approach that developers have used successfully in many generations of general-purpose processors. Whereas reconfigurable processors must demonstrate performance or energy savings large enough to overcome very large clock frequency and circuit density disadvantages, adaptive processors typically have baseline overheads of only a few percent

    Integrating adaptive on-chip storage structures for reduced dynamic power

    Get PDF
    Journal ArticleEnergy efficiency in microarchitectures has become a necessity. Significant dynamic energy savings can be realized for adaptive storage structures such as caches, issue queues, and register files by disabling unnecessary storage resources. Prior studies have analyzed individual structures and their control. A common theme to these studies is exploration of the configuration space and use of system IPC as feedback to guide reconfiguration. However, when multiple structures adapt in concert, the number of possible configurations increases dramatically, and assigning causal effects to IPC change becomes problematic. To overcome this issue, we introduce designs that are reconfigured solely on local behavior. We introduce a novel cache design that permits direct calculation of efficient configurations. For buffer and queue structures, limited histogramming permits precise resizing control. When applying these techniques we show energy savings of up to 70% on the individual structures, and savings averaging 30% overall for the portion of energy attributed to these structures with an average of 2.1% performance degradation

    Integrating Adaptive On-Chip Storage Structures for Reduced Dynamic Power

    Get PDF
    Energy efficiency in microarchitectures has become a necessity. Significant dynamic energy savings can be realized for adaptive storage structures such as caches, issue queues, and register files by disabling unnecessary storage resources. Prior studies have analyzed individual structures and their control. A common theme to these studies is exploration of the configuration space and use of system IPC as feedback to guide reconfiguration. However, when multiple structures adapt in concert, the number of possible configurations increases dramatically, and assigning causal effects to IPC change becomes problematic. To overcome this issue, we introduce designs that are reconfigured solely on local behavior. We introduce a novel cache design that permits direct calculation of efficient configurations. For buffer and queue structures, limited histogramming permits precise resizing control. When applying these techniques we show energy savings of up to 70% on the individual structures, and savings averaging 30% overall for the portion of energy attributed to these structures with an average of 2.1% performance degradation

    Power Management for Deep Submicron Microprocessors

    Get PDF
    As VLSI technology scales, the enhanced performance of smaller transistors comes at the expense of increased power consumption. In addition to the dynamic power consumed by the circuits there is a tremendous increase in the leakage power consumption which is further exacerbated by the increasing operating temperatures. The total power consumption of modern processors is distributed between the processor core, memory and interconnects. In this research two novel power management techniques are presented targeting the functional units and the global interconnects. First, since most leakage control schemes for processor functional units are based on circuit level techniques, such schemes inherently lack information about the operational profile of higher-level components of the system. This is a barrier to the pivotal task of predicting standby time. Without this prediction, it is extremely difficult to assess the value of any leakage control scheme. Consequently, a methodology that can predict the standby time is highly beneficial in bridging the gap between the information available at the application level and the circuit implementations. In this work, a novel Dynamic Sleep Signal Generator (DSSG) is presented. It utilizes the usage traces extracted from cycle accurate simulations of benchmark programs to predict the long standby periods associated with the various functional units. The DSSG bases its decisions on the current and previous standby state of the functional units to accurately predict the length of the next standby period. The DSSG presents an alternative to Static Sleep Signal Generation (SSSG) based on static counters that trigger the generation of the sleep signal when the functional units idle for a prespecified number of cycles. The test results of the DSSG are obtained by the use of a modified RISC superscalar processor, implemented by SimpleScalar, the most widely accepted open source vehicle for architectural analysis. In addition, the results are further verified by a Simultaneous Multithreading simulator implemented by SMTSIM. Leakage saving results shows an increase of up to 146% in leakage savings using the DSSG versus the SSSG, with an accuracy of 60-80% for predicting long standby periods. Second, chip designers in their effort to achieve timing closure, have focused on achieving the lowest possible interconnect delay through buffer insertion and routing techniques. This approach, though, taxes the power budget of modern ICs, especially those intended for wireless applications. Also, in order to achieve more functionality, die sizes are constantly increasing. This trend is leading to an increase in the average global interconnect length which, in turn, requires more buffers to achieve timing closure. Unconstrained buffering is bound to adversely affect the overall chip performance, if the power consumption is added as a major performance metric. In fact, the number of global interconnect buffers is expected to reach hundreds of thousands to achieve an appropriate timing closure. To mitigate the impact of the power consumed by the interconnect buffers, a power-efficient multi-pin routing technique is proposed in this research. The problem is based on a graph representation of the routing possibilities, including buffer insertion and identifying the least power path between the interconnect source and set of sinks. The novel multi-pin routing technique is tested by applying it to the ISPD and IBM benchmarks to verify the accuracy, complexity, and solution quality. Results obtained indicate that an average power savings as high as 32% for the 130-nm technology is achieved with no impact on the maximum chip frequency

    Resource Banking An Energy-efficient, Run-time Adaptive Processor Design Technique

    Get PDF
    From the earliest and simplest scalar computation engines to modern superscalar out-oforder processors, the evolution of computational machinery during the past century has largely been driven by a single goal: performance. In today’s world of cheap, billion-plus transistor count processors and with an exploding market in mobile computing, a design landscape has emerged where energy efficiency, arguably more than any other single metric, determines the viability of a processor for a given application. The historical emphasis on performance has left modern processors bloated and over provisioned for everyday tasks in the hope that during computationally intensive periods some performance improvement will be observed. This work explores an energy-efficient processor design technique that ensures even a highly over provisioned out-of-order processor has only as many of its computational resources active as it requires for efficient computation at any given time. Specifically, this paper examines the feasibility of a dynamically banked register file and reorder buffer with variable banking policies that enable unused rename registers or reorder buffer entries to be voltage gated (turned off) during execution to save power. The impact of bank placement, turn-off and turn-on policies as well as rail stabilization latencies for this approach are explored for high-performance desktop and server designs as well as low-power mobile processor

    Compiler-Directed Energy Savings in Superscalar Processors

    Get PDF
    Institute for Computing Systems ArchitectureSuperscalar processors contain large, complex structures to hold data and instructions as they wait to be executed. However, many of these structures consume large amounts of energy, making them hotspots requiring sophisticated cooling systems. With the trend towards larger, more complex processors, this will become more of a problem, having important implications for future technology. This thesis uses compiler-based optimisation schemes to target the issue queue and register file. These are two of the most energy consuming structures in the processor. The algorithms and hardware techniques developed in this work dynamically adapt the processor's resources to the changing program phases, turning off parts of each structure when they are unused to save dynamic and static energy. To optimise the issue queue, the compiler analysis tracks data dependences through each program procedure. It identifies the critical path through each program region and informs the hardware of the minimum number of queue entries required to prevent it slowing down. This reduces the occupancy of the queue and increases the opportunities to save energy. With just a 1.3% performance loss, 26% dynamic and 32% static energy savings are achieved. Registers can be idle for many cycles after they are last read, before they are released and put back on the free-list to be reused by another instruction. Alternatively, they can be turned off for energy savings. Early register releasing can be used to perform this operation sooner than usual, but hardware schemes must wait for the instruction redefining the relevant logical register to enter the pipeline. This thesis presents an exploration of compiler-directed early register releasing. The compiler can exactly identify the last use of each register and pass the information to the hardware, based on simple data-flow and liveness analysis. The best scheme achieves 15% dynamic and 19% static energy savings. Finally, the issue queue limiting and early register releasing schemes are combined for energy savings in both processor structures. Four different configurations are evaluated bringing 25% to 31% dynamic and 19% to 34% static issue queue energy savings and reductions of 18% to 25% dynamic and 20% to 21% static energy in the register file
    • …
    corecore