12 research outputs found

    Managing Static Leakage Energy in Microprocessor Functional Units

    Get PDF
    Static energy due to subthreshold leakage current is projected to become a major component of the total energy in high performance microprocessors. Many studies so far have examined and proposed techniques to reduce leakage in on-chip storage structures. In this study, static energy is reduced in the integer functional units by leveraging the unique qualities of dual threshold voltage domino logic. Domino logic has desirable properties that greatly reduce leakage current while providing fast propagation times. However, due to the energy cost of entering the low leakage current state (sleep mode), domino logic has thus far been used only for leakage reduction in the longterm standby mode. We examine the utility of the sleep mode (while considering the aforementioned costs) when idle times are relatively short, one to a few hundred cycles, as is often the case for functional units. Using an analytical energy model suitable for architecture-level analysis, we explore the interaction of the application and technology, and the effect on energy and performance as the underlying parameters are varied, on a set of benchmarks. Our results show that if the leakage approaches the magnitude as projected in the literature, even for short idle intervals as few as ten cycles, an aggressive policy of activating the sleep mode at every idle period performs well and a more complex control strategy may not be warranted. We also propose a simple design, called Gradual Sleep, to reduce the energy impact of using the sleep mode for smaller idle periods

    Dynamically Tuning Processor Resources with Adaptive Processing

    Get PDF
    The productivity of modern society has become inextricably linked to its ability to produce energy-efficient computing technology. Increasingly sophisticated mobile computing systems, powered for hours solely by batteries, continue to proliferate rapidly throughout society, while battery technology improves at a much slower pace. In large data centers that handle everything from online orders for a dot-com company to sophisticated Web searches, row upon row of tightly packed computers may be warehoused in a city block. Microprocessor energy wastage in such a facility directly translates into higher electric bills. Simply receiving sufficient electricity from utilities to power such a center is no longer certain. Given this situation, energy efficiency has rapidly moved to the forefront of modern microprocessor design. The adaptive processing approach to improving microprocessor energy efficiency dynamically tunes major microprocessor resources—such as caches and hardware queues—during execution to better match varying application needs.1,2 This tuning usually involves reducing the size of a resource when its full capabilities are not needed, then restoring the disabled portions when they are needed again. Dynamically tailoring processor resources in active use contrasts sharply with techniques that simply turn off entire sections of a processor when they become idle. Presenting the application with the required amount of hardware—and nothing more— throughout its execution can achieve a potentially significant reduction in energy consumption. The challenges facing adaptive processing lie in achieving this greater efficiency with reasonable hardware and software overhead, and doing so without incurring undue performance loss. Unlike reconfigurable computing, which typically uses very different technology such as FPGAs, adaptive processing exploits the dynamic superscalar design approach that developers have used successfully in many generations of general-purpose processors. Whereas reconfigurable processors must demonstrate performance or energy savings large enough to overcome very large clock frequency and circuit density disadvantages, adaptive processors typically have baseline overheads of only a few percent

    Enhancing Branch Prediction via On-Line Statistical Analysis

    No full text
    To attain peak efficiency, high performance processors must anticipate changes in the flow of control before they actually occur. Branch prediction is the method of determining the most likely path to be taken at branch decision points in the program. Many branch prediction mechanisms have been proposed. The most effective of these use a single tabular data structure in hardware to hold historical information regarding the behaviors of branches. The table is a limited resource, and is managed in a manner that can result in multiple branches attempting to share locations in the table in a conflicting manner

    Miss Rate Prediction across All Program Inputs

    No full text
    Improving cache performance requires understanding cache behavior. However, measuring cache performance for one or two data input sets provides little insight into how cache behavior varies across all data input sets. This paper uses our recently published locality analysis to generate a parameterized model of program cache behavior. Given a cache size and associativity, this model predicts the miss rate for arbitrary data input set sizes. This model also identifies critical data input sizes where cache behavior exhibits marked changes. Experiments show this technique is within 2% of the hit rate for set associative caches on a set of integer and floating-point programs

    Miss rate prediction across program inputs and cache configurations

    Get PDF
    Improving cache performance requires understanding cache behavior. However, measuring cache performance for one or two data input sets provides little insight into how cache behavior varies across all data input sets and all cache configurations. This paper uses locality analysis to generate a parameterized model of program cache behavior. Given a cache size and associativity, this model predicts the miss rate for arbitrary data input set sizes. This model also identifies critical data input sizes where cache behavior exhibits marked changes. Experiments show this technique is within 2 percent of the hit rate for set associative caches on a set of floating-point and integer programs using array and pointer-based data structures. Building on the new model, this paper presents an interactive visualization tool that uses a three-dimensional plot to show miss rate changes across program data sizes and cache sizes and its use in evaluating compiler transformations. Other uses of this visualization tool include assisting machine and benchmark-set design. The tool can be accessed on the Web a

    Linguistic Support for Heterogeneous Parallel Processing: A Survey and an Approach

    No full text
    Coding a highly parallel application to run on a heterogeneous suite of processors (both metacomputers and mixed-mode computers) with high efficiency, ease of implementation, and portability is a significant challenge. This paper first surveys recently proposed and existing parallel languages from the perspective of programming complex, heterogeneous systems. We then propose two essential features to be included in programming languages that are intended to support heterogeneity. * 1. Introduction Recent examples have shown the success of combining heterogeneous computing hardware to solve complex problems[1, 2]. When the architecture of the machine matches the structure of the problem, the algorithmic solution is often easier to develop and can execute more efficiently. The currently popular practice of networking existing machines is merely a beginning [3]. As hardware continues to diminish in size and cost, new possibilities are being created for systems that are heterogeneous by..

    Miss Rate Prediction Across Program Inputs and Cache Configurations

    No full text