87 research outputs found
A survey of dynamic power optimization techniques
One of the most important considerations for the current VLSI/SOC design is power, which can be classified into power analysis and optimization. In this survey, the main concepts of power optimization including the sources and policies are introduced. Among the various approaches, dynamic power management (DPM), which implies to change devices states when they are not working at the highest speed or at their full capacity, is the most efficient one. Our explanations accompanying the figures specify the abstract concepts of DPM. This paper briefly surveys both heuristic and stochastic policies and discusses their advantages and disadvantages
Harnessing resilience: biased voltage overscaling for probabilistic signal processing
A central component of modern computing is the idea that computation requires
determinism. Contrary to this belief, the primary contribution of this work shows that
useful computation can be accomplished in an error-prone fashion. Focusing on low-power
computing and the increasing push toward energy conservation, the work seeks to sacrifice
accuracy in exchange for energy savings.
Probabilistic computing forms the basis for this error-prone computation by diverging from the requirement of determinism and allowing for randomness within computing.
Implemented as probabilistic CMOS (PCMOS), the approach realizes enormous energy sav-
ings in applications that require probability at an algorithmic level. Extending probabilistic
computing to applications that are inherently deterministic, the biased voltage overscaling
(BIVOS) technique presented here constrains the randomness introduced through PCMOS.
Doing so, BIVOS is able to limit the magnitude of any resulting deviations and realizes
energy savings with minimal impact to application quality.
Implemented for a ripple-carry adder, array multiplier, and finite-impulse-response (FIR)
filter; a BIVOS solution substantially reduces energy consumption and does so with im-
proved error rates compared to an energy equivalent reduced-precision solution. When
applied to H.264 video decoding, a BIVOS solution is able to achieve a 33.9% reduction in
energy consumption while maintaining a peak-signal-to-noise ratio of 35.0dB (compared to
14.3dB for a comparable reduced-precision solution).
While the work presented here focuses on a specific technology, the technique realized
through BIVOS has far broader implications. It is the departure from the conventional
mindset that useful computation requires determinism that represents the primary innovation of this work. With applicability to emerging and yet to be discovered technologies,
BIVOS has the potential to contribute to computing in a variety of fashions.PhDCommittee Chair: Anderson, David; Committee Member: Conte, Thomas; Committee Member: Ferri, Bonnie; Committee Member: Hasler, Paul; Committee Member: Mooney, Vincen
Multicore Performance Optimization Using Partner Cores
As the push for parallelism continues to increase the number of cores on a chip, and add to the complexity of system design, the task of optimizing performance at the application level becomes nearly impossible for the programmer. Much effort has been spent on developing techniques for optimizing performance at runtime, but many techniques for modern processors employ the use of speculative threads or performance counters. These approaches result in stolen cycles, or the use of an extra core, and such expensive penalties put demanding constraints on the gains provided by such methods. While processors have grown in power and complexity, the technology for small, efficient cores has emerged. We introduce the concept of Partner Cores for maximizing hardware power efficiency; these are low-area, low-power cores situated on-die, tightly coupled to each main processor core. We demonstrate that such cores enable performance improvement without incurring expensive penalties, and carry out potential applications that are impossible on a traditional chip multiprocessor
Design of asynchronous microprocessor for power proportionality
PhD ThesisMicroprocessors continue to get exponentially cheaper for end users following Moore’s
law, while the costs involved in their design keep growing, also at an exponential rate.
The reason is the ever increasing complexity of processors, which modern EDA tools
struggle to keep up with. This makes further scaling for performance subject to a high
risk in the reliability of the system. To keep this risk low, yet improve the performance,
CPU designers try to optimise various parts of the processor. Instruction Set Architecture
(ISA) is a significant part of the whole processor design flow, whose optimal design
for a particular combination of available hardware resources and software requirements
is crucial for building processors with high performance and efficient energy utilisation.
This is a challenging task involving a lot of heuristics and high-level design decisions.
Another issue impacting CPU reliability is continuous scaling for power consumption. For
the last decades CPU designers have been mainly focused on improving performance, but
“keeping energy and power consumption in mind”. The consequence of this was a development
of energy-efficient systems, where energy was considered as a resource whose
consumption should be optimised. As CMOS technology was progressing, with feature
size decreasing and power delivered to circuit components becoming less stable, the
energy resource turned from an optimisation criterion into a constraint, sometimes a critical
one. At this point power proportionality becomes one of the most important aspects
in system design. Developing methods and techniques which will address the problem
of designing a power-proportional microprocessor, capable to adapt to varying operating
conditions (such as low or even unstable voltage levels) and application requirements in
the runtime, is one of today’s grand challenges. In this thesis this challenge is addressed
by proposing a new design flow for the development of an ISA for microprocessors, which
can be altered to suit a particular hardware platform or a specific operating mode. This
flow uses an expressive and powerful formalism for the specification of processor instruction
sets called the Conditional Partial Order Graph (CPOG). The CPOG model captures
large sets of behavioural scenarios for a microarchitectural level in a computationally
efficient form amenable to formal transformations for synthesis, verification and automated
derivation of asynchronous hardware for the CPU microcontrol. The feasibility of
the methodology, novel design flow and a number of optimisation techniques was proven
in a full size asynchronous Intel 8051 microprocessor and its demonstrator silicon. The
chip showed the ability to work in a wide range of operating voltage and environmental
conditions. Depending on application requirements and power budget our ASIC supports
several operating modes: one optimised for energy consumption and the other one for
performance. This was achieved by extending a traditional datapath structure with an
auxiliary control layer for adaptable and fault tolerant operation. These and other optimisations
resulted in a reconfigurable and adaptable implementation, which was proven
by measurements, analysis and evaluation of the chip.EPSR
Optimality study of resource binding with multi-Vdds
Deploying multiple supply voltages (multi-Vdds) on one chip is an important technique to reduce dynamic power consumption. In this work we present an optimality study for resource binding targeting designs with multi-Vdds. This is similar to the voltage-island design concept, except that the granularity of our voltage island is on the functional-unit level as opposed to the core level. We are interested in achieving the maximum number of low-Vdd operations and, in the same time, minimizing switching activity during functional unit binding. To the best of our knowledge, there is no known optimal solution to this problem. To compute an optimal solution for this problem and examine the quality gap between our solution and previous heuristic solutions, we formulate this problem as a min-cost network flow problem, but with special equal-flow constraints. This formulation leads to an easy reduction to the integer linear programming (ILP) solution and also enables efficient approximate solution by Lagrangian relaxation. Experimental results show that the optimal solution computed based on our formulation provides 7% more low-Vdd operations and also reduces the total switching activity by 20% compared to one of the best known heuristic algorithms that consider multi-Vdd assignments only. Copyright 2006 ACM.EI
- …