31 research outputs found
Recommended from our members
Active timing margin management to improve microprocessor power efficiency
Improving power/performance efficiency is critical for today’s micro- processors. From edge devices to datacenters, lower power or higher performance always produces better systems, measured by lower cost of ownership or longer battery time. This thesis studies improving microprocessor power/performance efficiency by optimizing the pipeline timing margin. In particular, this thesis focuses on improving the efficacy of Active Timing Margin, a young technology that dynamically adjusts the margin.
Active timing margin trims down the pipeline timing margin with a control loop that adjusts voltage and frequency based on real-time chip environment monitoring. The key insight of this thesis is that in order to maximize active timing margin’s efficiency enhancement benefits, synergistic management from processor architecture design and system software scheduling are needed. To that end, this thesis covers the major consumers of pipeline timing margin, including temperature, voltage, and process variation. For temperature variation, the thesis proposes a table-lookup based active timing margin mechanism, and an associated temperature management scheme to minimize power consumption. For voltage variation, the thesis characterizes the limiting factors of adaptive clocking’s power saving and proposes application scheduling to maximize total system power reduction. For process variation, the thesis proposes core-level adaptive clocking reconfiguration to automatically expose inter-core variation and discusses workload scheduling and throttling management to control critical application performance.
The author believes the optimization presented in this thesis can potentially benefit a variety of processor architectures as the conclusions are based on the solid measurement on state-of-the-art processors, and the research objective, active timing margin, already has wide applicability in the latest microprocessors by the time this thesis is written.Electrical and Computer Engineerin
Exceeding Conservative Limits: A Consolidated Analysis on Modern Hardware Margins
Modern large-scale computing systems (data centers, supercomputers, cloud and
edge setups and high-end cyber-physical systems) employ heterogeneous
architectures that consist of multicore CPUs, general-purpose many-core GPUs,
and programmable FPGAs. The effective utilization of these architectures poses
several challenges, among which a primary one is power consumption. Voltage
reduction is one of the most efficient methods to reduce power consumption of a
chip. With the galloping adoption of hardware accelerators (i.e., GPUs and
FPGAs) in large datacenters and other large-scale computing infrastructures, a
comprehensive evaluation of the safe voltage reduction levels for each
different chip can be employed for efficient reduction of the total power. We
present a survey of recent studies in voltage margins reduction at the system
level for modern CPUs, GPUs and FPGAs. The pessimistic voltage guardbands
inserted by the silicon vendors can be exploited in all devices for significant
power savings. On average, voltage reduction can reach 12% in multicore CPUs,
20% in manycore GPUs and 39% in FPGAs.Comment: Accepted for publication in IEEE Transactions on Device and Materials
Reliabilit
Scrooge Attack: Undervolting ARM Processors for Profit
Latest ARM processors are approaching the computational power of x86
architectures while consuming much less energy. Consequently, supply follows
demand with Amazon EC2, Equinix Metal and Microsoft Azure offering ARM-based
instances, while Oracle Cloud Infrastructure is about to add such support. We
expect this trend to continue, with an increasing number of cloud providers
offering ARM-based cloud instances.
ARM processors are more energy-efficient leading to substantial electricity
savings for cloud providers. However, a malicious cloud provider could
intentionally reduce the CPU voltage to further lower its costs. Running
applications malfunction when the undervolting goes below critical thresholds.
By avoiding critical voltage regions, a cloud provider can run undervolted
instances in a stealthy manner.
This practical experience report describes a novel attack scenario: an attack
launched by the cloud provider against its users to aggressively reduce the
processor voltage for saving energy to the last penny. We call it the Scrooge
Attack and show how it could be executed using ARM-based computing instances.
We mimic ARM-based cloud instances by deploying our own ARM-based devices using
different generations of Raspberry Pi. Using realistic and synthetic workloads,
we demonstrate to which degree of aggressiveness the attack is relevant. The
attack is unnoticeable by our detection method up to an offset of -50mV. We
show that the attack may even remain completely stealthy for certain workloads.
Finally, we propose a set of client-based detection methods that can identify
undervolted instances. We support experimental reproducibility and provide
instructions to reproduce our results.Comment: European Commission Project: LEGaTO - Low Energy Toolset for
Heterogeneous Computing (EC-H2020-780681
Cross-Layer Approaches for an Aging-Aware Design of Nanoscale Microprocessors
Thanks to aggressive scaling of transistor dimensions, computers have revolutionized our life. However, the increasing unreliability of devices fabricated in nanoscale technologies emerged as a major threat for the future success of computers. In particular, accelerated transistor aging is of great importance, as it reduces the lifetime of digital systems. This thesis addresses this challenge by proposing new methods to model, analyze and mitigate aging at microarchitecture-level and above
ParaDox: Eliminating Voltage Margins via Heterogeneous Fault Tolerance.
Providing reliability is becoming a challenge for chip manufacturers, faced with simultaneously trying to improve miniaturization, performance and energy efficiency. This leads to very large margins on voltage and frequency, designed to avoid errors even in the worst case, along with significant hardware expenditure on eliminating voltage spikes and other forms of transient error, causing considerable inefficiency in power consumption and performance. We flip traditional ideas about reliability and performance around, by exploring the use of error resilience for power and performance gains. ParaMedic is a recent architecture that provides a solution for reliability with low overheads via automatic hardware error recovery. It works by splitting up checking onto many small cores in a heterogeneous multicore system with hardware logging support. However, its design is based on the idea that errors are exceptional. We transform ParaMedic into ParaDox, which shows high performance in both error-intensive and scarce-error scenarios, thus allowing correct execution even when undervolted and overclocked. Evaluation within error-intensive simulation environments confirms the error resilience of ParaDox and the low associated recovery cost. We estimate that compared to a non-resilient system with margins, ParaDox can reduce energy-delay product by 15% through undervolting, while completely recovering from any induced errors
Software Canaries: Software-based Path Delay Fault Testing for Variation-aware Energy-efficient Design
ABSTRACT Software-based path delay fault testing (SPDFT) has been used to identify faulty chips that cannot meet timing constraints due to gross delay defects. In this paper, we propose using SPDFT for a new purpose -aggressively selecting the operating point of a variation-affected design. In order to use SPDFT for this purpose, test routines must provide high coverage of potentially-critical paths and must have low dynamic performance overhead. We describe how to apply SPDFT for selecting an energy-efficient operating point for a variation-affected processor and demonstrate that our test routines achieve ample coverage and low overhead
Intelligent systems for efficiency and security
As computing becomes ubiquitous and personalized, resources like energy, storage and time are becoming increasingly scarce and, at the same time, computing systems must deliver in multiple dimensions, such as high performance, quality of service, reliability, security and low power. Building such computers is hard, particularly when the operating environment is becoming more dynamic, and systems are becoming heterogeneous and distributed.
Unfortunately, computers today manage resources with many ad hoc heuristics that are suboptimal, unsafe, and cannot be composed across the computer’s subsystems. Continuing this approach has severe consequences: underperforming systems, resource waste, information loss, and even life endangerment.
This dissertation research develops computing systems which, through intelligent adaptation, deliver efficiency along multiple dimensions. The key idea is to manage computers with principled methods from formal control. It is with these methods that the multiple subsystems of a computer sense their environment and configure themselves to meet system-wide goals.
To achieve the goal of intelligent systems, this dissertation makes a series of contributions, each building on the previous. First, it introduces the use of formal MIMO (Multiple Input Multiple Output) control for processors, to simultaneously optimize many goals like performance, power, and temperature. Second, it develops the Yukta control system, which uses coordinated formal controllers in different layers of the stack (hardware and operating system). Third, it uses robust control to develop a fast, globally coordinated and decentralized control framework called Tangram, for heterogeneous computers. Finally, it presents Maya, a defense against power side-channel attacks that uses formal control to reshape the power dissipated by a computer, confusing the attacker. The ideas in the dissertation have been demonstrated successfully with several prototypes, including one built along with AMD (Advanced Micro Devices, Inc.) engineers. These designs significantly outperformed the state of the art.
The research in this dissertation brought formal control closer to computer architecture and has been well-received in both domains. It has the first application of full-fledged MIMO control for processors, the first use of robust control in computer systems, and the first application of formal control for side-channel defense. It makes a significant stride towards intelligent systems that are efficient, secure and reliable