31 research outputs found

    Exceeding Conservative Limits: A Consolidated Analysis on Modern Hardware Margins

    Get PDF
    Modern large-scale computing systems (data centers, supercomputers, cloud and edge setups and high-end cyber-physical systems) employ heterogeneous architectures that consist of multicore CPUs, general-purpose many-core GPUs, and programmable FPGAs. The effective utilization of these architectures poses several challenges, among which a primary one is power consumption. Voltage reduction is one of the most efficient methods to reduce power consumption of a chip. With the galloping adoption of hardware accelerators (i.e., GPUs and FPGAs) in large datacenters and other large-scale computing infrastructures, a comprehensive evaluation of the safe voltage reduction levels for each different chip can be employed for efficient reduction of the total power. We present a survey of recent studies in voltage margins reduction at the system level for modern CPUs, GPUs and FPGAs. The pessimistic voltage guardbands inserted by the silicon vendors can be exploited in all devices for significant power savings. On average, voltage reduction can reach 12% in multicore CPUs, 20% in manycore GPUs and 39% in FPGAs.Comment: Accepted for publication in IEEE Transactions on Device and Materials Reliabilit

    Scrooge Attack: Undervolting ARM Processors for Profit

    Full text link
    Latest ARM processors are approaching the computational power of x86 architectures while consuming much less energy. Consequently, supply follows demand with Amazon EC2, Equinix Metal and Microsoft Azure offering ARM-based instances, while Oracle Cloud Infrastructure is about to add such support. We expect this trend to continue, with an increasing number of cloud providers offering ARM-based cloud instances. ARM processors are more energy-efficient leading to substantial electricity savings for cloud providers. However, a malicious cloud provider could intentionally reduce the CPU voltage to further lower its costs. Running applications malfunction when the undervolting goes below critical thresholds. By avoiding critical voltage regions, a cloud provider can run undervolted instances in a stealthy manner. This practical experience report describes a novel attack scenario: an attack launched by the cloud provider against its users to aggressively reduce the processor voltage for saving energy to the last penny. We call it the Scrooge Attack and show how it could be executed using ARM-based computing instances. We mimic ARM-based cloud instances by deploying our own ARM-based devices using different generations of Raspberry Pi. Using realistic and synthetic workloads, we demonstrate to which degree of aggressiveness the attack is relevant. The attack is unnoticeable by our detection method up to an offset of -50mV. We show that the attack may even remain completely stealthy for certain workloads. Finally, we propose a set of client-based detection methods that can identify undervolted instances. We support experimental reproducibility and provide instructions to reproduce our results.Comment: European Commission Project: LEGaTO - Low Energy Toolset for Heterogeneous Computing (EC-H2020-780681

    Cross-Layer Approaches for an Aging-Aware Design of Nanoscale Microprocessors

    Get PDF
    Thanks to aggressive scaling of transistor dimensions, computers have revolutionized our life. However, the increasing unreliability of devices fabricated in nanoscale technologies emerged as a major threat for the future success of computers. In particular, accelerated transistor aging is of great importance, as it reduces the lifetime of digital systems. This thesis addresses this challenge by proposing new methods to model, analyze and mitigate aging at microarchitecture-level and above

    Resource Management for Multicores to Optimize Performance under Temperature and Aging Constraints

    Get PDF

    ParaDox: Eliminating Voltage Margins via Heterogeneous Fault Tolerance.

    Get PDF
    Providing reliability is becoming a challenge for chip manufacturers, faced with simultaneously trying to improve miniaturization, performance and energy efficiency. This leads to very large margins on voltage and frequency, designed to avoid errors even in the worst case, along with significant hardware expenditure on eliminating voltage spikes and other forms of transient error, causing considerable inefficiency in power consumption and performance. We flip traditional ideas about reliability and performance around, by exploring the use of error resilience for power and performance gains. ParaMedic is a recent architecture that provides a solution for reliability with low overheads via automatic hardware error recovery. It works by splitting up checking onto many small cores in a heterogeneous multicore system with hardware logging support. However, its design is based on the idea that errors are exceptional. We transform ParaMedic into ParaDox, which shows high performance in both error-intensive and scarce-error scenarios, thus allowing correct execution even when undervolted and overclocked. Evaluation within error-intensive simulation environments confirms the error resilience of ParaDox and the low associated recovery cost. We estimate that compared to a non-resilient system with margins, ParaDox can reduce energy-delay product by 15% through undervolting, while completely recovering from any induced errors

    Software Canaries: Software-based Path Delay Fault Testing for Variation-aware Energy-efficient Design

    Get PDF
    ABSTRACT Software-based path delay fault testing (SPDFT) has been used to identify faulty chips that cannot meet timing constraints due to gross delay defects. In this paper, we propose using SPDFT for a new purpose -aggressively selecting the operating point of a variation-affected design. In order to use SPDFT for this purpose, test routines must provide high coverage of potentially-critical paths and must have low dynamic performance overhead. We describe how to apply SPDFT for selecting an energy-efficient operating point for a variation-affected processor and demonstrate that our test routines achieve ample coverage and low overhead

    Intelligent systems for efficiency and security

    Get PDF
    As computing becomes ubiquitous and personalized, resources like energy, storage and time are becoming increasingly scarce and, at the same time, computing systems must deliver in multiple dimensions, such as high performance, quality of service, reliability, security and low power. Building such computers is hard, particularly when the operating environment is becoming more dynamic, and systems are becoming heterogeneous and distributed. Unfortunately, computers today manage resources with many ad hoc heuristics that are suboptimal, unsafe, and cannot be composed across the computer’s subsystems. Continuing this approach has severe consequences: underperforming systems, resource waste, information loss, and even life endangerment. This dissertation research develops computing systems which, through intelligent adaptation, deliver efficiency along multiple dimensions. The key idea is to manage computers with principled methods from formal control. It is with these methods that the multiple subsystems of a computer sense their environment and configure themselves to meet system-wide goals. To achieve the goal of intelligent systems, this dissertation makes a series of contributions, each building on the previous. First, it introduces the use of formal MIMO (Multiple Input Multiple Output) control for processors, to simultaneously optimize many goals like performance, power, and temperature. Second, it develops the Yukta control system, which uses coordinated formal controllers in different layers of the stack (hardware and operating system). Third, it uses robust control to develop a fast, globally coordinated and decentralized control framework called Tangram, for heterogeneous computers. Finally, it presents Maya, a defense against power side-channel attacks that uses formal control to reshape the power dissipated by a computer, confusing the attacker. The ideas in the dissertation have been demonstrated successfully with several prototypes, including one built along with AMD (Advanced Micro Devices, Inc.) engineers. These designs significantly outperformed the state of the art. The research in this dissertation brought formal control closer to computer architecture and has been well-received in both domains. It has the first application of full-fledged MIMO control for processors, the first use of robust control in computer systems, and the first application of formal control for side-channel defense. It makes a significant stride towards intelligent systems that are efficient, secure and reliable
    corecore