656 research outputs found
Investigation of LSTM Based Prediction for Dynamic Energy Management in Chip Multiprocessors
In this paper, we investigate the effectiveness of using long short-term memory (LSTM) instead of Kalman filtering to do prediction for the purpose of constructing dynamic energy management (DEM) algorithms in chip multi-processors (CMPs). Either of the two prediction methods is employed to estimate the workload in the next control period for each of the processor cores. These estimates are then used to select voltage-frequency (VF) pairs for each core of the CMP during the next control period as part of a dynamic voltage and frequency scaling (DVFS) technique. The objective of the DVFS technique is to reduce energy consumption under performance constraints that are set by the user. We conduct our investigation using a custom Sniper system simulation framework. Simulation results for 16 and 64 core network-on-chip based CMP architectures and using several benchmarks demonstrate that the LSTM is slightly better than Kalman filtering
Scrooge Attack: Undervolting ARM Processors for Profit
Latest ARM processors are approaching the computational power of x86
architectures while consuming much less energy. Consequently, supply follows
demand with Amazon EC2, Equinix Metal and Microsoft Azure offering ARM-based
instances, while Oracle Cloud Infrastructure is about to add such support. We
expect this trend to continue, with an increasing number of cloud providers
offering ARM-based cloud instances.
ARM processors are more energy-efficient leading to substantial electricity
savings for cloud providers. However, a malicious cloud provider could
intentionally reduce the CPU voltage to further lower its costs. Running
applications malfunction when the undervolting goes below critical thresholds.
By avoiding critical voltage regions, a cloud provider can run undervolted
instances in a stealthy manner.
This practical experience report describes a novel attack scenario: an attack
launched by the cloud provider against its users to aggressively reduce the
processor voltage for saving energy to the last penny. We call it the Scrooge
Attack and show how it could be executed using ARM-based computing instances.
We mimic ARM-based cloud instances by deploying our own ARM-based devices using
different generations of Raspberry Pi. Using realistic and synthetic workloads,
we demonstrate to which degree of aggressiveness the attack is relevant. The
attack is unnoticeable by our detection method up to an offset of -50mV. We
show that the attack may even remain completely stealthy for certain workloads.
Finally, we propose a set of client-based detection methods that can identify
undervolted instances. We support experimental reproducibility and provide
instructions to reproduce our results.Comment: European Commission Project: LEGaTO - Low Energy Toolset for
Heterogeneous Computing (EC-H2020-780681
Throughput-Distortion Computation Of Generic Matrix Multiplication: Toward A Computation Channel For Digital Signal Processing Systems
The generic matrix multiply (GEMM) function is the core element of
high-performance linear algebra libraries used in many
computationally-demanding digital signal processing (DSP) systems. We propose
an acceleration technique for GEMM based on dynamically adjusting the
imprecision (distortion) of computation. Our technique employs adaptive scalar
companding and rounding to input matrix blocks followed by two forms of packing
in floating-point that allow for concurrent calculation of multiple results.
Since the adaptive companding process controls the increase of concurrency (via
packing), the increase in processing throughput (and the corresponding increase
in distortion) depends on the input data statistics. To demonstrate this, we
derive the optimal throughput-distortion control framework for GEMM for the
broad class of zero-mean, independent identically distributed, input sources.
Our approach converts matrix multiplication in programmable processors into a
computation channel: when increasing the processing throughput, the output
noise (error) increases due to (i) coarser quantization and (ii) computational
errors caused by exceeding the machine-precision limitations. We show that,
under certain distortion in the GEMM computation, the proposed framework can
significantly surpass 100% of the peak performance of a given processor. The
practical benefits of our proposal are shown in a face recognition system and a
multi-layer perceptron system trained for metadata learning from a large music
feature database.Comment: IEEE Transactions on Signal Processing (vol. 60, 2012
Computational Sprinting: Exceeding Sustainable Power in Thermally Constrained Systems
Although process technology trends predict that transistor sizes will continue to shrink for a few more generations, voltage scaling has stalled and thus future chips are projected to be increasingly more power hungry than previous generations. Particularly in mobile devices which are severely cooling constrained, it is estimated that the peak operation of a future chip could generate heat ten times faster than than the device can sustainably vent.
However, many mobile applications do not demand sustained performance; rather they comprise short bursts of computation in response to sporadic user activity. To improve responsiveness for such applications, this dissertation proposes computational sprinting, in which a system greatly exceeds sustainable power margins (by up to 10Ã?) to provide up to a few seconds of high-performance computation when a user interacts with the device. Computational sprinting exploits the material property of thermal capacitance to temporarily store the excess heat generated when sprinting. After sprinting, the chip returns to sustainable power levels and dissipates the stored heat when the system is idle.
This dissertation: (i) broadly analyzes thermal, electrical, hardware, and software considerations to analyze the feasibility of engineering a system which can provide the responsiveness of a plat- form with 10Ã? higher sustainable power within today\u27s cooling constraints, (ii) leverages existing sources of thermal capacitance to demonstrate sprinting on a real system today, and (iii) identifies the energy-performance characteristics of sprinting operation to determine runtime sprint pacing policies
- …