14,844 research outputs found
Generalizing Amdahl’s Law for Power and Energy
Extending Amdahl\u27s law to identify optimal power-performance configurations requires considering the interactive effects of power, performance, and parallel overhead
PowerPack: Energy Profiling and Analysis of High-Performance Systems and Applications
Energy efficiency is a major concern in modern high-performance computing system design. In the past few years, there has been mounting evidence that power usage limits system scale and computing density, and thus, ultimately system performance. However, despite the impact of power and energy on the computer systems community, few studies provide insight to where and how power is consumed on high-performance systems and applications. In previous work, we designed a framework called PowerPack that was the first tool to isolate the power consumption of devices including disks, memory, NICs, and processors in a high-performance cluster and correlate these measurements to application functions. In this work, we extend our framework to support systems with multicore, multiprocessor-based nodes, and then provide in-depth analyses of the energy consumption of parallel applications on clusters of these systems. These analyses include the impacts of chip multiprocessing on power and energy efficiency, and its interaction with application executions. In addition, we use PowerPack to study the power dynamics and energy efficiencies of dynamic voltage and frequency scaling (DVFS) techniques on clusters. Our experiments reveal conclusively how intelligent DVFS scheduling can enhance system energy efficiency while maintaining performance
COLAB:A Collaborative Multi-factor Scheduler for Asymmetric Multicore Processors
Funding: Partially funded by the UK EPSRC grants Discovery: Pattern Discovery and Program Shaping for Many-core Systems (EP/P020631/1) and ABC: Adaptive Brokerage for Cloud (EP/R010528/1); Royal Academy of Engineering under the Research Fellowship scheme.Increasingly prevalent asymmetric multicore processors (AMP) are necessary for delivering performance in the era of limited power budget and dark silicon. However, the software fails to use them efficiently. OS schedulers, in particular, handle asymmetry only under restricted scenarios. We have efficient symmetric schedulers, efficient asymmetric schedulers for single-threaded workloads, and efficient asymmetric schedulers for single program workloads. What we do not have is a scheduler that can handle all runtime factors affecting AMP for multi-threaded multi-programmed workloads. This paper introduces the first general purpose asymmetry-aware scheduler for multi-threaded multi-programmed workloads. It estimates the performance of each thread on each type of core and identifies communication patterns and bottleneck threads. The scheduler then makes coordinated core assignment and thread selection decisions that still provide each application its fair share of the processor's time. We evaluate our approach using the GEM5 simulator on four distinct big.LITTLE configurations and 26 mixed workloads composed of PARSEC and SPLASH2 benchmarks. Compared to the state-of-the art Linux CFS and AMP-aware schedulers, we demonstrate performance gains of up to 25% and 5% to 15% on average depending on the hardware setup.Postprin
JALAD: Joint Accuracy- and Latency-Aware Deep Structure Decoupling for Edge-Cloud Execution
Recent years have witnessed a rapid growth of deep-network based services and
applications. A practical and critical problem thus has emerged: how to
effectively deploy the deep neural network models such that they can be
executed efficiently. Conventional cloud-based approaches usually run the deep
models in data center servers, causing large latency because a significant
amount of data has to be transferred from the edge of network to the data
center. In this paper, we propose JALAD, a joint accuracy- and latency-aware
execution framework, which decouples a deep neural network so that a part of it
will run at edge devices and the other part inside the conventional cloud,
while only a minimum amount of data has to be transferred between them. Though
the idea seems straightforward, we are facing challenges including i) how to
find the best partition of a deep structure; ii) how to deploy the component at
an edge device that only has limited computation power; and iii) how to
minimize the overall execution latency. Our answers to these questions are a
set of strategies in JALAD, including 1) A normalization based in-layer data
compression strategy by jointly considering compression rate and model
accuracy; 2) A latency-aware deep decoupling strategy to minimize the overall
execution latency; and 3) An edge-cloud structure adaptation strategy that
dynamically changes the decoupling for different network conditions.
Experiments demonstrate that our solution can significantly reduce the
execution latency: it speeds up the overall inference execution with a
guaranteed model accuracy loss.Comment: conference, copyright transfered to IEE
Profitable Task Allocation in Mobile Cloud Computing
We propose a game theoretic framework for task allocation in mobile cloud
computing that corresponds to offloading of compute tasks to a group of nearby
mobile devices. Specifically, in our framework, a distributor node holds a
multidimensional auction for allocating the tasks of a job among nearby mobile
nodes based on their computational capabilities and also the cost of
computation at these nodes, with the goal of reducing the overall job
completion time. Our proposed auction also has the desired incentive
compatibility property that ensures that mobile devices truthfully reveal their
capabilities and costs and that those devices benefit from the task allocation.
To deal with node mobility, we perform multiple auctions over adaptive time
intervals. We develop a heuristic approach to dynamically find the best time
intervals between auctions to minimize unnecessary auctions and the
accompanying overheads. We evaluate our framework and methods using both real
world and synthetic mobility traces. Our evaluation results show that our game
theoretic framework improves the job completion time by a factor of 2-5 in
comparison to the time taken for executing the job locally, while minimizing
the number of auctions and the accompanying overheads. Our approach is also
profitable for the nearby nodes that execute the distributor's tasks with these
nodes receiving a compensation higher than their actual costs
Iso-energy-efficiency: An approach to power-constrained parallel computation
Future large scale high performance supercomputer systems require high energy efficiency to achieve exaflops computational power and beyond. Despite the need to understand energy efficiency in high-performance systems, there are few techniques to evaluate energy efficiency at scale. In this paper, we propose a system-level iso-energy-efficiency model to analyze, evaluate and predict energy-performance of data intensive parallel applications with various execution patterns running on large scale power-aware clusters. Our analytical model can help users explore the effects of machine and application dependent characteristics on system energy efficiency and isolate efficient ways to scale system parameters (e.g. processor count, CPU power/frequency, workload size and network bandwidth) to balance energy use and performance. We derive our iso-energy-efficiency model and apply it to the NAS Parallel Benchmarks on two power-aware clusters. Our results indicate that the model accurately predicts total system energy consumption within 5% error on average for parallel applications with various execution and communication patterns. We demonstrate effective use of the model for various application contexts and in scalability decision-making
PowerPlanningDL: Reliability-Aware Framework for On-Chip Power Grid Design using Deep Learning
With the increase in the complexity of chip designs, VLSI physical design has
become a time-consuming task, which is an iterative design process. Power
planning is that part of the floorplanning in VLSI physical design where power
grid networks are designed in order to provide adequate power to all the
underlying functional blocks. Power planning also requires multiple iterative
steps to create the power grid network while satisfying the allowed worst-case
IR drop and Electromigration (EM) margin. For the first time, this paper
introduces Deep learning (DL)-based framework to approximately predict the
initial design of the power grid network, considering different reliability
constraints. The proposed framework reduces many iterative design steps and
speeds up the total design cycle. Neural Network-based multi-target regression
technique is used to create the DL model. Feature extraction is done, and the
training dataset is generated from the floorplans of some of the power grid
designs extracted from the IBM processor. The DL model is trained using the
generated dataset. The proposed DL-based framework is validated using a new set
of power grid specifications (obtained by perturbing the designs used in the
training phase). The results show that the predicted power grid design is
closer to the original design with minimal prediction error (~2%). The proposed
DL-based approach also improves the design cycle time with a speedup of ~6X for
standard power grid benchmarks.Comment: Published in proceedings of IEEE/ACM Design, Automation and Test in
Europe Conference (DATE) 2020, 6 page
- …