Search CORE

3,298 research outputs found

Faithful Performance Prediction of a Dynamic Task-Based Runtime System for Heterogeneous Multi-Core Architectures

Author: Legrand Arnaud
Méhaut Jean-François
Stanisic Luka
Thibault Samuel
Videau Brice
Publication venue: 'Wiley'
Publication date: 04/05/2015
Field of study

International audienceSUMMARY Multi-core architectures comprising several GPUs have become mainstream in the field of High-Performance Computing. However, obtaining the maximum performance of such heterogeneous machines is challenging as it requires to carefully offload computations and manage data movements between the different processing units. The most promising and successful approaches so far build on task-based runtimes that abstract the machine and rely on opportunistic scheduling algorithms. As a consequence, the problem gets shifted to choosing the task granularity, task graph structure, and optimizing the scheduling strategies. Trying different combinations of these different alternatives is also itself a challenge. Indeed, getting accurate measurements requires reserving the target system for the whole duration of experiments. Furthermore, observations are limited to the few available systems at hand and may be difficult to generalize. In this article, we show how we crafted a coarse-grain hybrid simulation/emulation of StarPU, a dynamic runtime for hybrid architectures, over SimGrid, a versatile simulator of distributed systems. This approach allows to obtain performance predictions of classical dense linear algebra kernels accurate within a few percents and in a matter of seconds, which allows both runtime and application designers to quickly decide which optimization to enable or whether it is worth investing in higher-end GPUs or not. Additionally, it allows to conduct robust and extensive scheduling studies in a controlled environment whose characteristics are very close to real platforms while having reproducible behavior

Crossref

Hal - Université Grenoble Alpes

INRIA a CCSD electronic archive server

RPPM : Rapid Performance Prediction of Multithreaded workloads on multicore processors

Author: Akram Shoaib
De Pestel Sander
Eeckhout Lieven
Van den Steen Sam
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2019
Field of study

Analytical performance modeling is a useful complement to detailed cycle-level simulation to quickly explore the design space in an early design stage. Mechanistic analytical modeling is particularly interesting as it provides deep insight and does not require expensive offline profiling as empirical modeling. Previous work in mechanistic analytical modeling, unfortunately, is limited to single-threaded applications running on single-core processors. This work proposes RPPM, a mechanistic analytical performance model for multi-threaded applications on multicore hardware. RPPM collects microarchitecture-independent characteristics of a multi-threaded workload to predict performance on a previously unseen multicore architecture. The profile needs to be collected only once to predict a range of processor architectures. We evaluate RPPM's accuracy against simulation and report a performance prediction error of 11.2% on average (23% max). We demonstrate RPPM's usefulness for conducting design space exploration experiments as well as for analyzing parallel application performance

Crossref

Ghent University Academic Bibliography

Recommended from our members

Scheduling, Characterization and Prediction of HPC Workloads for Distributed Computing Environments

Author: Naghshnejad Mina
Publication venue: eScholarship, University of California
Publication date: 01/01/2019
Field of study

As High Performance Computing (HPC) has grown considerably and is expected to grow even more, effective resource management for distributed computing sys- tems is motivated more than ever. As the computational workloads grow in quantity, it is becoming more crucial to apply efficient resource management and workload scheduling to use resources efficiently while keeping the computational performance reasonably good. The problem of efficiently scheduling workloads on resources while meeting performance standards is hard. Additionally, non-clairvoyance of job dimen- sions makes resource management even harder in real-world scenarios. Our research methodology investigates the scheduling problem compliant for HPC and researches the challenges for deploying the scheduling in real world-scenarios using state of the art machine learning and data science techniques.To this end, this Ph.D. dissertation makes the following core contributions: a) We perform a theoretical analysis of space-sharing, non-preemptive scheduling: we studied this scheduling problem and proposed scheduling algorithms with polyno- mial computation time. We also proved constant upper-bounds for the performance of these algorithms. b) We studied the sensitivity of scheduling algorithms to the accuracy of runtime and devised a meta-learning approach to estimate prediction accuracy for newly submitted jobs to the HPC system. c) We studied the runtime prediction problem for HPC applications. For this purpose, we studied the distri- bution of available public workloads and proposed two different solutions that can predict multi-modal distributions: switching state-space models and Mixture Density Networks. d) We studied the effectiveness of recent recurrent neural network models for CPU usage trace prediction for individual VM traces as well as aggregate CPU usage traces. In this dissertation, we explore solutions to improve the performance of scheduling workloads on distributed systems.We begin by looking at the problem from the theoretical perspective. Modeling the problem mathematically, we first propose a scheduling algorithm that finds a constant approximation of the optimal solution for the problem in polynomial time. We prove that the performance of the algorithm (average completion time is the constant approximation of the performance of the optimal scheduling. We next look at the problem in real-world scenarios. Considering High-Performance Computing (HPC) workload computing environments as the most similar real-world equivalent of our mathematical model, we explore the problem of predicting application runtime. We propose an algorithm to handle the existing uncertainties in the real world and show-case our algorithm with demonstrative effectiveness in terms of response time and resource utilization. After looking at the uncertainty problem, we focus on trying to improve the accuracy of existing prediction approaches for HPC application runtime. We propose two solutions, one based on Kalman filters and one based on deep density mixture networks. We showcase the effectiveness of our prediction approaches by comparing with previous prediction approaches in terms of prediction accuracy and impact on improving scheduling performance. In the end, we focus on predicting resource usage for individual applications during their execution. We explore the application of recurrent neural networks for predicting resource usage of applications deployed on individual virtual machines. To validate our proposed models and solutions, we performed extensive trace-driven simulation and measured the effectiveness of our approaches

eScholarship - University of California

A study of the robustness of the group scheduling method using an emulation of a complex FMS

Author: Cardin Olivier
Mebarki Nasser
Pinot Guillaume
Publication venue: 'Elsevier BV'
Publication date: 01/01/2013
Field of study

International audienceIn the field of predictive-reactive scheduling methods, group sequencing is reputed to be robust (in terms of uncertainties absorption) due to the flexibility it adds with regard to the sequence of operations. However, this assumption has been established on experiments made on simple theoretical examples. The aim of this paper is to carry out experimentation on a complex flexible manufacturing system in order to determine whether or not the flexibility of the group scheduling method can in fact absorb uncertainties. In the study, transportation times of parts between machines are considered as uncertain. Simulation studies have been designed in order to evaluate the relationship between flexibility and the ability to absorb uncertainties. Comparisons are made between schedules generated using the group sequencing method with different flexibility levels and a schedule with no flexibility. A last schedule takes into account uncertainties whereas schedules generated using the group sequencing method do not. As it is the best possible schedule, it provides a lower bound and enables to calculate the degradation of performance of calculated schedules. The results show that group sequencing perform very well, enabling the quality of the schedule to be improved, especially when the level of uncertainty of the problem increases. The results also show that flexibility is the key factor for robustness. The rise in the level of flexibility increases the robustness of the schedule towards the uncertainties

Probabilistic Hybrid Action Models for Predicting Concurrent Percept-driven Robot Behavior

Author: Beetz M.
Grosskreutz H.
Publication venue: 'AI Access Foundation'
Publication date: 01/01/2005
Field of study

This article develops Probabilistic Hybrid Action Models (PHAMs), a realistic causal model for predicting the behavior generated by modern percept-driven robot plans. PHAMs represent aspects of robot behavior that cannot be represented by most action models used in AI planning: the temporal structure of continuous control processes, their non-deterministic effects, several modes of their interferences, and the achievement of triggering conditions in closed-loop robot plans. The main contributions of this article are: (1) PHAMs, a model of concurrent percept-driven behavior, its formalization, and proofs that the model generates probably, qualitatively accurate predictions; and (2) a resource-efficient inference method for PHAMs based on sampling projections from probabilistic action models and state descriptions. We show how PHAMs can be applied to planning the course of action of an autonomous robot office courier based on analytical and experimental results

arXiv.org e-Print Archive

Crossref

Iso-energy-efficiency: An approach to power-constrained parallel computation

Author: Ge Rong
Kirk Cameron
Song Shuaiwen
Su Chunyi
Vishnu Abhinav
Publication venue
Publication date: 01/01/2010
Field of study

Future large scale high performance supercomputer systems require high energy efficiency to achieve exaflops computational power and beyond. Despite the need to understand energy efficiency in high-performance systems, there are few techniques to evaluate energy efficiency at scale. In this paper, we propose a system-level iso-energy-efficiency model to analyze, evaluate and predict energy-performance of data intensive parallel applications with various execution patterns running on large scale power-aware clusters. Our analytical model can help users explore the effects of machine and application dependent characteristics on system energy efficiency and isolate efficient ways to scale system parameters (e.g. processor count, CPU power/frequency, workload size and network bandwidth) to balance energy use and performance. We derive our iso-energy-efficiency model and apply it to the NAS Parallel Benchmarks on two power-aware clusters. Our results indicate that the model accurately predicts total system energy consumption within 5% error on average for parallel applications with various execution and communication patterns. We demonstrate effective use of the model for various application contexts and in scalability decision-making

Computer Science Technical Reports @Virginia Tech

Intelligent Management of Mobile Systems through Computational Self-Awareness

Author: Donyanavard Bryan
Dutt Nikil
Jantsch Axel
Mutlu Onur
Rahmani Amir M.
Publication venue
Publication date: 31/07/2020
Field of study

Runtime resource management for many-core systems is increasingly complex. The complexity can be due to diverse workload characteristics with conflicting demands, or limited shared resources such as memory bandwidth and power. Resource management strategies for many-core systems must distribute shared resource(s) appropriately across workloads, while coordinating the high-level system goals at runtime in a scalable and robust manner. To address the complexity of dynamic resource management in many-core systems, state-of-the-art techniques that use heuristics have been proposed. These methods lack the formalism in providing robustness against unexpected runtime behavior. One of the common solutions for this problem is to deploy classical control approaches with bounds and formal guarantees. Traditional control theoretic methods lack the ability to adapt to (1) changing goals at runtime (i.e., self-adaptivity), and (2) changing dynamics of the modeled system (i.e., self-optimization). In this chapter, we explore adaptive resource management techniques that provide self-optimization and self-adaptivity by employing principles of computational self-awareness, specifically reflection. By supporting these self-awareness properties, the system can reason about the actions it takes by considering the significance of competing objectives, user requirements, and operating conditions while executing unpredictable workloads

arXiv.org e-Print Archive

eScholarship - University of California

SCALO: Scalability-Aware Parallelism Orchestration for Multi-Threaded Workloads

Author: de Supinski Bronis
Fahringer Thomas
Georgakoudis Giorgis
Nikolopoulos Dimitrios
Thoman Peter
Vandierendonck Hans
Publication venue
Publication date: 01/12/2017
Field of study

This article contributes a solution to orchestrate concurrent application execution to increase throughput. SCALO monitors co-executing applications at runtime to evaluate their scalability

Queen's University Belfast Research Portal

Crossref

ZENODO

NEUROSURGERY ENTHUSIASTIC WOMEN SOCIETY

Load forecasting and dispatch optimisation for decentralised co-generation plant with dual energy storage

Author: Crosbie Tracey
Dawood Muneeb
Dawood Nashwan
Short Michael
Publication venue: 'Elsevier BV'
Publication date: 21/04/2016
Field of study

Crossref

Teeside University's Research Repository