1,051 research outputs found
Model-driven Scheduling for Distributed Stream Processing Systems
Distributed Stream Processing frameworks are being commonly used with the
evolution of Internet of Things(IoT). These frameworks are designed to adapt to
the dynamic input message rate by scaling in/out.Apache Storm, originally
developed by Twitter is a widely used stream processing engine while others
includes Flink, Spark streaming. For running the streaming applications
successfully there is need to know the optimal resource requirement, as
over-estimation of resources adds extra cost.So we need some strategy to come
up with the optimal resource requirement for a given streaming application. In
this article, we propose a model-driven approach for scheduling streaming
applications that effectively utilizes a priori knowledge of the applications
to provide predictable scheduling behavior. Specifically, we use application
performance models to offer reliable estimates of the resource allocation
required. Further, this intuition also drives resource mapping, and helps
narrow the estimated and actual dataflow performance and resource utilization.
Together, this model-driven scheduling approach gives a predictable application
performance and resource utilization behavior for executing a given DSPS
application at a target input stream rate on distributed resources.Comment: 54 page
DCDB Wintermute: Enabling Online and Holistic Operational Data Analytics on HPC Systems
As we approach the exascale era, the size and complexity of HPC systems
continues to increase, raising concerns about their manageability and
sustainability. For this reason, more and more HPC centers are experimenting
with fine-grained monitoring coupled with Operational Data Analytics (ODA) to
optimize efficiency and effectiveness of system operations. However, while
monitoring is a common reality in HPC, there is no well-stated and
comprehensive list of requirements, nor matching frameworks, to support
holistic and online ODA. This leads to insular ad-hoc solutions, each
addressing only specific aspects of the problem.
In this paper we propose Wintermute, a novel generic framework to enable
online ODA on large-scale HPC installations. Its design is based on the results
of a literature survey of common operational requirements. We implement
Wintermute on top of the holistic DCDB monitoring system, offering a large
variety of configuration options to accommodate the varying requirements of ODA
applications. Moreover, Wintermute is based on a set of logical abstractions to
ease the configuration of models at a large scale and maximize code re-use. We
highlight Wintermute's flexibility through a series of practical case studies,
each targeting a different aspect of the management of HPC systems, and then
demonstrate the small resource footprint of our implementation.Comment: Accepted for publication at the 29th ACM International Symposium on
High-Performance Parallel and Distributed Computing (HPDC 2020
Dynamic optimization of provider-based scheduling for HPC workloads
The vast array of cloud providers present in today’s market proffer a suite of High-Performance Computing (HPC) services. However, these offerings are characterized by significant variations in execution times and cost structures. Consequently, selecting the optimal cloud provider and configuring the features of the chosen computing instance (e.g. virtual machines) proves to be a challenging task for users intending to execute HPC workloads. This paper introduces a novel component designed for effortless integration with existing HPC scheduling systems. This module’s primary function is to facilitate the selection of the most appropriate cloud provider for each distinct job, thereby empowering dynamic and adaptive cost-minimization strategies. Through the application of data augmentation techniques and the employment of Continuous Machine Learning, the system is endowed with the capability to operate efficiently with cloud providers that have not been previously utilized. Furthermore, it is capable of tracking the evolution of jobs over time. Our results show that this component can achieve consistent economic savings, based on the quality of the data used in the training phase
Recommended from our members
Scheduling, Characterization and Prediction of HPC Workloads for Distributed Computing Environments
As High Performance Computing (HPC) has grown considerably and is expected to grow even more, effective resource management for distributed computing sys- tems is motivated more than ever. As the computational workloads grow in quantity, it is becoming more crucial to apply efficient resource management and workload scheduling to use resources efficiently while keeping the computational performance reasonably good. The problem of efficiently scheduling workloads on resources while meeting performance standards is hard. Additionally, non-clairvoyance of job dimen- sions makes resource management even harder in real-world scenarios. Our research methodology investigates the scheduling problem compliant for HPC and researches the challenges for deploying the scheduling in real world-scenarios using state of the art machine learning and data science techniques.To this end, this Ph.D. dissertation makes the following core contributions: a) We perform a theoretical analysis of space-sharing, non-preemptive scheduling: we studied this scheduling problem and proposed scheduling algorithms with polyno- mial computation time. We also proved constant upper-bounds for the performance of these algorithms. b) We studied the sensitivity of scheduling algorithms to the accuracy of runtime and devised a meta-learning approach to estimate prediction accuracy for newly submitted jobs to the HPC system. c) We studied the runtime prediction problem for HPC applications. For this purpose, we studied the distri- bution of available public workloads and proposed two different solutions that can predict multi-modal distributions: switching state-space models and Mixture Density Networks. d) We studied the effectiveness of recent recurrent neural network models for CPU usage trace prediction for individual VM traces as well as aggregate CPU usage traces. In this dissertation, we explore solutions to improve the performance of scheduling workloads on distributed systems.We begin by looking at the problem from the theoretical perspective. Modeling the problem mathematically, we first propose a scheduling algorithm that finds a constant approximation of the optimal solution for the problem in polynomial time. We prove that the performance of the algorithm (average completion time is the constant approximation of the performance of the optimal scheduling. We next look at the problem in real-world scenarios. Considering High-Performance Computing (HPC) workload computing environments as the most similar real-world equivalent of our mathematical model, we explore the problem of predicting application runtime. We propose an algorithm to handle the existing uncertainties in the real world and show-case our algorithm with demonstrative effectiveness in terms of response time and resource utilization. After looking at the uncertainty problem, we focus on trying to improve the accuracy of existing prediction approaches for HPC application runtime. We propose two solutions, one based on Kalman filters and one based on deep density mixture networks. We showcase the effectiveness of our prediction approaches by comparing with previous prediction approaches in terms of prediction accuracy and impact on improving scheduling performance. In the end, we focus on predicting resource usage for individual applications during their execution. We explore the application of recurrent neural networks for predicting resource usage of applications deployed on individual virtual machines. To validate our proposed models and solutions, we performed extensive trace-driven simulation and measured the effectiveness of our approaches
HPC Cloud for Scientific and Business Applications: Taxonomy, Vision, and Research Challenges
High Performance Computing (HPC) clouds are becoming an alternative to
on-premise clusters for executing scientific applications and business
analytics services. Most research efforts in HPC cloud aim to understand the
cost-benefit of moving resource-intensive applications from on-premise
environments to public cloud platforms. Industry trends show hybrid
environments are the natural path to get the best of the on-premise and cloud
resources---steady (and sensitive) workloads can run on on-premise resources
and peak demand can leverage remote resources in a pay-as-you-go manner.
Nevertheless, there are plenty of questions to be answered in HPC cloud, which
range from how to extract the best performance of an unknown underlying
platform to what services are essential to make its usage easier. Moreover, the
discussion on the right pricing and contractual models to fit small and large
users is relevant for the sustainability of HPC clouds. This paper brings a
survey and taxonomy of efforts in HPC cloud and a vision on what we believe is
ahead of us, including a set of research challenges that, once tackled, can
help advance businesses and scientific discoveries. This becomes particularly
relevant due to the fast increasing wave of new HPC applications coming from
big data and artificial intelligence.Comment: 29 pages, 5 figures, Published in ACM Computing Surveys (CSUR
Design of robust scheduling methodologies for high performance computing
Scientific applications are often large, complex, computationally-intensive, and irregular. Loops are often an abundant source of parallelism in scientific applications. Due to the ever-increasing computational needs of scientific applications, high performance computing (HPC) systems have become larger and more complex, offering increased parallelism at multiple hardware levels.
Load imbalance, caused by irregular computational load per task and unpredictable computing system characteristics (system variability), often degrades the performance of applications. Besides, perturbations, such as reduced computing power, network latency availability, or failures, can severely impact the performance of the applications. System variability and perturbations are only expected to increase in future extreme-scale computing systems. Extrapolating the current failure rate to Exascale would result in a failure every 20 minutes. Such failure rate and perturbations would render the computing systems unusable. This doctoral thesis improves the performance of computationally-intensive scientific applications on HPC systems via robust load balancing. Robust scheduling ensures and maintains improved load balanced execution under unpredictable application and system characteristics.
A number of dynamic loop self-scheduling (DLS) techniques have been introduced and successfully used in scientific applications between the 1980s and 2000s. These DLS techniques are not fault-tolerant as they were originally introduced. In this thesis, we identify three major research questions to achieve robust scheduling (1) How to ensure that the DLS techniques employed in scientific applications today adhere to their original design goals and specifications? (2) How to select a DLS technique that will achieve improved performance under perturbations? (3) How to tolerate perturbations during execution and maintain a load balanced execution on HPC systems?
To answer the first question, we reproduced the original experiments that introduced the DLS techniques to verify their present implementation. Simulation is used to reproduce experiments on systems from the past. Realistic simulation induces a similar analysis and conclusions to the analysis of the native results. To this end, we devised an approach for bridging the native and simulative executions of parallel applications on HPC systems. This simulation approach is used to reproduce scheduling experiments on past and present systems to verify the implementation of DLS techniques.
Given the multiple levels of parallelism offered by the present HPC systems, we analyzed the load imbalance in scientific applications, from computer vision, astrophysics, and mathematical kernels, at both thread and process levels. This analysis revealed a significant interplay between thread level and process level load balancing. We found that dynamic load balancing at the thread level propagates to the process level and vice versa. However, the best application performance is only achieved by two-level dynamic load balancing.
Next, we examined the performance of applications under perturbations. We found that the most robust DLS technique does not deliver the best performance under various perturbations. The most efficient DLS technique changes by changing the application, the system, or perturbations during execution. This signifies the algorithm selection problem in the DLS. We leveraged realistic simulations to address the algorithm selection problem of scheduling under perturbations via a simulation assisted approach (SimAS), which answers the second question. SimAS dynamically selects DLS techniques that improve the performance depending on the application, system, and perturbations during the execution.
To answer the third question, we introduced a robust dynamic load balancing (rDLB) approach for the robust self-scheduling of scientific applications under failures (question 3). rDLB proactively reschedules already allocated tasks and requires no detection of perturbations. rDLB tolerates up to P −1 processor failures (P is the number of processors allocated to the application) and boosts the flexibility of applications against nonfatal perturbations, such as reduced availability of resources.
This thesis is the first to provide insights into the interplay between thread and process level dynamic load balancing in scientific applications. Verified DLS techniques, SimAS, and rDLB are integrated into an MPI-based dynamic load balancing library (DLS4LB), which supports thirteen DLS techniques, for robust dynamic load balancing of scientific applications on HPC systems. Using the methods devised in this thesis, we improved the performance of scientific applications by up to 21% via two-level dynamic load balancing. Under perturbations, we enhanced their performance by a factor of 7 and their flexibility by a factor of 30. This thesis opens up the horizons into understanding the interplay of load balancing between various levels of software parallelism and lays the ground for robust multilevel scheduling for the upcoming Exascale HPC systems and beyond
An Approach for Realistically Simulating the Performance of Scientific Applications on High Performance Computing Systems
Scientific applications often contain large, computationally-intensive, and
irregular parallel loops or tasks that exhibit stochastic characteristics.
Applications may suffer from load imbalance during their execution on
high-performance computing (HPC) systems due to such characteristics. Dynamic
loop self-scheduling (DLS) techniques are instrumental in improving the
performance of scientific applications on HPC systems via load balancing.
Selecting a DLS technique that results in the best performance for different
problems and system sizes requires a large number of exploratory experiments. A
theoretical model that can be used to predict the scheduling technique that
yields the best performance for a given problem and system has not yet been
identified. Therefore, simulation is the most appropriate approach for
conducting such exploratory experiments with reasonable costs. This work
devises an approach to realistically simulate computationally-intensive
scientific applications that employ DLS and execute on HPC systems. Several
approaches to represent the application tasks (or loop iterations) are compared
to establish their influence on the simulative application performance. A novel
simulation strategy is introduced, which transforms a native application code
into a simulative code. The native and simulative performance of two
computationally-intensive scientific applications are compared to evaluate the
realism of the proposed simulation approach. The comparison of the performance
characteristics extracted from the native and simulative performance shows that
the proposed simulation approach fully captured most of the performance
characteristics of interest. This work shows and establishes the importance of
simulations that realistically predict the performance of DLS techniques for
different applications and system configurations
Understanding ML driven HPC: Applications and Infrastructure
We recently outlined the vision of "Learning Everywhere" which captures the
possibility and impact of how learning methods and traditional HPC methods can
be coupled together. A primary driver of such coupling is the promise that
Machine Learning (ML) will give major performance improvements for traditional
HPC simulations. Motivated by this potential, the ML around HPC class of
integration is of particular significance. In a related follow-up paper, we
provided an initial taxonomy for integrating learning around HPC methods. In
this paper, which is part of the Learning Everywhere series, we discuss "how"
learning methods and HPC simulations are being integrated to enhance effective
performance of computations. This paper identifies several modes ---
substitution, assimilation, and control, in which learning methods integrate
with HPC simulations and provide representative applications in each mode. This
paper discusses some open research questions and we hope will motivate and
clear the ground for MLaroundHPC benchmarks.Comment: Invited talk to "Visionary Track" at IEEE eScience 2019. arXiv admin
note: text overlap with arXiv:1806.04731 by other author
Parallel Programming with Migratable Objects: Charm++ in Practice
The advent of petascale computing has introduced new challenges (e.g. Heterogeneity, system failure) for programming scalable parallel applications. Increased complexity and dynamism in science and engineering applications of today have further exacerbated the situation. Addressing these challenges requires more emphasis on concepts that were previously of secondary importance, including migratability, adaptivity, and runtime system introspection. In this paper, we leverage our experience with these concepts to demonstrate their applicability and efficacy for real world applications. Using the CHARM++ parallel programming framework, we present details on how these concepts can lead to development of applications that scale irrespective of the rough landscape of supercomputing technology. Empirical evaluation presented in this paper spans many miniapplications and real applications executed on modern supercomputers including Blue Gene/Q, Cray XE6, and Stampede
- …