5,753 research outputs found
Multi-objective scheduling of Scientific Workflows in multisite clouds
Clouds appear as appropriate infrastructures for executing Scientific Workflows (SWfs). A cloud is typically made of several sites (or data centers), each with its own resources and data. Thus, it becomes important to be able to execute some SWfs at more than one cloud site because of the geographical distribution of data or available resources among different cloud sites. Therefore, a major problem is how to execute a SWf in a multisite cloud, while reducing execution time and monetary costs. In this paper, we propose a general solution based on multi-objective scheduling in order to execute SWfs in a multisite cloud. The solution consists of a multi-objective cost model including execution time and monetary costs, a Single Site Virtual Machine (VM) Provisioning approach (SSVP) and ActGreedy, a multisite scheduling approach. We present an experimental evaluation, based on the execution of the SciEvol SWf in Microsoft Azure cloud. The results reveal that our scheduling approach significantly outperforms two adapted baseline algorithms (which we propose by adapting two existing algorithms) and the scheduling time is reasonable compared with genetic and brute-force algorithms. The results also show that our cost model is accurate and that SSVP can generate better VM provisioning plans compared with an existing approach.Work partially funded by EU H2020 Programme and MCTI/RNP-Brazil (HPC4E grant agreement number 689772), CNPq, FAPERJ, and INRIA (MUSIC project), Microsoft
(ZcloudFlow project) and performed in the context of the Computational Biology Institute (www.ibc-montpellier.fr). We would like to thank Kary Ocaña for her help in modeling and
executing the SciEvol SWf.Peer ReviewedPostprint (author's final draft
Task Runtime Prediction in Scientific Workflows Using an Online Incremental Learning Approach
Many algorithms in workflow scheduling and resource provisioning rely on the
performance estimation of tasks to produce a scheduling plan. A profiler that
is capable of modeling the execution of tasks and predicting their runtime
accurately, therefore, becomes an essential part of any Workflow Management
System (WMS). With the emergence of multi-tenant Workflow as a Service (WaaS)
platforms that use clouds for deploying scientific workflows, task runtime
prediction becomes more challenging because it requires the processing of a
significant amount of data in a near real-time scenario while dealing with the
performance variability of cloud resources. Hence, relying on methods such as
profiling tasks' execution data using basic statistical description (e.g.,
mean, standard deviation) or batch offline regression techniques to estimate
the runtime may not be suitable for such environments. In this paper, we
propose an online incremental learning approach to predict the runtime of tasks
in scientific workflows in clouds. To improve the performance of the
predictions, we harness fine-grained resources monitoring data in the form of
time-series records of CPU utilization, memory usage, and I/O activities that
are reflecting the unique characteristics of a task's execution. We compare our
solution to a state-of-the-art approach that exploits the resources monitoring
data based on regression machine learning technique. From our experiments, the
proposed strategy improves the performance, in terms of the error, up to
29.89%, compared to the state-of-the-art solutions.Comment: Accepted for presentation at main conference track of 11th IEEE/ACM
International Conference on Utility and Cloud Computin
Cloud computing resource scheduling and a survey of its evolutionary approaches
A disruptive technology fundamentally transforming the way that computing services are delivered, cloud computing offers information and communication technology users a new dimension of convenience of resources, as services via the Internet. Because cloud provides a finite pool of virtualized on-demand resources, optimally scheduling them has become an essential and rewarding topic, where a trend of using Evolutionary Computation (EC) algorithms is emerging rapidly. Through analyzing the cloud computing architecture, this survey first presents taxonomy at two levels of scheduling cloud resources. It then paints a landscape of the scheduling problem and solutions. According to the taxonomy, a comprehensive survey of state-of-the-art approaches is presented systematically. Looking forward, challenges and potential future research directions are investigated and invited, including real-time scheduling, adaptive dynamic scheduling, large-scale scheduling, multiobjective scheduling, and distributed and parallel scheduling. At the dawn of Industry 4.0, cloud computing scheduling for cyber-physical integration with the presence of big data is also discussed. Research in this area is only in its infancy, but with the rapid fusion of information and data technology, more exciting and agenda-setting topics are likely to emerge on the horizon
Exploring the Fairness and Resource Distribution in an Apache Mesos Environment
Apache Mesos, a cluster-wide resource manager, is widely deployed in massive
scale at several Clouds and Data Centers. Mesos aims to provide high cluster
utilization via fine grained resource co-scheduling and resource fairness among
multiple users through Dominant Resource Fairness (DRF) based allocation. DRF
takes into account different resource types (CPU, Memory, Disk I/O) requested
by each application and determines the share of each cluster resource that
could be allocated to the applications. Mesos has adopted a two-level
scheduling policy: (1) DRF to allocate resources to competing frameworks and
(2) task level scheduling by each framework for the resources allocated during
the previous step. We have conducted experiments in a local Mesos cluster when
used with frameworks such as Apache Aurora, Marathon, and our own framework
Scylla, to study resource fairness and cluster utilization. Experimental
results show how informed decision regarding second level scheduling policy of
frameworks and attributes like offer holding period, offer refusal cycle and
task arrival rate can reduce unfair resource distribution. Bin-Packing
scheduling policy on Scylla with Marathon can reduce unfair allocation from
38\% to 3\%. By reducing unused free resources in offers we bring down the
unfairness from to 90\% to 28\%. We also show the effect of task arrival rate
to reduce the unfairness from 23\% to 7\%
SHADHO: Massively Scalable Hardware-Aware Distributed Hyperparameter Optimization
Computer vision is experiencing an AI renaissance, in which machine learning
models are expediting important breakthroughs in academic research and
commercial applications. Effectively training these models, however, is not
trivial due in part to hyperparameters: user-configured values that control a
model's ability to learn from data. Existing hyperparameter optimization
methods are highly parallel but make no effort to balance the search across
heterogeneous hardware or to prioritize searching high-impact spaces. In this
paper, we introduce a framework for massively Scalable Hardware-Aware
Distributed Hyperparameter Optimization (SHADHO). Our framework calculates the
relative complexity of each search space and monitors performance on the
learning task over all trials. These metrics are then used as heuristics to
assign hyperparameters to distributed workers based on their hardware. We first
demonstrate that our framework achieves double the throughput of a standard
distributed hyperparameter optimization framework by optimizing SVM for MNIST
using 150 distributed workers. We then conduct model search with SHADHO over
the course of one week using 74 GPUs across two compute clusters to optimize
U-Net for a cell segmentation task, discovering 515 models that achieve a lower
validation loss than standard U-Net.Comment: 10 pages, 6 figure
A Time-driven Data Placement Strategy for a Scientific Workflow Combining Edge Computing and Cloud Computing
Compared to traditional distributed computing environments such as grids,
cloud computing provides a more cost-effective way to deploy scientific
workflows. Each task of a scientific workflow requires several large datasets
that are located in different datacenters from the cloud computing environment,
resulting in serious data transmission delays. Edge computing reduces the data
transmission delays and supports the fixed storing manner for scientific
workflow private datasets, but there is a bottleneck in its storage capacity.
It is a challenge to combine the advantages of both edge computing and cloud
computing to rationalize the data placement of scientific workflow, and
optimize the data transmission time across different datacenters. Traditional
data placement strategies maintain load balancing with a given number of
datacenters, which results in a large data transmission time. In this study, a
self-adaptive discrete particle swarm optimization algorithm with genetic
algorithm operators (GA-DPSO) was proposed to optimize the data transmission
time when placing data for a scientific workflow. This approach considered the
characteristics of data placement combining edge computing and cloud computing.
In addition, it considered the impact factors impacting transmission delay,
such as the band-width between datacenters, the number of edge datacenters, and
the storage capacity of edge datacenters. The crossover operator and mutation
operator of the genetic algorithm were adopted to avoid the premature
convergence of the traditional particle swarm optimization algorithm, which
enhanced the diversity of population evolution and effectively reduced the data
transmission time. The experimental results show that the data placement
strategy based on GA-DPSO can effectively reduce the data transmission time
during workflow execution combining edge computing and cloud computing
D-SPACE4Cloud: A Design Tool for Big Data Applications
The last years have seen a steep rise in data generation worldwide, with the
development and widespread adoption of several software projects targeting the
Big Data paradigm. Many companies currently engage in Big Data analytics as
part of their core business activities, nonetheless there are no tools and
techniques to support the design of the underlying hardware configuration
backing such systems. In particular, the focus in this report is set on Cloud
deployed clusters, which represent a cost-effective alternative to on premises
installations. We propose a novel tool implementing a battery of optimization
and prediction techniques integrated so as to efficiently assess several
alternative resource configurations, in order to determine the minimum cost
cluster deployment satisfying QoS constraints. Further, the experimental
campaign conducted on real systems shows the validity and relevance of the
proposed method
- …