11,860 research outputs found
Resource Provisioning and Scheduling Algorithm for Meeting Cost and Deadline-Constraints of Scientific Workflows in IaaS Clouds
Infrastructure as a Service model of cloud computing is a desirable platform
for the execution of cost and deadline constrained workflow applications as the
elasticity of cloud computing allows large-scale complex scientific workflow
applications to scale dynamically according to their deadline requirements.
However, scheduling of these multitask workflow jobs in a distributed computing
environment is a computationally hard multi-objective combinatorial
optimization problem. The critical challenge is to schedule the workflow tasks
whilst meeting user quality of service (QoS) requirements and the application's
deadline. The existing research work not only fails to address this challenge
but also do not incorporate the basic principles of elasticity and
heterogeneity of computing resources in cloud environment. In this paper, we
propose a resource provisioning and scheduling algorithm to schedule the
workflow applications on IaaS clouds to meet application deadline constraints
while optimizing the execution cost. The proposed algorithm is based on the
nature-inspired population based Intelligent Water Drop (IWD) optimization
algorithm. The experimental results in the simulated environment of CloudSim
with four real-world workflow applications demonstrates that IWD algorithm
schedules workflow tasks with optimized cost within the specified deadlines.
Moreover, the IWD algorithm converges fast to near optimal solution.Comment: 15 pages, 8 figures, This work is done in the year 2015 when the
first author was part of NITTTR, Bhopal, Indi
Parallelization in Scientific Workflow Management Systems
Over the last two decades, scientific workflow management systems (SWfMS)
have emerged as a means to facilitate the design, execution, and monitoring of
reusable scientific data processing pipelines. At the same time, the amounts of
data generated in various areas of science outpaced enhancements in
computational power and storage capabilities. This is especially true for the
life sciences, where new technologies increased the sequencing throughput from
kilobytes to terabytes per day. This trend requires current SWfMS to adapt:
Native support for parallel workflow execution must be provided to increase
performance; dynamically scalable "pay-per-use" compute infrastructures have to
be integrated to diminish hardware costs; adaptive scheduling of workflows in
distributed compute environments is required to optimize resource utilization.
In this survey we give an overview of parallelization techniques for SWfMS,
both in theory and in their realization in concrete systems. We find that
current systems leave considerable room for improvement and we propose key
advancements to the landscape of SWfMS.Comment: 24 pages, 17 figures (13 PDF, 4 PNG
Characterizing Application Scheduling on Edge, Fog and Cloud Computing Resources
Cloud computing has grown to become a popular distributed computing service
offered by commercial providers. More recently, Edge and Fog computing
resources have emerged on the wide-area network as part of Internet of Things
(IoT) deployments. These three resource abstraction layers are complementary,
and provide distinctive benefits. Scheduling applications on clouds has been an
active area of research, with workflow and dataflow models serving as a
flexible abstraction to specify applications for execution. However, the
application programming and scheduling models for edge and fog are still
maturing, and can benefit from learnings on cloud resources. At the same time,
there is also value in using these resources cohesively for application
execution. In this article, we present a taxonomy of concepts essential for
specifying and solving the problem of scheduling applications on edge, for and
cloud computing resources. We first characterize the resource capabilities and
limitations of these infrastructure, and design a taxonomy of application
models, Quality of Service (QoS) constraints and goals, and scheduling
techniques, based on a literature review. We also tabulate key research
prototypes and papers using this taxonomy. This survey benefits developers and
researchers on these distributed resources in designing and categorizing their
applications, selecting the relevant computing abstraction(s), and developing
or selecting the appropriate scheduling algorithm. It also highlights gaps in
literature where open problems remain.Comment: Pre-print of journal article: Varshney P, Simmhan Y. Characterizing
application scheduling on edge, fog, and cloud computing resources. Softw:
Pract Exper. 2019; 1--37. https://doi.org/10.1002/spe.269
Harnessing the Power of Many: Extensible Toolkit for Scalable Ensemble Applications
Many scientific problems require multiple distinct computational tasks to be
executed in order to achieve a desired solution. We introduce the Ensemble
Toolkit (EnTK) to address the challenges of scale, diversity and reliability
they pose. We describe the design and implementation of EnTK, characterize its
performance and integrate it with two distinct exemplar use cases: seismic
inversion and adaptive analog ensembles. We perform nine experiments,
characterizing EnTK overheads, strong and weak scalability, and the performance
of two use case implementations, at scale and on production infrastructures. We
show how EnTK meets the following general requirements: (i) implementing
dedicated abstractions to support the description and execution of ensemble
applications; (ii) support for execution on heterogeneous computing
infrastructures; (iii) efficient scalability up to O(10^4) tasks; and (iv)
fault tolerance. We discuss novel computational capabilities that EnTK enables
and the scientific advantages arising thereof. We propose EnTK as an important
addition to the suite of tools in support of production scientific computing
Monetary Cost Optimizations for Hosting Workflow-as-a-Service in IaaS Clouds
Recently, we have witnessed workflows from science and other data-intensive
applications emerging on Infrastructure-asa-Service (IaaS) clouds, and many
workflow service providers offering workflow as a service (WaaS). The major
concern of WaaS providers is to minimize the monetary cost of executing
workflows in the IaaS cloud. While there have been previous studies on this
concern, most of them assume static task execution time and static pricing
scheme, and have the QoS notion of satisfying a deterministic deadline.
However, cloud environment is dynamic, with performance dynamics caused by the
interference from concurrent executions and price dynamics like spot prices
offered by Amazon EC2. Therefore, we argue that WaaS providers should have the
notion of offering probabilistic performance guarantees for individual
workflows on IaaS clouds. We develop a probabilistic scheduling framework
called Dyna to minimize the monetary cost while offering probabilistic deadline
guarantees. The framework includes an A*-based instance configuration method
for performance dynamics, and a hybrid instance configuration refinement for
utilizing spot instances. Experimental results with three real-world scientific
workflow applications on Amazon EC2 demonstrate (1) the accuracy of our
framework on satisfying the probabilistic deadline guarantees required by the
users; (2) the effectiveness of our framework on reducing monetary cost in
comparison with the existing approaches
Deep Learning on Operational Facility Data Related to Large-Scale Distributed Area Scientific Workflows
Distributed computing platforms provide a robust mechanism to perform
large-scale computations by splitting the task and data among multiple
locations, possibly located thousands of miles apart geographically. Although
such distribution of resources can lead to benefits, it also comes with its
associated problems such as rampant duplication of file transfers increasing
congestion, long job completion times, unexpected site crashing, suboptimal
data transfer rates, unpredictable reliability in a time range, and suboptimal
usage of storage elements. In addition, each sub-system becomes a potential
failure node that can trigger system wide disruptions. In this vision paper, we
outline our approach to leveraging Deep Learning algorithms to discover
solutions to unique problems that arise in a system with computational
infrastructure that is spread over a wide area. The presented vision, motivated
by a real scientific use case from Belle II experiments, is to develop
multilayer neural networks to tackle forecasting, anomaly detection and
optimization challenges in a complex and distributed data movement environment.
Through this vision based on Deep Learning principles, we aim to achieve
reduced congestion events, faster file transfer rates, and enhanced site
reliability
Scientific Workflows and Provenance: Introduction and Research Opportunities
Scientific workflows are becoming increasingly popular for compute-intensive
and data-intensive scientific applications. The vision and promise of
scientific workflows includes rapid, easy workflow design, reuse, scalable
execution, and other advantages, e.g., to facilitate "reproducible science"
through provenance (e.g., data lineage) support. However, as described in the
paper, important research challenges remain. While the database community has
studied (business) workflow technologies extensively in the past, most current
work in scientific workflows seems to be done outside of the database
community, e.g., by practitioners and researchers in the computational sciences
and eScience. We provide a brief introduction to scientific workflows and
provenance, and identify areas and problems that suggest new opportunities for
database research.Comment: 12 pages, 2 figure
Performance optimizations for scalable CFD applications on hybrid CPU+MIC heterogeneous computing system with millions of cores
For computational fluid dynamics (CFD) applications with a large number of
grid points/cells, parallel computing is a common efficient strategy to reduce
the computational time. How to achieve the best performance in the modern
supercomputer system, especially with heterogeneous computing resources such as
hybrid CPU+GPU, or a CPU + Intel Xeon Phi (MIC) co-processors, is still a great
challenge.
An in-house parallel CFD code capable of simulating three dimensional
structured grid applications is developed and tested in this study. Several
methods of parallelization, performance optimization and code tuning both in
the CPU-only homogeneous system and in the heterogeneous system are proposed
based on identifying potential parallelism of applications, balancing the work
load among all kinds of computing devices, tuning the multi-thread code toward
better performance in intra-machine node with hundreds of CPU/MIC cores, and
optimizing the communication among inter-nodes, inter-cores, and between CPUs
and MICs.
Some benchmark cases from model and/or industrial CFD applications are tested
on the Tianhe-1A and Tianhe-2 supercomputer to evaluate the performance. Among
these CFD cases, the maximum number of grid cells reached 780 billion. The
tuned solver successfully scales up to half of the entire Tianhe-2
supercomputer system with over 1.376 million of heterogeneous cores. The test
results and performance analysis are discussed in detail.Comment: 12pages, 12 figure
Helix: Holistic Optimization for Accelerating Iterative Machine Learning
Machine learning workflow development is a process of trial-and-error:
developers iterate on workflows by testing out small modifications until the
desired accuracy is achieved. Unfortunately, existing machine learning systems
focus narrowly on model training---a small fraction of the overall development
time---and neglect to address iterative development. We propose Helix, a
machine learning system that optimizes the execution across
iterations---intelligently caching and reusing, or recomputing intermediates as
appropriate. Helix captures a wide variety of application needs within its
Scala DSL, with succinct syntax defining unified processes for data
preprocessing, model specification, and learning. We demonstrate that the reuse
problem can be cast as a Max-Flow problem, while the caching problem is
NP-Hard. We develop effective lightweight heuristics for the latter. Empirical
evaluation shows that Helix is not only able to handle a wide variety of use
cases in one unified workflow but also much faster, providing run time
reductions of up to 19x over state-of-the-art systems, such as DeepDive or
KeystoneML, on four real-world applications in natural language processing,
computer vision, social and natural sciences
Implementing and Running a Workflow Application on Cloud Resources
Scientist need to run applications that are time and resource consuming, but, not all of them, have the requires knowledge to run this applications in a parallel manner, by using grid, cluster or cloud resources. In the past few years many workflow building frameworks were developed in order to help scientist take a better advantage of computing resources, by designing workflows based on their applications and executing them on heterogeneous resources. This paper presents a case study of implementing and running a workflow for an E-bay data retrieval application. The workflow was designed using Askalon framework and executed on the cloud resources. The purpose of this paper is to demonstrate how workflows and cloud resources can be used by scientists in order to achieve speedup for their application without the need of spending large amounts of money on computational resources.Workflow, Cloud Resource
- âŠ