10 research outputs found
Resource provisioning in Science Clouds: Requirements and challenges
Cloud computing has permeated into the information technology industry in the
last few years, and it is emerging nowadays in scientific environments. Science
user communities are demanding a broad range of computing power to satisfy the
needs of high-performance applications, such as local clusters,
high-performance computing systems, and computing grids. Different workloads
are needed from different computational models, and the cloud is already
considered as a promising paradigm. The scheduling and allocation of resources
is always a challenging matter in any form of computation and clouds are not an
exception. Science applications have unique features that differentiate their
workloads, hence, their requirements have to be taken into consideration to be
fulfilled when building a Science Cloud. This paper will discuss what are the
main scheduling and resource allocation challenges for any Infrastructure as a
Service provider supporting scientific applications
Architecture for Fault Tolerance in Mobile Cloud Computing using Disease Resistance Approach
The mobile cloud computing (MCC) is one of the emerging fields in the distributed computing. MCC is an integration of both mobile computing and cloud computing. The limitations of the mobile devices are storage, battery and processing proficiency.These sensitive characteristics of mobile devices can be effectively handled with the introduction of cloud computing. The increasing functionality of the cloud and complexity of the applications causes resource failures in the cloud computing and it reduces the overall performance of the MCC environment. On the other hand, the existing approaches for resource scheduling in MCC proposed several architectures and they are only concentrated on the allocation of resources. The existing architectures are lack of fault tolerance mechanism to handle the faulty resources. To overcome the issues stated above, this paper proposes architecture for fault tolerance in MCC using Disease Resistance approach (DRFT). The main aim of the DRFT approach is to effectively handle the faultyVMs in the MCC. This DRFT approach utilizes the human disease resistance mechanism which is used as materials and methods in the proposed model. The DRFT is capable of identifying the faulty virtual machines and reschedules the tasks to the identified suitable virtual machines. This procedure ultimately leads to minimization of makespan value and it improves the overall performance of the scheduling process. To validate the effectiveness of the proposed approach, a series of simulations has been carried out using CloudSim simulator. The performance of the proposed DRFT approach is compared with the Dynamic group based fault tolerance approach (DGFT-approach). The makespan value of DRFT is reduced to 7% and the performance of DRFT is increased when compare to the DGFT approach. The experimental results show the effectiveness of the proposed approach
Optimization of Cloud Costs
A large number of companies and organizations nowadays are making the decision to migrate their applications to the cloud. The resources needed to host their applications are provided by a cloud provider. It determines the price for the resources according to certain criteria. The users of the services pay for the costs depending on the resources they use. After the migration to the cloud, the consumers of cloud resources should try to optimize their costs. This paper presents several methods that we can use for optimization of cloud costs. In addition, it is provided a real case study of application of these methods in practice. According to the obtained results, cloud costs are reduced by about 65%
CLOSURE: A cloud scientific workflow scheduling algorithm based on attack-defense game model
The multi-tenant coexistence service mode makes the cloud-based scientific workflow encounter the risks of being intruded. For this problem, we propose a CLoud scientific wOrkflow SchedUling algoRithm based on attack-defensE game model (CLOSURE). In the algorithm, attacks based on different operating system vulnerabilities are regarded as different “attack” strategies; and different operating system distributions in a virtual machine cluster executing the workflows are regarded as different “defense” strategies. The information of the attacker and defender is not balanced. In other words, the defender cannot obtain the information about the attacker’s strategies, while the attacker can acquire information about the defender’s strategies through a network scan. Therefore, we propose to dynamically switch the defense strategies during the workflow execution, which can weaken the network scan effects and transform the workflow security problem into an attack-defense game problem. Then, the probability distribution of the optimal mixed defense strategies can be achieved by calculating the Nash Equilibrium in the attack-defense game model. Based on this probability, diverse VMs are provisioned for workflow execution. Furthermore, a task-VM mapping algorithm based on dynamic Heterogeneous Earliest Finish Time (HEFT) is presented to accelerate the defense strategy switching and improve workflow efficiency. The experiments are conducted on both simulation and actual environment, experimental results demonstrate that compared with other algorithms, the proposed algorithm can reduce the attacker’s benefits by around 15.23%, and decrease the time costs of the algorithm by around 7.86%
Performance optimization and energy efficiency of big-data computing workflows
Next-generation e-science is producing colossal amounts of data, now frequently termed as Big Data, on the order of terabyte at present and petabyte or even exabyte in the predictable future. These scientific applications typically feature data-intensive workflows comprised of moldable parallel computing jobs, such as MapReduce, with intricate inter-job dependencies. The granularity of task partitioning in each moldable job of such big data workflows has a significant impact on workflow completion time, energy consumption, and financial cost if executed in clouds, which remains largely unexplored. This dissertation conducts an in-depth investigation into the properties of moldable jobs and provides an experiment-based validation of the performance model where the total workload of a moldable job increases along with the degree of parallelism. Furthermore, this dissertation conducts rigorous research on workflow execution dynamics in resource sharing environments and explores the interactions between workflow mapping and task scheduling on various computing platforms. A workflow optimization architecture is developed to seamlessly integrate three interrelated technical components, i.e., resource allocation, job mapping, and task scheduling.
Cloud computing provides a cost-effective computing platform for big data workflows where moldable parallel computing models are widely applied to meet stringent performance requirements. Based on the moldable parallel computing performance model, a big-data workflow mapping model is constructed and a workflow mapping problem is formulated to minimize workflow makespan under a budget constraint in public clouds. This dissertation shows this problem to be strongly NP-complete and designs i) a fully polynomial-time approximation scheme for a special case with a pipeline-structured workflow executed on virtual machines of a single class, and ii) a heuristic for a generalized problem with an arbitrary directed acyclic graph-structured workflow executed on virtual machines of multiple classes. The performance superiority of the proposed solution is illustrated by extensive simulation-based results in Hadoop/YARN in comparison with existing workflow mapping models and algorithms.
Considering that large-scale workflows for big data analytics have become a main consumer of energy in data centers, this dissertation also delves into the problem of static workflow mapping to minimize the dynamic energy consumption of a workflow request under a deadline constraint in Hadoop clusters, which is shown to be strongly NP-hard. A fully polynomial-time approximation scheme is designed for a special case with a pipeline-structured workflow on a homogeneous cluster and a heuristic is designed for the generalized problem with an arbitrary directed acyclic graph-structured workflow on a heterogeneous cluster. This problem is further extended to a dynamic version with deadline-constrained MapReduce workflows to minimize dynamic energy consumption in Hadoop clusters. This dissertation proposes a semi-dynamic online scheduling algorithm based on adaptive task partitioning to reduce dynamic energy consumption while meeting performance requirements from a global perspective, and also develops corresponding system modules for algorithm implementation in the Hadoop ecosystem. The performance superiority of the proposed solutions in terms of dynamic energy saving and deadline missing rate is illustrated by extensive simulation results in comparison with existing algorithms, and further validated through real-life workflow implementation and experiments using the Oozie workflow engine in Hadoop/YARN systems
Partitioning workflow applications over federated clouds to meet non-functional requirements
PhD ThesisWith cloud computing, users can acquire computer resources when they need them
on a pay-as-you-go business model. Because of this, many applications are now being
deployed in the cloud, and there are many di erent cloud providers worldwide. Importantly,
all these various infrastructure providers o er services with di erent levels
of quality. For example, cloud data centres are governed by the privacy and security
policies of the country where the centre is located, while many organisations have
created their own internal \private cloud" to meet security needs.
With all this varieties and uncertainties, application developers who decide to host their
system in the cloud face the issue of which cloud to choose to get the best operational
conditions in terms of price, reliability and security. And the decision becomes even
more complicated if their application consists of a number of distributed components,
each with slightly di erent requirements.
Rather than trying to identify the single best cloud for an application, this thesis
considers an alternative approach, that is, combining di erent clouds to meet users'
non-functional requirements. Cloud federation o ers the ability to distribute a single
application across two or more clouds, so that the application can bene t from the
advantages of each one of them. The key challenge for this approach is how to nd the
distribution (or deployment) of application components, which can yield the greatest
bene ts. In this thesis, we tackle this problem and propose a set of algorithms, and a
framework, to partition a work
ow-based application over federated clouds in order to
exploit the strengths of each cloud. The speci c goal is to split a distributed application
structured as a work
ow such that the security and reliability requirements of each
component are met, whilst the overall cost of execution is minimised.
To achieve this, we propose and evaluate a cloud broker for partitioning a work
ow
application over federated clouds. The broker integrates with the e-Science Central
cloud platform to automatically deploy a work
ow over public and private clouds.
We developed a deployment planning algorithm to partition a large work
ow appli-
- i -
cation across federated clouds so as to meet security requirements and minimise the
monetary cost.
A more generic framework is then proposed to model, quantify and guide the partitioning
and deployment of work
ows over federated clouds. This framework considers
the situation where changes in cloud availability (including cloud failure) arise during
work
ow execution
Recommended from our members
Scheduling, Characterization and Prediction of HPC Workloads for Distributed Computing Environments
As High Performance Computing (HPC) has grown considerably and is expected to grow even more, effective resource management for distributed computing sys- tems is motivated more than ever. As the computational workloads grow in quantity, it is becoming more crucial to apply efficient resource management and workload scheduling to use resources efficiently while keeping the computational performance reasonably good. The problem of efficiently scheduling workloads on resources while meeting performance standards is hard. Additionally, non-clairvoyance of job dimen- sions makes resource management even harder in real-world scenarios. Our research methodology investigates the scheduling problem compliant for HPC and researches the challenges for deploying the scheduling in real world-scenarios using state of the art machine learning and data science techniques.To this end, this Ph.D. dissertation makes the following core contributions: a) We perform a theoretical analysis of space-sharing, non-preemptive scheduling: we studied this scheduling problem and proposed scheduling algorithms with polyno- mial computation time. We also proved constant upper-bounds for the performance of these algorithms. b) We studied the sensitivity of scheduling algorithms to the accuracy of runtime and devised a meta-learning approach to estimate prediction accuracy for newly submitted jobs to the HPC system. c) We studied the runtime prediction problem for HPC applications. For this purpose, we studied the distri- bution of available public workloads and proposed two different solutions that can predict multi-modal distributions: switching state-space models and Mixture Density Networks. d) We studied the effectiveness of recent recurrent neural network models for CPU usage trace prediction for individual VM traces as well as aggregate CPU usage traces. In this dissertation, we explore solutions to improve the performance of scheduling workloads on distributed systems.We begin by looking at the problem from the theoretical perspective. Modeling the problem mathematically, we first propose a scheduling algorithm that finds a constant approximation of the optimal solution for the problem in polynomial time. We prove that the performance of the algorithm (average completion time is the constant approximation of the performance of the optimal scheduling. We next look at the problem in real-world scenarios. Considering High-Performance Computing (HPC) workload computing environments as the most similar real-world equivalent of our mathematical model, we explore the problem of predicting application runtime. We propose an algorithm to handle the existing uncertainties in the real world and show-case our algorithm with demonstrative effectiveness in terms of response time and resource utilization. After looking at the uncertainty problem, we focus on trying to improve the accuracy of existing prediction approaches for HPC application runtime. We propose two solutions, one based on Kalman filters and one based on deep density mixture networks. We showcase the effectiveness of our prediction approaches by comparing with previous prediction approaches in terms of prediction accuracy and impact on improving scheduling performance. In the end, we focus on predicting resource usage for individual applications during their execution. We explore the application of recurrent neural networks for predicting resource usage of applications deployed on individual virtual machines. To validate our proposed models and solutions, we performed extensive trace-driven simulation and measured the effectiveness of our approaches
Simulation of the performance of complex data-intensive workflows
PhD ThesisRecently, cloud computing has been used for analytical and data-intensive processes
as it offers many attractive features, including resource pooling, on-demand capability
and rapid elasticity. Scientific workflows use these features to tackle the problems of
complex data-intensive applications. Data-intensive workflows are composed of many
tasks that may involve large input data sets and produce large amounts of data as
output, which typically runs in highly dynamic environments. However, the resources
should be allocated dynamically depending on the demand changes of the work
flow, as over-provisioning increases the cost and under-provisioning causes Service Level
Agreement (SLA) violation and poor Quality of Service (QoS). Performance prediction
of complex workflows is a necessary step prior to the deployment of the workflow.
Performance analysis of complex data-intensive workflows is a challenging task due
to the complexity of their structure, diversity of big data, and data dependencies, in
addition to the required examination to the performance and challenges associated
with running their workflows in the real cloud.
In this thesis, a solution is explored to address these challenges, using a Next Generation
Sequencing (NGS) workflow pipeline as a case study, which may require hundreds/
thousands of CPU hours to process a terabyte of data. We propose a methodology to
model, simulate and predict runtime and the number of resources used by the complex
data-intensive workflows. One contribution of our simulation methodology is that it
provides an ability to extract the simulation parameters (e.g., MIPs and BW values)
that are required for constructing a training set and a fairly accurate prediction of
the run time for input for cluster sizes much larger than ones used in training of the
prediction model. The proposed methodology permits the derivation of run time prediction
based on historical data from the provenance fi les. We present the run time
prediction of the complex workflow by considering different cases of its running in the
cloud such as execution failure and library deployment time. In case of failure, the
framework can apply the prediction only partially considering the successful parts of
the pipeline, in the other case the framework can predict with or without considering
the time to deploy libraries. To further improve the accuracy of prediction, we propose
a simulation model that handles I/O contention
Resource-efficient workflow scheduling in clouds
Workflow applications in science and engineering have steadily increased in variety and scale. Coinciding with this increase has been the relentless effort to improve the performance of these applications through exploiting the abundance of resources in hyper-scale clouds and with little attention to resources efficiency. The inefficient use of resources when executing scientific workflows results from both the excessive amount of resources provisioned and the wastage from unused resources among task runs. In this paper, we address the problem of resource-efficient workflow scheduling. To this end, we present the Maximum Effective Reduction (MER) algorithm, a resource efficiency solution that optimizes the resource usage of a workflow schedule generated by any particular scheduling algorithm. MER trades the minimal makespan increase for the maximal resource usage reduction by consolidating tasks with the exploitation of resource inefficiency in the original workflow schedule. The main novelty of MER lies in its identification of "near-optimal" trade-off point between makespan increase and resource usage reduction. Finding such a point is of great practical importance and can lead to: (1) improvements in resource utilization, (2) reductions in resource provisioning, and (3) savings in energy consumption. Another significant contribution of this work is MER's broad applicability. In essence, MER can be applied to any environments that deal with the execution of (scientific) workflows of many precedence-constrained tasks although MER best suits for the IaaS cloud model. Based on results obtained from our extensive simulations using scientific workflow traces, we demonstrate MER is capable of reducing the amount of actual resources used by 54% with an average makespan increase of less than 10%. The efficacy of MER is further verified by results (from a comprehensive set of experiments with varying makespan delay limits) that show the resource usage reduction, makespan increase and the trade-off between them for various workflow applications.10 page(s