1,167 research outputs found
Partitioning workflow applications over federated clouds to meet non-functional requirements
PhD ThesisWith cloud computing, users can acquire computer resources when they need them
on a pay-as-you-go business model. Because of this, many applications are now being
deployed in the cloud, and there are many di erent cloud providers worldwide. Importantly,
all these various infrastructure providers o er services with di erent levels
of quality. For example, cloud data centres are governed by the privacy and security
policies of the country where the centre is located, while many organisations have
created their own internal \private cloud" to meet security needs.
With all this varieties and uncertainties, application developers who decide to host their
system in the cloud face the issue of which cloud to choose to get the best operational
conditions in terms of price, reliability and security. And the decision becomes even
more complicated if their application consists of a number of distributed components,
each with slightly di erent requirements.
Rather than trying to identify the single best cloud for an application, this thesis
considers an alternative approach, that is, combining di erent clouds to meet users'
non-functional requirements. Cloud federation o ers the ability to distribute a single
application across two or more clouds, so that the application can bene t from the
advantages of each one of them. The key challenge for this approach is how to nd the
distribution (or deployment) of application components, which can yield the greatest
bene ts. In this thesis, we tackle this problem and propose a set of algorithms, and a
framework, to partition a work
ow-based application over federated clouds in order to
exploit the strengths of each cloud. The speci c goal is to split a distributed application
structured as a work
ow such that the security and reliability requirements of each
component are met, whilst the overall cost of execution is minimised.
To achieve this, we propose and evaluate a cloud broker for partitioning a work
ow
application over federated clouds. The broker integrates with the e-Science Central
cloud platform to automatically deploy a work
ow over public and private clouds.
We developed a deployment planning algorithm to partition a large work
ow appli-
- i -
cation across federated clouds so as to meet security requirements and minimise the
monetary cost.
A more generic framework is then proposed to model, quantify and guide the partitioning
and deployment of work
ows over federated clouds. This framework considers
the situation where changes in cloud availability (including cloud failure) arise during
work
ow execution
S-Store: Streaming Meets Transaction Processing
Stream processing addresses the needs of real-time applications. Transaction
processing addresses the coordination and safety of short atomic computations.
Heretofore, these two modes of operation existed in separate, stove-piped
systems. In this work, we attempt to fuse the two computational paradigms in a
single system called S-Store. In this way, S-Store can simultaneously
accommodate OLTP and streaming applications. We present a simple transaction
model for streams that integrates seamlessly with a traditional OLTP system. We
chose to build S-Store as an extension of H-Store, an open-source, in-memory,
distributed OLTP database system. By implementing S-Store in this way, we can
make use of the transaction processing facilities that H-Store already
supports, and we can concentrate on the additional implementation features that
are needed to support streaming. Similar implementations could be done using
other main-memory OLTP platforms. We show that we can actually achieve higher
throughput for streaming workloads in S-Store than an equivalent deployment in
H-Store alone. We also show how this can be achieved within H-Store with the
addition of a modest amount of new functionality. Furthermore, we compare
S-Store to two state-of-the-art streaming systems, Spark Streaming and Storm,
and show how S-Store matches and sometimes exceeds their performance while
providing stronger transactional guarantees
Remodelling Scientific Workflows for Cloud
Viimastel aastatel on hakanud teaduslikes kogukondades huvi pilvearvutuse vastu kasvama. Teaduskatsete läbiviimisel pilves on mitmeid eeliseid nagu elastsus, paindlikkus ja hooldatavus, kuid varasemad uuringud näitavad, et üks suurimaid probleeme teadusprogrammide jooksutamisel pilves on omavaheliste masinate andmevahetuse suurus. Üks lahendus sellele probleemile oleks tuvastada komponendid, mis omavahel palju suhtlevad ning panna nad pilves ühte kohta jooksma, et vähendada omavahelist andmevahetust. Antud bakalaureuse töös jagati (partitsioneeriti) Montage töövoo osad pilves asuvate virtuaalmasinate vahel ning rakendati valmis kirjutatud P2P süsteemi, et vähendada pilves olevat suhtlust. Tänu P2P süsteemile ja teadusprogrammi partitsioneerimisele vähendati kogu suhtlust pilves kuni 80%.In recent years, cloud computing has raised significant
interest in the scientific community. Running scientific
experiments in the cloud has its advantages like elasticity, scalability
and software maintenance. However, the communication
latencies are observed to be the
major hindrance for migrating scientific computing applications
to the cloud. The problem escalates further when we consider
scientific workflows, where significant data is exchanged across
different tasks.
One way to overcome this problem is to reduce the data communication by partitioning
and scheduling the workflow and adapting a peer-to-peer file
sharing among the nodes. Different size Montage workflows were
considered for the analysis of this problem. From the study it was
observed that the partitioning along with the peer-to-peer file
sharing reduced the data communication in the cloud up to 80
LOGOS: Enabling Local Resource Managers for the Efficient Support of Data-Intensive Workflows within Grid Sites
In this study we discuss how to enable grid sites for the support of data-intensive workflows. Usually, within grid sites, tasks and resources are administrated by local resource managers (LRMs). Many of LRMs have been designed for managing compute-intensive applications. Therefore, data-intensive workflow applications might not perform well on such environments due to the number and size of data transfers between tasks. To improve the performance of such kind of applications it is necessary to redefine the scheduling policies integrated on LRMs. This paper proposes a novel scheme for efficiently supporting data-intensive workflows in LRMs within grid sites. Such scheme is partially implemented in our grid middleware LOGOS and used to improve the performance of a well known LRM: HTCondor. The core of LOGOS is a novel communication-aware scheduling algorithm (PPSA) capable of finding near-optimal solutions. Experiments conducted in this study showed that our approach leads to performance improvements up to 52 % in the management of data-intensive workflow applications
Cloudbus Toolkit for Market-Oriented Cloud Computing
This keynote paper: (1) presents the 21st century vision of computing and
identifies various IT paradigms promising to deliver computing as a utility;
(2) defines the architecture for creating market-oriented Clouds and computing
atmosphere by leveraging technologies such as virtual machines; (3) provides
thoughts on market-based resource management strategies that encompass both
customer-driven service management and computational risk management to sustain
SLA-oriented resource allocation; (4) presents the work carried out as part of
our new Cloud Computing initiative, called Cloudbus: (i) Aneka, a Platform as a
Service software system containing SDK (Software Development Kit) for
construction of Cloud applications and deployment on private or public Clouds,
in addition to supporting market-oriented resource management; (ii)
internetworking of Clouds for dynamic creation of federated computing
environments for scaling of elastic applications; (iii) creation of 3rd party
Cloud brokering services for building content delivery networks and e-Science
applications and their deployment on capabilities of IaaS providers such as
Amazon along with Grid mashups; (iv) CloudSim supporting modelling and
simulation of Clouds for performance studies; (v) Energy Efficient Resource
Allocation Mechanisms and Techniques for creation and management of Green
Clouds; and (vi) pathways for future research.Comment: 21 pages, 6 figures, 2 tables, Conference pape
High-Performance Cloud Computing: A View of Scientific Applications
Scientific computing often requires the availability of a massive number of
computers for performing large scale experiments. Traditionally, these needs
have been addressed by using high-performance computing solutions and installed
facilities such as clusters and super computers, which are difficult to setup,
maintain, and operate. Cloud computing provides scientists with a completely
new model of utilizing the computing infrastructure. Compute resources, storage
resources, as well as applications, can be dynamically provisioned (and
integrated within the existing infrastructure) on a pay per use basis. These
resources can be released when they are no more needed. Such services are often
offered within the context of a Service Level Agreement (SLA), which ensure the
desired Quality of Service (QoS). Aneka, an enterprise Cloud computing
solution, harnesses the power of compute resources by relying on private and
public Clouds and delivers to users the desired QoS. Its flexible and service
based infrastructure supports multiple programming paradigms that make Aneka
address a variety of different scenarios: from finance applications to
computational science. As examples of scientific computing in the Cloud, we
present a preliminary case study on using Aneka for the classification of gene
expression data and the execution of fMRI brain imaging workflow.Comment: 13 pages, 9 figures, conference pape
Framework for Automated Partitioning of Scientific Workflows on the Cloud
Teaduslikud töövood on saanud populaarseks standardiks, et lihtsal viisil esitada ning lahendada erinevaid teaduslikke ülesandeid. Üldiselt koosnevad need töövood suurtest hulkadest ülesannetest, mis nõuavad tihti palju erinevaid arvuti ressursse, mistõttu jooksutatakse neid kas pilvearvutust, hajustöötlust või superarvuteid kasutades. Varem on tõestatud, et kui rakendada pilves töövoo erinevate osade jagamiseks k-way partitsioneerimis algoritmi, siis üleüldine kommunikatsioon pilves väheneb. Antud magistritöös programmeriti raamistik, et seda protsessi automatiseerida. Loodud raamistik võimaldab automaatselt partitsioneerida igasugusegi töövoo, mis on mõeldud Pegasuse programmiga jooksutamiseks. Raamistik, kasutades CloudML'i, seab automaatselt pilves üles klastri masinaid, konfigureerib ning sätestab kõik vajaliku tarkvara ning jooksutab ja partitsioneerib etteantud töövoo. Lisaks, kuvatakse pärast töövoo lõpetamist ka ajalise kalkulatsiooni visualisatsioon. Seda kasutades saab lõppkasutaja aimu, mitu tuuma peaks töövoo jooksutamiseks kasutama, et lõpetada eksperiment mingis kindlas ajavahemikus.Scientific workflows have become a standardized way for scientists to represent a set of tasks to overcome or solve a certain problem. Usually these workflows consist of numerous amount of jobs that are both CPU heavy and I/O intensive that are executed using some kind of workflow management system either on clouds, grids, supercomputers, etc. Previously, it has been shown that using k-way partitioning algorithm to distribute a workflow's tasks between multiple machines in the cloud reduces the overall data communication and therefore lowers the cost of the bandwidth usage. In this thesis, a framework was built in order to automate this process - partition any workflow submitted by a scientist that is meant to be run on Pegasus workflow management system in the cloud with ease. The framework provisions the instances in the cloud using CloudML, configures and installs all the software needed for the execution, runs and partitions the scientific workflow and finally shows the time estimation of the workflow, so that the user would have an approximate guidelines on, how many resources one should provision in order to finish an experiment under a certain time-frame
- …