1 research outputs found
Automatic deployment and reproducibility of workflow on the Cloud using container virtualization
PhD ThesisCloud computing is a service-oriented approach to distributed computing that has
many attractive features, including on-demand access to large compute resources. One
type of cloud applications are scientific work
ows, which are playing an increasingly
important role in building applications from heterogeneous components. Work
ows are
increasingly used in science as a means to capture, share, and publish computational
analysis. Clouds can offer a number of benefits to work
ow systems, including the
dynamic provisioning of the resources needed for computation and storage, which has
the potential to dramatically increase the ability to quickly extract new results from
the huge amounts of data now being collected.
However, there are increasing number of Cloud computing platforms, each with different
functionality and interfaces. It therefore becomes increasingly challenging to
de ne work
ows in a portable way so that they can be run reliably on different clouds.
As a consequence, work
ow developers face the problem of deciding which Cloud to
select and - more importantly for the long-term - how to avoid vendor lock-in.
A further issue that has arisen with work
ows is that it is common for them to stop
being executable a relatively short time after they were created. This can be due to
the external resources required to execute a work
ow - such as data and services -
becoming unavailable. It can also be caused by changes in the execution environment
on which the work
ow depends, such as changes to a library causing an error when a
work
ow service is executed. This "work
ow decay" issue is recognised as an impediment
to the reuse of work
ows and the reproducibility of their results. It is becoming
a major problem, as the reproducibility of science is increasingly dependent on the
reproducibility of scientific work
ows.
In this thesis we presented new solutions to address these challenges. We propose a new
approach to work
ow modelling that offers a portable and re-usable description of the
work
ow using the TOSCA specification language. Our approach addresses portability
by allowing work
ow components to be systematically specifed and automatically
- v -
deployed on a range of clouds, or in local computing environments, using container
virtualisation techniques.
To address the issues of reproducibility and work
ow decay, our modelling and deployment
approach has also been integrated with source control and container management
techniques to create a new framework that e ciently supports dynamic work
ow deployment,
(re-)execution and reproducibility.
To improve deployment performance, we extend the framework with number of new
optimisation techniques, and evaluate their effect on a range of real and synthetic
work
ows.Ministry of Higher Education and
Scientific Research in Iraq and Mosul Universit