5,199 research outputs found
Interoperability of heterogeneous large-scale scientific workflows and data resources
Workflow allows e-Scientists to express their experimental processes in a structured
way and provides a glue to integrate remote applications. Since Grid provides an
enormously large amount of data and computational resources, executing workflows
on the Grid results in significant performance improvement. Several workflow management
systems, which are widely used by different scientific communities, were
developed for various purposes. Therefore, they differ in several aspects.
This thesis outlines two major problems of existing workflow systems: workflow
interoperability and data access. On the one hand, existing workflow systems are
based on different technologies. Therefore, to achieve interoperability between their
workflows at any level is a challenging task. In spite of the fact that there is a clear
demand for interoperable workflows, for example, to enable scientists to share workflows,
to leverage existing work of others, and to create multi-disciplinary workflows;
currently, there are only limited, ad-hoc workflow interoperability solutions available
for scientists. Existing solutions only realise workflow interoperability between
a small set of workflow systems and do not consider performance issues that arise
in the case of large-scale (computational and/or data intensive) scientific workflows.
Scientific workflows are typically computation and/or data intensive and are executed
in a distributed environment to speed up their execution time. Therefore,
their performance is a key issue. Existing interoperability solutions bottleneck the
communication between workflows in most scenarios dramatically increasing execution time. On the other hand, many scientific computational experiments are based
on data that reside in data resources which can be of different types and vendors.
Many workflow systems support access to limited subsets of such data resources
preventing data level workflow interoperation between different systems. Therefore,
there is a demand for a general solution that provides access to a wide range of data
resources of different types and vendors. If such a solution is general, in the sense
that it can be adopted by several workflow systems, then it also enables workflows
of different systems to access the same data resources and therefore interoperate at
data level. Note that data semantics are out of the scope of this work. For the
same reasons as described above, the performance characteristics of such a solution
are inevitably important. Although in terms of functionality, there are solutions
which could be adopted by workflow systems for this purpose, they provide poor
performance. For that reason, they did not gain wide acceptance by the scientific
workflow community.
Addressing these issues, a set of architectures is proposed to realise heterogeneous
data access and heterogeneous workflow execution solutions. The primary goal was
to investigate how such solutions can be implemented and integrated with workflow
systems. The secondary aim was to analyse how such solutions can be implemented
and utilised by single applications
- …