138 research outputs found
Enhancing Workflow with a Semantic Description of Scientific Intent
Peer reviewedPreprin
Grid application meta-repository system
As one of the most popular forms of distributed computing technology, Grid brings together different scientific communities that are able to deploy, access, and run complex applications with the help of the enormous computational and storage
power offered by the Grid infrastructure.
However as the number of Grid applications has been growing steadily in recent
years, they are now stored on a multitude of different repositories, which remain specific to each Grid. At the time this research was carried out there were no two well-known Grid application repositories sharing the same structure, same
implementation, same access technology and methods, same communication
protocols, same security system or same application description language used for application descriptions. This remained a great limitation for Grid users, who were bound to work on only one specific repository, and also presented a significant
limitation in terms of interoperability and inter-repository access. The research presented in this thesis provides a solution to this problem, as well as to several other related issues that have been identified while investigating these areas of Grid.
Following a comprehensive review of existing Grid repository capabilities, I defined the main challenges that need to be addressed in order to make Grid repositories more versatile and I proposed a solution that addresses these challenges. To this end, I designed a new Grid repository (GAMRS – Grid Application Meta-Repository
System), which includes a novel model and architecture, an improved application
description language and a matchmaking system. After implementing and testing
this solution, I have proved that GAMRS marks an improvement in Grid repository
systems. Its new features allow for the inter-connection of different Grid
repositories; make applications stored on these repositories visible on the web; allow for the discovery of similar or identical applications stored in different Grid repositories; permit the exchange and re-usage of application and applicationrelated objects between different repositories; and extend the use of applications
stored on Grid repositories to other distributed environments, such as virtualized cluster-on-demand and cloud computing
Generic access to symbolic computing services
Symbolic computation is one of the computational domains that requires large computational
resources. Computer Algebra Systems (CAS), the main tools used for symbolic
computations, are mainly designed to be used as software tools installed on standalone
machines that do not provide the required resources for solving large symbolic computation
problems. In order to support symbolic computations an infrastructure built upon
massively distributed computational environments must be developed.
Building an infrastructure for symbolic computations requires a thorough analysis of
the most important requirements raised by the symbolic computation world and must
be built based on the most suitable architectural styles and technologies. The architecture
that we propose is composed of several main components: the Computer Algebra
System (CAS) Server that exposes the functionality implemented by one or more supporting
CASs through generic interfaces of Grid Services; the Architecture for Grid
Symbolic Services Orchestration (AGSSO) Server that allows seamless composition of
CAS Server capabilities; and client side libraries to assist the users in describing workflows
for symbolic computations directly within the CAS environment. We have also
designed and developed a framework for automatic data management of mathematical
content that relies on OpenMath encoding.
To support the validation and fine tuning of the system we have developed a simulation
platform that mimics the environment on which the architecture is deployed
Service-Oriented Ad Hoc Grid Computing
Subject of this thesis are the design and implementation of an ad hoc Grid infrastructure. The vision of an ad hoc Grid further evolves conventional service-oriented Grid systems into a more robust, more flexible and more usable environment that is still standards compliant and interoperable with other Grid systems. A lot of work in current Grid middleware systems is focused on providing transparent access to high performance computing (HPC) resources (e.g. clusters) in virtual organizations spanning multiple institutions. The ad hoc Grid vision presented in this thesis exceeds this view in combining classical Grid components with more flexible components and usage models, allowing to form an environment combining dedicated HPC-resources with a large number of personal computers forming a "Desktop Grid".
Three examples from medical research, media research and mechanical engineering are presented as application scenarios for a service-oriented ad hoc Grid infrastructure. These sample applications are also used to derive requirements for the runtime environment as well as development tools for such an ad hoc Grid environment.
These requirements form the basis for the design and implementation of the Marburg ad hoc Grid Environment (MAGE) and the Grid Development Tools for Eclipse (GDT). MAGE is an implementation of a WSRF-compliant Grid middleware, that satisfies the criteria for an ad hoc Grid middleware presented in the introduction to this thesis. GDT extends the popular Eclipse integrated development environment by components that support application development both for traditional service-oriented Grid middleware systems as well as ad hoc Grid infrastructures such as MAGE. These development tools represent the first fully model driven approach to Grid service development integrated with infrastructure management components in service-oriented Grid computing.
This thesis is concluded by a quantitative discussion of the performance overhead imposed by the presented extensions to a service-oriented Grid middleware as well as a discussion of the qualitative improvements gained by the overall solution. The conclusion of this thesis also gives an outlook on future developments and areas for further research.
One of these qualitative improvements is "hot deployment" the ability to install and remove Grid services in a running node without interrupt to other active services on the same node. Hot deployment has been introduced as a novelty in service-oriented Grid systems as a result of the research conducted for this thesis. It extends service-oriented Grid computing with a new paradigm, making installation of individual application components a functional aspect of the application.
This thesis further explores the idea of using peer-to-peer (P2P networking for Grid computing by combining a general purpose P2P framework with a standard compliant Grid middleware. In previous work the application of P2P systems has been limited to replica location and use of P2P index structures for discovery purposes. The work presented in this thesis also uses P2P networking to realize seamless communication accross network barriers. Even though the web service standards have been designed for the internet, the two-way communication requirement introduced by the WSRF-standards and particularly the notification pattern is not well supported by the web service standards. This defficiency can be answered by mechanisms that are part of such general purpose P2P communication frameworks.
Existing security infrastructures for Grid systems focus on protection of data during transmission and access control to individual resources or the overall Grid environment. This thesis focuses on security issues within a single node of a dynamically changing service-oriented Grid environment. To counter the security threads arising from the new capabilities of an ad hoc Grid, a number of novel isolation solutions are presented. These solutions address security issues and isolation on a fine-grained level providing a range of applicable basic mechanisms for isolation, ranging from lightweight system call interposition to complete para-virtualization of the operating systems
DRIVER Technology Watch Report
This report is part of the Discovery Workpackage (WP4) and is the third report out of four deliverables. The objective of this report is to give an overview of the latest technical developments in the world of digital repositories, digital libraries and beyond, in order to serve as theoretical and practical input for the technical DRIVER developments, especially those focused on enhanced publications. This report consists of two main parts, one part focuses on interoperability standards for enhanced publications, the other part consists of three subchapters, which give a landscape picture of current and surfacing technologies and communities crucial to DRIVER. These three subchapters contain the GRID, CRIS and LTP communities and technologies. Every chapter contains a theoretical explanation, followed by case studies and the outcomes and opportunities for DRIVER in this field
Supporting Quality of Service in Scientific Workflows
While workflow management systems have been utilized in enterprises to support
businesses for almost two decades, the use of workflows in scientific environments
was fairly uncommon until recently. Nowadays, scientists use workflow systems to
conduct scientific experiments, simulations, and distributed computations. However,
most scientific workflow management systems have not been built using existing
workflow technology; rather they have been designed and developed from
scratch. Due to the lack of generality of early scientific workflow systems, many
domain-specific workflow systems have been developed. Generally speaking, those
domain-specific approaches lack common acceptance and tool support and offer
lower robustness compared to business workflow systems.
In this thesis, the use of the industry standard BPEL, a workflow language
for modeling business processes, is proposed for the modeling and the execution of
scientific workflows. Due to the widespread use of BPEL in enterprises, a number
of stable and mature software products exist. The language is expressive (Turingcomplete)
and not restricted to specific applications. BPEL is well suited for the
modeling of scientific workflows, but existing implementations of the standard lack
important features that are necessary for the execution of scientific workflows.
This work presents components that extend an existing implementation of the
BPEL standard and eliminate the identified weaknesses. The components thus provide
the technical basis for use of BPEL in academia. The particular focus is on
so-called non-functional (Quality of Service) requirements. These requirements include
scalability, reliability (fault tolerance), data security, and cost (of executing a
workflow). From a technical perspective, the workflow system must be able to interface
with the middleware systems that are commonly used by the scientific workflow
community to allow access to heterogeneous, distributed resources (especially Grid
and Cloud resources).
The major components cover exactly these requirements:
Cloud Resource Provisioner Scalability of the workflow system is achieved by
automatically adding additional (Cloud) resources to the workflow system’s
resource pool when the workflow system is heavily loaded.
Fault Tolerance Module High reliability is achieved via continuous monitoring
of workflow execution and corrective interventions, such as re-execution of a
failed workflow step or replacement of the faulty resource.
Cost Aware Data Flow Aware Scheduler The majority of scientific workflow
systems only take the performance and utilization of resources for the execution
of workflow steps into account when making scheduling decisions. The
presented workflow system goes beyond that. By defining preference values
for the weighting of costs and the anticipated workflow execution time,
workflow users may influence the resource selection process. The developed multiobjective
scheduling algorithm respects the defined weighting and makes both
efficient and advantageous decisions using a heuristic approach.
Security Extensions Because it supports various encryption, signature and authentication
mechanisms (e.g., Grid Security Infrastructure), the workflow
system guarantees data security in the transfer of workflow data.
Furthermore, this work identifies the need to equip workflow developers with
workflow modeling tools that can be used intuitively. This dissertation presents
two modeling tools that support users with different needs. The first tool, DAVO
(domain-adaptable, Visual BPEL Orchestrator), operates at a low level of abstraction
and allows users with knowledge of BPEL to use the full extent of the language.
DAVO is a software that offers extensibility and customizability for different application
domains. These features are used in the implementation of the second tool,
SimpleBPEL Composer. SimpleBPEL is aimed at users with little or no background
in computer science and allows for quick and intuitive development of BPEL workflows based on predefined components
An Integrated Methodology for Creating Composed Web/Grid Services
This thesis presents an approach to design, specify, validate, verify, implement, and evaluate composed web/grid services. Web and grid services can be composed to create new services
with complex behaviours. The BPEL (Business Process Execution Language) standard was created to enable the orchestration of web services, but there have also been investigation of
its use for grid services. BPEL specifies the implementation of service composition but has no formal semantics; implementations are in practice checked by testing. Formal methods are
used in general to define an abstract model of system behaviour that allows simulation and reasoning about properties. The approach can detect and reduce potentially costly errors at
design time.
CRESS (Communication Representation Employing Systematic Specification) is a domainindependent,
graphical, abstract notation, and integrated toolset for developing composite web service. The original version of CRESS had automated support for formal specification in
LOTOS (Language Of Temporal Ordering Specification), executing formal validation with MUSTARD (Multiple-Use Scenario Testing and Refusal Description), and implementing in
BPEL4WS as the early version of BPEL standard. This thesis work has extended CRESS and its integrated tools to design, specify, validate, verify, implement, and evaluate composed web/grid
services. The work has extended the CRESS notation to support a wider range of service compositions, and has applied it to grid services as a new domain. The thesis presents two new
tools, CLOVE (CRESS Language-Oriented Verification Environment) and MINT (MUSTARD Interpreter), to respectively support formal verification and implementation testing. New work
has also extended CRESS to automate implementation of composed services using the more recent BPEL standard WS-BPEL 2.0
DRIVE: A Distributed Economic Meta-Scheduler for the Federation of Grid and Cloud Systems
The computational landscape is littered with islands of disjoint resource providers including
commercial Clouds, private Clouds, national Grids, institutional Grids, clusters, and data centers.
These providers are independent and isolated due to a lack of communication and coordination,
they are also often proprietary without standardised interfaces, protocols, or execution environments.
The lack of standardisation and global transparency has the effect of binding consumers
to individual providers. With the increasing ubiquity of computation providers there is an opportunity
to create federated architectures that span both Grid and Cloud computing providers
effectively creating a global computing infrastructure. In order to realise this vision, secure and
scalable mechanisms to coordinate resource access are required. This thesis proposes a generic
meta-scheduling architecture to facilitate federated resource allocation in which users can provision
resources from a range of heterogeneous (service) providers.
Efficient resource allocation is difficult in large scale distributed environments due to the inherent
lack of centralised control. In a Grid model, local resource managers govern access to a
pool of resources within a single administrative domain but have only a local view of the Grid
and are unable to collaborate when allocating jobs. Meta-schedulers act at a higher level able to
submit jobs to multiple resource managers, however they are most often deployed on a per-client
basis and are therefore concerned with only their allocations, essentially competing against one
another. In a federated environment the widespread adoption of utility computing models seen in
commercial Cloud providers has re-motivated the need for economically aware meta-schedulers.
Economies provide a way to represent the different goals and strategies that exist in a competitive
distributed environment. The use of economic allocation principles effectively creates an
open service market that provides efficient allocation and incentives for participation.
The major contributions of this thesis are the architecture and prototype implementation of the
DRIVE meta-scheduler. DRIVE is a Virtual Organisation (VO) based distributed economic metascheduler
in which members of the VO collaboratively allocate services or resources. Providers
joining the VO contribute obligation services to the VO. These contributed services are in effect
membership “dues” and are used in the running of the VOs operations – for example allocation,
advertising, and general management. DRIVE is independent from a particular class of provider
(Service, Grid, or Cloud) or specific economic protocol. This independence enables allocation in
federated environments composed of heterogeneous providers in vastly different scenarios. Protocol
independence facilitates the use of arbitrary protocols based on specific requirements and
infrastructural availability. For instance, within a single organisation where internal trust exists,
users can achieve maximum allocation performance by choosing a simple economic protocol.
In a global utility Grid no such trust exists. The same meta-scheduler architecture can be used
with a secure protocol which ensures the allocation is carried out fairly in the absence of trust.
DRIVE establishes contracts between participants as the result of allocation. A contract describes
individual requirements and obligations of each party. A unique two stage contract negotiation
protocol is used to minimise the effect of allocation latency. In addition due to the co-op nature of
the architecture and the use of secure privacy preserving protocols, DRIVE can be deployed in a
distributed environment without requiring large scale dedicated resources.
This thesis presents several other contributions related to meta-scheduling and open service
markets. To overcome the perceived performance limitations of economic systems four high utilisation
strategies have been developed and evaluated. Each strategy is shown to improve occupancy,
utilisation and profit using synthetic workloads based on a production Grid trace. The
gRAVI service wrapping toolkit is presented to address the difficulty web enabling existing applications.
The gRAVI toolkit has been extended for this thesis such that it creates economically
aware (DRIVE-enabled) services that can be transparently traded in a DRIVE market without requiring
developer input. The final contribution of this thesis is the definition and architecture of
a Social Cloud – a dynamic Cloud computing infrastructure composed of virtualised resources
contributed by members of a Social network. The Social Cloud prototype is based on DRIVE
and highlights the ease in which dynamic DRIVE markets can be created and used in different
domains
- …