Search CORE

20 research outputs found

Provenance support for service-based infrastructure

Author: Rajbhandari Shrija
Publication venue
Publication date: 01/01/2007
Field of study

Service-based architectures represent the next evolutionary step in the development of e-science, namely, the transformation of the Internet from a commercial marketplace to a mechanism for sharing multidisciplinary scientific resources. Although scientists in many disciplines have become increasingly reliant on distributed computing technologies for data processing and dissemination, the record of the processing history and origin of a data product, that is its data provenance, is often nonexistent, incomplete or impossible to recover by potential users. This thesis aims to address data provenance issues in service-based environments, particularly to answer how a scientist who performs a workflow execution in such an environment can (1) document the data provenance for a data item created by the execution, and (2) use the provenance documentation as a recipe to re-execute the workflow. This thesis pro poses a provenance model for delivering data provenance support in a service-based environment. Through the use of an example scenario of a scientific workflow in the Astrophysics domain, we explore and identify components of the provenance model. The provenance model proposes a technique to collect and record data provenance for service-based workflow executions. The technique facilitates the collection of data provenance of workflow execution at runtime. In order to record the collected data provenance, the thesis also proposes a specification to represent provenance to de scribe the processing history whereby a piece of data was derived. The thesis also proposes query interfaces that allow recorded provenance to be queried, has formulated a technique to construct provenance graphs, and supports the re-execution of past workflows. The provenance representation specification, the collection technique, and the query interfaces have been used to implement a prototype system to demonstrate the proposed model. The thesis also experimentally evaluates the scalability of the components implemented.EThOS - Electronic Theses Online ServiceGBUnited Kingdo

OpenGrey Repository

BRIL - Capturing Experiments in the Wild

Author: Fabiane Stella
Hedges Mark
Rajbhandari Shrija
Publication venue: International Conference on Open Repositories : Proceedings
Publication date: 31/12/2010
Field of study

This presentation describes a project to embed a repository system (based on Fedora) within the complex, experimental processes of a number of researchers in biophysics and structural biology. The project is capturing not just individual datasets but entire experimental workflows as complex objects, incorporating provenance information based on the Open Provenance Model, to support reproduction and validation of published results. The repository is integrated within these experimental processes, so that data capture is as far as possible automatic and invisible to the researcher. A particular challenge is that the researchers’ work takes place in local environments within the department, entirely decoupled from the repository. In meeting this challenge, the project is bridging the gap between the “wild”, ad hoc and independent environment of the researchers desktop, and the curated, sustainable, institutional environment of the repository, and in the process project crosses the boundary between several of the pairs of polar opposites identified in the call

BieColl - Bielefeld eCollections

Provenance support for service-based infrastructure

Author: Rajbhandari Shrija
Publication venue
Publication date
Field of study

Online Research @ Cardiff

BRIL - Capturing Experiments in the Wild

Author: Fabiane Stella
Hedges Mark
Rajbhandari Shrija
Publication venue: International Conference on Open Repositories : Proceedings
Publication date: 31/12/2010
Field of study

BieColl - Bielefeld Electronic Collections

BieColl - Bielefeld eCollections

Trust assessment using provenance in service oriented applications

Author: Arnaud Contes
Ian Wootten
Omer F Rana
Shrija Rajbhandari
Vikas Deora
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2006
Field of study

CiteSeerX

Consulting (in Writing) to the Corporation: Principles and Pragmatics

Author: Contes Arnaud
Deora Vikas
Rajbhandari Shrija
Rana Omer Farooq
Tamas Kifor
Varga Laszlo
Wootten Ian M.
Publication venue: Fairleigh Dickinson University
Publication date: 01/01/2005
Field of study

Provenance information provides a useful basis to verify whether a particular application behavior has been adhered to. This is particularly useful to evaluate the basis for a particular outcome, as a result of a process, and to verify if the process involved in making the decision conforms to some pre-defined set of rules. This is significant in a healthcare scenario, where it is necessary to demonstrate that patient data has been processed in a particular way. Understanding how provenance information may be recorded, stored, and subsequently analyzed by a decision maker is therefore significant in a service oriented architecture, which involves the use of third party services over which the decision maker does not have control. The aggregation of data from multiple sources of patient information plays an important part in subsequent treatments that are proposed for a patient. A tool to navigate through and analyze such provenance information is proposed, based on the use of a portal framework that allows different views on provenance information to co-exist. The portal enables users to add custom portlets enabling application specific views that would facilitate particular decision making

CiteSeerX

Crossref

Online Research @ Cardiff

SZTAKI Publication Repository

University of Queensland eSpace

Trust Assessment Using Provenance in Service Oriented Applications

Author: Contes Arnaud
Deora Vikas
Rajbhandari Shrija
Rana Omer Farooq
Wootten Ian M.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2006
Field of study

Workflow forms a key part of many existing Service Oriented applications, involving the integration of services that may be made available at distributed sites. It is possible to distinguish between an "abstract" workflow description outlining which services must be involved in a workflow execution and a "physical" workflow description outlining the particular instances of services that were used in a particular enactment. Provenance information provides a useful way to capture the physical workflow description automatically especially if this information is captured in a standard format. Subsequent analysis on this provenance information may be used to evaluate whether the abstract workflow description has been adhered to, and to enable a user executing a workflow-based application to establish "trust" in the outcome

CiteSeerX

Crossref

Online Research @ Cardiff

MetaTools - Investigating Metadata Generation Tools - Final Report

Author: Polfreman Malcolm
Rajbhandari Shrija
Publication venue
Publication date: 24/11/2008
Field of study

Automatic metadata generation has sometimes been posited as a solution to the ‘metadata bottleneck’ that repositories and portals are facing as they struggle to provide resource discovery metadata for a rapidly growing number of new digital resources. Unfortunately there is no registry or trusted body of documentation that rates the quality of metadata generation tools or identifies the most effective tool(s) for any given task. The aim of the first stage of the project was to remedy this situation by developing a framework for evaluating tools used for the purpose of generating Dublin Core metadata. A range of intrinsic and extrinsic metrics (standard tests or measurements) that capture the attributes of good metadata from various perspectives were identified from the research literature and evaluated in a report. A test program was then implemented using metrics from the framework. It evaluated the quality of metadata generated from 1) Web pages (html) and 2) scholarly works (pdf) by four of the more widely-known metadata generation tools - Data Fountains, DC-dot, SamgI, and the Yahoo! Term Extractor. The intention was also to test PaperBase, a prototype for generating metadata for scholarly works, but its developers ultimately preferred to conduct tests in-house. Some interesting comparisons with their results were nonetheless possible and were included in the stage 2 report. It was found that the output from Data Fountains was generally superior to that of the other tools that the project tested. But the output from all of the tools was considered to be disappointing and markedly inferior to the quality of metadata that Tonkin and Muller report that PaperBase has extracted from scholarly works. Over all, the prospects for generating high-quality metadata for scholarly works appear to be brighter because of their more predictable layout. It is suggested JISC should particularly encourage research into auto-generation methods that exploit the structural and syntactic features of scholarly works in pdf format, as exemplified by PaperBase, and strongly consider funding the development of tools in this direction. In the third stage of the project SOAP and RESTful Web Service interfaces were developed for three metadata generation tools – Data Fountains, SamgI and Kea. This had a dual purpose. Firstly, the creation of an optimal metadata record usually requires the merging of output from several tools each of which, until now, had to be invoked separately because of the ad hoc nature of their interfaces. As Web services, they will be available for use in a network such as the Web with well-defined interfaces that are implementation-independent. These services will be exposed for use by clients without them having to be concerned with how the service will execute their requests. Repositories should be able to plug them into their own cataloguing environments and experiment with automatic metadata generation under more ‘real-life’ circumstances than hitherto. Secondly, and more importantly (in view of the relatively poor quality of current tools) they enabled the project to experiment with the use of a high-level ontology for describing metadata generation tools. The value of an ontology being used in this way should be felt as higher quality tools (such as PaperBase?) emerge. The high-level ontology is part of a MetaTools system architecture that consists of various components to describe, register and discover services. Low level definitions within a service ontology are mapped to higher-level human-understandable semantic descriptions contained within a MetaTools ontology. A user interface enables service providers register their service in a public registry. This registry is used by consumers to find services that match certain criteria. If the registry has such a service, it provides the consumer with a contract and an endpoint address for that service. The terms in the MetaTools ontology can, in turn, be part of a higher-level ontology that describes the preservation domain as a whole. The team believes that an ontology-aided approach to service discovery, as employed by the MetaTools project, is a practical solution. A stage 3 technical report was also written

Jisc Repository

Support for Provenance in a Service-based Computing Grid

Author: Rajbhandari Shrija
Walker David William
Publication venue: EPSRC
Publication date: 01/09/2004
Field of study

There is underlying need to support data provenance in a service-based computing environment such as the Grid where web services may be automatically discovered, composed, and then consumed using innovative workflow management systems. In many scientic scenarios a succession of data transformations occurs producing data of added scientic value. The provenance of such data needs to be recognized for other potential users to verify the data before using it in further studies. In this paper we present general requirements and implementation issues in provenance data collection, recording and reasoning, and discuss how these reflect on what aspects of information are essential for an effective and scalable system

Online Research @ Cardiff