Search CORE

275 research outputs found

Towards Provenance and Traceability in CRISTAL for HEP

Author: Branson Andrew
McClatchey Richard
Shamdasani Jetendr
Publication venue: 'IOP Publishing'
Publication date: 24/02/2014
Field of study

This paper discusses the CRISTAL object lifecycle management system and its use in provenance data management and the traceability of system events. This software was initially used to capture the construction and calibration of the CMS ECAL detector at CERN for later use by physicists in their data analysis. Some further uses of CRISTAL in different projects (CMS, neuGRID and N4U) are presented as examples of its flexible data model. From these examples, applications are drawn for the High Energy Physics domain and some initial ideas for its use in data preservation HEP are outlined in detail in this paper. Currently investigations are underway to gauge the feasibility of using the N4U Analysis Service or a derivative of it to address the requirements of data and analysis logging and provenance capture within the HEP long term data analysis environment.Comment: 5 pages and 1 figure. 20th International Conference on Computing in High Energy and Nuclear Physics (CHEP13). 14-18th October 2013. Amsterdam, Netherlands. To appear in Journal of Physics Conference Serie

arXiv.org e-Print Archive

CERN Document Server

The Deployment of an Enhanced Model-Driven Architecture for Business Process Management

Author: McClatchey Richard
Publication venue
Publication date: 01/01/2018
Field of study

Business systems these days need to be agile to address the needs of a changing world. Business modelling requires business process management to be highly adaptable with the ability to support dynamic workflows, inter-application integration (potentially between businesses) and process reconfiguration. Designing systems with the in-built ability to cater for evolution is also becoming critical to their success. To handle change, systems need the capability to adapt as and when necessary to changes in users requirements. Allowing systems to be self-describing is one way to facilitate this. Using our implementation of a self-describing system, a so-called description-driven approach, new versions of data structures or processes can be created alongside older versions providing a log of changes to the underlying data schema and enabling the gathering of traceable (provenance) data. The CRISTAL software, which originated at CERN for handling physics data, uses versions of stored descriptions to define versions of data and workflows which can be evolved over time and thereby to handle evolving system needs. It has been customised for use in business applications as the Agilium-NG product. This paper reports on how the Agilium-NG software has enabled the deployment of an unique business process management solution that can be dynamically evolved to cater for changing user requirement.Comment: 11 pages, 4 figures, 1 table, 22nd International Database Engineering & Applications Symposium (IDEAS 2018). arXiv admin note: text overlap with arXiv:1402.5764, arXiv:1402.5753, arXiv:1502.0154

arXiv.org e-Print Archive

Crossref

Data provenance tracking as the basis for a biomedical virtual research environment

Author: McClatchey Richard
Publication venue
Publication date: 01/01/2018
Field of study

In complex data analyses it is increasingly important to capture information about the usage of data sets in addition to their preservation over time to ensure reproducibility of results, to verify the work of others and to ensure appropriate conditions data have been used for specific analyses. Scientific workflow based studies are beginning to realize the benefit of capturing this provenance of data and the activities used to process, transform and carry out studies on those data. This is especially true in biomedicine where the collection of data through experiment is costly and/or difficult to reproduce and where that data needs to be preserved over time. One way to support the development of workflows and their use in (collaborative) biomedical analyses is through the use of a Virtual Research Environment. The dynamic and distributed nature of Grid/Cloud computing, however, makes the capture and processing of provenance information a major research challenge. Furthermore most workflow provenance management services are designed only for data-flow oriented workflows and researchers are now realising that tracking data or workflows alone or separately is insufficient to support the scientific process. What is required for collaborative research is traceable and reproducible provenance support in a full orchestrated Virtual Research Environment (VRE) that enables researchers to define their studies in terms of the datasets and processes used, to monitor and visualize the outcome of their analyses and to log their results so that others users can call upon that acquired knowledge to support subsequent studies. We have extended the work carried out in the neuGRID and N4U projects in providing a so-called Virtual Laboratory to provide the foundation for a generic VRE in which sets of biomedical data (images, laboratory test results, patient records, epidemiological analyses etc.) and the workflows (pipelines) used to process those data, together with their provenance data and results sets are captured in the CRISTAL software. This paper outlines the functionality provided for a VRE by the Open Source CRISTAL software and examines how that can provide the foundations for a practice-based knowledge base for biomedicine and, potentially, for a wider research community

arXiv.org e-Print Archive

Crossref

UWE Bristol Research Repository

Scientific Workflow Repeatability through Cloud-Aware Provenance

Author: Hasham Khawar
McClatchey Richard
Munir Kamran
Shamdasani Jetendr
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 05/02/2015
Field of study

The transformations, analyses and interpretations of data in scientific workflows are vital for the repeatability and reliability of scientific workflows. This provenance of scientific workflows has been effectively carried out in Grid based scientific workflow systems. However, recent adoption of Cloud-based scientific workflows present an opportunity to investigate the suitability of existing approaches or propose new approaches to collect provenance information from the Cloud and to utilize it for workflow repeatability in the Cloud infrastructure. The dynamic nature of the Cloud in comparison to the Grid makes it difficult because resources are provisioned on-demand unlike the Grid. This paper presents a novel approach that can assist in mitigating this challenge. This approach can collect Cloud infrastructure information along with workflow provenance and can establish a mapping between them. This mapping is later used to re-provision resources on the Cloud. The repeatability of the workflow execution is performed by: (a) capturing the Cloud infrastructure information (virtual machine configuration) along with the workflow provenance, and (b) re-provisioning the similar resources on the Cloud and re-executing the workflow on them. The evaluation of an initial prototype suggests that the proposed approach is feasible and can be investigated further.Comment: 6 pages; 5 figures; 3 tables in Proceedings of the Recomputability 2014 workshop of the 7th IEEE/ACM International Conference on Utility and Cloud Computing (UCC 2014). London December 201

arXiv.org e-Print Archive

Crossref

Scientific workflow execution reproducibility using cloud-aware provenance

Author: Ahmad Mian Khawar Hasham
Publication venue
Publication date
Field of study

Scientific experiments and projects such as CMS and neuGRIDforYou (N4U) are annually producing data of the order of Peta-Bytes. They adopt scientific workflows to analyse this large amount of data in order to extract meaningful information. These workflows are executed over distributed resources, both compute and storage in nature, provided by the Grid and recently by the Cloud. The Cloud is becoming the playing field for scientists as it provides scalability and on-demand resource provisioning. Reproducing a workflow execution to verify results is vital for scientists and have proven to be a challenge. As per a study (Belhajjame et al. 2012) around 80% of workflows cannot be reproduced, and 12% of them are due to the lack of information about the execution environment. The dynamic and on-demand provisioning capability of the Cloud makes this more challenging. To overcome these challenges, this research aims to investigate how to capture the execution provenance of a scientific workflow along with the resources used to execute the workflow in a Cloud infrastructure. This information will then enable a scientist to reproduce workflow-based scientific experiments on the Cloud infrastructure by re-provisioning the similar resources on the Cloud.Provenance has been recognised as information that helps in debugging, verifying and reproducing a scientific workflow execution. Recent adoption of Cloud-based scientific workflows presents an opportunity to investigate the suitability of existing approaches or to propose new approaches to collect provenance information from the Cloud and to utilize it for workflow reproducibility on the Cloud. From literature analysis, it was found that the existing approaches for Grid or Cloud do not provide detailed resource information and also do not present an automatic provenance capturing approach for the Cloud environment. To mitigate the challenges and fulfil the knowledge gap, a provenance based approach, ReCAP, has been proposed in this thesis. In ReCAP, workflow execution reproducibility is achieved by (a) capturing the Cloud-aware provenance (CAP), b) re-provisioning similar resources on the Cloud and re-executing the workflow on them and (c) by comparing the provenance graph structure including the Cloud resource information, and outputs of workflows. ReCAP captures the Cloud resource information and links it with the workflow provenance to generate Cloud-aware provenance. The Cloud-aware provenance consists of configuration parameters relating to hardware and software describing a resource on the Cloud. This information once captured aids in re-provisioning the same execution infrastructure on the Cloud for workflow re-execution. Since resources on the Cloud can be used in static or dynamic (i.e. destroyed when a task is finished) manner, this presents a challenge for the devised provenance capturing approach. In order to deal with these scenarios, different capturing and mapping approaches have been presented in this thesis. These mapping approaches work outside the virtual machine and collect resource information from the Cloud middleware, thus they do not affect job performance. The impact of the collected Cloud resource information on the job as well as on the workflow execution has been evaluated through various experiments in this thesis. In ReCAP, the workflow reproducibility isverified by comparing the provenance graph structure, infrastructure details and the output produced by the workflows. To compare the provenance graphs, the captured provenance information including infrastructure details is translated to a graph model. These graphs of original execution and the reproduced execution are then compared in order to analyse their similarity. In this regard, two comparison approaches have been presented that can produce a qualitative analysis as well as quantitative analysis about the graph structure. The ReCAP framework and its constituent components are evaluated using different scientific workflows such as ReconAll and Montage from the domains of neuroscience (i.e. N4U) and astronomy respectively. The results have shown that ReCAP has been able to capture the Cloud-aware provenance and demonstrate the workflow execution reproducibility by re-provisioning the same resources on the Cloud. The results have also demonstrated that the provenance comparison approaches can determine the similarity between the two given provenance graphs. The results of workflow output comparison have shown that this approach is suitable to compare the outputs of scientific workflows, especially for deterministic workflows

UWE Bristol Research Repository

Recommended from our members

The National Transport Data Framework

Author: Landshoff Peter Vincent
Polak John
Publication venue
Publication date: 18/08/2008
Field of study

Report by Professor Peter Landshoff (Cambridge University) and Professor John Polak (Imperial College London) on a project for the Department for Transport. emails: [email protected] [email protected] NTDF is designed to be a resource for data owners to deposit descriptions into a central catalogue, so that people can search for data and find data and understand their characteristics. The value of this is to individuals, to commercial organizations, and to public bodies. For example, services that provide better information to travellers will help to make their journey less stressful and persuade them to make more use of public transport. Transport operators need very diverse information to help them plan developments to their services: demographic, geographical, economic etc. And policy makers need a similar range of information to help them decide how to divide their budget and afterwards to evaluate how valuable it has been.This work was supported by the Department for Transport (DfT)

Apollo (Cambridge)

Analysis traceability and provenance for HEP

Author: Andrew Branson
Hoekstra Rinke
Jetendr Shamdasani
McClatchey Richard
McClatchey Richard
Ram Sudha
Richard McClatchey
Shamdasani Jetendr
Wolstencroft Katherine
Zsolt Kovács
Publication venue: 'IOP Publishing'
Publication date: 01/01/2015
Field of study

This paper presents the use of the CRISTAL software in the N4U project. CRISTAL was used to create a set of provenance aware analysis tools for the Neuroscience domain. This paper advocates that the approach taken in N4U to build the analysis suite is sufficiently generic to be able to be applied to the HEP domain. A mapping to the PROV model for provenance interoperability is also presented and how this can be applied to the HEP domain for the interoperability of HEP analyses

arXiv.org e-Print Archive

Crossref

UWE Bristol Research Repository

Analysing the provenance tracking of business process management in the quality domain

Author: Blanc Coralie Lucie
Publication venue
Publication date
Field of study

This work presents a framework of how Provenance can be combined with BPM. This is achieved through a use case, from the Martine Spécialités enterprise which is using the Agilium BPM system (without the provenance functionality). Using the PROV Data Model, a graphical representation of the provenance tracking of the processes of the enterprise has been done and the functional requirements for combining provenance and BPM have been defined. A survey was submitted to the employees in order to see if adding the provenance functionality is something beneficial for them. The results have shown that provenance is adding value for the decision making during the execution of the processes, and that several benefits can be obtained from this, such as better decision making and time saving

UWE Bristol Research Repository

A Formal Study of Collaborative Access Control in Distributed Datalog

Author: Abiteboul Serge
Bourhis Pierre
Vianu Victor
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 19th International Conference on Database Theory (ICDT 2016)
Publication date: 01/01/2016
Field of study

We formalize and study a declaratively specified collaborative access control mechanism for data dissemination in a distributed environment. Data dissemination is specified using distributed datalog. Access control is also defined by datalog-style rules, at the relation level for extensional relations, and at the tuple level for intensional ones, based on the derivation of tuples. The model also includes a mechanism for "declassifying" data, that allows circumventing overly restrictive access control. We consider the complexity of determining whether a peer is allowed to access a given fact, and address the problem of achieving the goal of disseminating certain information under some access control policy. We also investigate the problem of information leakage, which occurs when a peer is able to infer facts to which the peer is not allowed access by the policy. Finally, we consider access control extended to facts equipped with provenance information, motivated by the many applications where such information is required. We provide semantics for access control with provenance, and establish the complexity of determining whether a peer may access a given fact together with its provenance. This work is motivated by the access control of the Webdamlog system, whose core features it formalizes

INRIA a CCSD electronic archive server

HAL Descartes

Dagstuhl Research Online Publication Server

Hal-Diderot

Glueing grids and clouds together: A service-oriented approach

Author: Anjum Ashiq
Bessis Nik
Branson Andrew
Hill Richard
McClatchey Richard
Publication venue: 'Inderscience Publishers'
Publication date: 01/01/2012
Field of study

Scientific communities are actively developing services to exploit the capabilities of service-oriented distributed systems. This exploitation requires services to be specified and developed for a range of activities such as management and scheduling of workflows and provenance capture and management. Most of these services are designed and developed for a particular community of scientific users. The constraints imposed by architectures, interfaces or platforms can restrict or even prohibit the free interchange of services between disparate scientific communities. Using the notion of 'Platform as a Service' (PaaS), we propose an architectural approach that addresses these limitations so that users can make use of a wider range of services without being concerned about the development of cross-platform middleware, wrappers or any need for bespoke applications. The proposed architecture shields the details of heterogeneous Grid/Cloud infrastructure within a brokering environment, thus enabling users to concentrate on the specification of higher level services. Copyright © 2012 Inderscience Enterprises Ltd

Crossref

UWE Bristol Research Repository

Edge Hill University Research Information Repository

UDORA - University of Derby Online Research Archive