275 research outputs found
Towards Provenance and Traceability in CRISTAL for HEP
This paper discusses the CRISTAL object lifecycle management system and its
use in provenance data management and the traceability of system events. This
software was initially used to capture the construction and calibration of the
CMS ECAL detector at CERN for later use by physicists in their data analysis.
Some further uses of CRISTAL in different projects (CMS, neuGRID and N4U) are
presented as examples of its flexible data model. From these examples,
applications are drawn for the High Energy Physics domain and some initial
ideas for its use in data preservation HEP are outlined in detail in this
paper. Currently investigations are underway to gauge the feasibility of using
the N4U Analysis Service or a derivative of it to address the requirements of
data and analysis logging and provenance capture within the HEP long term data
analysis environment.Comment: 5 pages and 1 figure. 20th International Conference on Computing in
High Energy and Nuclear Physics (CHEP13). 14-18th October 2013. Amsterdam,
Netherlands. To appear in Journal of Physics Conference Serie
The Deployment of an Enhanced Model-Driven Architecture for Business Process Management
Business systems these days need to be agile to address the needs of a
changing world. Business modelling requires business process management to be
highly adaptable with the ability to support dynamic workflows,
inter-application integration (potentially between businesses) and process
reconfiguration. Designing systems with the in-built ability to cater for
evolution is also becoming critical to their success. To handle change, systems
need the capability to adapt as and when necessary to changes in users
requirements. Allowing systems to be self-describing is one way to facilitate
this. Using our implementation of a self-describing system, a so-called
description-driven approach, new versions of data structures or processes can
be created alongside older versions providing a log of changes to the
underlying data schema and enabling the gathering of traceable (provenance)
data. The CRISTAL software, which originated at CERN for handling physics data,
uses versions of stored descriptions to define versions of data and workflows
which can be evolved over time and thereby to handle evolving system needs. It
has been customised for use in business applications as the Agilium-NG product.
This paper reports on how the Agilium-NG software has enabled the deployment of
an unique business process management solution that can be dynamically evolved
to cater for changing user requirement.Comment: 11 pages, 4 figures, 1 table, 22nd International Database Engineering
& Applications Symposium (IDEAS 2018). arXiv admin note: text overlap with
arXiv:1402.5764, arXiv:1402.5753, arXiv:1502.0154
Data provenance tracking as the basis for a biomedical virtual research environment
In complex data analyses it is increasingly important to capture information about the usage of data sets in addition to their preservation over time to ensure reproducibility of results, to verify the work of others and to ensure appropriate conditions data have been used for specific analyses. Scientific workflow based studies are beginning to realize the benefit of capturing this provenance of data and the activities used to process, transform and carry out studies on those data. This is especially true in biomedicine where the collection of data through experiment is costly and/or difficult to reproduce and where that data needs to be preserved over time. One way to support the development of workflows and their use in (collaborative) biomedical analyses is through the use of a Virtual Research Environment. The dynamic and distributed nature of Grid/Cloud computing, however, makes the capture and processing of provenance information a major research challenge. Furthermore most workflow provenance management services are designed only for data-flow oriented workflows and researchers are now realising that tracking data or workflows alone or separately is insufficient to support the scientific process. What is required for collaborative research is traceable and reproducible provenance support in a full orchestrated Virtual Research Environment (VRE) that enables researchers to define their studies in terms of the datasets and processes used, to monitor and visualize the outcome of their analyses and to log their results so that others users can call upon that acquired knowledge to support subsequent studies. We have extended the work carried out in the neuGRID and N4U projects in providing a so-called Virtual Laboratory to provide the foundation for a generic VRE in which sets of biomedical data (images, laboratory test results, patient records, epidemiological analyses etc.) and the workflows (pipelines) used to process those data, together with their provenance data and results sets are captured in the CRISTAL software. This paper outlines the functionality provided for a VRE by the Open Source CRISTAL software and examines how that can provide the foundations for a practice-based knowledge base for biomedicine and, potentially, for a wider research community
Scientific Workflow Repeatability through Cloud-Aware Provenance
The transformations, analyses and interpretations of data in scientific
workflows are vital for the repeatability and reliability of scientific
workflows. This provenance of scientific workflows has been effectively carried
out in Grid based scientific workflow systems. However, recent adoption of
Cloud-based scientific workflows present an opportunity to investigate the
suitability of existing approaches or propose new approaches to collect
provenance information from the Cloud and to utilize it for workflow
repeatability in the Cloud infrastructure. The dynamic nature of the Cloud in
comparison to the Grid makes it difficult because resources are provisioned
on-demand unlike the Grid. This paper presents a novel approach that can assist
in mitigating this challenge. This approach can collect Cloud infrastructure
information along with workflow provenance and can establish a mapping between
them. This mapping is later used to re-provision resources on the Cloud. The
repeatability of the workflow execution is performed by: (a) capturing the
Cloud infrastructure information (virtual machine configuration) along with the
workflow provenance, and (b) re-provisioning the similar resources on the Cloud
and re-executing the workflow on them. The evaluation of an initial prototype
suggests that the proposed approach is feasible and can be investigated
further.Comment: 6 pages; 5 figures; 3 tables in Proceedings of the Recomputability
2014 workshop of the 7th IEEE/ACM International Conference on Utility and
Cloud Computing (UCC 2014). London December 201
Scientific workflow execution reproducibility using cloud-aware provenance
Scientific experiments and projects such as CMS and neuGRIDforYou (N4U) are annually producing data of the order of Peta-Bytes. They adopt scientific workflows to analyse this large amount of data in order to extract meaningful information. These workflows are executed over distributed resources, both compute and storage in nature, provided by the Grid and recently by the Cloud. The Cloud is becoming the playing field for scientists as it provides scalability and on-demand resource provisioning. Reproducing a workflow execution to verify results is vital for scientists and have proven to be a challenge. As per a study (Belhajjame et al. 2012) around 80% of workflows cannot be reproduced, and 12% of them are due to the lack of information about the execution environment. The dynamic and on-demand provisioning capability of the Cloud makes this more challenging. To overcome these challenges, this research aims to investigate how to capture the execution provenance of a scientific workflow along with the resources used to execute the workflow in a Cloud infrastructure. This information will then enable a scientist to reproduce workflow-based scientific experiments on the Cloud infrastructure by re-provisioning the similar resources on the Cloud.Provenance has been recognised as information that helps in debugging, verifying and reproducing a scientific workflow execution. Recent adoption of Cloud-based scientific workflows presents an opportunity to investigate the suitability of existing approaches or to propose new approaches to collect provenance information from the Cloud and to utilize it for workflow reproducibility on the Cloud. From literature analysis, it was found that the existing approaches for Grid or Cloud do not provide detailed resource information and also do not present an automatic provenance capturing approach for the Cloud environment. To mitigate the challenges and fulfil the knowledge gap, a provenance based approach, ReCAP, has been proposed in this thesis. In ReCAP, workflow execution reproducibility is achieved by (a) capturing the Cloud-aware provenance (CAP), b) re-provisioning similar resources on the Cloud and re-executing the workflow on them and (c) by comparing the provenance graph structure including the Cloud resource information, and outputs of workflows. ReCAP captures the Cloud resource information and links it with the workflow provenance to generate Cloud-aware provenance. The Cloud-aware provenance consists of configuration parameters relating to hardware and software describing a resource on the Cloud. This information once captured aids in re-provisioning the same execution infrastructure on the Cloud for workflow re-execution. Since resources on the Cloud can be used in static or dynamic (i.e. destroyed when a task is finished) manner, this presents a challenge for the devised provenance capturing approach. In order to deal with these scenarios, different capturing and mapping approaches have been presented in this thesis. These mapping approaches work outside the virtual machine and collect resource information from the Cloud middleware, thus they do not affect job performance. The impact of the collected Cloud resource information on the job as well as on the workflow execution has been evaluated through various experiments in this thesis. In ReCAP, the workflow reproducibility isverified by comparing the provenance graph structure, infrastructure details and the output produced by the workflows. To compare the provenance graphs, the captured provenance information including infrastructure details is translated to a graph model. These graphs of original execution and the reproduced execution are then compared in order to analyse their similarity. In this regard, two comparison approaches have been presented that can produce a qualitative analysis as well as quantitative analysis about the graph structure. The ReCAP framework and its constituent components are evaluated using different scientific workflows such as ReconAll and Montage from the domains of neuroscience (i.e. N4U) and astronomy respectively. The results have shown that ReCAP has been able to capture the Cloud-aware provenance and demonstrate the workflow execution reproducibility by re-provisioning the same resources on the Cloud. The results have also demonstrated that the provenance comparison approaches can determine the similarity between the two given provenance graphs. The results of workflow output comparison have shown that this approach is suitable to compare the outputs of scientific workflows, especially for deterministic workflows
Recommended from our members
The National Transport Data Framework
Report by Professor Peter Landshoff (Cambridge University) and
Professor John Polak (Imperial College London) on a project for
the Department for Transport.
emails: [email protected] [email protected] NTDF is designed to be a resource for data owners to deposit descriptions
into a central catalogue, so that people can search for data and find data
and understand their characteristics. The value of this is to individuals, to
commercial organizations, and to public bodies. For example, services that
provide better information to travellers will help to make their journey
less stressful and persuade them to make more use of public transport.
Transport operators need very diverse information to help them
plan developments to their services: demographic, geographical, economic etc.
And policy makers need a similar range of information to help them decide
how to divide their budget and afterwards to evaluate how valuable it has
been.This work was supported by the Department for Transport (DfT)
Analysis traceability and provenance for HEP
This paper presents the use of the CRISTAL software in the N4U project. CRISTAL was used to create a set of provenance aware analysis tools for the Neuroscience domain. This paper advocates that the approach taken in N4U to build the analysis suite is sufficiently generic to be able to be applied to the HEP domain. A mapping to the PROV model for provenance interoperability is also presented and how this can be applied to the HEP domain for the interoperability of HEP analyses
Analysing the provenance tracking of business process management in the quality domain
This work presents a framework of how Provenance can be combined with BPM. This is achieved through a use case, from the Martine Spécialités enterprise which is using the Agilium BPM system (without the provenance functionality). Using the PROV Data Model, a graphical representation of the provenance tracking of the processes of the enterprise has been done and the functional requirements for combining provenance and BPM have been defined. A survey was submitted to the employees in order to see if adding the provenance functionality is something beneficial for them. The results have shown that provenance is adding value for the decision making during the execution of the processes, and that several benefits can be obtained from this, such as better decision making and time saving
A Formal Study of Collaborative Access Control in Distributed Datalog
We formalize and study a declaratively specified collaborative access control mechanism for data dissemination in a distributed environment. Data dissemination is specified using distributed datalog. Access control is also defined by datalog-style rules, at the relation level for extensional relations, and at the tuple level for intensional ones, based on the derivation of tuples. The model also includes a mechanism for "declassifying" data, that allows circumventing overly restrictive access control. We consider the complexity of determining whether a peer is allowed to access a given fact, and address the problem of achieving the goal of disseminating certain information under some access control policy. We also investigate the problem of information leakage, which occurs when a peer is able to infer facts to which the peer is not allowed access by the policy. Finally, we consider access control extended to facts equipped with provenance information, motivated by the many applications where such information is required. We provide semantics for access control with provenance, and establish the complexity of determining whether a peer may access a given fact together with its provenance. This work is motivated by the access control of the Webdamlog system, whose core features it formalizes
Glueing grids and clouds together: A service-oriented approach
Scientific communities are actively developing services to exploit the capabilities of service-oriented distributed systems. This exploitation requires services to be specified and developed for a range of activities such as management and scheduling of workflows and provenance capture and management. Most of these services are designed and developed for a particular community of scientific users. The constraints imposed by architectures, interfaces or platforms can restrict or even prohibit the free interchange of services between disparate scientific communities. Using the notion of 'Platform as a Service' (PaaS), we propose an architectural approach that addresses these limitations so that users can make use of a wider range of services without being concerned about the development of cross-platform middleware, wrappers or any need for bespoke applications. The proposed architecture shields the details of heterogeneous Grid/Cloud infrastructure within a brokering environment, thus enabling users to concentrate on the specification of higher level services. Copyright © 2012 Inderscience Enterprises Ltd
- âŠ