Search CORE

7 research outputs found

From scientific workflow patterns to 5-star linked open data

Author: Bihouée Audrey
Gaignard Alban
Skaf-Molli Hala
Publication venue: HAL CCSD
Publication date: 08/06/2016
Field of study

International audienceScientific Workflow management systems have been largely adopted by data-intensive science communities. Many efforts have been dedicated to the representation and exploitation of prove-nance to improve reproducibility in data-intensive sciences. However , few works address the mining of provenance graphs to annotate the produced data with domain-specific context for better interpretation and sharing of results. In this paper, we propose PoeM, a lightweight framework for mining provenance in scientific workflows. PoeM allows to produce linked in silico experiment reports based on workflow runs. PoeM leverages semantic web technologies and reference vocabularies (PROV-O, P-Plan) to generate provenance mining rules and finally assemble linked scientific experiment reports (Micropublications, Experimental Factor Ontology). Preliminary experiments demonstrate that PoeM enables the querying and sharing of Galaxy 1-processed genomic data as 5-star linked datasets

TOWARDS HARNESSING COMPUTATIONAL WORKFLOW PROVENANCE FOR EXPERIMENT REPORTING

Author: Alper Pinar
Publication venue
Publication date: 01/08/2016
Field of study

The University of Manchester - Institutional Repository

FAIR Computational Workflows

Author: Cohen-Boulakia Sarah
Crusoe Michael R.
Garijo Daniel
Gil Yolanda
Goble Carole
Peters Kristian
Schober Daniel
Soiland-Reyes Stian
Publication venue
Publication date: 06/07/2019
Field of study

Computational workflows describe the complex multi-step methods that are used for data collection, data preparation, analytics, predictive modelling, and simulation that lead to new data products. They can inherently contribute to the FAIR data principles: by processing data according to established metadata; by creating metadata themselves during the processing of data; and by tracking and recording data provenance. These properties aid data quality assessment and contribute to secondary data usage. Moreover, workflows are digital objects in their own right. This paper argues that FAIR principles for workflows need to address their specific nature in terms of their composition of executable software steps, their provenance, and their development.Accepted for Data Intelligence special issue: FAIR best practices 2019. Carole Goble acknowledges funding by BioExcel2 (H2020 823830), IBISBA1.0 (H2020 730976) and EOSCLife (H2020 824087) . Daniel Schober's work was financed by Phenomenal (H2020 654241) at the initiation-phase of this effort, current work in kind contribution. Kristian Peters is funded by the German Network for Bioinformatics Infrastructure (de.NBI) and acknowledges BMBF funding under grant number 031L0107. Stian Soiland-Reyes is funded by BioExcel2 (H2020 823830). Daniel Garijo, Yolanda Gil, gratefully acknowledge support from DARPA award W911NF-18-1-0027, NIH award 1R01AG059874-01, and NSF award ICER-1740683

ZENODO

NEUROSURGERY ENTHUSIASTIC WOMEN SOCIETY

The University of Manchester - Institutional Repository

An Incremental Learning Method to Support the Annotation of Workflows with Data-to-Data Relations

Author: C Francescomarino Di
D Garijo
DR Ferreira
J Liu
J Poelmans
J Poelmans
JM Gómez-Pérez
K Wolstencroft
P Alper
R Godin
R Wille
SO Kuznetsov
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2016
Field of study

Workflow formalisations are often focused on the representation of a process with the primary objective to support execution. However, there are scenarios where what needs to be represented is the effect of the process on the data artefacts involved, for example when reasoning over the corresponding data policies. This can be achieved by annotating the workflow with the semantic relations that occur between these data artefacts. However, manually producing such annotations is difficult and time consuming. In this paper we introduce a method based on recommendations to support users in this task. Our approach is centred on an incremental rule association mining technique that allows to compensate the cold start problem due to the lack of a training set of annotated workflows. We discuss the implementation of a tool relying on this approach and how its application on an existing repository of workflows effectively enable the generation of such annotations

Crossref

Open Research Online (The Open University)

Knowledge Components and Methods for Policy Propagation in Data Flows

Author: Daga Enrico
Publication venue
Publication date: 20/06/2018
Field of study

Data-oriented systems and applications are at the centre of current developments of the World Wide Web (WWW). On the Web of Data (WoD), information sources can be accessed and processed for many purposes. Users need to be aware of any licences or terms of use, which are associated with the data sources they want to use. Conversely, publishers need support in assigning the appropriate policies alongside the data they distribute. In this work, we tackle the problem of policy propagation in data flows - an expression that refers to the way data is consumed, manipulated and produced within processes. We pose the question of what kind of components are required, and how they can be acquired, managed, and deployed, to support users on deciding what policies propagate to the output of a data-intensive system from the ones associated with its input. We observe three scenarios: applications of the Semantic Web, workflow reuse in Open Science, and the exploitation of urban data in City Data Hubs. Starting from the analysis of Semantic Web applications, we propose a data-centric approach to semantically describe processes as data flows: the Datanode ontology, which comprises a hierarchy of the possible relations between data objects. By means of Policy Propagation Rules, it is possible to link data flow steps and policies derivable from semantic descriptions of data licences. We show how these components can be designed, how they can be effectively managed, and how to reason efficiently with them. In a second phase, the developed components are verified using a Smart City Data Hub as a case study, where we developed an end-to-end solution for policy propagation. Finally, we evaluate our approach and report on a user study aimed at assessing both the quality and the value of the proposed solution

ZENODO

Open Research Online (The Open University)

NEUROSURGERY ENTHUSIASTIC WOMEN SOCIETY

LabelFlow Framework for Annotating Workflow Provenance

Author: Alper Pinar
Belhajjame Khalid
Curcin Vasa
Goble Carole A.
Publication venue: 'MDPI AG'
Publication date: 01/02/2018
Field of study

Scientists routinely analyse and share data for others to use. Successful data (re)use relies on having metadata describing the context of analysis of data. In many disciplines the creation of contextual metadata is referred to as reporting. One method of implementing analyses is with workflows. A stand-out feature of workflows is their ability to record provenance from executions. Provenance is useful when analyses are executed with changing parameters (changing contexts) and results need to be traced to respective parameters. In this paper we investigate whether provenance can be exploited to support reporting. Specifically; we outline a case-study based on a real-world workflow and set of reporting queries. We observe that provenance, as collected from workflow executions, is of limited use for reporting, as it supports queries partially. We identify that this is due to the generic nature of provenance, its lack of domain-specific contextual metadata. We observe that the required information is available in implicit form, embedded in data. We describe LabelFlow, a framework comprised of four Labelling Operators for decorating provenance with domain-specific Labels. LabelFlow can be instantiated for a domain by plugging it with domain-specific metadata extractors. We provide a tool that takes as input a workflow, and produces as output a Labelling Pipeline for that workflow, comprised of Labelling Operators. We revisit the case-study and show how Labels provide a more complete implementation of reporting queries

Multidisciplinary Digital Publishing Institute

Directory of Open Access Journals

The University of Manchester - Institutional Repository

King's Research Portal

LabelFlow Framework for Annotating Workflow Provenance

Author: Booch
Callahan
Davenhall
Greenberg
Missier
Missier
Moreau
Moreau
Rossum
Publication venue: 'MDPI AG'
Publication date
Field of study

Crossref