3,939 research outputs found
Automatic vs Manual Provenance Abstractions: Mind the Gap
In recent years the need to simplify or to hide sensitive information in
provenance has given way to research on provenance abstraction. In the context
of scientific workflows, existing research provides techniques to semi
automatically create abstractions of a given workflow description, which is in
turn used as filters over the workflow's provenance traces. An alternative
approach that is commonly adopted by scientists is to build workflows with
abstractions embedded into the workflow's design, such as using sub-workflows.
This paper reports on the comparison of manual versus semi-automated approaches
in a context where result abstractions are used to filter report-worthy results
of computational scientific analyses. Specifically; we take a real-world
workflow containing user-created design abstractions and compare these with
abstractions created by ZOOM UserViews and Workflow Summaries systems. Our
comparison shows that semi-automatic and manual approaches largely overlap from
a process perspective, meanwhile, there is a dramatic mismatch in terms of data
artefacts retained in an abstracted account of derivation. We discuss reasons
and suggest future research directions.Comment: Preprint accepted to the 2016 workshop on the Theory and Applications
of Provenance, TAPP 201
A Linked Data Approach to Sharing Workflows and Workflow Results
A bioinformatics analysis pipeline is often highly elaborate, due to the inherent complexity of biological systems and the variety and size of datasets. A digital equivalent of the ‘Materials and Methods’ section in wet laboratory publications would be highly beneficial to bioinformatics, for evaluating evidence and examining data across related experiments, while introducing the potential to find associated resources and integrate them as data and services. We present initial steps towards preserving bioinformatics ‘materials and methods’ by exploiting the workflow paradigm for capturing the design of a data analysis pipeline, and RDF to link the workflow, its component services, run-time provenance, and a personalized biological interpretation of the results. An example shows the reproduction of the unique graph of an analysis procedure, its results, provenance, and personal interpretation of a text mining experiment. It links data from Taverna, myExperiment.org, BioCatalogue.org, and ConceptWiki.org. The approach is relatively ‘light-weight’ and unobtrusive to bioinformatics users
An Incremental Learning Method to Support the Annotation of Workflows with Data-to-Data Relations
Workflow formalisations are often focused on the representation of a process with the primary objective to support execution. However, there are scenarios where what needs to be represented is the effect of the process on the data artefacts involved, for example when reasoning over the corresponding data policies. This can be achieved by annotating the workflow with the semantic relations that occur between these data artefacts. However, manually producing such annotations is difficult and time consuming. In this paper we introduce a method based on recommendations to support users in this task. Our approach is centred on an incremental rule association mining technique that allows to compensate the cold start problem due to the lack of a training set of annotated workflows. We discuss the implementation of a tool relying on this approach and how its application on an existing repository of workflows effectively enable the generation of such annotations
Chemical information matters: an e-Research perspective on information and data sharing in the chemical sciences
Recently, a number of organisations have called for open access to scientific information and especially to the data obtained from publicly funded research, among which the Royal Society report and the European Commission press release are particularly notable. It has long been accepted that building research on the foundations laid by other scientists is both effective and efficient. Regrettably, some disciplines, chemistry being one, have been slow to recognise the value of sharing and have thus been reluctant to curate their data and information in preparation for exchanging it. The very significant increases in both the volume and the complexity of the datasets produced has encouraged the expansion of e-Research, and stimulated the development of methodologies for managing, organising, and analysing "big data". We review the evolution of cheminformatics, the amalgam of chemistry, computer science, and information technology, and assess the wider e-Science and e-Research perspective. Chemical information does matter, as do matters of communicating data and collaborating with data. For chemistry, unique identifiers, structure representations, and property descriptors are essential to the activities of sharing and exchange. Open science entails the sharing of more than mere facts: for example, the publication of negative outcomes can facilitate better understanding of which synthetic routes to choose, an aspiration of the Dial-a-Molecule Grand Challenge. The protagonists of open notebook science go even further and exchange their thoughts and plans. We consider the concepts of preservation, curation, provenance, discovery, and access in the context of the research lifecycle, and then focus on the role of metadata, particularly the ontologies on which the emerging chemical Semantic Web will depend. Among our conclusions, we present our choice of the "grand challenges" for the preservation and sharing of chemical information
User Applications Driven by the Community Contribution Framework MPContribs in the Materials Project
This work discusses how the MPContribs framework in the Materials Project
(MP) allows user-contributed data to be shown and analyzed alongside the core
MP database. The Materials Project is a searchable database of electronic
structure properties of over 65,000 bulk solid materials that is accessible
through a web-based science-gateway. We describe the motivation for enabling
user contributions to the materials data and present the framework's features
and challenges in the context of two real applications. These use-cases
illustrate how scientific collaborations can build applications with their own
"user-contributed" data using MPContribs. The Nanoporous Materials Explorer
application provides a unique search interface to a novel dataset of hundreds
of thousands of materials, each with tables of user-contributed values related
to material adsorption and density at varying temperature and pressure. The
Unified Theoretical and Experimental x-ray Spectroscopy application discusses a
full workflow for the association, dissemination and combined analyses of
experimental data from the Advanced Light Source with MP's theoretical core
data, using MPContribs tools for data formatting, management and exploration.
The capabilities being developed for these collaborations are serving as the
model for how new materials data can be incorporated into the Materials Project
website with minimal staff overhead while giving powerful tools for data search
and display to the user community.Comment: 12 pages, 5 figures, Proceedings of 10th Gateway Computing
Environments Workshop (2015), to be published in "Concurrency in Computation:
Practice and Experience
- …