Search CORE

42 research outputs found

Provenance explorer: Customized provenance views using semantic inferencing

Author: A. Gangemi
I. Altintas
J. Freire
J. Zhao
J.J. Carroll
J.M. Crawford
R.L.B. Morgan
S. Majithia
T. Oinn
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2006
Field of study

This paper presents Provenance Explorer, a secure provenance visualization tool, designed to dynamically generate customized views of scientific data provenance that depend on the viewer's requirements and/or access privileges. Using RDF and graph visualizations, it enables scientists to view the data, states and events associated with a scientific workflow in order to understand the scientific methodology and validate the results. Initially the Provenance Explorer presents a simple, coarse-grained view of the scientific process or experiment. However the GUI allows permitted users to expand links between nodes (input states, events and output states) to reveal more fine-grained information about particular sub-events and their inputs and outputs. Access control is implemented using Shibboleth to identify and authenticate users and XACML to define access control policies. The system also provides a platform for publishing scientific results. It enables users to select particular nodes within the visualized workflow and drag-and-drop them into an RDF package for publication or e-learning. The direct relationships between the individual components selected for such packages are inferred by the rule-inference engine

CiteSeerX

Crossref

University of Queensland eSpace

Provenance Explorer: A Tool for Viewing Provenance Trails and Constructing Scientific Publication Packages

Author: Cheung Kwok
Hunter Jane
Publication venue: Fraunhofer IPSI
Publication date: 01/01/2006
Field of study

University of Queensland eSpace

Visualization of Network Data Provenance

Author: Cheah You-Wei
Chen Peng
Ghoshal Devarshi
Jensen Scott
Luo Yuan
Plale Beth
Publication venue
Publication date: 01/09/2012
Field of study

Visualization facilitates the understanding of scientific data both through exploration and explanation of the visualized data. Provenance also contributes to the understanding of data by containing the contributing factors behind a result. The visualization of provenance, although supported in existing workflow management systems, generally focuses on small (medium) sized provenance data, lacking techniques to deal with big data with high complexity. This paper discusses visualization techniques developed for exploration and explanation of provenance, including layout algorithm, visual style, graph abstraction techniques, and graph matching algorithm, to deal with the high complexity. We demonstrate through application to two extensively analyzed case studies that involved provenance capture and use over three year projects, the first involving provenance of a satellite imagery ingest processing pipeline and the other of provenance in a large-scale computer network testbed

IUScholarWorks (University of Indiana)

Developing Materials Informatics Workbench for Expediting the Discovery of Novel Compound Materials

Author: Cheung Kwok Wai Steny
Publication venue: The University of Queensland, The Australian Institute for Bioengineering and Nanotechnology
Publication date: 01/09/2009
Field of study

University of Queensland eSpace

NeuroProv: Provenance data visualisation for neuroimaging analyses

Author: Bechhofer
Belhajjame
Bose
Cheung
Cohen
Corbett
Deelman
Deelman
Del Rio
Dey
Dolgert
El-Jaick
Fleisher
Freire
Gil
Goble
Kehoe
Kunde
Kwok
MacKenzie-Graham
Macko
Macko
McClatchey
McPhillips
Miksa
Moreau
Mueller
Munir
Oinn
Quan
Rusinek
Tian
Ton That
Yuan
Zhao
Publication venue: 'Elsevier BV'
Publication date: 01/06/2019
Field of study

© 2019 Elsevier Ltd Visualisation underpins the understanding of scientific data both through exploration and explanation of analysed data. Provenance strengthens the understanding of data by showing the process of how a result has been achieved. With the significant increase in data volumes and algorithm complexity, clinical researchers are struggling with information tracking, analysis reproducibility and the verification of scientific output. In addition, data coming from various heterogeneous sources with varying levels of trust in a collaborative environment adds to the uncertainty of the scientific outputs. This provides the motivation for provenance data capture and visualisation support for analyses. In this paper a system, NeuroProv is presented, to visualise provenance data in order to aid in the process of verification of scientific outputs, comparison of analyses, progression and evolution of results for neuroimaging analyses. The experimental results show the effectiveness of visualising provenance data for neuroimaging analyses

Crossref

UWE Bristol Research Repository

NeuroProv - A visualisation system to enhance the utility of provenance Data for neuroimaging analysis

Author: Arshad Bilal
Publication venue
Publication date
Field of study

E-Science platforms such as myGRID and NeuGRID for Users are growing at an amazing rate. One of the key barriers to their widespread use in practice is the lack of provenance data to support the reasoning and verification of experimental or analysis results. Clinical researchers use workflows to orchestrate the data present in e-science platforms in order to facilitate processing. Even though most systems capture provenance data and store it, systems rarely make use of it, thus limiting the exploitation of the true potential of such provenance. This thesis investigates mechanisms to visualise provenance data for neuroimaging analysis and to provide means to exploit the true potential of provenance data. In order to achieve this, a visualisation system has been implemented based on the use-cases that have been designed following requirements elicited for neuroimaging analysis. In this research a technique has been used to address the requirements of provenance visualisation for neuroimaging analysis. The prototype system has been tested against the provenance generated by NeuGRID for Users (N4U) as a proof of concept for our research. Different workflows have been visualised to study the efficacy of the proposed solution. Furthermore, evaluation metrics have been defined to determine whether the proposed solution is suitable for the purpose of the research conducted. The results show that the proposed visualisation system enhances the utility of provenance data for neuroimaging analysis and therefore the proposed research can be used to provide value to provenance data for neuroimaging analyses

UWE Bristol Research Repository

Paths Explored, Paths Omitted, Paths Obscured: Decision Points & Selective Reporting in End-to-End Data Analysis

Author: Battle Leilani
Callahan Steven P.
Cashman Dylan
Cockburn Andy
Collaboration Open Science
Computer Transparent
Creswell John W.
Cumming Geoff
Dragicevic Pierre
Dragicevic Pierre
Eiselmayer Alexander
Feger Sebastian S.
Feger Sebastian S.
Glenn Begley C.
Guest Greg
Guo Philip J
Hartmann Björn
Henderson Peter
Jun Eunice
Kale Alex
Kay Matthew
Kery Mary B.
Liu Jiali
Mary
Myers James D.
Nicolaci Pimentel Joao Felipe
Rae James R.
Rule Adam
Zgraggen Emanuel
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 08/01/2020
Field of study

Drawing reliable inferences from data involves many, sometimes arbitrary, decisions across phases of data collection, wrangling, and modeling. As different choices can lead to diverging conclusions, understanding how researchers make analytic decisions is important for supporting robust and replicable analysis. In this study, we pore over nine published research studies and conduct semi-structured interviews with their authors. We observe that researchers often base their decisions on methodological or theoretical concerns, but subject to constraints arising from the data, expertise, or perceived interpretability. We confirm that researchers may experiment with choices in search of desirable results, but also identify other reasons why researchers explore alternatives yet omit findings. In concert with our interviews, we also contribute visualizations for communicating decision processes throughout an analysis. Based on our results, we identify design opportunities for strengthening end-to-end analysis, for instance via tracking and meta-analysis of multiple decision paths

arXiv.org e-Print Archive

Crossref

Scalable And Secure Provenance Querying For Scientific Workflows And Its Application In Autism Study

Author: Bhuyan Fahima Amin
Publication venue: DigitalCommons@WayneState
Publication date: 01/01/2018
Field of study

In the era of big data, scientific workflows have become essential to automate scientific experiments and guarantee repeatability. As both data and workflow increase in their scale, requirements for having a data lineage management system commensurate with the complexity of the workflow also become necessary, calling for new scalable storage, query, and analytics infrastructure. This system that manages and preserves the derivation history and morphosis of data, known as provenance system, is essential for maintaining quality and trustworthiness of data products and ensuring reproducibility of scientific discoveries. With a flurry of research and increased adoption of scientific workflows in processing sensitive data, i.e., health and medication domain, securing information flow and instrumenting access privileges in the system have become a fundamental precursor to deploying large-scale scientific workflows. That has become more important now since today team of scientists around the world can collaborate on experiments using globally distributed sensitive data sources. Hence, it has become imperative to augment scientific workflow systems as well as the underlying provenance management systems with data security protocols. Provenance systems, void of data security protocol, are susceptible to vulnerability. In this dissertation research, we delineate how scientific workflows can improve therapeutic practices in autism spectrum disorders. The data-intensive computation inherent in these workflows and sensitive nature of the data, necessitate support for scalable, parallel and robust provenance queries and secured view of data. With that in perspective, we propose

OPQL^{Pig}

, a parallel, robust, reliable and scalable provenance query language and introduce the concept of access privilege inheritance in the provenance systems. We characterize desirable properties of role-based access control protocol in scientific workflows and demonstrate how the qualities are integrated into the workflow provenance systems as well. Finally, we describe how these concepts fit within the DATAVIEW workflow management system

Digital Commons@Wayne State University

Big Data Analytics in Static and Streaming Provenance

Author: Chen Peng
Publication venue: [Bloomington, Ind.] : Indiana University
Publication date: 01/04/2016
Field of study

Thesis (Ph.D.) - Indiana University, Informatics and Computing,, 2016With recent technological and computational advances, scientists increasingly integrate sensors and model simulations to understand spatial, temporal, social, and ecological relationships at unprecedented scale. Data provenance traces relationships of entities over time, thus providing a unique view on over-time behavior under study. However, provenance can be overwhelming in both volume and complexity; the now forecasting potential of provenance creates additional demands. This dissertation focuses on Big Data analytics of static and streaming provenance. It develops filters and a non-preprocessing slicing technique for in-situ querying of static provenance. It presents a stream processing framework for online processing of provenance data at high receiving rate. While the former is sufficient for answering queries that are given prior to the application start (forward queries), the latter deals with queries whose targets are unknown beforehand (backward queries). Finally, it explores data mining on large collections of provenance and proposes a temporal representation of provenance that can reduce the high dimensionality while effectively supporting mining tasks like clustering, classification and association rules mining; and the temporal representation can be further applied to streaming provenance as well. The proposed techniques are verified through software prototypes applied to Big Data provenance captured from computer network data, weather models, ocean models, remote (satellite) imagery data, and agent-based simulations of agricultural decision making

IUScholarWorks (University of Indiana)

Engineering Agile Big-Data Systems

Author: Davies Jim
Feeney Kevin
Welch James
Publication venue: 'Informa UK Limited'
Publication date: 28/11/2022
Field of study

To be effective, data-intensive systems require extensive ongoing customisation to reflect changing user requirements, organisational policies, and the structure and interpretation of the data they hold. Manual customisation is expensive, time-consuming, and error-prone. In large complex systems, the value of the data can be such that exhaustive testing is necessary before any new feature can be added to the existing design. In most cases, the precise details of requirements, policies and data will change during the lifetime of the system, forcing a choice between expensive modification and continued operation with an inefficient design.Engineering Agile Big-Data Systems outlines an approach to dealing with these problems in software and data engineering, describing a methodology for aligning these processes throughout product lifecycles. It discusses tools which can be used to achieve these goals, and, in a number of case studies, shows how the tools and methodology have been used to improve a variety of academic and business systems

Directory of Open Access Books (DOAB)