Search CORE

3,071 research outputs found

bdbms -- A Database Management System for Biological Data

Author: Aref Walid G.
Eltabakh Mohamed Y.
Ouzzani Mourad
Publication venue
Publication date: 01/12/2006
Field of study

Biologists are increasingly using databases for storing and managing their data. Biological databases typically consist of a mixture of raw data, metadata, sequences, annotations, and related data obtained from various sources. Current database technology lacks several functionalities that are needed by biological databases. In this paper, we introduce bdbms, an extensible prototype database management system for supporting biological data. bdbms extends the functionalities of current DBMSs to include: (1) Annotation and provenance management including storage, indexing, manipulation, and querying of annotation and provenance as first class objects in bdbms, (2) Local dependency tracking to track the dependencies and derivations among data items, (3) Update authorization to support data curation via content-based authorization, in contrast to identity-based authorization, and (4) New access methods and their supporting operators that support pattern matching on various types of compressed biological data types. This paper presents the design of bdbms along with the techniques proposed to support these functionalities including an extension to SQL. We also outline some open issues in building bdbms.Comment: This article is published under a Creative Commons License Agreement (http://creativecommons.org/licenses/by/2.5/.) You may copy, distribute, display, and perform the work, make derivative works and make commercial use of the work, but, you must attribute the work to the author and CIDR 2007. 3rd Biennial Conference on Innovative Data Systems Research (CIDR) January 710, 2007, Asilomar, California, US

arXiv.org e-Print Archive

CiteSeerX

Purdue E-Pubs

Querying and managing opm-compliant scientific workflow provenance

Author: Lim Chunhyeok
Publication venue: DigitalCommons@WayneState
Publication date: 01/01/2012
Field of study

Provenance, the metadata that records the derivation history of scientific results, is important in scientific workflows to interpret, validate, and analyze the result of scientific computing. Recently, to promote and facilitate interoperability among heterogeneous provenance systems, the Open Provenance Model (OPM) has been proposed and has played an important role in the community. In this dissertation, to efficiently query and manage OPM-compliant provenance, we first propose a provenance collection framework that collects both prospective provenance, which captures an abstract workflow specification as a recipe for future data derivation and retrospective provenance, which captures past workflow execution and data derivation information. We then propose a relational database-based provenance system, called OPMPROV that stores, reasons, and queries prospective and retrospective provenance, which is OPM-compliant provenance. We finally propose OPQL, an OPM-level provenance query language, that is directly defined over the OPM model. An OPQL query takes an OPM graph as input and produces an OPM graph as output; therefore, OPQL queries are not tightly coupled to the underlying provenance storage strategies. Our provenance store, provenance collection framework, and provenance query language feature the native support of the OPM model

Digital Commons@Wayne State University

AiiDA: Automated Interactive Infrastructure and Database for Computational Science

Author: Cepellotti Andrea
Kozinsky Boris
Marzari Nicola
Pizzi Giovanni
Sabatini Riccardo
Publication venue: 'Elsevier BV'
Publication date: 08/09/2015
Field of study

Computational science has seen in the last decades a spectacular rise in the scope, breadth, and depth of its efforts. Notwithstanding this prevalence and impact, it is often still performed using the renaissance model of individual artisans gathered in a workshop, under the guidance of an established practitioner. Great benefits could follow instead from adopting concepts and tools coming from computer science to manage, preserve, and share these computational efforts. We illustrate here our paradigm sustaining such vision, based around the four pillars of Automation, Data, Environment, and Sharing. We then discuss its implementation in the open-source AiiDA platform (http://www.aiida.net), that has been tuned first to the demands of computational materials science. AiiDA's design is based on directed acyclic graphs to track the provenance of data and calculations, and ensure preservation and searchability. Remote computational resources are managed transparently, and automation is coupled with data storage to ensure reproducibility. Last, complex sequences of calculations can be encoded into scientific workflows. We believe that AiiDA's design and its sharing capabilities will encourage the creation of social ecosystems to disseminate codes, data, and scientific workflows.Comment: 30 pages, 7 figure

arXiv.org e-Print Archive

Infoscience - École polytechnique fédérale de Lausanne

Towards Exascale Scientific Metadata Management

Author: Blanas Spyros
Byna Surendra
Publication venue
Publication date: 29/03/2015
Field of study

Advances in technology and computing hardware are enabling scientists from all areas of science to produce massive amounts of data using large-scale simulations or observational facilities. In this era of data deluge, effective coordination between the data production and the analysis phases hinges on the availability of metadata that describe the scientific datasets. Existing workflow engines have been capturing a limited form of metadata to provide provenance information about the identity and lineage of the data. However, much of the data produced by simulations, experiments, and analyses still need to be annotated manually in an ad hoc manner by domain scientists. Systematic and transparent acquisition of rich metadata becomes a crucial prerequisite to sustain and accelerate the pace of scientific innovation. Yet, ubiquitous and domain-agnostic metadata management infrastructure that can meet the demands of extreme-scale science is notable by its absence. To address this gap in scientific data management research and practice, we present our vision for an integrated approach that (1) automatically captures and manipulates information-rich metadata while the data is being produced or analyzed and (2) stores metadata within each dataset to permeate metadata-oblivious processes and to query metadata through established and standardized data access interfaces. We motivate the need for the proposed integrated approach using applications from plasma physics, climate modeling and neuroscience, and then discuss research challenges and possible solutions

arXiv.org e-Print Archive

eScholarship - University of California

A policy language definition for provenance in pervasive computing

Author: Alsiyami Aeshah Abdulkarim Dammad
Publication venue
Publication date: 29/05/2012
Field of study

Recent advances in computing technology have led to the paradigm of pervasive computing, which provides a means of simplifying daily life by integrating information processing into the everyday physical world. Pervasive computing draws its power from knowing the surroundings and creates an environment which combines computing and communication capabilities. Sensors that provide high-resolution spatial and instant measurement are most commonly used for forecasting, monitoring and real-time environmental modelling. Sensor data generated by a sensor network depends on several influences, such as the configuration and location of the sensors or the processing performed on the raw measurements. Storing sufficient metadata that gives meaning to the recorded observation is important in order to draw accurate conclusions or to enhance the reliability of the result dataset that uses this automatically collected data. This kind of metadata is called provenance data, as the origin of the data and the process by which it arrived from its origin are recorded. Provenance is still an exploratory field in pervasive computing and many open research questions are yet to emerge. The context information and the different characteristics of the pervasive environment call for different approaches to a provenance support system. This work implements a policy language definition that specifies the collecting model for provenance management systems and addresses the challenges that arise with stream data and sensor environments. The structure graph of the proposed model is mapped to the Open Provenance Model in order to facilitating the sharing of provenance data and interoperability with other systems. As provenance security has been recognized as one of the most important components in any provenance system, an access control language has been developed that is tailored to support the special requirements of provenance: fine-grained polices, privacy policies and preferences. Experimental evaluation findings show a reasonable overhead for provenance collecting and a reasonable time for provenance query performance, while a numerical analysis was used to evaluate the storage overhead

Sussex Research Online

Automatic vs Manual Provenance Abstractions: Mind the Gap

Author: Alper Pinar
Belhajjame Khalid
Goble Carole A.
Publication venue
Publication date: 21/05/2016
Field of study

In recent years the need to simplify or to hide sensitive information in provenance has given way to research on provenance abstraction. In the context of scientific workflows, existing research provides techniques to semi automatically create abstractions of a given workflow description, which is in turn used as filters over the workflow's provenance traces. An alternative approach that is commonly adopted by scientists is to build workflows with abstractions embedded into the workflow's design, such as using sub-workflows. This paper reports on the comparison of manual versus semi-automated approaches in a context where result abstractions are used to filter report-worthy results of computational scientific analyses. Specifically; we take a real-world workflow containing user-created design abstractions and compare these with abstractions created by ZOOM UserViews and Workflow Summaries systems. Our comparison shows that semi-automatic and manual approaches largely overlap from a process perspective, meanwhile, there is a dramatic mismatch in terms of data artefacts retained in an abstracted account of derivation. We discuss reasons and suggest future research directions.Comment: Preprint accepted to the 2016 workshop on the Theory and Applications of Provenance, TAPP 201

arXiv.org e-Print Archive

The University of Manchester - Institutional Repository

trackr: A Framework for Enhancing Discoverability and Reproducibility of Data Visualizations and Other Artifacts in R

Author: Becker Gabriel
Lawrence Michael
Moore Sara E.
Publication venue
Publication date: 13/06/2017
Field of study

Research is an incremental, iterative process, with new results relying and building upon previous ones. Scientists need to find, retrieve, understand, and verify results in order to confidently extend them, even when the results are their own. We present the trackr framework for organizing, automatically annotating, discovering, and retrieving results. We identify sources of automatically extractable metadata for computational results, and we define an extensible system for organizing, annotating, and searching for results based on these and other metadata. We present an open-source implementation of these concepts for plots, computational artifacts, and woven dynamic reports generated in the R statistical computing language

arXiv.org e-Print Archive

Workflow Provenance: from Modeling to Reporting

Author: Ferdous Rayhan 1992-
Publication venue: 'University of Saskatchewan Library'
Publication date: 12/03/2019
Field of study

Workflow provenance is a crucial part of a workflow system as it enables data lineage analysis, error tracking, workflow monitoring, usage pattern discovery, and so on. Integrating provenance into a workflow system or modifying a workflow system to capture or analyze different provenance information is burdensome, requiring extensive development because provenance mechanisms rely heavily on the modelling, architecture, and design of the workflow system. Various tools and technologies exist for logging events in a software system. Unfortunately, logging tools and technologies are not designed for capturing and analyzing provenance information. Workflow provenance is not only about logging, but also about retrieving workflow related information from logs. In this work, we propose a taxonomy of provenance questions and guided by these questions, we created a workflow programming model 'ProvMod' with a supporting run-time library to provide automated provenance and log analysis for any workflow system. The design and provenance mechanism of ProvMod is based on recommendations from prominent research and is easy to integrate into any workflow system. ProvMod offers Neo4j graph database support to manage semi-structured heterogeneous JSON logs. The log structure is adaptable to any NoSQL technology. For each provenance question in our taxonomy, ProvMod provides the answer with data visualization using Neo4j and the ELK Stack. Besides analyzing performance from various angles, we demonstrate the ease of integration by integrating ProvMod with Apache Taverna and evaluate ProvMod usability by engaging users. Finally, we present two Software Engineering research cases (clone detection and architecture extraction) where our proposed model ProvMod and provenance questions taxonomy can be applied to discover meaningful insights

eCommons@USASK

University of Saskatchewan Research Archive