Search CORE

1,614 research outputs found

bdbms -- A Database Management System for Biological Data

Author: Aref Walid G.
Eltabakh Mohamed Y.
Ouzzani Mourad
Publication venue
Publication date: 01/12/2006
Field of study

Biologists are increasingly using databases for storing and managing their data. Biological databases typically consist of a mixture of raw data, metadata, sequences, annotations, and related data obtained from various sources. Current database technology lacks several functionalities that are needed by biological databases. In this paper, we introduce bdbms, an extensible prototype database management system for supporting biological data. bdbms extends the functionalities of current DBMSs to include: (1) Annotation and provenance management including storage, indexing, manipulation, and querying of annotation and provenance as first class objects in bdbms, (2) Local dependency tracking to track the dependencies and derivations among data items, (3) Update authorization to support data curation via content-based authorization, in contrast to identity-based authorization, and (4) New access methods and their supporting operators that support pattern matching on various types of compressed biological data types. This paper presents the design of bdbms along with the techniques proposed to support these functionalities including an extension to SQL. We also outline some open issues in building bdbms.Comment: This article is published under a Creative Commons License Agreement (http://creativecommons.org/licenses/by/2.5/.) You may copy, distribute, display, and perform the work, make derivative works and make commercial use of the work, but, you must attribute the work to the author and CIDR 2007. 3rd Biennial Conference on Innovative Data Systems Research (CIDR) January 710, 2007, Asilomar, California, US

arXiv.org e-Print Archive

CiteSeerX

Purdue E-Pubs

Dynamic Provenance for SPARQL Update

Author: C. Gutierrez
G. Flouris
H. Halpin
J. Perèz
J.J. Carroll
L. Moreau
L. Moreau
N. Lopes
O. Udrea
P. Buneman
P. Buneman
R. Horne
R.T. Snodgrass
T.J. Green
V. Papavassiliou
Y. Theoharis
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2014
Field of study

While the Semantic Web currently can exhibit provenance information by using the W3C PROV standards, there is a "missing link" in connecting PROV to storing and querying for dynamic changes to RDF graphs using SPARQL. Solving this problem would be required for such clear use-cases as the creation of version control systems for RDF. While some provenance models and annotation techniques for storing and querying provenance data originally developed with databases or workflows in mind transfer readily to RDF and SPARQL, these techniques do not readily adapt to describing changes in dynamic RDF datasets over time. In this paper we explore how to adapt the dynamic copy-paste provenance model of Buneman et al. [2] to RDF datasets that change over time in response to SPARQL updates, how to represent the resulting provenance records themselves as RDF in a manner compatible with W3C PROV, and how the provenance information can be defined by reinterpreting SPARQL updates. The primary contribution of this paper is a semantic framework that enables the semantics of SPARQL Update to be used as the basis for a 'cut-and-paste' provenance model in a principled manner.Comment: Pre-publication version of ISWC 2014 pape

arXiv.org e-Print Archive

CiteSeerX

Crossref

Edinburgh Research Explorer

Towards structured sharing of raw and derived neuroimaging data across existing resources

Author: Ashish N.
Burns G. A.
Gadde S.
Ghosh S. S.
Helmer K.
Keator D. B.
Nichols B. N.
Steffener J.
Turner J. A.
Van Erp T. G. M.
Publication venue
Publication date: 06/03/2013
Field of study

Data sharing efforts increasingly contribute to the acceleration of scientific discovery. Neuroimaging data is accumulating in distributed domain-specific databases and there is currently no integrated access mechanism nor an accepted format for the critically important meta-data that is necessary for making use of the combined, available neuroimaging data. In this manuscript, we present work from the Derived Data Working Group, an open-access group sponsored by the Biomedical Informatics Research Network (BIRN) and the International Neuroimaging Coordinating Facility (INCF) focused on practical tools for distributed access to neuroimaging data. The working group develops models and tools facilitating the structured interchange of neuroimaging meta-data and is making progress towards a unified set of tools for such data and meta-data exchange. We report on the key components required for integrated access to raw and derived neuroimaging data as well as associated meta-data and provenance across neuroimaging resources. The components include (1) a structured terminology that provides semantic context to data, (2) a formal data model for neuroimaging with robust tracking of data provenance, (3) a web service-based application programming interface (API) that provides a consistent mechanism to access and query the data model, and (4) a provenance library that can be used for the extraction of provenance data by image analysts and imaging software developers. We believe that the framework and set of tools outlined in this manuscript have great potential for solving many of the issues the neuroimaging community faces when sharing raw and derived neuroimaging data across the various existing database systems for the purpose of accelerating scientific discovery

arXiv.org e-Print Archive

Crossref

PubMed Central

eScholarship - University of California

Using Provenance to support Good Laboratory Practice in Grid Environments

Author: Kloss Guy K.
Ney Miriam
Schreiber Andreas
Publication venue
Publication date: 12/12/2011
Field of study

Conducting experiments and documenting results is daily business of scientists. Good and traceable documentation enables other scientists to confirm procedures and results for increased credibility. Documentation and scientific conduct are regulated and termed as "good laboratory practice." Laboratory notebooks are used to record each step in conducting an experiment and processing data. Originally, these notebooks were paper based. Due to computerised research systems, acquired data became more elaborate, thus increasing the need for electronic notebooks with data storage, computational features and reliable electronic documentation. As a new approach to this, a scientific data management system (DataFinder) is enhanced with features for traceable documentation. Provenance recording is used to meet requirements of traceability, and this information can later be queried for further analysis. DataFinder has further important features for scientific documentation: It employs a heterogeneous and distributed data storage concept. This enables access to different types of data storage systems (e. g. Grid data infrastructure, file servers). In this chapter we describe a number of building blocks that are available or close to finished development. These components are intended for assembling an electronic laboratory notebook for use in Grid environments, while retaining maximal flexibility on usage scenarios as well as maximal compatibility overlap towards each other. Through the usage of such a system, provenance can successfully be used to trace the scientific workflow of preparation, execution, evaluation, interpretation and archiving of research data. The reliability of research results increases and the research process remains transparent to remote research partners.Comment: Book Chapter for "Data Provenance and Data Management for eScience," of Studies in Computational Intelligence series, Springer. 25 pages, 8 figure

arXiv.org e-Print Archive

Scipedia

Provision of an integrated data analysis platform for computational neuroscience experiments

Author: Branson Andrew
Hasham Khawar
Kiani Saad Liaquat
McClatchey Richard
Munir Kamran
Shamdasani Jetendr
Publication venue: 'Emerald'
Publication date: 01/01/2014
Field of study

© Emerald Group Publishing Limited. Purpose – The purpose of this paper is to provide an integrated analysis base to facilitate computational neuroscience experiments, following a user-led approach to provide access to the integrated neuroscience data and to enable the analyses demanded by the biomedical research community. Design/methodology/approach – The design and development of the N4U analysis base and related information services addresses the existing research and practical challenges by offering an integrated medical data analysis environment with the necessary building blocks for neuroscientists to optimally exploit neuroscience workflows, large image data sets and algorithms to conduct analyses. Findings – The provision of an integrated e-science environment of computational neuroimaging can enhance the prospects, speed and utility of the data analysis process for neurodegenerative diseases. Originality/value – The N4U analysis base enables conducting biomedical data analyses by indexing and interlinking the neuroimaging and clinical study data sets stored on the grid infrastructure, algorithms and scientific workflow definitions along with their associated provenance information

Crossref

UWE Bristol Research Repository

An Architecture for Provenance Systems

Author: Groth Paul
Jiang Sheng
Miles Simon
Moreau Luc
Munroe Steve
Tan Victor
Tsasakou Sofia
Publication venue: s.n.
Publication date: 01/02/2006
Field of study

This document covers the logical and process architectures of provenance systems. The logical architecture identifies key roles and their interactions, whereas the process architecture discusses distribution and security. A fundamental aspect of our presentation is its technology-independent nature, which makes it reusable: the principles that are exposed in this document may be applied to different technologies

Southampton (e-Prints Soton)

King's Research Portal

trackr: A Framework for Enhancing Discoverability and Reproducibility of Data Visualizations and Other Artifacts in R

Author: Becker Gabriel
Lawrence Michael
Moore Sara E.
Publication venue
Publication date: 13/06/2017
Field of study

Research is an incremental, iterative process, with new results relying and building upon previous ones. Scientists need to find, retrieve, understand, and verify results in order to confidently extend them, even when the results are their own. We present the trackr framework for organizing, automatically annotating, discovering, and retrieving results. We identify sources of automatically extractable metadata for computational results, and we define an extensible system for organizing, annotating, and searching for results based on these and other metadata. We present an open-source implementation of these concepts for plots, computational artifacts, and woven dynamic reports generated in the R statistical computing language

arXiv.org e-Print Archive