19,658 research outputs found
bdbms -- A Database Management System for Biological Data
Biologists are increasingly using databases for storing and managing their
data. Biological databases typically consist of a mixture of raw data,
metadata, sequences, annotations, and related data obtained from various
sources. Current database technology lacks several functionalities that are
needed by biological databases. In this paper, we introduce bdbms, an
extensible prototype database management system for supporting biological data.
bdbms extends the functionalities of current DBMSs to include: (1) Annotation
and provenance management including storage, indexing, manipulation, and
querying of annotation and provenance as first class objects in bdbms, (2)
Local dependency tracking to track the dependencies and derivations among data
items, (3) Update authorization to support data curation via content-based
authorization, in contrast to identity-based authorization, and (4) New access
methods and their supporting operators that support pattern matching on various
types of compressed biological data types. This paper presents the design of
bdbms along with the techniques proposed to support these functionalities
including an extension to SQL. We also outline some open issues in building
bdbms.Comment: This article is published under a Creative Commons License Agreement
(http://creativecommons.org/licenses/by/2.5/.) You may copy, distribute,
display, and perform the work, make derivative works and make commercial use
of the work, but, you must attribute the work to the author and CIDR 2007.
3rd Biennial Conference on Innovative Data Systems Research (CIDR) January
710, 2007, Asilomar, California, US
Automatic vs Manual Provenance Abstractions: Mind the Gap
In recent years the need to simplify or to hide sensitive information in
provenance has given way to research on provenance abstraction. In the context
of scientific workflows, existing research provides techniques to semi
automatically create abstractions of a given workflow description, which is in
turn used as filters over the workflow's provenance traces. An alternative
approach that is commonly adopted by scientists is to build workflows with
abstractions embedded into the workflow's design, such as using sub-workflows.
This paper reports on the comparison of manual versus semi-automated approaches
in a context where result abstractions are used to filter report-worthy results
of computational scientific analyses. Specifically; we take a real-world
workflow containing user-created design abstractions and compare these with
abstractions created by ZOOM UserViews and Workflow Summaries systems. Our
comparison shows that semi-automatic and manual approaches largely overlap from
a process perspective, meanwhile, there is a dramatic mismatch in terms of data
artefacts retained in an abstracted account of derivation. We discuss reasons
and suggest future research directions.Comment: Preprint accepted to the 2016 workshop on the Theory and Applications
of Provenance, TAPP 201
Labeling Workflow Views with Fine-Grained Dependencies
This paper considers the problem of efficiently answering reachability
queries over views of provenance graphs, derived from executions of workflows
that may include recursion. Such views include composite modules and model
fine-grained dependencies between module inputs and outputs. A novel
view-adaptive dynamic labeling scheme is developed for efficient query
evaluation, in which view specifications are labeled statically (i.e. as they
are created) and data items are labeled dynamically as they are produced during
a workflow execution. Although the combination of fine-grained dependencies and
recursive workflows entail, in general, long (linear-size) data labels, we show
that for a large natural class of workflows and views, labels are compact
(logarithmic-size) and reachability queries can be evaluated in constant time.
Experimental results demonstrate the benefit of this approach over the
state-of-the-art technique when applied for labeling multiple views.Comment: VLDB201
Sciunits: Reusable Research Objects
Science is conducted collaboratively, often requiring knowledge sharing about
computational experiments. When experiments include only datasets, they can be
shared using Uniform Resource Identifiers (URIs) or Digital Object Identifiers
(DOIs). An experiment, however, seldom includes only datasets, but more often
includes software, its past execution, provenance, and associated
documentation. The Research Object has recently emerged as a comprehensive and
systematic method for aggregation and identification of diverse elements of
computational experiments. While a necessary method, mere aggregation is not
sufficient for the sharing of computational experiments. Other users must be
able to easily recompute on these shared research objects. In this paper, we
present the sciunit, a reusable research object in which aggregated content is
recomputable. We describe a Git-like client that efficiently creates, stores,
and repeats sciunits. We show through analysis that sciunits repeat
computational experiments with minimal storage and processing overhead.
Finally, we provide an overview of sharing and reproducible cyberinfrastructure
based on sciunits gaining adoption in the domain of geosciences
- …