4,232 research outputs found
Causality and the semantics of provenance
Provenance, or information about the sources, derivation, custody or history
of data, has been studied recently in a number of contexts, including
databases, scientific workflows and the Semantic Web. Many provenance
mechanisms have been developed, motivated by informal notions such as
influence, dependence, explanation and causality. However, there has been
little study of whether these mechanisms formally satisfy appropriate policies
or even how to formalize relevant motivating concepts such as causality. We
contend that mathematical models of these concepts are needed to justify and
compare provenance techniques. In this paper we review a theory of causality
based on structural models that has been developed in artificial intelligence,
and describe work in progress on a causal semantics for provenance graphs.Comment: Workshop submissio
Using Links to prototype a Database Wiki
Both relational databases and wikis have strengths that make them attractive for use in collaborative applications. In the last decade, database-backed Web applications have been used extensively to develop valuable shared biological references called curated databases. Databases offer many advantages such as scalability, query optimization and concurrency control, but are not easy to use and lack other features needed for collaboration. Wikis have become very popular for early-stage biocuration projects because they are easy to use, encourage sharing and collaboration, and provide built-in support for archiving, history-tracking and annotation. However, curation projects often outgrow the limited capabilities of wikis for structuring and efficiently querying data at scale, necessitating a painful phase transition to a database-backed Web application. We perceive a need for a new class of general-purpose system, which we call a Database Wiki, that combines flexible wiki-like support for collaboration with robust database-like capabilities for structuring and querying data. This paper presents DBWiki, a design prototype for such a system written in the Web programming language Links. We present the architecture, typical use, and wiki markup language design for DBWiki and discuss features of Links that provided unique advantages for rapid Web/database application prototyping
Well Structured Transition Systems with History
We propose a formal model of concurrent systems in which the history of a
computation is explicitly represented as a collection of events that provide a
view of a sequence of configurations. In our model events generated by
transitions become part of the system configurations leading to operational
semantics with historical data. This model allows us to formalize what is
usually done in symbolic verification algorithms. Indeed, search algorithms
often use meta-information, e.g., names of fired transitions, selected
processes, etc., to reconstruct (error) traces from symbolic state exploration.
The other interesting point of the proposed model is related to a possible new
application of the theory of well-structured transition systems (wsts). In our
setting wsts theory can be applied to formally extend the class of properties
that can be verified using coverability to take into consideration (ordered and
unordered) historical data. This can be done by using different types of
representation of collections of events and by combining them with wsts by
using closure properties of well-quasi orderings.Comment: In Proceedings GandALF 2015, arXiv:1509.0685
Towards Exascale Scientific Metadata Management
Advances in technology and computing hardware are enabling scientists from
all areas of science to produce massive amounts of data using large-scale
simulations or observational facilities. In this era of data deluge, effective
coordination between the data production and the analysis phases hinges on the
availability of metadata that describe the scientific datasets. Existing
workflow engines have been capturing a limited form of metadata to provide
provenance information about the identity and lineage of the data. However,
much of the data produced by simulations, experiments, and analyses still need
to be annotated manually in an ad hoc manner by domain scientists. Systematic
and transparent acquisition of rich metadata becomes a crucial prerequisite to
sustain and accelerate the pace of scientific innovation. Yet, ubiquitous and
domain-agnostic metadata management infrastructure that can meet the demands of
extreme-scale science is notable by its absence.
To address this gap in scientific data management research and practice, we
present our vision for an integrated approach that (1) automatically captures
and manipulates information-rich metadata while the data is being produced or
analyzed and (2) stores metadata within each dataset to permeate
metadata-oblivious processes and to query metadata through established and
standardized data access interfaces. We motivate the need for the proposed
integrated approach using applications from plasma physics, climate modeling
and neuroscience, and then discuss research challenges and possible solutions
DataHub: Collaborative Data Science & Dataset Version Management at Scale
Relational databases have limited support for data collaboration, where teams
collaboratively curate and analyze large datasets. Inspired by software version
control systems like git, we propose (a) a dataset version control system,
giving users the ability to create, branch, merge, difference and search large,
divergent collections of datasets, and (b) a platform, DataHub, that gives
users the ability to perform collaborative data analysis building on this
version control system. We outline the challenges in providing dataset version
control at scale.Comment: 7 page
High Energy Physics Forum for Computational Excellence: Working Group Reports (I. Applications Software II. Software Libraries and Tools III. Systems)
Computing plays an essential role in all aspects of high energy physics. As
computational technology evolves rapidly in new directions, and data throughput
and volume continue to follow a steep trend-line, it is important for the HEP
community to develop an effective response to a series of expected challenges.
In order to help shape the desired response, the HEP Forum for Computational
Excellence (HEP-FCE) initiated a roadmap planning activity with two key
overlapping drivers -- 1) software effectiveness, and 2) infrastructure and
expertise advancement. The HEP-FCE formed three working groups, 1) Applications
Software, 2) Software Libraries and Tools, and 3) Systems (including systems
software), to provide an overview of the current status of HEP computing and to
present findings and opportunities for the desired HEP computational roadmap.
The final versions of the reports are combined in this document, and are
presented along with introductory material.Comment: 72 page
- …