Search CORE

4,232 research outputs found

Causality and the semantics of provenance

Provenance, or information about the sources, derivation, custody or history of data, has been studied recently in a number of contexts, including databases, scientific workflows and the Semantic Web. Many provenance mechanisms have been developed, motivated by informal notions such as influence, dependence, explanation and causality. However, there has been little study of whether these mechanisms formally satisfy appropriate policies or even how to formalize relevant motivating concepts such as causality. We contend that mathematical models of these concepts are needed to justify and compare provenance techniques. In this paper we review a theory of causality based on structural models that has been developed in artificial intelligence, and describe work in progress on a causal semantics for provenance graphs.Comment: Workshop submissio

arXiv.org e-Print Archive

Crossref

Directory of Open Access Journals

Using Links to prototype a Database Wiki

Author: Cheney James
Lindley Sam
Müller Heiko
Publication venue
Publication date: 01/01/2011
Field of study

Both relational databases and wikis have strengths that make them attractive for use in collaborative applications. In the last decade, database-backed Web applications have been used extensively to develop valuable shared biological references called curated databases. Databases offer many advantages such as scalability, query optimization and concurrency control, but are not easy to use and lack other features needed for collaboration. Wikis have become very popular for early-stage biocuration projects because they are easy to use, encourage sharing and collaboration, and provide built-in support for archiving, history-tracking and annotation. However, curation projects often outgrow the limited capabilities of wikis for structuring and efficiently querying data at scale, necessitating a painful phase transition to a database-backed Web application. We perceive a need for a new class of general-purpose system, which we call a Database Wiki, that combines flexible wiki-like support for collaboration with robust database-like capabilities for structuring and querying data. This paper presents DBWiki, a design prototype for such a system written in the Web programming language Links. We present the architecture, typical use, and wiki markup language design for DBWiki and discuss features of Links that provided unique advantages for rapid Web/database application prototyping

CiteSeerX

Heriot Watt Pure

Edinburgh Research Explorer

Well Structured Transition Systems with History

Author: Abdulla Parosh
Delzanno Giorgio
Montali Marco
Publication venue: 'Open Publishing Association'
Publication date: 01/09/2015
Field of study

We propose a formal model of concurrent systems in which the history of a computation is explicitly represented as a collection of events that provide a view of a sequence of configurations. In our model events generated by transitions become part of the system configurations leading to operational semantics with historical data. This model allows us to formalize what is usually done in symbolic verification algorithms. Indeed, search algorithms often use meta-information, e.g., names of fired transitions, selected processes, etc., to reconstruct (error) traces from symbolic state exploration. The other interesting point of the proposed model is related to a possible new application of the theory of well-structured transition systems (wsts). In our setting wsts theory can be applied to formally extend the class of properties that can be verified using coverability to take into consideration (ordered and unordered) historical data. This can be done by using different types of representation of collections of events and by combining them with wsts by using closure properties of well-quasi orderings.Comment: In Proceedings GandALF 2015, arXiv:1509.0685

arXiv.org e-Print Archive

Directory of Open Access Journals

Towards Exascale Scientific Metadata Management

Author: Blanas Spyros
Byna Surendra
Publication venue
Publication date: 29/03/2015
Field of study

Advances in technology and computing hardware are enabling scientists from all areas of science to produce massive amounts of data using large-scale simulations or observational facilities. In this era of data deluge, effective coordination between the data production and the analysis phases hinges on the availability of metadata that describe the scientific datasets. Existing workflow engines have been capturing a limited form of metadata to provide provenance information about the identity and lineage of the data. However, much of the data produced by simulations, experiments, and analyses still need to be annotated manually in an ad hoc manner by domain scientists. Systematic and transparent acquisition of rich metadata becomes a crucial prerequisite to sustain and accelerate the pace of scientific innovation. Yet, ubiquitous and domain-agnostic metadata management infrastructure that can meet the demands of extreme-scale science is notable by its absence. To address this gap in scientific data management research and practice, we present our vision for an integrated approach that (1) automatically captures and manipulates information-rich metadata while the data is being produced or analyzed and (2) stores metadata within each dataset to permeate metadata-oblivious processes and to query metadata through established and standardized data access interfaces. We motivate the need for the proposed integrated approach using applications from plasma physics, climate modeling and neuroscience, and then discuss research challenges and possible solutions

arXiv.org e-Print Archive

eScholarship - University of California

DataHub: Collaborative Data Science & Dataset Version Management at Scale

Author: Bhardwaj Anant
Bhattacherjee Souvik
Chavan Amit
Deshpande Amol
Elmore Aaron J.
Madden Samuel
Parameswaran Aditya G.
Publication venue
Publication date: 02/09/2014
Field of study

Relational databases have limited support for data collaboration, where teams collaboratively curate and analyze large datasets. Inspired by software version control systems like git, we propose (a) a dataset version control system, giving users the ability to create, branch, merge, difference and search large, divergent collections of datasets, and (b) a platform, DataHub, that gives users the ability to perform collaborative data analysis building on this version control system. We outline the challenges in providing dataset version control at scale.Comment: 7 page

arXiv.org e-Print Archive

CiteSeerX

DSpace@MIT

High Energy Physics Forum for Computational Excellence: Working Group Reports (I. Applications Software II. Software Libraries and Tools III. Systems)

Author: Asai Makoto
Bauerdick Lothar
Borgland Anders
Calafiura Paolo
Dart Eli
Elmer Peter
Finkel Hal
Gottlieb Steve
Gutsche Oliver
Habib Salman
Hoeche Stefan
Izubuchi Taku
Kirby Michael
LeCompte Tom
Lyon Adam
Marshall Zach
Nugent Peter
Patton Simon
Petravick Don
Potekhin Maxim
Roser Robert
Sheldon Paul
Vay Jean-Luc
Viren Brett
Yanny Brian
Publication venue
Publication date: 28/10/2015
Field of study

Computing plays an essential role in all aspects of high energy physics. As computational technology evolves rapidly in new directions, and data throughput and volume continue to follow a steep trend-line, it is important for the HEP community to develop an effective response to a series of expected challenges. In order to help shape the desired response, the HEP Forum for Computational Excellence (HEP-FCE) initiated a roadmap planning activity with two key overlapping drivers -- 1) software effectiveness, and 2) infrastructure and expertise advancement. The HEP-FCE formed three working groups, 1) Applications Software, 2) Software Libraries and Tools, and 3) Systems (including systems software), to provide an overview of the current status of HEP computing and to present findings and opportunities for the desired HEP computational roadmap. The final versions of the reports are combined in this document, and are presented along with introductory material.Comment: 72 page

arXiv.org e-Print Archive

eScholarship - University of California