Search CORE

13 research outputs found

Incorporating Provenance in Database Systems

Author: Chapman Adriane P.
Publication venue
Publication date
Field of study

The importance of maintaining provenance has been widely recognized. Currently there are two approaches: provenance generated within workflow frameworks, and provenance within a contained relational database. Workflow provenance allows workflow re-execution, and can offer some explanation of results. Within relational databases, knowledge of SQL queries and relational operators is used to express provenance. There is a disconnect between these two areas of provenance research. Techniques that work in relational databases cannot be applied to workflow systems because of heterogeneous data types and black-box operators. Meanwhile, the real-life utility of workflow systems has not been extended to database provenance. In the gap between provenance in workflow systems and databases, there are myriads of systems that need provenance. For instance, when creating a new dataset, like MiMI, using several sources and processes, or building an algorithm that generates sequence alignments, like MiBlast. These hybrid systems cannot be mashed into a workflow framework and do not solely exist within a database. This work solves issues that block provenance usage in hybrid systems. In particular, we look at capturing, storing, and using provenance information outside of workflow and database provenance systems. Database provenance and workflow systems provide no support for tracking the provenance of user actions, but manual effort is often a large component of effort in these hybrid systems. We describe an approach to track and record the user's actions in a queriable form. Once provenance is captured, storage can become prohibitively expensive, in both hybrid and workflow systems. We identify several techniques to reduce the provenance store. Additionally, usable provenance is a problem in workflow, database and hybrid provenance systems. Provenance contains both too much and too little information. We highlight the missing information that can assist user understanding, and develop a model of provenance answers to decrease information overload. Finally, workflow and database systems are designed to explain the results users see; they do not explain why items are not in the result. We allow researchers to specify what they are looking for and answer why it does not exist in the result set.PhDComputer Science & EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/61645/1/apchapma_1.pd

Deep Blue Documents at the University of Michigan

Prospecting Period Measurements with LSST - Low Mass X-ray Binaries as a Test Case

Author: Chapman Adriane P.
Charles Philip A.
Clarkson William I.
Gandhi Poshak
Hill Adam B.
Johnson Michael A. C.
Moreau Luc
Publication venue: 'Oxford University Press (OUP)'
Publication date: 24/09/2018
Field of study

The Large Synoptic Survey Telescope (LSST) will provide for unbiased sampling of variability properties of objects with

r

mag

<

24. This should allow for those objects whose variations reveal their orbital periods (

P_{orb}

), such as low mass X-ray binaries (LMXBs) and related objects, to be examined in much greater detail and with uniform systematic sampling. However, the baseline LSST observing strategy has temporal sampling that is not optimised for such work in the Galaxy. Here we assess four candidate observing strategies for measurement of

P_{orb}

in the range 10 minutes to 50 days. We simulate multi-filter quiescent LMXB lightcurves including ellipsoidal modulation and stochastic flaring, and then sample these using LSST's operations simulator (OpSim) over the (mag,

P_{orb}

) parameter space, and over five sightlines sampling a range of possible reddening values. The percentage of simulated parameter space with correctly returned periods ranges from

\sim

23 %, for the current baseline strategy, to

\sim

70 % for the two simulated specialist strategies. Convolving these results with a

P_{orb}

distribution, a modelled Galactic spatial distribution and reddening maps, we conservatively estimate that the most recent version of the LSST baseline strategy will allow

P_{orb}

determination for

\sim

18 % of the Milky Way's LMXB population, whereas strategies that do not reduce observations of the Galactic Plane can improve this dramatically to

\sim

32 %. This increase would allow characterisation of the full binary population by breaking degeneracies between suggested

P_{orb}

distributions in the literature. Our results can be used in the ongoing assessment of the effectiveness of various potential cadencing strategies.Comment: Replacement after addressing minor corrections from the referee - mainly improvements in clarificatio

arXiv.org e-Print Archive

Southampton (e-Prints Soton)

Incorporating provenance in database systems

Author: Chapman Adriane P
Publication venue: 'University of Michigan Library'
Publication date: 01/01/2008
Field of study

CiteSeerX

Southampton (e-Prints Soton)

The challenge of “quick and dirty” information quality

Author: Chapman Adriane P.
Rosenthal Arnon
Seligman Len
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 17/02/2016
Field of study

Southampton (e-Prints Soton)

Crossref

Time Constrained Information Quality

Author: Rosenthal Arnon
Seligman Len
Chapman Adriane P
Publication venue
Publication date: 01/01/2012
Field of study

Southampton (e-Prints Soton)

Biblioteca Digital de la Comunidad de Madrid

Time Constrained Information Quality

Author: Chapman Adriane P
Rosenthal Arnon
Seligman Len
Publication venue
Publication date: 01/01/2012
Field of study

Southampton (e-Prints Soton)

Modelling provenance collection points and their impact on provenance graphs

Author: Chapman Adriane P.
Gammack David
Scott Steve
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Southampton (e-Prints Soton)

Modelling provenance collection points and their impact on provenance graphs

Author: Gammack David
Scott Steve
Chapman Adriane P.
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Southampton (e-Prints Soton)

Biblioteca Digital de la Comunidad de Madrid

Efficient provenance storage

Author: Adriane P. Chapman
H. V. Jagadish
Prakash Ramanan
Publication venue
Publication date: 01/01/2008
Field of study

As the world is increasingly networked and digitized, the data we store has more and more frequently been chopped, baked, diced and stewed. In consequence, there is an increasing need to store and manage provenance for each data item stored in a database, describing exactly where it came from, and what manipulations have been applied to it. Storage of the complete provenance of each data item can become prohibitively expensive. In this paper, we identify important properties of provenance that can be used to considerably reduce the amount of storage required. We identify three different techniques: a family of factorization processes and two methods based on inheritance, to decrease the amount of storage required for provenance. We have used the techniques described in this work to significantly reduce the provenance storage costs associated with constructing MiMI [22], a warehouse of data regarding protein interactions, as well as two provenance stores, Karma [31] and PReServ [20], produced through workflow execution. In these real provenance sets, we were able to reduce the size of the provenance by up to a factor of 20. Additionally, we show that this reduced store can be queried efficiently and further that incremental changes can be made inexpensively

CiteSeerX

Southampton (e-Prints Soton)

Crossref

Prospecting period measurements with LSST - low mass X-ray binaries as a test case

Author: Chapman Adriane P.
Charles Philip A.
Clarkson William I.
Gandhi Poshak
Hill Adam B.
Johnson Michael A.C.
Moreau Luc
Publication venue
Publication date: 24/09/2018
Field of study

The Large Synoptic Survey Telescope (LSST) will provide for unbiased sampling of variability properties of objects with r mag < 24. This should allow for those objects whose variations reveal their orbital periods (Porb), such as low mass X-ray binaries (LMXBs) and related objects, to be examined in much greater detail and with uniform systematic sampling. However, the baseline LSST observing strategy has temporal sampling that is not optimised for such work in the Galaxy. Here we assess four candidate observing strategies for measurement of Porb in the range 10 minutes to 50 days. We simulate multi-filter quiescent LMXB lightcurves including ellipsoidal modulation and stochastic flaring, and then sample these using LSST's operations simulator (OpSim) over the (mag, Porb) parameter space, and over five sightlines sampling a range of possible reddening values. The percentage of simulated parameter space with correctly returned periods ranges from ~23%, for the current baseline strategy, to ~70% for the two simulated specialist strategies. Convolving these results with a Porb distribution, a modelled Galactic spatial distribution and reddening maps, we conservatively estimate that the most recent version of the LSST baseline strategy will allow Porb determination for ~18% of the Milky Way's LMXB population, whereas strategies that do not reduce observations of the Galactic Plane can improve this dramatically to ~32%. This increase would allow characterisation of the full binary population by breaking degeneracies between suggested Porb distributions in the literature. Our results can be used in the ongoing assessment of the effectiveness of various potential cadencing strategies

Southampton (e-Prints Soton)