13 research outputs found

    Incorporating Provenance in Database Systems

    Full text link
    The importance of maintaining provenance has been widely recognized. Currently there are two approaches: provenance generated within workflow frameworks, and provenance within a contained relational database. Workflow provenance allows workflow re-execution, and can offer some explanation of results. Within relational databases, knowledge of SQL queries and relational operators is used to express provenance. There is a disconnect between these two areas of provenance research. Techniques that work in relational databases cannot be applied to workflow systems because of heterogeneous data types and black-box operators. Meanwhile, the real-life utility of workflow systems has not been extended to database provenance. In the gap between provenance in workflow systems and databases, there are myriads of systems that need provenance. For instance, when creating a new dataset, like MiMI, using several sources and processes, or building an algorithm that generates sequence alignments, like MiBlast. These hybrid systems cannot be mashed into a workflow framework and do not solely exist within a database. This work solves issues that block provenance usage in hybrid systems. In particular, we look at capturing, storing, and using provenance information outside of workflow and database provenance systems. Database provenance and workflow systems provide no support for tracking the provenance of user actions, but manual effort is often a large component of effort in these hybrid systems. We describe an approach to track and record the user's actions in a queriable form. Once provenance is captured, storage can become prohibitively expensive, in both hybrid and workflow systems. We identify several techniques to reduce the provenance store. Additionally, usable provenance is a problem in workflow, database and hybrid provenance systems. Provenance contains both too much and too little information. We highlight the missing information that can assist user understanding, and develop a model of provenance answers to decrease information overload. Finally, workflow and database systems are designed to explain the results users see; they do not explain why items are not in the result. We allow researchers to specify what they are looking for and answer why it does not exist in the result set.PhDComputer Science & EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/61645/1/apchapma_1.pd

    Prospecting Period Measurements with LSST - Low Mass X-ray Binaries as a Test Case

    Full text link
    The Large Synoptic Survey Telescope (LSST) will provide for unbiased sampling of variability properties of objects with rr mag << 24. This should allow for those objects whose variations reveal their orbital periods (PorbP_{orb}), such as low mass X-ray binaries (LMXBs) and related objects, to be examined in much greater detail and with uniform systematic sampling. However, the baseline LSST observing strategy has temporal sampling that is not optimised for such work in the Galaxy. Here we assess four candidate observing strategies for measurement of PorbP_{orb} in the range 10 minutes to 50 days. We simulate multi-filter quiescent LMXB lightcurves including ellipsoidal modulation and stochastic flaring, and then sample these using LSST's operations simulator (OpSim) over the (mag, PorbP_{orb}) parameter space, and over five sightlines sampling a range of possible reddening values. The percentage of simulated parameter space with correctly returned periods ranges from \sim23 %, for the current baseline strategy, to \sim70 % for the two simulated specialist strategies. Convolving these results with a PorbP_{orb} distribution, a modelled Galactic spatial distribution and reddening maps, we conservatively estimate that the most recent version of the LSST baseline strategy will allow PorbP_{orb} determination for \sim18 % of the Milky Way's LMXB population, whereas strategies that do not reduce observations of the Galactic Plane can improve this dramatically to \sim32 %. This increase would allow characterisation of the full binary population by breaking degeneracies between suggested PorbP_{orb} distributions in the literature. Our results can be used in the ongoing assessment of the effectiveness of various potential cadencing strategies.Comment: Replacement after addressing minor corrections from the referee - mainly improvements in clarificatio

    Incorporating provenance in database systems

    No full text

    The challenge of “quick and dirty” information quality

    No full text

    Efficient provenance storage

    No full text
    As the world is increasingly networked and digitized, the data we store has more and more frequently been chopped, baked, diced and stewed. In consequence, there is an increasing need to store and manage provenance for each data item stored in a database, describing exactly where it came from, and what manipulations have been applied to it. Storage of the complete provenance of each data item can become prohibitively expensive. In this paper, we identify important properties of provenance that can be used to considerably reduce the amount of storage required. We identify three different techniques: a family of factorization processes and two methods based on inheritance, to decrease the amount of storage required for provenance. We have used the techniques described in this work to significantly reduce the provenance storage costs associated with constructing MiMI [22], a warehouse of data regarding protein interactions, as well as two provenance stores, Karma [31] and PReServ [20], produced through workflow execution. In these real provenance sets, we were able to reduce the size of the provenance by up to a factor of 20. Additionally, we show that this reduced store can be queried efficiently and further that incremental changes can be made inexpensively

    Prospecting period measurements with LSST - low mass X-ray binaries as a test case

    No full text
    The Large Synoptic Survey Telescope (LSST) will provide for unbiased sampling of variability properties of objects with r mag &lt; 24. This should allow for those objects whose variations reveal their orbital periods (Porb), such as low mass X-ray binaries (LMXBs) and related objects, to be examined in much greater detail and with uniform systematic sampling. However, the baseline LSST observing strategy has temporal sampling that is not optimised for such work in the Galaxy. Here we assess four candidate observing strategies for measurement of Porb in the range 10 minutes to 50 days. We simulate multi-filter quiescent LMXB lightcurves including ellipsoidal modulation and stochastic flaring, and then sample these using LSST's operations simulator (OpSim) over the (mag, Porb) parameter space, and over five sightlines sampling a range of possible reddening values. The percentage of simulated parameter space with correctly returned periods ranges from ~23%, for the current baseline strategy, to ~70% for the two simulated specialist strategies. Convolving these results with a Porb distribution, a modelled Galactic spatial distribution and reddening maps, we conservatively estimate that the most recent version of the LSST baseline strategy will allow Porb determination for ~18% of the Milky Way's LMXB population, whereas strategies that do not reduce observations of the Galactic Plane can improve this dramatically to ~32%. This increase would allow characterisation of the full binary population by breaking degeneracies between suggested Porb distributions in the literature. Our results can be used in the ongoing assessment of the effectiveness of various potential cadencing strategies
    corecore