1,180 research outputs found
Labeling Workflow Views with Fine-Grained Dependencies
This paper considers the problem of efficiently answering reachability
queries over views of provenance graphs, derived from executions of workflows
that may include recursion. Such views include composite modules and model
fine-grained dependencies between module inputs and outputs. A novel
view-adaptive dynamic labeling scheme is developed for efficient query
evaluation, in which view specifications are labeled statically (i.e. as they
are created) and data items are labeled dynamically as they are produced during
a workflow execution. Although the combination of fine-grained dependencies and
recursive workflows entail, in general, long (linear-size) data labels, we show
that for a large natural class of workflows and views, labels are compact
(logarithmic-size) and reachability queries can be evaluated in constant time.
Experimental results demonstrate the benefit of this approach over the
state-of-the-art technique when applied for labeling multiple views.Comment: VLDB201
Answering Regular Path Queries on Workflow Provenance
This paper proposes a novel approach for efficiently evaluating regular path
queries over provenance graphs of workflows that may include recursion. The
approach assumes that an execution g of a workflow G is labeled with
query-agnostic reachability labels using an existing technique. At query time,
given g, G and a regular path query R, the approach decomposes R into a set of
subqueries R1, ..., Rk that are safe for G. For each safe subquery Ri, G is
rewritten so that, using the reachability labels of nodes in g, whether or not
there is a path which matches Ri between two nodes can be decided in constant
time. The results of each safe subquery are then composed, possibly with some
small unsafe remainder, to produce an answer to R. The approach results in an
algorithm that significantly reduces the number of subqueries k over existing
techniques by increasing their size and complexity, and that evaluates each
subquery in time bounded by its input and output size. Experimental results
demonstrate the benefit of this approach
Provenance Views for Module Privacy
Scientific workflow systems increasingly store provenance information about
the module executions used to produce a data item, as well as the parameter
settings and intermediate data items passed between module executions. However,
authors/owners of workflows may wish to keep some of this information
confidential. In particular, a module may be proprietary, and users should not
be able to infer its behavior by seeing mappings between all data inputs and
outputs. The problem we address in this paper is the following: Given a
workflow, abstractly modeled by a relation R, a privacy requirement \Gamma and
costs associated with data. The owner of the workflow decides which data
(attributes) to hide, and provides the user with a view R' which is the
projection of R over attributes which have not been hidden. The goal is to
minimize the cost of hidden data while guaranteeing that individual modules are
\Gamma -private. We call this the "secureview" problem. We formally define the
problem, study its complexity, and offer algorithmic solutions
A core genetic module : the Mixed Feedback Loop
The so-called Mixed Feedback Loop (MFL) is a small two-gene network where
protein A regulates the transcription of protein B and the two proteins form a
heterodimer. It has been found to be statistically over-represented in
statistical analyses of gene and protein interaction databases and to lie at
the core of several computer-generated genetic networks. Here, we propose and
mathematically study a model of the MFL and show that, by itself, it can serve
both as a bistable switch and as a clock (an oscillator) depending on kinetic
parameters. The MFL phase diagram as well as a detailed description of the
nonlinear oscillation regime are presented and some biological examples are
discussed. The results emphasize the role of protein interactions in the
function of genetic modules and the usefulness of modelling RNA dynamics
explicitly.Comment: To be published in Physical Review
Consequences of gravitational radiation recoil
Coalescing binary black holes experience an impulsive kick due to anisotropic
emission of gravitational waves. We discuss the dynamical consequences of the
recoil accompanying massive black hole mergers. Recoil velocities are
sufficient to eject most coalescing black holes from dwarf galaxies and
globular clusters, which may explain the apparent absence of massive black
holes in these systems. Ejection from giant elliptical galaxies would be rare,
but coalescing black holes are displaced from the center and fall back on a
time scale of order the half-mass crossing time. Displacement of the black
holes transfers energy to the stars in the nucleus and can convert a steep
density cusp into a core. Radiation recoil calls into question models that grow
supermassive black holes from hierarchical mergers of stellar-mass precursors.Comment: 5 pages, 4 figures, emulateapj style; minor changes made; accepted to
ApJ Letter
Deriving Probabilistic Databases with Inference Ensembles
Many real-world applications deal with uncertain or missing data, prompting a surge of activity in the area of probabilistic databases. A shortcoming of prior work is the assumption that an appropriate probabilistic model, along with the necessary probability distributions, is given. We address this shortcoming by presenting a framework for learning a set of inference ensembles, termed meta-rule semi-lattices, or MRSL, from the complete portion of the data. We use the MRSL to infer probability distributions for missing data, and demonstrate experimentally that high accuracy is achieved when a single attribute value is missing per tuple. We next propose an inference algorithm based on Gibbs sampling that accurately predicts the probability distribution for multiple missing values. We also develop an optimization that greatly improves performance of multi-attribute inference for collections of tuples, while maintaining high accuracy. Finally, we develop an experimental framework to evaluate the efficiency and accuracy of our approach
Multiple Score Comparison: a network meta-analysis approach to comparison and external validation of prognostic scores
BACKGROUND Prediction models and prognostic scores have been increasingly popular in both clinical practice and clinical research settings, for example to aid in risk-based decision making or control for confounding. In many medical fields, a large number of prognostic scores are available, but practitioners may find it difficult to choose between them due to lack of external validation as well as lack of comparisons between them. METHODS Borrowing methodology from network meta-analysis, we describe an approach to Multiple Score Comparison meta-analysis (MSC) which permits concurrent external validation and comparisons of prognostic scores using individual patient data (IPD) arising from a large-scale international collaboration. We describe the challenges in adapting network meta-analysis to the MSC setting, for instance the need to explicitly include correlations between the scores on a cohort level, and how to deal with many multi-score studies. We propose first using IPD to make cohort-level aggregate discrimination or calibration scores, comparing all to a common comparator. Then, standard network meta-analysis techniques can be applied, taking care to consider correlation structures in cohorts with multiple scores. Transitivity, consistency and heterogeneity are also examined. RESULTS We provide a clinical application, comparing prognostic scores for 3-year mortality in patients with chronic obstructive pulmonary disease using data from a large-scale collaborative initiative. We focus on the discriminative properties of the prognostic scores. Our results show clear differences in performance, with ADO and eBODE showing higher discrimination with respect to mortality than other considered scores. The assumptions of transitivity and local and global consistency were not violated. Heterogeneity was small. CONCLUSIONS We applied a network meta-analytic methodology to externally validate and concurrently compare the prognostic properties of clinical scores. Our large-scale external validation indicates that the scores with the best discriminative properties to predict 3 year mortality in patients with COPD are ADO and eBODE
- …