23,695 research outputs found
McRunjob: A High Energy Physics Workflow Planner for Grid Production Processing
McRunjob is a powerful grid workflow manager used to manage the generation of
large numbers of production processing jobs in High Energy Physics. In use at
both the DZero and CMS experiments, McRunjob has been used to manage large
Monte Carlo production processing since 1999 and is being extended to uses in
regular production processing for analysis and reconstruction. Described at
CHEP 2001, McRunjob converts core metadata into jobs submittable in a variety
of environments. The powerful core metadata description language includes
methods for converting the metadata into persistent forms, job descriptions,
multi-step workflows, and data provenance information. The language features
allow for structure in the metadata by including full expressions, namespaces,
functional dependencies, site specific parameters in a grid environment, and
ontological definitions. It also has simple control structures for
parallelization of large jobs. McRunjob features a modular design which allows
for easy expansion to new job description languages or new application level
tasks.Comment: CHEP 2003 serial number TUCT00
Answering Regular Path Queries on Workflow Provenance
This paper proposes a novel approach for efficiently evaluating regular path
queries over provenance graphs of workflows that may include recursion. The
approach assumes that an execution g of a workflow G is labeled with
query-agnostic reachability labels using an existing technique. At query time,
given g, G and a regular path query R, the approach decomposes R into a set of
subqueries R1, ..., Rk that are safe for G. For each safe subquery Ri, G is
rewritten so that, using the reachability labels of nodes in g, whether or not
there is a path which matches Ri between two nodes can be decided in constant
time. The results of each safe subquery are then composed, possibly with some
small unsafe remainder, to produce an answer to R. The approach results in an
algorithm that significantly reduces the number of subqueries k over existing
techniques by increasing their size and complexity, and that evaluates each
subquery in time bounded by its input and output size. Experimental results
demonstrate the benefit of this approach
Polynomials for Multidimensional Provenance in Graph Databases
In this thesis, we study the provenance of querying graph databases. Compared to using the semiring of polynomials as the most general form of provenance for relational databases (Green, Karvounarakis, & Tannen, 2007), we show that the most general provenance for querying graph databases can be represented by regular expressions over paths in the database. In this work we focus on the single-source provenance, which is a more general representation and contains more information than the single-source, single-target problem considered in (Ramusat, Maniu, & Senellart, 2018). We present an algorithm that computes single-source multidimensional provenances for graph databases, where each dimension represents an application provenance semiring, and also propose a potential application, by using parse tree techniques and deriving results for various application provenances
First-Order Provenance Games
We propose a new model of provenance, based on a game-theoretic approach to
query evaluation. First, we study games G in their own right, and ask how to
explain that a position x in G is won, lost, or drawn. The resulting notion of
game provenance is closely related to winning strategies, and excludes from
provenance all "bad moves", i.e., those which unnecessarily allow the opponent
to improve the outcome of a play. In this way, the value of a position is
determined by its game provenance. We then define provenance games by viewing
the evaluation of a first-order query as a game between two players who argue
whether a tuple is in the query answer. For RA+ queries, we show that game
provenance is equivalent to the most general semiring of provenance polynomials
N[X]. Variants of our game yield other known semirings. However, unlike
semiring provenance, game provenance also provides a "built-in" way to handle
negation and thus to answer why-not questions: In (provenance) games, the
reason why x is not won, is the same as why x is lost or drawn (the latter is
possible for games with draws). Since first-order provenance games are
draw-free, they yield a new provenance model that combines how- and why-not
provenance
Broadening the Scope of Nanopublications
In this paper, we present an approach for extending the existing concept of
nanopublications --- tiny entities of scientific results in RDF representation
--- to broaden their application range. The proposed extension uses English
sentences to represent informal and underspecified scientific claims. These
sentences follow a syntactic and semantic scheme that we call AIDA (Atomic,
Independent, Declarative, Absolute), which provides a uniform and succinct
representation of scientific assertions. Such AIDA nanopublications are
compatible with the existing nanopublication concept and enjoy most of its
advantages such as information sharing, interlinking of scientific findings,
and detailed attribution, while being more flexible and applicable to a much
wider range of scientific results. We show that users are able to create AIDA
sentences for given scientific results quickly and at high quality, and that it
is feasible to automatically extract and interlink AIDA nanopublications from
existing unstructured data sources. To demonstrate our approach, a web-based
interface is introduced, which also exemplifies the use of nanopublications for
non-scientific content, including meta-nanopublications that describe other
nanopublications.Comment: To appear in the Proceedings of the 10th Extended Semantic Web
Conference (ESWC 2013
TAPER: query-aware, partition-enhancement for large, heterogenous, graphs
Graph partitioning has long been seen as a viable approach to address Graph
DBMS scalability. A partitioning, however, may introduce extra query processing
latency unless it is sensitive to a specific query workload, and optimised to
minimise inter-partition traversals for that workload. Additionally, it should
also be possible to incrementally adjust the partitioning in reaction to
changes in the graph topology, the query workload, or both. Because of their
complexity, current partitioning algorithms fall short of one or both of these
requirements, as they are designed for offline use and as one-off operations.
The TAPER system aims to address both requirements, whilst leveraging existing
partitioning algorithms. TAPER takes any given initial partitioning as a
starting point, and iteratively adjusts it by swapping chosen vertices across
partitions, heuristically reducing the probability of inter-partition
traversals for a given pattern matching queries workload. Iterations are
inexpensive thanks to time and space optimisations in the underlying support
data structures. We evaluate TAPER on two different large test graphs and over
realistic query workloads. Our results indicate that, given a hash-based
partitioning, TAPER reduces the number of inter-partition traversals by around
80%; given an unweighted METIS partitioning, by around 30%. These reductions
are achieved within 8 iterations and with the additional advantage of being
workload-aware and usable online.Comment: 12 pages, 11 figures, unpublishe
- …