Search CORE

23,695 research outputs found

McRunjob: A High Energy Physics Workflow Planner for Grid Production Processing

Author: Bertram Iain
Evans Dave
Graham Gregory E.
Publication venue
Publication date: 30/05/2003
Field of study

McRunjob is a powerful grid workflow manager used to manage the generation of large numbers of production processing jobs in High Energy Physics. In use at both the DZero and CMS experiments, McRunjob has been used to manage large Monte Carlo production processing since 1999 and is being extended to uses in regular production processing for analysis and reconstruction. Described at CHEP 2001, McRunjob converts core metadata into jobs submittable in a variety of environments. The powerful core metadata description language includes methods for converting the metadata into persistent forms, job descriptions, multi-step workflows, and data provenance information. The language features allow for structure in the metadata by including full expressions, namespaces, functional dependencies, site specific parameters in a grid environment, and ontological definitions. It also has simple control structures for parallelization of large jobs. McRunjob features a modular design which allows for easy expansion to new job description languages or new application level tasks.Comment: CHEP 2003 serial number TUCT00

arXiv.org e-Print Archive

CERN Document Server

Answering Regular Path Queries on Workflow Provenance

Author: Bao Zhuowei
Davidson Susan B.
Huang Xiaocheng
Milo Tova
Yuan Xiaojie
Publication venue
Publication date: 04/08/2014
Field of study

This paper proposes a novel approach for efficiently evaluating regular path queries over provenance graphs of workflows that may include recursion. The approach assumes that an execution g of a workflow G is labeled with query-agnostic reachability labels using an existing technique. At query time, given g, G and a regular path query R, the approach decomposes R into a set of subqueries R1, ..., Rk that are safe for G. For each safe subquery Ri, G is rewritten so that, using the reachability labels of nodes in g, whether or not there is a path which matches Ri between two nodes can be decided in constant time. The results of each safe subquery are then composed, possibly with some small unsafe remainder, to produce an answer to R. The approach results in an algorithm that significantly reduces the number of subqueries k over existing techniques by increasing their size and complexity, and that evaluates each subquery in time bounded by its input and output size. Experimental results demonstrate the benefit of this approach

arXiv.org e-Print Archive

Crossref

Polynomials for Multidimensional Provenance in Graph Databases

Author: Liu Tianyi
Publication venue
Publication date: 30/07/2021
Field of study

In this thesis, we study the provenance of querying graph databases. Compared to using the semiring of polynomials as the most general form of provenance for relational databases (Green, Karvounarakis, & Tannen, 2007), we show that the most general provenance for querying graph databases can be represented by regular expressions over paths in the database. In this work we focus on the single-source provenance, which is a more general representation and contains more information than the single-source, single-target problem considered in (Ramusat, Maniu, & Senellart, 2018). We present an algorithm that computes single-source multidimensional provenances for graph databases, where each dimension represents an application provenance semiring, and also propose a potential application, by using parse tree techniques and deriving results for various application provenances

Concordia University Research Repository

First-Order Provenance Games

Author: A. Gelder Van
A. Gelder Van
F. Geerts
G. Karvounarakis
J. Cheney
J. Flum
J. Flum
J. Flum
J. Neumann von
P. Buneman
T. Green
T. Green
U. Schwalbe
Y. Amsterdamer
Y. Cui
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2013
Field of study

We propose a new model of provenance, based on a game-theoretic approach to query evaluation. First, we study games G in their own right, and ask how to explain that a position x in G is won, lost, or drawn. The resulting notion of game provenance is closely related to winning strategies, and excludes from provenance all "bad moves", i.e., those which unnecessarily allow the opponent to improve the outcome of a play. In this way, the value of a position is determined by its game provenance. We then define provenance games by viewing the evaluation of a first-order query as a game between two players who argue whether a tuple is in the query answer. For RA+ queries, we show that game provenance is equivalent to the most general semiring of provenance polynomials N[X]. Variants of our game yield other known semirings. However, unlike semiring provenance, game provenance also provides a "built-in" way to handle negation and thus to answer why-not questions: In (provenance) games, the reason why x is not won, is the same as why x is lost or drawn (the latter is possible for games with draws). Since first-order provenance games are draw-free, they yield a new provenance model that combines how- and why-not provenance

arXiv.org e-Print Archive

CiteSeerX

Crossref

Broadening the Scope of Nanopublications

Author: A. Wyner
A.L. Rector
B. Mons
L.N. Soldatova
L.N. Soldatova
M.R. Seringhaus
P. Ciccarese
P. Groth
R.D. King
T. Clark
T. Kuhn
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2013
Field of study

In this paper, we present an approach for extending the existing concept of nanopublications --- tiny entities of scientific results in RDF representation --- to broaden their application range. The proposed extension uses English sentences to represent informal and underspecified scientific claims. These sentences follow a syntactic and semantic scheme that we call AIDA (Atomic, Independent, Declarative, Absolute), which provides a uniform and succinct representation of scientific assertions. Such AIDA nanopublications are compatible with the existing nanopublication concept and enjoy most of its advantages such as information sharing, interlinking of scientific findings, and detailed attribution, while being more flexible and applicable to a much wider range of scientific results. We show that users are able to create AIDA sentences for given scientific results quickly and at high quality, and that it is feasible to automatically extract and interlink AIDA nanopublications from existing unstructured data sources. To demonstrate our approach, a web-based interface is introduced, which also exemplifies the use of nanopublications for non-scientific content, including meta-nanopublications that describe other nanopublications.Comment: To appear in the Proceedings of the 10th Extended Semantic Web Conference (ESWC 2013

arXiv.org e-Print Archive

CiteSeerX

Crossref

TAPER: query-aware, partition-enhancement for large, heterogenous, graphs

Author: Firth Hugo
Missier Paolo
Publication venue
Publication date: 23/06/2016
Field of study

Graph partitioning has long been seen as a viable approach to address Graph DBMS scalability. A partitioning, however, may introduce extra query processing latency unless it is sensitive to a specific query workload, and optimised to minimise inter-partition traversals for that workload. Additionally, it should also be possible to incrementally adjust the partitioning in reaction to changes in the graph topology, the query workload, or both. Because of their complexity, current partitioning algorithms fall short of one or both of these requirements, as they are designed for offline use and as one-off operations. The TAPER system aims to address both requirements, whilst leveraging existing partitioning algorithms. TAPER takes any given initial partitioning as a starting point, and iteratively adjusts it by swapping chosen vertices across partitions, heuristically reducing the probability of inter-partition traversals for a given pattern matching queries workload. Iterations are inexpensive thanks to time and space optimisations in the underlying support data structures. We evaluate TAPER on two different large test graphs and over realistic query workloads. Our results indicate that, given a hash-based partitioning, TAPER reduces the number of inter-partition traversals by around 80%; given an unweighted METIS partitioning, by around 30%. These reductions are achieved within 8 iterations and with the additional advantage of being workload-aware and usable online.Comment: 12 pages, 11 figures, unpublishe

arXiv.org e-Print Archive

Crossref

University of Birmingham Research Portal

Newcastle University E-Prints