Search CORE

295 research outputs found

Labeling Workflow Views with Fine-Grained Dependencies

Author: Bao Zhuowei
Davidson Susan B.
Milo Tova
Publication venue
Publication date: 01/01/2012
Field of study

This paper considers the problem of efficiently answering reachability queries over views of provenance graphs, derived from executions of workflows that may include recursion. Such views include composite modules and model fine-grained dependencies between module inputs and outputs. A novel view-adaptive dynamic labeling scheme is developed for efficient query evaluation, in which view specifications are labeled statically (i.e. as they are created) and data items are labeled dynamically as they are produced during a workflow execution. Although the combination of fine-grained dependencies and recursive workflows entail, in general, long (linear-size) data labels, we show that for a large natural class of workflows and views, labels are compact (logarithmic-size) and reachability queries can be evaluated in constant time. Experimental results demonstrate the benefit of this approach over the state-of-the-art technique when applied for labeling multiple views.Comment: VLDB201

arXiv.org e-Print Archive

CiteSeerX

ScholarlyCommons@Penn

Answering Regular Path Queries on Workflow Provenance

Author: Bao Zhuowei
Davidson Susan B.
Huang Xiaocheng
Milo Tova
Yuan Xiaojie
Publication venue
Publication date: 04/08/2014
Field of study

This paper proposes a novel approach for efficiently evaluating regular path queries over provenance graphs of workflows that may include recursion. The approach assumes that an execution g of a workflow G is labeled with query-agnostic reachability labels using an existing technique. At query time, given g, G and a regular path query R, the approach decomposes R into a set of subqueries R1, ..., Rk that are safe for G. For each safe subquery Ri, G is rewritten so that, using the reachability labels of nodes in g, whether or not there is a path which matches Ri between two nodes can be decided in constant time. The results of each safe subquery are then composed, possibly with some small unsafe remainder, to produce an answer to R. The approach results in an algorithm that significantly reduces the number of subqueries k over existing techniques by increasing their size and complexity, and that evaluates each subquery in time bounded by its input and output size. Experimental results demonstrate the benefit of this approach

arXiv.org e-Print Archive

Crossref

Search and Result Presentation in Scientific Workflow Repositories

Author: Davidson Susan B.
Huang Xiaocheng
Stoyanovich Julia
Yuan Xiaojie
Publication venue
Publication date: 01/01/2013
Field of study

We study the problem of searching a repository of complex hierarchical workflows whose component modules, both composite and atomic, have been annotated with keywords. Since keyword search does not use the graph structure of a workflow, we develop a model of workflows using context-free bag grammars. We then give efficient polynomial-time algorithms that, given a workflow and a keyword query, determine whether some execution of the workflow matches the query. Based on these algorithms we develop a search and ranking solution that efficiently retrieves the top-k grammars from a repository. Finally, we propose a novel result presentation method for grammars matching a keyword query, based on representative parse-trees. The effectiveness of our approach is validated through an extensive experimental evaluation

arXiv.org e-Print Archive

Crossref

ScholarlyCommons@Penn

Verifying Recursive Active Documents with Positive Data Tree Rewriting

Author: Genest Blaise
Muscholl Anca
Wu Zhilin
Publication venue
Publication date: 01/01/2010
Field of study

This paper proposes a data tree-rewriting framework for modeling evolving documents. The framework is close to Guarded Active XML, a platform used for handling XML repositories evolving through web services. We focus on automatic verification of properties of evolving documents that can contain data from an infinite domain. We establish the boundaries of decidability, and show that verification of a {\em positive} fragment that can handle recursive service calls is decidable. We also consider bounded model-checking in our data tree-rewriting framework and show that it is \nexptime-complete

arXiv.org e-Print Archive

HAL-CentraleSupelec

INRIA a CCSD electronic archive server

Dagstuhl Research Online Publication Server

HAL-Rennes 1

Search and Result Presentation in Scientific Workflow Repositories

Author: Davidson Susan
Huang Xiaocheng
Stoyanovich Julia
Yuan Xiaojie
Publication venue: ScholarlyCommons
Publication date: 17/05/2013
Field of study

ScholarlyCommons@Penn

Tools and Algorithms for the Construction and Analysis of Systems:24th International Conference, TACAS 2018, Held as Part of the European Joint Conferences on Theory and Practice of Software, ETAPS 2018, Thessaloniki, Greece, April 14-20, 2018, Proceedings, Part II

Author
Publication venue: Springer
Publication date: 01/01/2018
Field of study

University of Twente Research Information

Secure Time-Aware Provenance for Distributed Systems

Author: Zhou Wenchao
Publication venue: ScholarlyCommons
Publication date: 01/01/2012
Field of study

Operators of distributed systems often find themselves needing to answer forensic questions, to perform a variety of managerial tasks including fault detection, system debugging, accountability enforcement, and attack analysis. In this dissertation, we present Secure Time-Aware Provenance (STAP), a novel approach that provides the fundamental functionality required to answer such forensic questions – the capability to “explain” the existence (or change) of a certain distributed system state at a given time in a potentially adversarial environment. This dissertation makes the following contributions. First, we propose the STAP model, to explicitly represent time and state changes. The STAP model allows consistent and complete explanations of system state (and changes) in dynamic environments. Second, we show that it is both possible and practical to efficiently and scalably maintain and query provenance in a distributed fashion, where provenance maintenance and querying are modeled as recursive continuous queries over distributed relations. Third, we present security extensions that allow operators to reliably query provenance information in adversarial environments. Our extensions incorporate tamper-evident properties that guarantee eventual detection of compromised nodes that lie or falsely implicate correct nodes. Finally, the proposed research results in a proof-of-concept prototype, which includes a declarative query language for specifying a range of useful provenance queries, an interactive exploration tool, and a distributed provenance engine for operators to conduct analysis of their distributed systems. We discuss the applicability of this tool in several use cases, including Internet routing, overlay routing, and cloud data processing

ScholarlyCommons@Penn

PatternLab for proteomics: a tool for differential shotgun proteomics

Author: B Zhang
C Cheadle
DL Tabb
DN Perkins
EI Chen
Emily I Chen
H Fröhlich
H Liu
I Guyon
J Aubert
JG Cleary
JH Holland
JK Eng
John R Yates
JR Yates
Juliana SG Fischer
JX Pang
KY Yeung
L Florens
L Li
M Katajamaa
MP Washburn
N Jain
N Jessani
P Jafari
Paulo C Carvalho
PC Carvalho
S Audic
T Joachims
TM Cover
TR Golub
Valmir C Barbosa
VN Vapnik
W Wang
Y Benjamini
YH Yang
Publication venue: BioMed Central
Publication date: 01/07/2008
Field of study

Abstract Background A goal of proteomics is to distinguish between states of a biological system by identifying protein expression differences. Liu <it>et al</it>. demonstrated a method to perform semi-relative protein quantitation in shotgun proteomics data by correlating the number of tandem mass spectra obtained for each protein, or "spectral count", with its abundance in a mixture; however, two issues have remained open: how to normalize spectral counting data and how to efficiently pinpoint differences between profiles. Moreover, Chen <it>et al</it>. recently showed how to increase the number of identified proteins in shotgun proteomics by analyzing samples with different MS-compatible detergents while performing proteolytic digestion. The latter introduced new challenges as seen from the data analysis perspective, since replicate readings are not acquired. Results To address the open issues above, we present a program termed PatternLab for proteomics. This program implements existing strategies and adds two new methods to pinpoint differences in protein profiles. The first method, ACFold, addresses experiments with less than three replicates from each state or having assays acquired by different protocols as described by Chen <it>et al</it>. ACFold uses a combined criterion based on expression fold changes, the AC test, and the false-discovery rate, and can supply a "bird's-eye view" of differentially expressed proteins. The other method addresses experimental designs having multiple readings from each state and is referred to as nSVM (natural support vector machine) because of its roots in evolutionary computing and in statistical learning theory. Our observations suggest that nSVM's niche comprises projects that select a minimum set of proteins for classification purposes; for example, the development of an early detection kit for a given pathology. We demonstrate the effectiveness of each method on experimental data and confront them with existing strategies. Conclusion PatternLab offers an easy and unified access to a variety of feature selection and normalization strategies, each having its own niche. Additionally, graphing tools are available to aid in the analysis of high throughput experimental data. PatternLab is available at <url>http://pcarvalho.com/patternlab</url>.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central