5 research outputs found

    A formal model of provenance in distributed systems

    Get PDF
    We present a formalism for provenance in distributed systems based on the π-calculus. Its main feature is that all data products are annotated with metadata representing their provenance. The calculus is given a provenance tracking semantics, which ensures that data provenance is updated as the computation proceeds. The calculus also enjoys a pattern-restricted input primitive which allows processes to decide what data to receive and what branch of computation to proceed with based on the provenance information of data. We give examples to illustrate the use of the calculus and discuss some of the semantic properties of our provenance notion. We conclude by reviewing related work and discussing directions for future research.peer-reviewe

    Causality and the semantics of provenance

    Full text link
    Provenance, or information about the sources, derivation, custody or history of data, has been studied recently in a number of contexts, including databases, scientific workflows and the Semantic Web. Many provenance mechanisms have been developed, motivated by informal notions such as influence, dependence, explanation and causality. However, there has been little study of whether these mechanisms formally satisfy appropriate policies or even how to formalize relevant motivating concepts such as causality. We contend that mathematical models of these concepts are needed to justify and compare provenance techniques. In this paper we review a theory of causality based on structural models that has been developed in artificial intelligence, and describe work in progress on a causal semantics for provenance graphs.Comment: Workshop submissio

    Provenance in distributed systems: a process algebraic study of provenance management and its role in establishing trust in data quality

    No full text
    We aim to develop a formal framework to reason about provenance in distributed systems. We take as our starting point an extension of the asynchronous pi-calculus where processes are explicitly assigned principal identities. We enrich this basic setting with provenance annotated data, dynamic provenance tracking and dynamically checked trust policies. We give several examples to illustrate the use of the calculus in modelling systems where principals base their trust in the quality of data on the provenance information associated with it.We consider the role of provenance in the calculus by relating the provenance tracking semantics to a plain one in which no provenance tracking or checking takes place. We further substantiate this by studying bisimulation-based behavioural equivalences for the plain and annotated versions of the calculus and contrasting the discriminating power of the equivalences obtained in each case. We also give a more denotational take on the semantics of the provenance calculus and look at notions of well-formedness and soundness for the provenance tracking semantics.We consider two different extensions of the basic calculus. The first aims to alleviate the cost of run time provenance tracking and checking by defining a static type system which guarantees that in well-typed systems principals always receive data with provenance that matches their requirements. The second extension looks at the ramifications of provenance tracking on privacy and security policies and consists of extending the calculus with a notion we call filters. This gives principals the ability to assign different views of the provenance of a given value to different principals, thus allowing for the selective disclosure of provenance information. We study behavioural equivalences for this extension of the calculus, paying particular attention to the set of principals composing the observer and its role in discriminating between systems

    A Formal Model of Provenance in Distributed Systems

    No full text
    We present a formalism for provenance in distributed systems based on the π-calculus. Its main feature is that all data products are annotated with metadata representing their provenance. The calculus is given a provenance tracking semantics, which ensures that data provenance is updated as the computation proceeds. The calculus also enjoys a pattern-restricted input primitive which allows processes to decide what data to receive and what branch of computation to proceed with based on the provenance information of data. We give examples to illustrate the use of the calculus and discuss some of the semantic properties of our provenance notion. We conclude by reviewing related work and discussing directions for future research
    corecore