4,553 research outputs found

    DataHub: Collaborative Data Science & Dataset Version Management at Scale

    Get PDF
    Relational databases have limited support for data collaboration, where teams collaboratively curate and analyze large datasets. Inspired by software version control systems like git, we propose (a) a dataset version control system, giving users the ability to create, branch, merge, difference and search large, divergent collections of datasets, and (b) a platform, DataHub, that gives users the ability to perform collaborative data analysis building on this version control system. We outline the challenges in providing dataset version control at scale.Comment: 7 page

    A template-based graph transformation system for the PROV data model

    No full text
    As data provenance becomes a significant metadata in validating the origin of information and asserting its quality, it is crucial to hide the sensitive information of provenance data to enable trustworthiness prior to sharing provenance in open environments such as the Web. In this paper, a graph rewriting system is constructed from the PROV data model to hide restricted provenance information while preserving the integrity and connectivity of the provenance graph. The system is formally established as a template-based framework and formalised using category theory concepts, such as functors, diagrams, and natural transformation

    An Architecture for Provenance Systems

    No full text
    This document covers the logical and process architectures of provenance systems. The logical architecture identifies key roles and their interactions, whereas the process architecture discusses distribution and security. A fundamental aspect of our presentation is its technology-independent nature, which makes it reusable: the principles that are exposed in this document may be applied to different technologies

    Supporting provenance of digital calibration certificates with temporal databases

    Get PDF
    Trust in current and historical calibration data is crucial. The recently proposed XML schema for digital calibration certificates (DCCs) provides machine-readability and a common exchange format to enhance this trust. We present a prototype web application developed in the programming language Links for storing and displaying a DCC using a relational database. In particular, we leverage the temporal database features that Links provides to capture different versions of a certificate and inspect differences between versions. The prototype is the starting point for developing software to support DCCs and the data with which they are populated and has underlined that DCCs are the tip of the iceberg in automating the management of digital calibration data, activity that includes data provenance and tracking of modifications

    A methodology to take account of diversity in collective adaptive system

    No full text
    Collective Adaptive Systems (CASs) are comprised of a heterogeneous set of components often developed in a distributed manner. Their users are diverse with respect to their profiles, preferences, interests and goals, and hence, have different requirements. We propose a typology for the diversity of these components, users, and their requirements. We then present a methodology which provides steps to integrate features that record diversity to support accountability. The foundation of accountability is provided by provenance data, and a CAS vocabulary, these knowledge representation languages provide the core vocabulary that can be exploited by agents and services
    • ā€¦
    corecore