4 research outputs found

    GeneaLog: Fine-Grained Data Streaming Provenance at the Edge

    Get PDF
    Fine-grained data provenance in data streaming allows linking each result tuple back to the source data that contributed to it, something beneficial for many applications (e.g., to find the conditions triggering a security- or safety-related alert). Further, when data transmission or storage has to be minimized, as in edge computing and cyber-physical systems, it can help in identifying the source data to be prioritized.The memory and processing costs of fine-grained data provenance, possibly afforded by high-end servers, can be prohibitive for the resource-constrained devices deployed in edge computing and cyber-physical systems. Motivated by this challenge, we present GeneaLog, a novel fine-grained data provenance technique for data streaming applications. Leveraging the logical dependencies of the data, GeneaLog takes advantage of cross-layer properties of the software stack and incurs a minimal, constant size per-tuple overhead. Furthermore, it allows for a modular and efficient algorithmic implementation using only standard data streaming operators. This is particularly useful for distributed streaming applications since the provenance processing can be executed at separate nodes, orthogonal to the data processing. We evaluate an implementation of GeneaLog using vehicular and smart grid applications, confirming it efficiently captures fine-grained provenance data with minimal overhead

    A Survey of Scholarly Data: From Big Data Perspective

    Get PDF
    Recently, there has been a shifting focus of organizations and governments towards digitization of academic and technical documents, adding a new facet to the concept of digital libraries. The volume, variety and velocity of this generated data, satisfies the big data definition, as a result of which, this scholarly reserve is popularly referred to as big scholarly data. In order to facilitate data analytics for big scholarly data, architectures and services for the same need to be developed. The evolving nature of research problems has made them essentially interdisciplinary. As a result, there is a growing demand for scholarly applications like collaborator discovery, expert finding and research recommendation systems, in addition to several others. This research paper investigates the current trends and identifies the existing challenges in development of a big scholarly data platform, with specific focus on directions for future research and maps them to the different phases of the big data lifecycle

    Provenance Research Issues and Challenges in the Big Data Era

    No full text
    Provenance of Big Data is a hot-topic in the database and data mining research communities. Basically, provenance is the process of detecting the lineage and the derivation of data and data objects, and it plays a major role in database management systems as well as in workflow management systems and distributed systems. Despite this, provenance of big data research is still in its embryonic phase, and a lot of efforts must still be done in this area. Inspired by these considerations, in this paper we provide an overview of relevant issues and challenges in the context of big data provenance research, by also highlighting possible future efforts within these research directions
    corecore