570 research outputs found

    Provenance-based Auditing of Private Data Use

    No full text
    Across the world, organizations are required to comply with regulatory frameworks dictating how to manage personal information. Despite these, several cases of data leaks and exposition of private data to unauthorized recipients have been publicly and widely advertised. For authorities and system administrators to check compliance to regulations, auditing of private data processing becomes crucial in IT systems. Finding the origin of some data, determining how some data is being used, checking that the processing of some data is compatible with the purpose for which the data was captured are typical functionality that an auditing capability should support, but difficult to implement in a reusable manner. Such questions are so-called provenance questions, where provenance is defined as the process that led to some data being produced. The aim of this paper is to articulate how data provenance can be used as the underpinning approach of an auditing capability in IT systems. We present a case study based on requirements of the Data Protection Act and an application that audits the processing of private data, which we apply to an example manipulating private data in a university

    Referencing Sources of Molecular Spectroscopic Data in the Era of Data Science: Application to the HITRAN and AMBDAS Databases

    Full text link
    The application described has been designed to create bibliographic entries in large databases with diverse sources automatically, which reduces both the frequency of mistakes and the workload for the administrators. This new system uniquely identifies each reference from its digital object identifier (DOI) and retrieves the corresponding bibliographic information from any of several online services, including the SAO/NASA Astrophysics Data Systems (ADS) and CrossRef APIs. Once parsed into a relational database, the software is able to produce bibliographies in any of several formats, including HTML and BibTeX, for use on websites or printed articles. The application is provided free-of-charge for general use by any scientific database. The power of this application is demonstrated when used to populate reference data for the HITRAN and AMBDAS databases as test cases. HITRAN contains data that is provided by researchers and collaborators throughout the spectroscopic community. These contributors are accredited for their contributions through the bibliography produced alongside the data returned by an online search in HITRAN. Prior to the work presented here, HITRAN and AMBDAS created these bibliographies manually, which is a tedious, time-consuming and error-prone process. The complete code for the new referencing system can be found at \url{https://github.com/hitranonline/refs}.Comment: 11 pages, 5 figures, already published online at https://doi.org/10.3390/atoms802001

    Mapping attribution metadata to the Open Provenance Model

    Get PDF

    Metadata and provenance management

    Get PDF
    Scientists today collect, analyze, and generate TeraBytes and PetaBytes of data. These data are often shared and further processed and analyzed among collaborators. In order to facilitate sharing and data interpretations, data need to carry with it metadata about how the data was collected or generated, and provenance information about how the data was processed. This chapter describes metadata and provenance in the context of the data lifecycle. It also gives an overview of the approaches to metadata and provenance management, followed by examples of how applications use metadata and provenance in their scientific processes

    Decentralized Identity and Access Management Framework for Internet of Things Devices

    Get PDF
    The emerging Internet of Things (IoT) domain is about connecting people and devices and systems together via sensors and actuators, to collect meaningful information from the devices surrounding environment and take actions to enhance productivity and efficiency. The proliferation of IoT devices from around few billion devices today to over 25 billion in the next few years spanning over heterogeneous networks defines a new paradigm shift for many industrial and smart connectivity applications. The existing IoT networks faces a number of operational challenges linked to devices management and the capability of devices’ mutual authentication and authorization. While significant progress has been made in adopting existing connectivity and management frameworks, most of these frameworks are designed to work for unconstrained devices connected in centralized networks. On the other hand, IoT devices are constrained devices with tendency to work and operate in decentralized and peer-to-peer arrangement. This tendency towards peer-to-peer service exchange resulted that many of the existing frameworks fails to address the main challenges faced by the need to offer ownership of devices and the generated data to the actual users. Moreover, the diversified list of devices and offered services impose that more granular access control mechanisms are required to limit the exposure of the devices to external threats and provide finer access control policies under control of the device owner without the need for a middleman. This work addresses these challenges by utilizing the concepts of decentralization introduced in Distributed Ledger (DLT) technologies and capability of automating business flows through smart contracts. The proposed work utilizes the concepts of decentralized identifiers (DIDs) for establishing a decentralized devices identity management framework and exploits Blockchain tokenization through both fungible and non-fungible tokens (NFTs) to build a self-controlled and self-contained access control policy based on capability-based access control model (CapBAC). The defined framework provides a layered approach that builds on identity management as the foundation to enable authentication and authorization processes and establish a mechanism for accounting through the adoption of standardized DLT tokenization structure. The proposed framework is demonstrated through implementing a number of use cases that addresses issues related identity management in industries that suffer losses in billions of dollars due to counterfeiting and lack of global and immutable identity records. The framework extension to support applications for building verifiable data paths in the application layer were addressed through two simple examples. The system has been analyzed in the case of issuing authorization tokens where it is expected that DLT consensus mechanisms will introduce major performance hurdles. A proof of concept emulating establishing concurrent connections to a single device presented no timed-out requests at 200 concurrent connections and a rise in the timed-out requests ratio to 5% at 600 connections. The analysis showed also that a considerable overhead in the data link budget of 10.4% is recorded due to the use of self-contained policy token which is a trade-off between building self-contained access tokens with no middleman and link cost

    Big Data Analytics in Static and Streaming Provenance

    Get PDF
    Thesis (Ph.D.) - Indiana University, Informatics and Computing,, 2016With recent technological and computational advances, scientists increasingly integrate sensors and model simulations to understand spatial, temporal, social, and ecological relationships at unprecedented scale. Data provenance traces relationships of entities over time, thus providing a unique view on over-time behavior under study. However, provenance can be overwhelming in both volume and complexity; the now forecasting potential of provenance creates additional demands. This dissertation focuses on Big Data analytics of static and streaming provenance. It develops filters and a non-preprocessing slicing technique for in-situ querying of static provenance. It presents a stream processing framework for online processing of provenance data at high receiving rate. While the former is sufficient for answering queries that are given prior to the application start (forward queries), the latter deals with queries whose targets are unknown beforehand (backward queries). Finally, it explores data mining on large collections of provenance and proposes a temporal representation of provenance that can reduce the high dimensionality while effectively supporting mining tasks like clustering, classification and association rules mining; and the temporal representation can be further applied to streaming provenance as well. The proposed techniques are verified through software prototypes applied to Big Data provenance captured from computer network data, weather models, ocean models, remote (satellite) imagery data, and agent-based simulations of agricultural decision making

    Rover: Architectural Support for Exposing and Using Context

    Get PDF
    Technology has advanced to the point where many people feel it has created a world with an insurmountable amount of information. Information includes messages people send to each other, logged data from their activities, and the services available to them. This problem has been exaggerated in modern societies by high availability of Internet connectivity. All types of information contains context, whether they have been stated explicitly or understood implicitly. Understanding, handling, and using context represents one of the most critical steps towards coping with the amount of information available today. In this dissertation, we examine two topics: context and the design of a context-aware platform. We describe fundamental types of context associated with every piece of information and discuss issues which may occur when implementing a system which utilizes context. We present a context-aware platform called Rover. The Rover architecture provides a conceptual framework geared towards understanding how application developers can utilize a variety of aspects of context to assist the development of modern applications. To aid developers in figuring out what context may be useful in their application, we describe the concept of a Rover ecosystem: a logical organization analogous to how similar groups of people interact with each other. We also discuss how information and context can be shared between ecosystems. To examine the feasibility of the Rover architecture's conceptual framework, we have implemented a reference implementation of the core unit of a Rover ecosystem: the Rover server. We discuss the details of the Rover server and describe the implementation of an emergency response application which demonstrates the utility of the conceptual framework

    Curating the CIA World Factbook

    Get PDF
    This paper is based on the paper given by the authors at the 5th International Digital Curation Conference, December 2009; received November 2009, published December 2009.The CIA World Factbook is a prime example of a curated database – a database that is constructed and maintained with a great deal of human effort in collecting, verifying, and annotating data. Preservation of old versions of the Factbook is important for verification of citations; it is also essential for anyone interested in the history of the data such as demographic change. Although the Factbook has been published, both physically and electronically, only for the past 30 years, we appear in danger of losing this history. This paper investigates the issues involved in capturing the history of an evolving database and its application to the CIA World Factbook. In particular it shows that there is substantial added value to be gained by preserving databases in such a way that questions about the change in data, (longitudinal queries) can be readily answered. Within this paper, we describe techniques for recording change in a curated database and we describe novel techniques for querying the change. Using the example of this archived curated database, we discuss the extent to which the accepted practices and terminology of archiving, curation and digital preservation apply to this important class of digital artefacts
    corecore