8,020 research outputs found

    Abstracting PROV provenance graphs:A validity-preserving approach

    Get PDF
    Data provenance is a structured form of metadata designed to record the activities and datasets involved in data production, as well as their dependency relationships. The PROV data model, released by the W3C in 2013, defines a schema and constraints that together provide a structural and semantic foundation for provenance. This enables the interoperable exchange of provenance between data producers and consumers. When the provenance content is sensitive and subject to disclosure restrictions, however, a way of hiding parts of the provenance in a principled way before communicating it to certain parties is required. In this paper we present a provenance abstraction operator that achieves this goal. It maps a graphical representation of a PROV document PG1 to a new abstract version PG2, ensuring that (i) PG2 is a valid PROV graph, and (ii) the dependencies that appear in PG2 are justified by those that appear in PG1. These two properties ensure that further abstraction of abstract PROV graphs is possible. A guiding principle of the work is that of minimum damage: the resultant graph is altered as little as possible, while ensuring that the two properties are maintained. The operator developed is implemented as part of a user tool, described in a separate paper, that lets owners of sensitive provenance information control the abstraction by specifying an abstraction policy.</p

    Data mining and fusion

    No full text

    Securing the software-defined networking control plane by using control and data dependency techniques

    Get PDF
    Software-defined networking (SDN) fundamentally changes how network and security practitioners design, implement, and manage their networks. SDN decouples the decision-making about traffic forwarding (i.e., the control plane) from the traffic being forwarded (i.e., the data plane). SDN also allows for network applications, or apps, to programmatically control network forwarding behavior and policy through a logically centralized control plane orchestrated by a set of SDN controllers. As a result of logical centralization, SDN controllers act as network operating systems in the coordination of shared data plane resources and comprehensive security policy implementation. SDN can support network security through the provision of security services and the assurances of policy enforcement. However, SDNā€™s programmability means that a networkā€™s security considerations are different from those of traditional networks. For instance, an adversary who manipulates the programmable control plane can leverage significant control over the data planeā€™s behavior. In this dissertation, we demonstrate that the security posture of SDN can be enhanced using control and data dependency techniques that track information flow and enable understanding of application composability, control and data plane decoupling, and control plane insight. We support that statement through investigation of the various ways in which an attacker can use control flow and data flow dependencies to influence the SDN control plane under different threat models. We systematically explore and evaluate the SDN security posture through a combination of runtime, pre-runtime, and post-runtime contributions in both attack development and defense designs. We begin with the development a conceptual accountability framework for SDN. We analyze the extent to which various entities within SDN are accountable to each other, what they are accountable for, mechanisms for assurance about accountability, standards by which accountability is judged, and the consequences of breaching accountability. We discover significant research gaps in SDNā€™s accountability that impact SDNā€™s security posture. In particular, the results of applying the accountability framework showed that more control plane attribution is necessary at different layers of abstraction, and that insight motivated the remaining work in this dissertation. Next, we explore the influence of apps in the SDN control planeā€™s secure operation. We find that existing access control protections that limit what apps can do, such as role-based access controls, prove to be insufficient for preventing malicious apps from damaging control plane operations. The reason is SDNā€™s reliance on shared network state. We analyze SDNā€™s shared state model to discover that benign apps can be tricked into acting as ā€œconfused deputiesā€; malicious apps can poison the state used by benign apps, and that leads the benign apps to make decisions that negatively affect the network. That violates an implicit (but unenforced) integrity policy that governs the networkā€™s security. Because of the strong interdependencies among apps that result from SDNā€™s shared state model, we show that apps can be easily co-opted as ā€œgadgets,ā€ and that allows an attacker who minimally controls one app to make changes to the network state beyond his or her originally granted permissions. We use a data provenance approach to track the lineage of the network state objects by assigning attribution to the set of processes and agents responsible for each control plane object. We design the ProvSDN tool to track API requests from apps as they access the shared network stateā€™s objects, and to check requests against a predefined integrity policy to ensure that low-integrity apps cannot poison high-integrity apps. ProvSDN acts as both a reference monitor and an information flow control enforcement mechanism. Motivated by the strong inter-app dependencies, we investigate whether implicit data plane dependencies affect the control planeā€™s secure operation too. We find that data plane hosts typically have an outsized effect on the generation of the network state in reactive-based control plane designs. We also find that SDNā€™s event-based design, and the apps that subscribe to events, can induce dependencies that originate in the data plane and that eventually change forwarding behaviors. That combination gives attackers that are residing on data plane hosts significant opportunities to influence control plane decisions without having to compromise the SDN controller or apps. We design the EventScope tool to automatically identify where such vulnerabilities occur. EventScope clusters appsā€™ event usage to decide in which cases unhandled events should be handled, statically analyzes controller and app code to understand how events affect control plane execution, and identifies valid control flow paths in which a data plane attacker can reach vulnerable code to cause unintended data plane changes. We use EventScope to discover 14 new vulnerabilities, and we develop exploits that show how such vulnerabilities could allow an attacker to bypass an intended network (i.e., data plane) access control policy. This research direction is critical for SDN security evaluation because such vulnerabilities could be induced by host-based malware campaigns. Finally, although there are classes of vulnerabilities that can be removed prior to deployment, it is inevitable that other classes of attacks will occur that cannot be accounted for ahead of time. In those cases, a network or security practitioner would need to have the right amount of after-the-fact insight to diagnose the root causes of such attacks without being inundated with too much informa- tion. Challenges remain in 1) the modeling of apps and objects, which can lead to overestimation or underestimation of causal dependencies; and 2) the omission of a data plane model that causally links control and data plane activities. We design the PicoSDN tool to mitigate causal dependency modeling challenges, to account for a data plane model through the use of the data plane topology to link activities in the provenance graph, and to account for network semantics to appropriately query and summarize the control planeā€™s history. We show how prior work can hinder investigations and analysis in SDN-based attacks and demonstrate how PicoSDN can track SDN control plane attacks.Ope

    Understanding Legacy Workflows through Runtime Trace Analysis

    Get PDF
    abstract: When scientific software is written to specify processes, it takes the form of a workflow, and is often written in an ad-hoc manner in a dynamic programming language. There is a proliferation of legacy workflows implemented by non-expert programmers due to the accessibility of dynamic languages. Unfortunately, ad-hoc workflows lack a structured description as provided by specialized management systems, making ad-hoc workflow maintenance and reuse difficult, and motivating the need for analysis methods. The analysis of ad-hoc workflows using compiler techniques does not address dynamic languages - a program has so few constrains that its behavior cannot be predicted. In contrast, workflow provenance tracking has had success using run-time techniques to record data. The aim of this work is to develop a new analysis method for extracting workflow structure at run-time, thus avoiding issues with dynamics. The method captures the dataflow of an ad-hoc workflow through its execution and abstracts it with a process for simplifying repetition. An instrumentation system first processes the workflow to produce an instrumented version, capable of logging events, which is then executed on an input to produce a trace. The trace undergoes dataflow construction to produce a provenance graph. The dataflow is examined for equivalent regions, which are collected into a single unit. The workflow is thus characterized in terms of its treatment of an input. Unlike other methods, a run-time approach characterizes the workflow's actual behavior; including elements which static analysis cannot predict (for example, code dynamically evaluated based on input parameters). This also enables the characterization of dataflow through external tools. The contributions of this work are: a run-time method for recording a provenance graph from an ad-hoc Python workflow, and a method to analyze the structure of a workflow from provenance. Methods are implemented in Python and are demonstrated on real world Python workflows. These contributions enable users to derive graph structure from workflows. Empowered by a graphical view, users can better understand a legacy workflow. This makes the wealth of legacy ad-hoc workflows accessible, enabling workflow reuse instead of investing time and resources into creating a workflow.Dissertation/ThesisMasters Thesis Computer Science 201

    The Origin of Data: Enabling the Determination of Provenance in Multi-institutional Scientific Systems through the Documentation of Processes

    Get PDF
    The Oxford English Dictionary defines provenance as (i) the fact of coming from some particular source or quarter; origin, derivation. (ii) the history or pedigree of a work of art, manuscript, rare book, etc.; concr., a record of the ultimate derivation and passage of an item through its various owners. In art, knowing the provenance of an artwork lends weight and authority to it while providing a context for curators and the public to understand and appreciate the workā€™s value. Without such a documented history, the work may be misunderstood, unappreciated, or undervalued. In computer systems, knowing the provenance of digital objects would provide them with greater weight, authority, and context just as it does for works of art. Specifically, if the provenance of digital objects could be determined, then users could understand how documents were produced, how simulation results were generated, and why decisions were made. Provenance is of particular importance in science, where experimental results are reused, reproduced, and verified. However, science is increasingly being done through large-scale collaborations that span multiple institutions, which makes the problem of determining the provenance of scientific results significantly harder. Current approaches to this problem are not designed specifically for multi-institutional scientific systems and their evolution towards greater dynamic and peer-to-peer topologies. Therefore, this thesis advocates a new approach, namely, that through the autonomous creation, scalable recording, and principled organisation of documentation of systemsā€™ processes, the determination of the provenance of results produced by complex multi-institutional scientific systems is enabled. The dissertation makes four contributions to the state of the art. First is the idea that provenance is a query performed over documentation of a systemā€™s past process. Thus, the problem is one of how to collect and collate documentation from multiple distributed sources and organise it in a manner that enables the provenance of a digital object to be determined. Second is an open, generic, shared, principled data model for documentation of processes, which enables its collation so that it provides high-quality evidence that a systemā€™s processes occurred. Once documentation has been created, it is recorded into specialised repositories called provenance stores using a formally specified protocol, which ensures documentation has high-quality characteristics. Furthermore, patterns and techniques are given to permit the distributed deployment of provenance stores. The protocol and patterns are the third contribution. The fourth contribution is a characterisation of the use of documentation of process to answer questions related to the provenance of digital objects and the impact recording has on application performance. Specifically, in the context of a bioinformatics case study, it is shown that six different provenance use cases are answered given an overhead of 13% on experiment run-time. Beyond the case study, the solution has been applied to other applications including fault tolerance in service-oriented systems, aerospace engineering, and organ transplant management

    An Analytical Survey of Provenance Sanitization

    Get PDF
    Security is likely becoming a critical factor in the future adoption of provenance technology, because of the risk of inadvertent disclosure of sensitive information. In this survey paper we review the state of the art in secure provenance, considering mechanisms for controlling access, and the extent to which these mechanisms preserve provenance integrity. We examine seven systems or approaches, comparing features and identifying areas for future work.Comment: To appear, IPAW 201
    • ā€¦
    corecore