Search CORE

24 research outputs found

Practical whole-system provenance capture

Author: Akoush Sherif
Amir-Mohammadian Sepehr
Balakrishnan Nikilesh
Bates Adam
Bates Adam
Bauer Mick
Berger Stefan
Chan Sheung Chi
Davidson Susan B
Gonzalez Joseph E
Greenwood Mark
Gulzar Muhammad Ali
Han Xueyuan
Hoffman Steve
Katcher Jeffrey
Kyrola Aapo
Lee Brian
Lerner Barbara
Macko Peter
Morris James
Morris Thomas
Moyer Thomas
Muniswamy-Reddy Kiran-Kumar
Muniswamy-Reddy Kiran-Kumar
Muniswamy-Reddy Kiran-Kumar
Murta Leonardo
Pasquier Thomas
Povey Dean
Sailer Reiner
Schaufler Casey
Somayaji Anil
Xie Yulai
Zanussi Tom
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 24/09/2017
Field of study

Data provenance describes how data came to be in its present form. It includes data sources and the transformations that have been applied to them. Data provenance has many uses, from forensics and security to aiding the reproducibility of scientific experiments. We present CamFlow, a whole-system provenance capture mechanism that integrates easily into a PaaS offering. While there have been several prior whole-system provenance systems that captured a comprehensive, systemic and ubiquitous record of a system’s behavior, none have been widely adopted. They either A) impose too much overhead, B) are designed for long-outdated kernel releases and are hard to port to current systems, C) generate too much data, or D) are designed for a single system. CamFlow addresses these shortcoming by: 1) leveraging the latest kernel design advances to achieve efficiency; 2) using a self-contained, easily maintainable implementation relying on a Linux Security Module, NetFilter, and other existing kernel facilities; 3) providing a mechanism to tailor the captured provenance data to the needs of the application; and 4) making it easy to integrate provenance across distributed systems. The provenance we capture is streamed and consumed by tenant-built auditor applications. We illustrate the usability of our implementation by describing three such applications: demonstrating compliance with data regulations; performing fault/intrusion detection; and implementing data loss prevention. We also show how CamFlow can be leveraged to capture meaningful provenance without modifying existing applications.Engineering and Applied Science

arXiv.org e-Print Archive

Crossref

Harvard University - DASH

Explore Bristol Research

Xanthus: Push-button Orchestration of Host Provenance Data Collection

Author: Balakrishnan Nikilesh
Bates Adam
Gregg Brendan
Guo Philip J
Han Xueyuan
Han Xueyuan
Hassan Wajih~Ul
Jiang Xuxian
Jiang Xuxian
Kennedy David
Muniswamy-Reddy Kiran-Kumar
National Academies of Sciences Engineering, and
Pohly J
Spillane P
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 10/05/2020
Field of study

Host-based anomaly detectors generate alarms by inspecting audit logs for suspicious behavior. Unfortunately, evaluating these anomaly detectors is hard. There are few high-quality, publicly-available audit logs, and there are no pre-existing frameworks that enable push-button creation of realistic system traces. To make trace generation easier, we created Xanthus, an automated tool that orchestrates virtual machines to generate realistic audit logs. Using Xanthus' simple management interface, administrators select a base VM image, configure a particular tracing framework to use within that VM, and define post-launch scripts that collect and save trace data. Once data collection is finished, Xanthus creates a self-describing archive, which contains the VM, its configuration parameters, and the collected trace data. We demonstrate that Xanthus hides many of the tedious (yet subtle) orchestration tasks that humans often get wrong; Xanthus avoids mistakes that lead to non-replicable experiments.Comment: 6 pages, 1 figure, 7 listings, 1 table, worksho

arXiv.org e-Print Archive

Crossref

Explore Bristol Research

Aggregating unsupervised provenance anomaly detectors

Author: Berrada Ghita
Cheney James
Publication venue
Publication date: 16/05/2019
Field of study

Edinburgh Research Explorer

CWLProv - Interoperable Retrospective Provenance capture and its challenges

Author: Crusoe Michael R.
Khan Farah Zaib
Lonie Andrew
Sinnott Richard
Soiland-Reyes Stian
Publication venue
Publication date: 27/03/2018
Field of study

The automation of data analysis in the form of scientific workflows is a widely adopted practice in many fields of research nowadays. Computationally driven data-intensive experiments using workflows enable Automation, Scaling, Adaption and Provenance support (ASAP). However, there are still several challenges associated with the effective sharing, publication, understandability and reproducibility of such workflows due to the incomplete capture of provenance and the dependence on particular technical (software) platforms. This paper presents CWLProv, an approach for retrospective provenance capture utilizing open source community-driven standards involving application and customization of workflow-centric <a href="http://www.researchobject.org/">Research Objects</a> (ROs). The ROs are produced as an output of a workflow enactment defined in the <a href="http://www.commonwl.org/">Common Workflow Language</a> (CWL) using the CWL reference implementation and its data structures. The approach aggregates and annotates all the resources involved in the scientific investigation including inputs, outputs, workflow specification, command line tool specifications and input parameter settings. The resources are linked within the RO to enable re-enactment of an analysis without depending on external resources. The workflow provenance profile is represented in W3C recommended standard <a href="https://www.w3.org/TR/prov-n/">PROV-N</a> and <a href="https://www.w3.org/Submission/prov-json/">PROV-JSON</a> format to capture retrospective provenance of the workflow enactment. The workflow-centric RO produced as an output of a CWL workflow enactment is expected to be interoperable, reusable, shareable and portable across different plat- forms. This paper describes the need and motivation for <a href="https://github.com/common-workflow-language/cwltool/tree/provenance">CWLProv</a> and the lessons learned in applying it for ROs using CWL in the bioinformatics domain.</p

ZENODO

The University of Manchester - Institutional Repository

FigShare

Data Provenance

Author: C Wang
D Bailo
HU Asuncion
I Altintas
I Celino
J Frew
J Kim
J Zhao
L Gadelha
L Moreau
L Moreau
L Murta
MA Borkin
P Buneman
P Macko
P Yue
S Cox
T Lebo
T Tanhua
TD Huynh
Z Zhao
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2020
Field of study

Crossref

International Migration, Integration and Social Cohesion online publications

UvA-DARE

Flexible graph matching and graph edit distance using answer set programming

Author: BD McKay
BD McKay
H Bunke
J Kazius
J Lee
J Lerouge
K Riesen
M Frank
M Gebser
M Gebser
MR Garey
S Auer
Stéphane Zampelli
V Arvind
X Chen
X Gao
Z Abu-Aisheh
Z Abu-Aisheh
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 26/11/2019
Field of study

The graph isomorphism, subgraph isomorphism, and graph edit distance problems are combinatorial problems with many applications. Heuristic exact and approximate algorithms for each of these problems have been developed for different kinds of graphs: directed, undirected, labeled, etc. However, additional work is often needed to adapt such algorithms to different classes of graphs, for example to accommodate both labels and property annotations on nodes and edges. In this paper, we propose an approach based on answer set programming. We show how each of these problems can be defined for a general class of property graphs with directed edges, and labels and key-value properties annotating both nodes and edges. We evaluate this approach on a variety of synthetic and realistic graphs, demonstrating that it is feasible as a rapid prototyping approach.Comment: To appear, PADL 202

arXiv.org e-Print Archive

Crossref

Edinburgh Research Explorer

Towards Specificationless Monitoring of Provenance-Emitting Systems

Author: Stoffers Martin
Weinert Alexander
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2022
Field of study

Monitoring often requires insight into the monitored system as well as concrete specifications of expected behavior. More and more systems, however, provide information about their inner procedures by emitting provenance information in a W3C-standardized graph format. In this work, we present an approach to monitor such provenance data for anomalous behavior by performing spectral graph analysis on slices of the constructed provenance graph and by comparing the characteristics of each slice with those of a sliding window over recently seen slices. We argue that this approach not only simplifies the monitoring of heterogeneous distributed systems, but also enables applying a host of well-studied techniques to monitor such systems

Institute of Transport Research:Publications

From Here to Provtopia

Author: A Schreiber
G Coker
J Cheney
J Cheney
J Freire
L Carata
L Moreau
M Interlandi
P Alvaro
P Buneman
Ragib Hasan
RN Watson
SC Xu
ST King
T Garfinkel
T Jaeger
T Pasquier
T Pasquier
T Pasquier
Y Huang
Publication venue
Publication date: 30/08/2019
Field of study

Crossref

Explore Bristol Research