31,061 research outputs found
Practical whole-system provenance capture
Data provenance describes how data came to be in its present form. It includes data sources and the transformations that have been applied to them. Data provenance has many uses, from forensics and security to aiding the reproducibility of scientific experiments. We present CamFlow, a whole-system provenance capture mechanism that integrates easily into a PaaS offering. While there have been several prior whole-system provenance systems that captured a comprehensive, systemic and ubiquitous record of a systemâs behavior, none have been widely adopted. They either A) impose too much overhead, B) are designed for long-outdated kernel releases and are hard to port to current systems, C) generate too much data, or D) are designed for a single system. CamFlow addresses these shortcoming by: 1) leveraging the latest kernel design advances to achieve efficiency; 2) using a self-contained, easily maintainable implementation relying on a Linux Security Module, NetFilter, and other existing kernel facilities; 3) providing a mechanism to tailor the captured provenance data to the needs of the application; and 4) making it easy to integrate provenance across distributed systems. The provenance we capture is streamed and consumed by tenant-built auditor applications. We illustrate the usability of our implementation by describing three such applications: demonstrating compliance with data regulations; performing fault/intrusion detection; and implementing data loss prevention. We also show how CamFlow can be leveraged to capture meaningful provenance without modifying existing applications.Engineering and Applied Science
Viewpoint | Personal Data and the Internet of Things: It is time to care about digital provenance
The Internet of Things promises a connected environment reacting to and
addressing our every need, but based on the assumption that all of our
movements and words can be recorded and analysed to achieve this end.
Ubiquitous surveillance is also a precondition for most dystopian societies,
both real and fictional. How our personal data is processed and consumed in an
ever more connected world must imperatively be made transparent, and more
effective technical solutions than those currently on offer, to manage personal
data must urgently be investigated.Comment: 3 pages, 0 figures, preprint for Communication of the AC
The lifecycle of provenance metadata and its associated challenges and opportunities
This chapter outlines some of the challenges and opportunities associated
with adopting provenance principles and standards in a variety of disciplines,
including data publication and reuse, and information sciences
Working with Legacy Media: A Lone Arranger\u27s First Steps
[Excerpt] In 2013, a naked hard drive from Fiji arriving in my small religious archives (an equivalent full-time staff of 2.5 â one archivist and two archivesâ assistants) started me off on the path of digital preservation and, in particular, the digital forensics practices that are beneficial for archivists. With such a small staff, outsourced IT services, and no digital preservation policy in sight, it was time to start exploring how institutions of my size could manage legacy media and start planning for the born-digital archives that will continue to arrive. Since I hold a part-time position, I was able to undertake this exploration in my own time through the support provided by a scholarship from the Ian McLean Wards Memorial Trust in 2015
Sharing and Preserving Computational Analyses for Posterity with encapsulator
Open data and open-source software may be part of the solution to science's
"reproducibility crisis", but they are insufficient to guarantee
reproducibility. Requiring minimal end-user expertise, encapsulator creates a
"time capsule" with reproducible code in a self-contained computational
environment. encapsulator provides end-users with a fully-featured desktop
environment for reproducible research.Comment: 11 pages, 6 figure
The Research Object Suite of Ontologies: Sharing and Exchanging Research Data and Methods on the Open Web
Research in life sciences is increasingly being conducted in a digital and
online environment. In particular, life scientists have been pioneers in
embracing new computational tools to conduct their investigations. To support
the sharing of digital objects produced during such research investigations, we
have witnessed in the last few years the emergence of specialized repositories,
e.g., DataVerse and FigShare. Such repositories provide users with the means to
share and publish datasets that were used or generated in research
investigations. While these repositories have proven their usefulness,
interpreting and reusing evidence for most research results is a challenging
task. Additional contextual descriptions are needed to understand how those
results were generated and/or the circumstances under which they were
concluded. Because of this, scientists are calling for models that go beyond
the publication of datasets to systematically capture the life cycle of
scientific investigations and provide a single entry point to access the
information about the hypothesis investigated, the datasets used, the
experiments carried out, the results of the experiments, the people involved in
the research, etc. In this paper we present the Research Object (RO) suite of
ontologies, which provide a structured container to encapsulate research data
and methods along with essential metadata descriptions. Research Objects are
portable units that enable the sharing, preservation, interpretation and reuse
of research investigation results. The ontologies we present have been designed
in the light of requirements that we gathered from life scientists. They have
been built upon existing popular vocabularies to facilitate interoperability.
Furthermore, we have developed tools to support the creation and sharing of
Research Objects, thereby promoting and facilitating their adoption.Comment: 20 page
Meeting the design challenges of nano-CMOS electronics: an introduction to an upcoming EPSRC pilot project
The years of âhappy scalingâ are over and the fundamental challenges that the semiconductor industry faces, at both technology and device level, will impinge deeply upon the design of future integrated circuits and systems. This paper provides an introduction to these challenges and gives an overview of the Grid infrastructure that will be developed as part of a recently funded EPSRC pilot project to address them, and we hope, which will revolutionise the electronics design industry
- âŠ