Search CORE

77 research outputs found

Recommended from our members

MrLazy: Lazy runtime label propagation for MapReduce

Author: Akoush S
Carata L
Hopper A
Sohan R
Publication venue: 6th USENIX Workshop on Hot Topics in Cloud Computing, HotCloud 2014
Publication date: 01/06/2014
Field of study

Organisations are starting to publish datasets containing potentially sensitive information in the Cloud; hence it is important there is a clear audit trail to show that involved parties are respecting data sharing laws and policies. Information Flow Control (IFC) has been proposed as a solution. However, fine-grained IFC has various deployment challenges and runtime overhead issues that have limited wide adoptation so far. In this paper we present MrLazy, a system that practically addresses some of these issues for MapReduce. Within one trust domain, we relax the need of continuously checking policies. We instead rely on lineage (information about the origin of a piece of data) as a mechanism to retrospectively apply policies on-demand. We show that MrLazy imposes manageable temporal and spatial overheads while enabling fine-grained data regulation

Apollo (Cambridge)

Recommended from our members

Accelerating the Configuration Tuning of Big Data Analytics with Similarity-aware Multitask Bayesian Optimization

Author: Carata L
Fekry A
Pasquier T
Rice A
Publication venue: Proceedings - 2020 IEEE International Conference on Big Data, Big Data 2020
Publication date: 01/01/2020
Field of study

One of the key challenges for data analytics deployment is configuration tuning. The existing approaches for configuration tuning are expensive and overlook the dynamic characteristics of the analytics environment (i.e. frequent changes in workload due to receiving evolving input sizes or change in the underlying cluster environment). Such workload/environment changes can cause significant performance degradation, with retuning the configuration to accommodate those changes can yield up to 85\% potential execution time saving. We propose SimTune, an approach that accommodates such changes through efficient configuration tuning. SimTune combines workload characterization and Multitask Bayesian optimization to identify similarity across workloads and accelerate finding near-optimal configurations. Our experimental results show that SimTune reduces the search time for finding close-to-optimal configurations by 56-73\% (at the median) when compared to existing state-of-the-art techniques. This means that the amortization of the tuning cost happens significantly faster, enabling practical tuning in the rapidly changing environment of distributed analytics.Google Cloud, Amazon AW

Apollo (Cambridge)

To Tune or Not to Tune?: In Search of Optimal Configurations for Data Analytics

Author: Carata L
Fekry A
Hopper A
Pasquier T
Rice A
Publication venue: Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
Publication date: 01/01/2020
Field of study

This experimental study presents several overlooked issues that pose a challenge for data analytics configuration tuning and deployment. These issues include: 1) the assumption of static workload/environment ignoring the dynamic characteristics of the analytics environment (e.g. the frequent need for workload retuning). 2) the speed of tuning cost amortization and how this influences the tuning decision. 3) the need for a comprehensive incremental tuning for a diverse set of workloads. To prove our point, we present Tuneful, an efficient configuration tuning framework for data analytics. We show how it is designed to overcome the above issues and illustrate its applicability by experimenting with it on two cloud service providers

Crossref

Apollo (Cambridge)

Explore Bristol Research

Recommended from our members

Shadow kernels: A general mechanism for kernel specialization in existing operating systems

Author: Balakrishnan N
Carata L
Chick ORA
Snee J
Sohan R
Publication venue: Operating Systems Review (ACM)
Publication date: 01/01/2016
Field of study

Existing operating systems share a common kernel text section amongst all processes. It is not possible to perform kernel specialization or tuning such that different applications execute text optimized for their kernel use despite the benefits of kernel specialization for performance guided optimization, exokernels, kernel fastpaths, and cheaper hardware access. Current specialization primitives involve system wide changes to kernel text, which can have adverse effects on other processes sharing the kernel due to the global side-effects. We present shadow kernels: a primitive that allows multiple kernel text sections to coexist in a contemporary operating system. By remapping kernel virtual memory on a context-switch, or for individual system calls, we specialize the kernel on a fine-grained basis. Our implementation of shadow kernels uses the Xen hypervisor so can be applied to any operating system that runs on Xen.This work was principally supported by internal funds from the Computer Laboratory at the University of Cambridge; and also by the Engineering and Physical Sciences Research Council [grant number EP/K503009/1].This is the final version of the article. It first appeared from ACM via http://dx.doi.org/10.1145/2797022.279702

Apollo (Cambridge)

Structural analysis of whole-system provenance graphs

Author: Balakrishnan ND
Bytheway T
Carata L
Sohan R
Soman J
Watson RNM
Publication venue: Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Publication date: 01/01/2018
Field of study

System based provenance generates traces captured from various systems, a representation method for inferring these traces is a graph. These graphs are not well understood, and current work focuses on their extraction and processing, without a thorough characterization being in place. This paper studies the topology of such graphs. We an- alyze multiple Whole-system-Provenance graphs and present that they have hubs-and-authorities model of graphs as well as a power law distri- bution. Our observations allow for a novel understanding of the structure of Whole-system-Provenance graphs.DARP

Crossref

Apollo (Cambridge)

A primer on provenance

Author: Akoush S
Balakrishnan N
Bytheway T
Carata L
Hopper A
Selter M
Sohan R
Publication venue: Communications of the ACM
Publication date: 01/01/2014
Field of study

Better understanding data requires tracking its history and context.</jats:p

Crossref

Apollo (Cambridge)

Recommended from our members

Soroban: Attributing latency in virtualized environments

Author: Carata L
Chick ORA
Faragher RM
Hopper A
Rice A
Snee J
Sohan R
Publication venue: 7th USENIX Workshop on Hot Topics in Cloud Computing, HotCloud 2015
Publication date: 01/01/2015
Field of study

Applications running in the cloud have highly-variable response times due to the lack of perfect performance isolation from other services served by common infrastructure. In particular, response latency when executing on a loaded hypervisor or in a container is substantially higher than uncontested bare-metal performance. Whilst efforts to increase performance isolation continue, we present Soroban, a framework for attributing latency to either the cloud provider or their customer. Soroban allows cloud providers to instrument commonly used programs, such as a web server to determine, for each request, how much of the latency is due to the cloud provider, or the consumer. We apply Soroban to a HTTP server and show that it identifies when the cause of latency is due to a provider-induced activity, such as underprovisioning a host, or due to the software run by the customer.This is the author accepted manuscript. The final version is available from USENIX. via https://www.usenix.org/conference/hotcloud15/workshop-program/presentation/sne

Apollo (Cambridge)

Recommended from our members

Applying provenance in APT monitoring and analysis: Practical challenges for scalable, efficient and trustworthy distributed provenance

Author: Anderson J
Balakrishnan N
Bytheway T
Carata L
Jenkinson G
Kidney B
Neville-Neil G
Sohan R
Strnad A
Thomas A
Watson RNM
Publication venue: TaPP 2017 - 9th USENIX Workshop on the Theory and Practice of Provenance
Publication date: 01/01/2017
Field of study

Advanced Persistent Threats (APT) are a class of security threats in which a well-resourced attacker targets a specific individual or organisation with a predefined goal. This typically involves exfiltration of confidential material, although increasingly attacks target the encryption or destruction of mission critical data. With traditional prevention and detection mechanisms failing to stem the tide of such attacks, there is a pressing need for new monitoring and analysis tools that reduce both false-positive rates and the cognitive burden on human analysts. We propose that local and distributed provenance metadata can simplify and improve monitoring and analysis of APTs by providing a single, authoritative sequence of events that captures the context (and side effects) of potentially malicious activities. Provenance metadata allows a human analyst to backtrack from detection of malicious activity to the point of intrusion and, similarly, to work forward to fully understand the consequences. Applying provenance to APT monitoring and analysis introduces some significantly different challenges and requirements in comparison to more traditional applications. Drawing from our experiences working with and adapting the OPUS (Observed Provenance in User Space) system to an APT monitoring and analysis use case, we introduce and discuss some of the key challenges in this space. These preliminary observations are intended to prime a discussion within the community about the design space for scalable, efficient and trustworthy distributed provenance for scenarios that impose different constraints from traditional provenance applications such as workflow and data processing frameworks

Apollo (Cambridge)

From Here to Provtopia

Author: A Schreiber
G Coker
J Cheney
J Cheney
J Freire
L Carata
L Moreau
M Interlandi
P Alvaro
P Buneman
Ragib Hasan
RN Watson
SC Xu
ST King
T Garfinkel
T Jaeger
T Pasquier
T Pasquier
T Pasquier
Y Huang
Publication venue
Publication date: 30/08/2019
Field of study

Crossref

Explore Bristol Research

Data provenance to audit compliance with privacy policy in the Internet of Things

Author: BW Lampson
David Eyers
DF Barbieri
DJ Weitzner
DJ Weitzner
G Coker
J Bacon
J Gubbi
J Mineraud
J Singh
Jatinder Singh
Jean Bacon
JH Ziegeldorf
Julia Powles
L Carata
M Armbrust
M Interlandi
Margo Seltzer
RA Kemmerer
RH Weber
SL Keoh
T Jaeger
T Neumann
Thomas Pasquier
Y Xie
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/04/2018
Field of study

Managing privacy in the IoT presents a significant challenge. We make the case that information obtained by auditing the flows of data can assist in demonstrating that the systems handling personal data satisfy regulatory and user requirements. Thus, components handling personal data should be audited to demonstrate that their actions comply with all such policies and requirements. A valuable side-effect of this approach is that such an auditing process will highlight areas where technical enforcement has been incompletely or incorrectly specified. There is a clear role for technical assistance in aligning privacy policy enforcement mechanisms with data protection regulations. The first step necessary in producing technology to accomplish this alignment is to gather evidence of data flows. We describe our work producing, representing and querying audit data and discuss outstanding challenges.Engineering and Applied Science

Crossref

Harvard University - DASH

Explore Bristol Research