77 research outputs found
Recommended from our members
MrLazy: Lazy runtime label propagation for MapReduce
Organisations are starting to publish datasets containing potentially sensitive information in the Cloud; hence it is important there is a clear audit trail to show that involved parties are respecting data sharing laws and policies. Information Flow Control (IFC) has been proposed as a solution. However, fine-grained IFC has various deployment challenges and runtime overhead issues that have limited wide adoptation so far. In this paper we present MrLazy, a system that practically addresses some of these issues for MapReduce. Within one trust domain, we relax the need of continuously checking policies. We instead rely on lineage (information about the origin of a piece of data) as a mechanism to retrospectively apply policies on-demand. We show that MrLazy imposes manageable temporal and spatial overheads while enabling fine-grained data regulation
Recommended from our members
Accelerating the Configuration Tuning of Big Data Analytics with Similarity-aware Multitask Bayesian Optimization
One of the key challenges for data analytics deployment is configuration tuning. The existing approaches for configuration tuning are expensive and overlook the dynamic characteristics of the analytics environment (i.e. frequent changes in workload due to receiving evolving input sizes or change in the underlying cluster environment). Such workload/environment changes can cause significant performance degradation, with retuning the configuration to accommodate those changes can yield up to 85\% potential execution time saving.
We propose SimTune, an approach that accommodates such changes through efficient configuration tuning.
SimTune combines
workload characterization and Multitask Bayesian optimization to identify similarity across workloads and accelerate finding near-optimal configurations. Our experimental results show that SimTune reduces the search time for finding
close-to-optimal configurations by 56-73\% (at the median) when compared to existing state-of-the-art techniques. This means that the
amortization of the tuning cost happens significantly faster, enabling
practical tuning in the rapidly changing environment of distributed analytics.Google Cloud, Amazon AW
To Tune or Not to Tune?: In Search of Optimal Configurations for Data Analytics
This experimental study presents several overlooked issues that pose a challenge for data analytics configuration tuning and deployment. These issues include: 1) the assumption of static workload/environment ignoring the dynamic characteristics of the analytics environment (e.g. the frequent need for workload retuning). 2) the speed of tuning cost amortization and how this influences the tuning decision. 3) the need for a comprehensive incremental tuning for a diverse set of workloads.
To prove our point, we present Tuneful, an efficient configuration tuning framework for data analytics. We show how it is designed to overcome the above issues and illustrate its applicability by experimenting with it on two cloud service providers
Recommended from our members
Shadow kernels: A general mechanism for kernel specialization in existing operating systems
Existing operating systems share a common kernel text section amongst all processes. It is not possible to perform kernel specialization or tuning such that different applications execute text optimized for their kernel use despite the benefits of kernel specialization for performance guided optimization, exokernels, kernel fastpaths, and cheaper hardware access. Current specialization primitives involve system wide changes to kernel text, which can have adverse effects on other processes sharing the kernel due to the global side-effects. We present shadow kernels: a primitive that allows multiple kernel text sections to coexist in a contemporary operating system. By remapping kernel virtual memory on a context-switch, or for individual system calls, we specialize the kernel on a fine-grained basis. Our implementation of shadow kernels uses the Xen hypervisor so can be applied to any operating system that runs on Xen.This work was principally supported by internal funds from the Computer Laboratory at the University of Cambridge; and also by the Engineering and Physical Sciences Research Council [grant number EP/K503009/1].This is the final version of the article. It first appeared from ACM via http://dx.doi.org/10.1145/2797022.279702
Structural analysis of whole-system provenance graphs
System based provenance generates traces captured from
various systems, a representation method for inferring these traces is
a graph. These graphs are not well understood, and current work focuses
on their extraction and processing, without a thorough characterization
being in place. This paper studies the topology of such graphs. We an-
alyze multiple Whole-system-Provenance graphs and present that they
have hubs-and-authorities model of graphs as well as a power law distri-
bution. Our observations allow for a novel understanding of the structure
of Whole-system-Provenance graphs.DARP
A primer on provenance
Better understanding data requires tracking its history and context.</jats:p
Recommended from our members
Soroban: Attributing latency in virtualized environments
Applications running in the cloud have highly-variable response times due to the lack of perfect performance isolation from other services served by common infrastructure. In particular, response latency when executing on a loaded hypervisor or in a container is substantially higher than uncontested bare-metal performance. Whilst efforts to increase performance isolation continue, we present Soroban, a framework for attributing latency to either the cloud provider or their customer. Soroban allows cloud providers to instrument commonly used programs, such as a web server to determine, for each request, how much of the latency is due to the cloud provider, or the consumer. We apply Soroban to a HTTP server and show that it identifies when the cause of latency is due to a provider-induced activity, such as underprovisioning a host, or due to the software run by the customer.This is the author accepted manuscript. The final version is available from USENIX. via https://www.usenix.org/conference/hotcloud15/workshop-program/presentation/sne
Recommended from our members
Applying provenance in APT monitoring and analysis: Practical challenges for scalable, efficient and trustworthy distributed provenance
Advanced Persistent Threats (APT) are a class of security threats in which a well-resourced attacker targets a specific individual or organisation with a predefined goal. This typically involves exfiltration of confidential material, although increasingly attacks target the encryption or destruction of mission critical data. With traditional prevention and detection mechanisms failing to stem the tide of such attacks, there is a pressing need for new monitoring and analysis tools that reduce both false-positive rates and the cognitive burden on human analysts. We propose that local and distributed provenance metadata can simplify and improve monitoring and analysis of APTs by providing a single, authoritative sequence of events that captures the context (and side effects) of potentially malicious activities. Provenance metadata allows a human analyst to backtrack from detection of malicious activity to the point of intrusion and, similarly, to work forward to fully understand the consequences. Applying provenance to APT monitoring and analysis introduces some significantly different challenges and requirements in comparison to more traditional applications. Drawing from our experiences working with and adapting the OPUS (Observed Provenance in User Space) system to an APT monitoring and analysis use case, we introduce and discuss some of the key challenges in this space. These preliminary observations are intended to prime a discussion within the community about the design space for scalable, efficient and trustworthy distributed provenance for scenarios that impose different constraints from traditional provenance applications such as workflow and data processing frameworks
Data provenance to audit compliance with privacy policy in the Internet of Things
Managing privacy in the IoT presents a significant challenge. We make the case that information obtained by auditing the flows of data can assist in demonstrating that the systems handling personal data satisfy regulatory and user requirements. Thus, components handling personal data should be audited to demonstrate that their actions comply with all such policies and requirements. A valuable side-effect of this approach is that such an auditing process will highlight areas where technical enforcement has been incompletely or incorrectly specified. There is a clear role for technical assistance in aligning privacy policy enforcement mechanisms with data protection regulations. The
first step necessary in producing technology to accomplish this alignment is to gather evidence of data flows. We describe our work producing, representing and querying audit data and discuss outstanding challenges.Engineering and Applied Science
- …