882 research outputs found
DataHub: Collaborative Data Science & Dataset Version Management at Scale
Relational databases have limited support for data collaboration, where teams
collaboratively curate and analyze large datasets. Inspired by software version
control systems like git, we propose (a) a dataset version control system,
giving users the ability to create, branch, merge, difference and search large,
divergent collections of datasets, and (b) a platform, DataHub, that gives
users the ability to perform collaborative data analysis building on this
version control system. We outline the challenges in providing dataset version
control at scale.Comment: 7 page
Scaling Causality Analysis for Production Systems.
Causality analysis reveals how program values influence each other.
It is important for debugging, optimizing, and understanding the execution of
programs. This thesis scales causality analysis to production systems
consisting of desktop and server applications as well as large-scale Internet
services. This enables developers to employ causality analysis to debug and
optimize complex, modern software systems. This thesis shows that it is
possible to scale causality analysis to both fine-grained instruction level
analysis and analysis of Internet scale distributed systems with thousands of
discrete software components by developing and employing automated methods to
observe and reason about causality.
First, we observe causality at a fine-grained instruction level by developing
the first taint tracking framework to support tracking millions of input
sources. We also introduce flexible taint tracking to allow
for scoping different queries and dynamic filtering of inputs, outputs, and
relationships.
Next, we introduce the Mystery Machine, which uses a ``big data'' approach to
discover causal relationships between software components in a large-scale
Internet service. We leverage the fact that large-scale Internet services
receive a large number of requests in order to observe counterexamples to
hypothesized causal relationships. Using discovered casual relationships, we
identify the critical path for request execution and use the critical path
analysis to explore potential scheduling optimizations.
Finally, we explore using causality to make data-quality tradeoffs in
Internet services. A data-quality tradeoff is an explicit decision by a software
component to return lower-fidelity data in order to improve response time or
minimize resource usage. We perform a study of data-quality tradeoffs in a
large-scale Internet service to show the pervasiveness of these
tradeoffs. We develop DQBarge, a system that enables better data-quality
tradeoffs by propagating critical information along the causal path of request
processing. Our evaluation shows that DQBarge helps Internet services mitigate
load spikes, improve utilization of spare resources, and implement dynamic
capacity planning.PHDComputer Science & EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/135888/1/mcchow_1.pd
Enforcement and Spectrum Sharing: Case Studies of Federal-Commercial Sharing
To promote economic growth and unleash the potential of wireless broadband, there is a need to introduce more spectrally efficient technologies and spectrum management regimes. That led to an environment where commercial wireless broadband need to share spectrum with the federal and non-federal operations. Implementing sharing regimes on a non-opportunistic basis means that sharing agreements must be implemented. To have meaning, those agreements must be enforceable.\ud
\ud
With the significant exception of license-free wireless systems, commercial wireless services are based on exclusive use. With the policy change facilitating spectrum sharing, it becomes necessary to consider how sharing might take place in practice. Beyond the technical aspects of sharing, that must be resolved lie questions about how usage rights are appropriately determined and enforced. This paper is reasoning about enforcement in a particular spectrum bands (1695-1710 MHz and 3.5 GHz) that are currently being proposed for sharing between commercial services and incumbent spectrum users in the US. We examine three enforcement approaches, exclusion zones, protection zones and pure ex post and consider their implications in terms of cost elements, opportunity cost, and their adaptability
Recommended from our members
Camflow: Managed Data-Sharing for Cloud Services
A model of cloud services is emerging whereby a few trusted providers manage the underlying hardware and communications whereas many companies build on this infrastructure to offer higher level, cloud-hosted PaaS services and/or SaaS applications. From the start, strong isolation between cloud tenants was seen to be of paramount importance, provided first by virtual machines (VM) and later by containers, which share the operating system (OS) kernel. Increasingly it is the case that applications also require facilities to effect isolation and protection of data managed by those applications. They also require flexible data sharing with other applications, often across the traditional cloud-isolation boundaries; for example, when government, consisting of different departments, provides services to its citizens through a common platform. These concerns relate to the management of data. Traditional access control is application and principal/role specific, applied at policy enforcement points, after which there is no subsequent control over where data flows;a crucial issue once data has left its owner's control by cloud-hosted applications andwithin cloud-services. Information Flow Control (IFC), in addition, offers system-wide, end-To-end, flow control based on the properties of the data. We discuss the potential of cloud-deployed IFC for enforcing owners' data flow policy with regard to protection and sharing, aswell as safeguarding against malicious or buggy software. In addition, the audit log associated with IFC provides transparency and offers system-wide visibility over data flows. This helps those responsible to meet their data management obligations, providing evidence of compliance, and aids in the identification ofpolicy errors and misconfigurations. We present our IFC model and describe and evaluate our IFC architecture and implementation (CamFlow). This comprises an OS level implementation of IFC with support for application management, together with an IFC-enabled middleware.This work was supported by UK Engineering and Physical Sciences Research Council grant EP/K011510 CloudSafetyNet: End-to-End Application Security in the Cloud. We acknowledge the support of Microsoft through the Microsoft Cloud Computing Research Centre
CamFlow: Managed Data-sharing for Cloud Services
A model of cloud services is emerging whereby a few trusted providers manage
the underlying hardware and communications whereas many companies build on this
infrastructure to offer higher level, cloud-hosted PaaS services and/or SaaS
applications. From the start, strong isolation between cloud tenants was seen
to be of paramount importance, provided first by virtual machines (VM) and
later by containers, which share the operating system (OS) kernel. Increasingly
it is the case that applications also require facilities to effect isolation
and protection of data managed by those applications. They also require
flexible data sharing with other applications, often across the traditional
cloud-isolation boundaries; for example, when government provides many related
services for its citizens on a common platform. Similar considerations apply to
the end-users of applications. But in particular, the incorporation of cloud
services within `Internet of Things' architectures is driving the requirements
for both protection and cross-application data sharing.
These concerns relate to the management of data. Traditional access control
is application and principal/role specific, applied at policy enforcement
points, after which there is no subsequent control over where data flows; a
crucial issue once data has left its owner's control by cloud-hosted
applications and within cloud-services. Information Flow Control (IFC), in
addition, offers system-wide, end-to-end, flow control based on the properties
of the data. We discuss the potential of cloud-deployed IFC for enforcing
owners' dataflow policy with regard to protection and sharing, as well as
safeguarding against malicious or buggy software. In addition, the audit log
associated with IFC provides transparency, giving configurable system-wide
visibility over data flows. [...]Comment: 14 pages, 8 figure
Towards automated provenance collection for runtime models to record system history
In highly dynamic environments, systems are expected to make decisions on the fly based on their observations that are bound to be partial. As such, the reasons for its runtime behaviour may be difficult to understand. In these cases, accountability is crucial, and decisions by the system need to be traceable. Logging is essential to support explanations of behaviour, but it poses challenges. Concerns about analysing massive logs have motivated the introduction of structured logging, however, knowing what to log and which details to include is still a challenge. Structured logs still do not necessarily relate events to each other, or indicate time intervals. We argue that logging changes to a runtime model in a provenance graph can mitigate some of these problems. The runtime model keeps only relevant details, therefore reducing the volume of the logs, while the provenance graph records causal connections between the changes and the activities performed by the agents in the system that have introduced them. In this paper, we demonstrate a first version towards a reusable infrastructure for the automated construction of such a provenance graph. We apply it to a multithreaded traffic simulation case study, with multiple concurrent agents managing different parts of the simulation. We show how the provenance graphs can support validating the system behaviour, and how a seeded fault is reflected in the provenance graphs
- …