15,670 research outputs found
PRETZEL: Opening the Black Box of Machine Learning Prediction Serving Systems
Machine Learning models are often composed of pipelines of transformations.
While this design allows to efficiently execute single model components at
training time, prediction serving has different requirements such as low
latency, high throughput and graceful performance degradation under heavy load.
Current prediction serving systems consider models as black boxes, whereby
prediction-time-specific optimizations are ignored in favor of ease of
deployment. In this paper, we present PRETZEL, a prediction serving system
introducing a novel white box architecture enabling both end-to-end and
multi-model optimizations. Using production-like model pipelines, our
experiments show that PRETZEL is able to introduce performance improvements
over different dimensions; compared to state-of-the-art approaches PRETZEL is
on average able to reduce 99th percentile latency by 5.5x while reducing memory
footprint by 25x, and increasing throughput by 4.7x.Comment: 16 pages, 14 figures, 13th USENIX Symposium on Operating Systems
Design and Implementation (OSDI), 201
Improving the Scalability of DPWS-Based Networked Infrastructures
The Devices Profile for Web Services (DPWS) specification enables seamless
discovery, configuration, and interoperability of networked devices in various
settings, ranging from home automation and multimedia to manufacturing
equipment and data centers. Unfortunately, the sheer simplicity of event
notification mechanisms that makes it fit for resource-constrained devices,
makes it hard to scale to large infrastructures with more stringent
dependability requirements, ironically, where self-configuration would be most
useful. In this report, we address this challenge with a proposal to integrate
gossip-based dissemination in DPWS, thus maintaining compatibility with
original assumptions of the specification, and avoiding a centralized
configuration server or custom black-box middleware components. In detail, we
show how our approach provides an evolutionary and non-intrusive solution to
the scalability limitations of DPWS and experimentally evaluate it with an
implementation based on the the Web Services for Devices (WS4D) Java Multi
Edition DPWS Stack (JMEDS).Comment: 28 pages, Technical Repor
Harnessing the Power of Many: Extensible Toolkit for Scalable Ensemble Applications
Many scientific problems require multiple distinct computational tasks to be
executed in order to achieve a desired solution. We introduce the Ensemble
Toolkit (EnTK) to address the challenges of scale, diversity and reliability
they pose. We describe the design and implementation of EnTK, characterize its
performance and integrate it with two distinct exemplar use cases: seismic
inversion and adaptive analog ensembles. We perform nine experiments,
characterizing EnTK overheads, strong and weak scalability, and the performance
of two use case implementations, at scale and on production infrastructures. We
show how EnTK meets the following general requirements: (i) implementing
dedicated abstractions to support the description and execution of ensemble
applications; (ii) support for execution on heterogeneous computing
infrastructures; (iii) efficient scalability up to O(10^4) tasks; and (iv)
fault tolerance. We discuss novel computational capabilities that EnTK enables
and the scientific advantages arising thereof. We propose EnTK as an important
addition to the suite of tools in support of production scientific computing
Cloudy Forecast: How Predictable is Communication Latency in the Cloud?
Many systems and services rely on timing assumptions for performance and
availability to perform critical aspects of their operation, such as various
timeouts for failure detectors or optimizations to concurrency control
mechanisms. Many such assumptions rely on the ability of different components
to communicate on time -- a delay in communication may trigger the failure
detector or cause the system to enter a less-optimized execution mode.
Unfortunately, these timing assumptions are often set with little regard to
actual communication guarantees of the underlying infrastructure -- in
particular, the variability of communication delays between processes in
different nodes/servers. The higher communication variability holds especially
true for systems deployed in the public cloud since the cloud is a utility
shared by many users and organizations, making it prone to higher performance
variance due to noisy neighbor syndrome. In this work, we present Cloud Latency
Tester (CLT), a simple tool that can help measure the variability of
communication delays between nodes to help engineers set proper values for
their timing assumptions. We also provide our observational analysis of running
CLT in three major cloud providers and share the lessons we learned
An objective based classification of aggregation techniques for wireless sensor networks
Wireless Sensor Networks have gained immense popularity in recent years due to their ever increasing capabilities and wide range of critical applications. A huge body of research efforts has been dedicated to find ways to utilize limited resources of these sensor nodes in an efficient manner. One of the common ways to minimize energy consumption has been aggregation of input data. We note that every aggregation technique has an improvement objective to achieve with respect to the output it produces. Each technique is designed to achieve some target e.g. reduce data size, minimize transmission energy, enhance accuracy etc. This paper presents a comprehensive survey of aggregation techniques that can be used in distributed manner to improve lifetime and energy conservation of wireless sensor networks. Main contribution of this work is proposal of a novel classification of such techniques based on the type of improvement they offer when applied to WSNs. Due to the existence of a myriad of definitions of aggregation, we first review the meaning of term aggregation that can be applied to WSN. The concept is then associated with the proposed classes. Each class of techniques is divided into a number of subclasses and a brief literature review of related work in WSN for each of these is also presented
Run Time Approximation of Non-blocking Service Rates for Streaming Systems
Stream processing is a compute paradigm that promises safe and efficient
parallelism. Modern big-data problems are often well suited for stream
processing's throughput-oriented nature. Realization of efficient stream
processing requires monitoring and optimization of multiple communications
links. Most techniques to optimize these links use queueing network models or
network flow models, which require some idea of the actual execution rate of
each independent compute kernel within the system. What we want to know is how
fast can each kernel process data independent of other communicating kernels.
This is known as the "service rate" of the kernel within the queueing
literature. Current approaches to divining service rates are static. Modern
workloads, however, are often dynamic. Shared cloud systems also present
applications with highly dynamic execution environments (multiple users,
hardware migration, etc.). It is therefore desirable to continuously re-tune an
application during run time (online) in response to changing conditions. Our
approach enables online service rate monitoring under most conditions,
obviating the need for reliance on steady state predictions for what are
probably non-steady state phenomena. First, some of the difficulties associated
with online service rate determination are examined. Second, the algorithm to
approximate the online non-blocking service rate is described. Lastly, the
algorithm is implemented within the open source RaftLib framework for
validation using a simple microbenchmark as well as two full streaming
applications.Comment: technical repor
- …