47 research outputs found
F2: Designing a Key-Value Store for Large Skewed Workloads
Today's key-value stores are either disk-optimized, focusing on large data
and saturating device IOPS, or memory-optimized, focusing on high throughput
with linear thread scaling assuming plenty of main memory. However, many
practical workloads demand high performance for read and write working sets
that are much larger than main memory, over a total data size that is even
larger. They require judicious use of memory and disk, and today's systems do
not handle such workloads well. We present F2, a new key-value store design
based on compartmentalization -- it consists of five key components that work
together in well-defined ways to achieve high throughput -- saturating disk and
memory bandwidths -- while incurring low disk read and write amplification. A
key design characteristic of F2 is that it separates the management of hot and
cold data, across the read and write domains, and adapts the use of memory to
optimize each case. Through a sequence of new latch-free system constructs, F2
solves the key challenge of maintaining high throughput with linear thread
scalability in such a compartmentalized design. Detailed experiments on
benchmark data validate our design's superiority, in terms of throughput, over
state-of-the-art key-value stores, when the available memory resources are
scarce
Blox: A Modular Toolkit for Deep Learning Schedulers
Deep Learning (DL) workloads have rapidly increased in popularity in
enterprise clusters and several new cluster schedulers have been proposed in
recent years to support these workloads. With rapidly evolving DL workloads, it
is challenging to quickly prototype and compare scheduling policies across
workloads. Further, as prior systems target different aspects of scheduling
(resource allocation, placement, elasticity etc.), it is also challenging to
combine these techniques and understand the overall benefits. To address these
challenges we propose Blox, a modular toolkit which allows developers to
compose individual components and realize diverse scheduling frameworks. We
identify a set of core abstractions for DL scheduling, implement several
existing schedulers using these abstractions, and verify the fidelity of these
implementations by reproducing results from prior research. We also highlight
how we can evaluate and compare existing schedulers in new settings: different
workload traces, higher cluster load, change in DNN workloads and deployment
characteristics. Finally, we showcase Blox's extensibility by composing
policies from different schedulers, and implementing novel policies with
minimal code changes. Blox is available at
\url{https://github.com/msr-fiddle/blox}.Comment: To be presented at Eurosys'2
Move Fast and Meet Deadlines: Fine-grained Real-time Stream Processing with Cameo
Resource provisioning in multi-tenant stream processing systems faces the
dual challenges of keeping resource utilization high (without
over-provisioning), and ensuring performance isolation. In our common
production use cases, where streaming workloads have to meet latency targets
and avoid breaching service-level agreements, existing solutions are incapable
of handling the wide variability of user needs. Our framework called Cameo uses
fine-grained stream processing (inspired by actor computation models), and is
able to provide high resource utilization while meeting latency targets. Cameo
dynamically calculates and propagates priorities of events based on user
latency targets and query semantics. Experiments on Microsoft Azure show that
compared to state-of-the-art, the Cameo framework: i) reduces query latency by
2.7X in single tenant settings, ii) reduces query latency by 4.6X in
multi-tenant scenarios, and iii) weathers transient spikes of workload