118 research outputs found
BestConfig: Tapping the Performance Potential of Systems via Automatic Configuration Tuning
An ever increasing number of configuration parameters are provided to system
users. But many users have used one configuration setting across different
workloads, leaving untapped the performance potential of systems. A good
configuration setting can greatly improve the performance of a deployed system
under certain workloads. But with tens or hundreds of parameters, it becomes a
highly costly task to decide which configuration setting leads to the best
performance. While such task requires the strong expertise in both the system
and the application, users commonly lack such expertise.
To help users tap the performance potential of systems, we present
BestConfig, a system for automatically finding a best configuration setting
within a resource limit for a deployed system under a given application
workload. BestConfig is designed with an extensible architecture to automate
the configuration tuning for general systems. To tune system configurations
within a resource limit, we propose the divide-and-diverge sampling method and
the recursive bound-and-search algorithm. BestConfig can improve the throughput
of Tomcat by 75%, that of Cassandra by 63%, that of MySQL by 430%, and reduce
the running time of Hive join job by about 50% and that of Spark join job by
about 80%, solely by configuration adjustment
APUS: Fast and Scalable PAXOS on RDMA
State machine replication (SMR) uses Paxos to enforce the same
inputs for a program (e.g., Redis) replicated on a number of hosts,
tolerating various types of failures. Unfortunately, traditional Paxos
protocols incur prohibitive performance overhead on server programs
due to their high consensus latency on TCP/IP. Worse, the
consensus latency of extant Paxos protocols increases drastically
when more concurrent client connections or hosts are added. This
paper presents APUS, the first RDMA-based Paxos protocol that
aims to be fast and scalable to client connections and hosts. APUS
intercepts inbound socket calls of an unmodified server program,
assigns a total order for all input requests, and uses fast RDMA
primitives to replicate these requests concurrently.
We evaluated APUS on nine widely-used server programs (e.g.,
Redis and MySQL). APUS incurred a mean overhead of 4.3% in
response time and 4.2% in throughput. We integrated APUS with an
SMR system Calvin. Our Calvin-APUS integration was 8.2X faster
than the extant Calvin-ZooKeeper integration. The consensus
latency of APUS outperformed an RDMA-based consensus protocol
by 4.9X. APUS source code and raw results are released on github.
com/hku-systems/apus.published_or_final_versio
Practical whole-system provenance capture
Data provenance describes how data came to be in its present form. It includes data sources and the transformations that have been applied to them. Data provenance has many uses, from forensics and security to aiding the reproducibility of scientific experiments. We present CamFlow, a whole-system provenance capture mechanism that integrates easily into a PaaS offering. While there have been several prior whole-system provenance systems that captured a comprehensive, systemic and ubiquitous record of a system’s behavior, none have been widely adopted. They either A) impose too much overhead, B) are designed for long-outdated kernel releases and are hard to port to current systems, C) generate too much data, or D) are designed for a single system. CamFlow addresses these shortcoming by: 1) leveraging the latest kernel design advances to achieve efficiency; 2) using a self-contained, easily maintainable implementation relying on a Linux Security Module, NetFilter, and other existing kernel facilities; 3) providing a mechanism to tailor the captured provenance data to the needs of the application; and 4) making it easy to integrate provenance across distributed systems. The provenance we capture is streamed and consumed by tenant-built auditor applications. We illustrate the usability of our implementation by describing three such applications: demonstrating compliance with data regulations; performing fault/intrusion detection; and implementing data loss prevention. We also show how CamFlow can be leveraged to capture meaningful provenance without modifying existing applications.Engineering and Applied Science
Carbon Containers: A System-level Facility for Managing Application-level Carbon Emissions
To reduce their environmental impact, cloud datacenters' are increasingly
focused on optimizing applications' carbon-efficiency, or work done per mass of
carbon emitted. To facilitate such optimizations, we present Carbon Containers,
a simple system-level facility, which extends prior work on power containers,
that automatically regulates applications' carbon emissions in response to
variations in both their workload's intensity and their energy's
carbon-intensity. Specifically, \carbonContainerS enable applications to
specify a maximum carbon emissions rate (in gCOe/hr), and then
transparently enforce this rate via a combination of vertical scaling,
container migration, and suspend/resume while maximizing either
energy-efficiency or performance.
Carbon Containers are especially useful for applications that i) must
continue running even during high-carbon periods, and ii) execute in regions
with few variations in carbon-intensity. These low-variability regions also
tend to have high average carbon-intensity, which increases the importance of
regulating carbon emissions. We implement a Carbon Containers prototype by
extending Linux Containers to incorporate the mechanisms above and evaluate it
using real workload traces and carbon-intensity data from multiple regions. We
compare Carbon Containers with prior work that regulates carbon emissions by
suspending/resuming applications during high/low carbon periods. We show that
Carbon Containers are more carbon-efficient and improve performance while
maintaining similar carbon emissions.Comment: ACM Symposium on Cloud Computing (SoCC
No Provisioned Concurrency: Fast RDMA-codesigned Remote Fork for Serverless Computing
Serverless platforms essentially face a tradeoff between container startup
time and provisioned concurrency (i.e., cached instances), which is further
exaggerated by the frequent need for remote container initialization. This
paper presents MITOSIS, an operating system primitive that provides fast remote
fork, which exploits a deep codesign of the OS kernel with RDMA. By leveraging
the fast remote read capability of RDMA and partial state transfer across
serverless containers, MITOSIS bridges the performance gap between local and
remote container initialization. MITOSIS is the first to fork over 10,000 new
containers from one instance across multiple machines within a second, while
allowing the new containers to efficiently transfer the pre-materialized states
of the forked one. We have implemented MITOSIS on Linux and integrated it with
FN, a popular serverless platform. Under load spikes in real-world serverless
workloads, MITOSIS reduces the function tail latency by 89% with orders of
magnitude lower memory usage. For serverless workflow that requires state
transfer, MITOSIS improves its execution time by 86%.Comment: To appear in OSDI'2
- …