165 research outputs found
PrismDB: Read-aware Log-structured Merge Trees for Heterogeneous Storage
In recent years, emerging hardware storage technologies have focused on
divergent goals: better performance or lower cost-per-bit of storage.
Correspondingly, data systems that employ these new technologies are optimized
either to be fast (but expensive) or cheap (but slow). We take a different
approach: by combining multiple tiers of fast and low-cost storage technologies
within the same system, we can achieve a Pareto-efficient balance between
performance and cost-per-bit.
This paper presents the design and implementation of PrismDB, a novel
log-structured merge tree based key-value store that exploits a full spectrum
of heterogeneous storage technologies (from 3D XPoint to QLC NAND). We
introduce the notion of "read-awareness" to log-structured merge trees, which
allows hot objects to be pinned to faster storage, achieving better tiering and
hot-cold separation of objects. Compared to the standard use of RocksDB on
flash in datacenters today, PrismDB's average throughput on heterogeneous
storage is 2.3 faster and its tail latency is more than an order of
magnitude better, using hardware than is half the cost
No Provisioned Concurrency: Fast RDMA-codesigned Remote Fork for Serverless Computing
Serverless platforms essentially face a tradeoff between container startup
time and provisioned concurrency (i.e., cached instances), which is further
exaggerated by the frequent need for remote container initialization. This
paper presents MITOSIS, an operating system primitive that provides fast remote
fork, which exploits a deep codesign of the OS kernel with RDMA. By leveraging
the fast remote read capability of RDMA and partial state transfer across
serverless containers, MITOSIS bridges the performance gap between local and
remote container initialization. MITOSIS is the first to fork over 10,000 new
containers from one instance across multiple machines within a second, while
allowing the new containers to efficiently transfer the pre-materialized states
of the forked one. We have implemented MITOSIS on Linux and integrated it with
FN, a popular serverless platform. Under load spikes in real-world serverless
workloads, MITOSIS reduces the function tail latency by 89% with orders of
magnitude lower memory usage. For serverless workflow that requires state
transfer, MITOSIS improves its execution time by 86%.Comment: To appear in OSDI'2
Replicating Persistent Memory Key-Value Stores with Efficient RDMA Abstraction
Combining persistent memory (PM) with RDMA is a promising approach to
performant replicated distributed key-value stores (KVSs). However, existing
replication approaches do not work well when applied to PM KVSs: 1) Using RPC
induces software queueing and execution at backups, increasing request latency;
2) Using one-sided RDMA WRITE causes many streams of small PM writes, leading
to severe device-level write amplification (DLWA) on PM. In this paper, we
propose Rowan, an efficient RDMA abstraction to handle replication writes in PM
KVSs; it aggregates concurrent remote writes from different servers, and lands
these writes to PM in a sequential (thus low DLWA) and one-sided (thus low
latency) manner. We realize Rowan with off-the-shelf RDMA NICs. Further, we
build Rowan-KV, a log-structured PM KVS using Rowan for replication. Evaluation
shows that under write-intensive workloads, compared with PM KVSs using RPC and
RDMA WRITE for replication, Rowan-KV boosts throughput by 1.22X and 1.39X as
well as lowers median PUT latency by 1.77X and 2.11X, respectively, while
largely eliminating DLWA.Comment: Accepted to OSDI 202
- …