9,948 research outputs found
Persistent Buffer Management with Optimistic Consistency
Finding the best way to leverage non-volatile memory (NVM) on modern database
systems is still an open problem. The answer is far from trivial since the
clear boundary between memory and storage present in most systems seems to be
incompatible with the intrinsic memory-storage duality of NVM. Rather than
treating NVM either solely as memory or solely as storage, in this work we
propose how NVM can be simultaneously used as both in the context of modern
database systems. We design a persistent buffer pool on NVM, enabling pages to
be directly read/written by the CPU (like memory) while recovering corrupted
pages after a failure (like storage). The main benefits of our approach are an
easy integration in the existing database architectures, reduced costs (by
replacing DRAM with NVM), and faster peak-performance recovery
Leveraging Non-Volatile Memory in Modern Storage Management Architectures
Non-volatile memory technologies (NVM) introduce a novel class of devices that combine characteristics of both storage and main memory. Like storage, NVM is not only persistent, but also denser and cheaper than DRAM. Like DRAM, NVM is byte-addressable and has lower access latency. In recent years, NVM has gained a lot of attention both in academia and in the data management industry, with views ranging from skepticism to over excitement. Some critics claim that NVM is not cheap enough to replace flash-based SSDs nor is it fast enough to replace DRAM, while others see it simply as a storage device. Supporters of NVM have observed that its low latency and byte-addressability requires radical changes and a complete rewrite of storage management architectures.
This thesis takes a moderate stance between these two views. We consider that, while NVM might not replace flash-based SSD or DRAM in the near future, it has the potential to reduce the gap between them. Furthermore, treating NVM as a regular storage media does not fully leverage its byte-addressability and low latency. On the other hand, completely redesigning systems to be NVM-centric is impractical. Proposals that attempt to leverage NVM to simplify storage management result in completely new architectures that face the same challenges that are already well-understood and addressed by the traditional architectures. Therefore, we take three common storage management architectures as a starting point, and propose incremental changes to enable them to better leverage NVM. First, in the context of log-structured merge-trees, we investigate the impact of storing data in NVM, and devise methods to enable small granularity accesses and NVM-aware caching policies. Second, in the context of B+Trees, we propose to extend the buffer pool and describe a technique based on the concept of optimistic consistency to handle corrupted pages in NVM. Third, we employ NVM to enable larger capacity and reduced costs in a index+log key-value store, and combine it with other techniques to build a system that achieves low tail latency. This thesis aims to describe and evaluate these techniques in order to enable storage management architectures to leverage NVM and achieve increased performance and lower costs, without major architectural changes.:1 Introduction
1.1 Non-Volatile Memory
1.2 Challenges
1.3 Non-Volatile Memory & Database Systems
1.4 Contributions and Outline
2 Background
2.1 Non-Volatile Memory
2.1.1 Types of NVM
2.1.2 Access Modes
2.1.3 Byte-addressability and Persistency
2.1.4 Performance
2.2 Related Work
2.3 Case Study: Persistent Tree Structures
2.3.1 Persistent Trees
2.3.2 Evaluation
3 Log-Structured Merge-Trees
3.1 LSM and NVM
3.2 LSM Architecture
3.2.1 LevelDB
3.3 Persistent Memory Environment
3.4 2Q Cache Policy for NVM
3.5 Evaluation
3.5.1 Write Performance
3.5.2 Read Performance
3.5.3 Mixed Workloads
3.6 Additional Case Study: RocksDB
3.6.1 Evaluation
4 B+Trees
4.1 B+Tree and NVM
4.1.1 Category #1: Buffer Extension
4.1.2 Category #2: DRAM Buffered Access
4.1.3 Category #3: Persistent Trees
4.2 Persistent Buffer Pool with Optimistic Consistency
4.2.1 Architecture and Assumptions
4.2.2 Embracing Corruption
4.3 Detecting Corruption
4.3.1 Embracing Corruption
4.4 Repairing Corruptions
4.5 Performance Evaluation and Expectations
4.5.1 Checksums Overhead
4.5.2 Runtime and Recovery
4.6 Discussion
5 Index+Log Key-Value Stores
5.1 The Case for Tail Latency
5.2 Goals and Overview
5.3 Execution Model
5.3.1 Reactive Systems and Actor Model
5.3.2 Message-Passing Communication
5.3.3 Cooperative Multitasking
5.4 Log-Structured Storage
5.5 Networking
5.6 Implementation Details
5.6.1 NVM Allocation on RStore
5.6.2 Log-Structured Storage and Indexing
5.6.3 Garbage Collection
5.6.4 Logging and Recovery
5.7 Systems Operations
5.8 Evaluation
5.8.1 Methodology
5.8.2 Environment
5.8.3 Other Systems
5.8.4 Throughput Scalability
5.8.5 Tail Latency
5.8.6 Scans
5.8.7 Memory Consumption
5.9 Related Work
6 Conclusion
Bibliography
A PiBenc
Assise: Performance and Availability via NVM Colocation in a Distributed File System
The adoption of very low latency persistent memory modules (PMMs) upends the
long-established model of disaggregated file system access. Instead, by
colocating computation and PMM storage, we can provide applications much higher
I/O performance, sub-second application failover, and strong consistency. To
demonstrate this, we built the Assise distributed file system, based on a
persistent, replicated coherence protocol for managing a set of
server-colocated PMMs as a fast, crash-recoverable cache between applications
and slower disaggregated storage, such as SSDs. Unlike disaggregated file
systems, Assise maximizes locality for all file IO by carrying out IO on
colocated PMM whenever possible and minimizes coherence overhead by maintaining
consistency at IO operation granularity, rather than at fixed block sizes.
We compare Assise to Ceph/Bluestore, NFS, and Octopus on a cluster with Intel
Optane DC PMMs and SSDs for common cloud applications and benchmarks, such as
LevelDB, Postfix, and FileBench. We find that Assise improves write latency up
to 22x, throughput up to 56x, fail-over time up to 103x, and scales up to 6x
better than its counterparts, while providing stronger consistency semantics.
Assise promises to beat the MinuteSort world record by 1.5x
Designing a commutative replicated data type
Commuting operations greatly simplify consistency in distributed systems.
This paper focuses on designing for commutativity, a topic neglected
previously. We show that the replicas of \emph{any} data type for which
concurrent operations commute converges to a correct value, under some simple
and standard assumptions. We also show that such a data type supports
transactions with very low cost. We identify a number of approaches and
techniques to ensure commutativity. We re-use some existing ideas
(non-destructive updates coupled with invariant identification), but propose a
much more efficient implementation. Furthermore, we propose a new technique,
background consensus. We illustrate these ideas with a shared edit buffer data
type
Improving the Performance and Endurance of Persistent Memory with Loose-Ordering Consistency
Persistent memory provides high-performance data persistence at main memory.
Memory writes need to be performed in strict order to satisfy storage
consistency requirements and enable correct recovery from system crashes.
Unfortunately, adhering to such a strict order significantly degrades system
performance and persistent memory endurance. This paper introduces a new
mechanism, Loose-Ordering Consistency (LOC), that satisfies the ordering
requirements at significantly lower performance and endurance loss. LOC
consists of two key techniques. First, Eager Commit eliminates the need to
perform a persistent commit record write within a transaction. We do so by
ensuring that we can determine the status of all committed transactions during
recovery by storing necessary metadata information statically with blocks of
data written to memory. Second, Speculative Persistence relaxes the write
ordering between transactions by allowing writes to be speculatively written to
persistent memory. A speculative write is made visible to software only after
its associated transaction commits. To enable this, our mechanism supports the
tracking of committed transaction ID and multi-versioning in the CPU cache. Our
evaluations show that LOC reduces the average performance overhead of memory
persistence from 66.9% to 34.9% and the memory write traffic overhead from
17.1% to 3.4% on a variety of workloads.Comment: This paper has been accepted by IEEE Transactions on Parallel and
Distributed System
LogBase: A Scalable Log-structured Database System in the Cloud
Numerous applications such as financial transactions (e.g., stock trading)
are write-heavy in nature. The shift from reads to writes in web applications
has also been accelerating in recent years. Write-ahead-logging is a common
approach for providing recovery capability while improving performance in most
storage systems. However, the separation of log and application data incurs
write overheads observed in write-heavy environments and hence adversely
affects the write throughput and recovery time in the system. In this paper, we
introduce LogBase - a scalable log-structured database system that adopts
log-only storage for removing the write bottleneck and supporting fast system
recovery. LogBase is designed to be dynamically deployed on commodity clusters
to take advantage of elastic scaling property of cloud environments. LogBase
provides in-memory multiversion indexes for supporting efficient access to data
maintained in the log. LogBase also supports transactions that bundle read and
write operations spanning across multiple records. We implemented the proposed
system and compared it with HBase and a disk-based log-structured
record-oriented system modeled after RAMCloud. The experimental results show
that LogBase is able to provide sustained write throughput, efficient data
access out of the cache, and effective system recovery.Comment: VLDB201
An Architecture for Multi-User Software Development Environments
We present an architecture for multi-user software development environments, covering general, process-centered and rule-based MUSDEs. Our architecture is founded on componentization, with particular concern for the capability to replace the synchronization component - to allow experimentation with novel concurrency control mechanisms - with minimal effects on other components while still supporting integration. The architecture has been implemented in the MARVEL SD
Research issues in real-time database systems
Cataloged from PDF version of article.Today's real-time systems are characterized by managing large volumes of data.
Efficient database management algorithms for accessing and manipulating data are
required to satisfy timing constraints of supported applications. Real-time database
systems involve a new research area investigating possible ways of applying database
systems technology to real-time systems. Management of real-time information through
a database system requires the integration of concepts from both real-time systems and
database systems. Some new criteria need to be developed to involve timing constraints
of real-time applications in many database systems design issues, such as
transaction/query processing, data buffering, CPU, and IO scheduling. In this paper, a
basic understanding of the issues in real-time database systems is provided and the
research efforts in this area are introduced. Different approaches to various problems of
real-time database systems are briefly described, and possible future research directions
are discussed
- …