Search CORE

130,227 research outputs found

Persistent Buffer Management with Optimistic Consistency

Author: Lehner Wolfgang
Lersch Lucas
Oukid Ismail
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2019
Field of study

Finding the best way to leverage non-volatile memory (NVM) on modern database systems is still an open problem. The answer is far from trivial since the clear boundary between memory and storage present in most systems seems to be incompatible with the intrinsic memory-storage duality of NVM. Rather than treating NVM either solely as memory or solely as storage, in this work we propose how NVM can be simultaneously used as both in the context of modern database systems. We design a persistent buffer pool on NVM, enabling pages to be directly read/written by the CPU (like memory) while recovering corrupted pages after a failure (like storage). The main benefits of our approach are an easy integration in the existing database architectures, reduced costs (by replacing DRAM with NVM), and faster peak-performance recovery

arXiv.org e-Print Archive

Crossref

Architectural Principles for Database Systems on Storage-Class Memory

Author: Oukid Ismail
Publication venue
Publication date: 05/12/2017
Field of study

Database systems have long been optimized to hide the higher latency of storage media, yielding complex persistence mechanisms. With the advent of large DRAM capacities, it became possible to keep a full copy of the data in DRAM. Systems that leverage this possibility, such as main-memory databases, keep two copies of the data in two different formats: one in main memory and the other one in storage. The two copies are kept synchronized using snapshotting and logging. This main-memory-centric architecture yields nearly two orders of magnitude faster analytical processing than traditional, disk-centric ones. The rise of Big Data emphasized the importance of such systems with an ever-increasing need for more main memory. However, DRAM is hitting its scalability limits: It is intrinsically hard to further increase its density. Storage-Class Memory (SCM) is a group of novel memory technologies that promise to alleviate DRAM’s scalability limits. They combine the non-volatility, density, and economic characteristics of storage media with the byte-addressability and a latency close to that of DRAM. Therefore, SCM can serve as persistent main memory, thereby bridging the gap between main memory and storage. In this dissertation, we explore the impact of SCM as persistent main memory on database systems. Assuming a hybrid SCM-DRAM hardware architecture, we propose a novel software architecture for database systems that places primary data in SCM and directly operates on it, eliminating the need for explicit IO. This architecture yields many benefits: First, it obviates the need to reload data from storage to main memory during recovery, as data is discovered and accessed directly in SCM. Second, it allows replacing the traditional logging infrastructure by fine-grained, cheap micro-logging at data-structure level. Third, secondary data can be stored in DRAM and reconstructed during recovery. Fourth, system runtime information can be stored in SCM to improve recovery time. Finally, the system may retain and continue in-flight transactions in case of system failures. However, SCM is no panacea as it raises unprecedented programming challenges. Given its byte-addressability and low latency, processors can access, read, modify, and persist data in SCM using load/store instructions at a CPU cache line granularity. The path from CPU registers to SCM is long and mostly volatile, including store buffers and CPU caches, leaving the programmer with little control over when data is persisted. Therefore, there is a need to enforce the order and durability of SCM writes using persistence primitives, such as cache line flushing instructions. This in turn creates new failure scenarios, such as missing or misplaced persistence primitives. We devise several building blocks to overcome these challenges. First, we identify the programming challenges of SCM and present a sound programming model that solves them. Then, we tackle memory management, as the first required building block to build a database system, by designing a highly scalable SCM allocator, named PAllocator, that fulfills the versatile needs of database systems. Thereafter, we propose the FPTree, a highly scalable hybrid SCM-DRAM persistent B+-Tree that bridges the gap between the performance of transient and persistent B+-Trees. Using these building blocks, we realize our envisioned database architecture in SOFORT, a hybrid SCM-DRAM columnar transactional engine. We propose an SCM-optimized MVCC scheme that eliminates write-ahead logging from the critical path of transactions. Since SCM -resident data is near-instantly available upon recovery, the new recovery bottleneck is rebuilding DRAM-based data. To alleviate this bottleneck, we propose a novel recovery technique that achieves nearly instant responsiveness of the database by accepting queries right after recovering SCM -based data, while rebuilding DRAM -based data in the background. Additionally, SCM brings new failure scenarios that existing testing tools cannot detect. Hence, we propose an online testing framework that is able to automatically simulate power failures and detect missing or misplaced persistence primitives. Finally, our proposed building blocks can serve to build more complex systems, paving the way for future database systems on SCM

Technische Universität Dresden: Qucosa

Memory management techniques for large-scale persistent-main-memory systems

Author: Booss Daniel
Gomes Grégoire
Lehner Wolfgang
Lespinasse Adrien
Oukid Ismail
Willhalm Thomas
Publication venue: 'VLDB Endowment'
Publication date: 10/01/2023
Field of study

Storage Class Memory (SCM) is a novel class of memory technologies that promise to revolutionize database architectures. SCM is byte-addressable and exhibits latencies similar to those of DRAM, while being non-volatile. Hence, SCM could replace both main memory and storage, enabling a novel single-level database architecture without the traditional I/O bottleneck. Fail-safe persistent SCM allocation can be considered conditio sine qua non for enabling this novel architecture paradigm for database management systems. In this paper we present PAllocator, a fail-safe persistent SCM allocator whose design emphasizes high concurrency and capacity scalability. Contrary to previous works, PAllocator thoroughly addresses the important challenge of persistent memory fragmentation by implementing an efficient defragmentation algorithm. We show that PAllocator outperforms state-of-the-art persistent allocators by up to one order of magnitude, both in operation throughput and recovery time, and enables up to 2.39x higher operation throughput on a persistent B-Tree

Qucosa

HSSS - Hochschulschriftenserver der SLUB

Technische Universität Dresden: Qucosa

Redesigning Transaction Processing Systems for Non-Volatile Memory

Author: Kim Wookhee
Publication venue: Graduate School of UNIST
Publication date: 01/02/2019
Field of study

Department of Computer Science and EngineeringTransaction Processing Systems are widely used because they make the user be able to manage their data more efficiently. However, they suffer performance bottleneck due to the redundant I/O for guaranteeing data consistency. In addition to the redundant I/O, slow storage device makes the performance more degraded. Leveraging non-volatile memory is one of the promising solutions the performance bottleneck in Transaction Processing Systems. However, since the I/O granularity of legacy storage devices and non-volatile memory is not equal, traditional Transaction Processing System cannot fully exploit the performance of persistent memory. The goal of this dissertation is to fully exploit non-volatile memory for improving the performance of Transaction Processing Systems. Write amplification between Transaction Processing System is pointed out as a performance bottleneck. As first approach, we redesigned Transaction Processing Systems to minimize the redundant I/O between the Transaction Processing Systems. We present LS-MVBT that integrates recovery information into the main database file to remove temporary files for recovery. The LS-MVBT also employs five optimizations to reduce the write traffics in single fsync() calls. We also exploit the persistent memory to reduce the performance bottleneck from slow storage devices. However, since the traditional recovery method is for slow storage devices, we develop byte-addressable differential logging, user-level heap manager, and transaction-aware persistence to fully exploit the persistent memory. To minimize the redundant I/O for guarantee data consistency, we present the failure-atomic slotted paging with persistent buffer cache. Redesigning indexing structure is the second approach to exploit the non-volatile memory fully. Since the B+-tree is originally designed for block granularity, It generates excessive I/O traffics in persistent memory. To mitigate this traffic, we develop cache line friendly B+-tree which aligns its node size to cache line size. It can minimize the write traffic. Moreover, with hardware transactional memory, it can update its single node atomically without any additional redundant I/O for guaranteeing data consistency. It can also adapt Failure-Atomic Shift and Failure-Atomic In-place Rebalancing to eliminate unnecessary I/O. Furthermore, We improved the persistent memory manager that exploit traditional memory heap structure with free-list instead of segregated lists for small memory allocations to minimize the memory allocation overhead. Our performance evaluation shows that our improved version that consider I/O granularity of non-volatile memory can efficiently reduce the redundant I/O traffic and improve the performance by large of a margin.ope

ScholarWorks@UNIST

Recovery algorithms for in-memory OLTP databases

Author: Malviya Nirmesh
Publication venue: Massachusetts Institute of Technology
Publication date: 01/01/2012
Field of study

Thesis (S.M.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2012.Cataloged from PDF version of thesis.Includes bibliographical references (p. 63-66).Fine-grained, record-oriented write-ahead logging, as exemplified by systems like ARIES, has been the gold standard for relational database recovery. In this thesis, we show that in modern high-throughput transaction processing systems, this is no longer the optimal way to recover a database system. In particular, as transaction throughputs get higher, ARIES-style logging starts to represent a non-trivial fraction of the overall transaction execution time. We propose a lighter weight, coarse-grained command logging technique which only records the transactions that were executed on the database. It then does recovery by starting from a transactionally consistent checkpoint and replaying the commands in the log as if they were new transactions. By avoiding the overhead of fine-grained, page-level logging of before and after images (and substantial associated I/O), command logging can yield significantly higher throughput at run-time. Recovery times for command logging are higher compared to ARIES, but especially with the advent of high-availability techniques that can mask the outage of a recovering node, recovery speeds have become secondary in importance to run-time performance for most applications. We evaluated our approach on an implementation of TPC-C in a main memory database system (VoltDB), and found that command logging can offer 1.5x higher throughput than a main-memory optimized implementation of ARIES.by Nirmesh Malviya.S.M

DSpace@MIT

Persistent Database Buffer Caching and Logging with Slotted Page Structure

Author: Seo Jihye
Publication venue: Graduate School of UNIST
Publication date: 01/02/2018
Field of study

Department of Computer Science and EngineeringEmerging byte-addressable persistent memory (PM) will be effective to improve the performance of computer system by reducing the redundant write operations. Traditional database management system uses recovery techniques to prevent data loss. The techniques copy the entire page into the block device storage several times for one insertion, so the amount of I/O is not negligible. In this work, we consider PM as main memory. Then, the durability of data in the buffer cache is ensured. To guarantee consistency, we exploit slotted page structure which is commonly used in database systems. We revisit that the slot header, which stores the metadata of the page in the slotted page structure, can act like a commit mark in the persistent database buffer cache. We then present two novel database management schemes using persistent buffer cache and slotted page. In-place commit scheme updates the page atomically using hardware transactional memory. It doesn't make any other copies and has optimal performance. Slot header logging scheme is needed for the case of updating pages more than one. Unlike the existing logging technique, slot header logging reduces the write operations by logging only commit mark. We implemented these schemes in SQLite and evaluate the performance compared with NVWAL, which is the state-of-the-art scheme. Our experiments show that in-place commit scheme needs only 3 cache line flush instructions for one insertion and slot header logging scheme reduces logging overhead at least 1/4.ope

ScholarWorks@UNIST

RGLock: Recoverable Mutual Exclusion for Non-Volatile Main Memory Systems

Author: Ramaraju Aditya
Publication venue: 'University of Waterloo'
Publication date: 01/01/2015
Field of study

Mutex locks have traditionally been the most popular concurrent programming mechanisms for inter-process synchronization in the rapidly advancing field of concurrent computing systems that support high-performance applications. However, the concept of recoverability of these algorithms in the event of a crash failure has not been studied thoroughly. Popular techniques like transaction roll-back are widely known for providing fault-tolerance in modern Database Management Systems. Whereas in the context of mutual exclusion in shared memory systems, none of the prominent lock algorithms (e.g., Lamport’s Bakery algorithm, MCS lock, etc.) are designed to tolerate crash failures, especially in operations carried out in the critical sections. Each of these algorithms may fail to maintain mutual exclusion, or sacrifice some of the liveness guarantees in presence of crash failures. Storing application data and recovery information in the primary storage with conventional volatile memory limits the development of efficient crash-recovery mechanisms since a failure on any component in the system causes a loss of program data. With the advent of Non-Volatile Main Memory technologies, opportunities have opened up to redefine the problem of Mutual Exclusion in the context of a crash-recovery model where processes may recover from crash failures and resume execution. When the main memory is non-volatile, an application’s entire state can be recovered from a crash using the in-memory state near-instantaneously, making a process’s failure appear as a suspend/resume event. This thesis proceeds to envision a solution for the problem of mutual exclusion in such systems. The goal is to provide a first-of-its-kind mutex lock that guarantees mutual exclusion and starvation freedom in emerging shared-memory architectures that incorporate non-volatile main memory (NVMM)

University of Waterloo's Institutional Repository

Instant restore after a media failure

Author: C Mohan
C Mohan
E Levy
G Graefe
G Graefe
J Gray
JN Gray
PM Chen
T Härder
Publication venue
Publication date: 26/02/2017
Field of study

Media failures usually leave database systems unavailable for several hours until recovery is complete, especially in applications with large devices and high transaction volume. Previous work introduced a technique called single-pass restore, which increases restore bandwidth and thus substantially decreases time to repair. Instant restore goes further as it permits read/write access to any data on a device undergoing restore--even data not yet restored--by restoring individual data segments on demand. Thus, the restore process is guided primarily by the needs of applications, and the observed mean time to repair is effectively reduced from several hours to a few seconds. This paper presents an implementation and evaluation of instant restore. The technique is incrementally implemented on a system starting with the traditional ARIES design for logging and recovery. Experiments show that the transaction latency perceived after a media failure can be cut down to less than a second and that the overhead imposed by the technique on normal processing is minimal. The net effect is that a few "nines" of availability are added to the system using simple and low-overhead software techniques

arXiv.org e-Print Archive

Crossref