345 research outputs found

    Transactional filesystems

    Get PDF
    Dissertação de Mestrado em Engenharia InformáticaThe task of implementing correct software is not trivial; mainly when facing the need for supporting concurrency. To overcome this difficulty, several researchers proposed the technique of providing the well known database transactional models as an abstraction for existing programming languages, allowing a software programmer to define groups of computations as transactions and benefit from the expectable semantics of the underlying transactional model. Prototypes for this programming model are nowadays made available by many research teams but are still far from perfection due to a considerable number of operational restrictions. Mostly, these restrictions derive from the limitations on the use of input-output functions inside a transaction. These functions are frequently irreversible which disables their compatibility with a transactional engine due to its impossibility to undo their effects in the event of aborting a transaction. However, there is a group of input-output operations that are potentially reversible and that can produce a valuable tool when provided within the transactional programming model explained above: the file system operations. A programming model that would involve in a transaction not only a set of memory operations but also a set of file operations, would allow the software programmer to define algorithms in a much flexible and simple way, reaching greater stability and consistency in each application. In this document we purpose to specify and allow the use of this type of operations inside a transactional programming model, as well as studying the advantages and disadvantages of this approach

    Using Lightweight Transactions and Snapshots for Fault-Tolerant Services Based on Shared Storage Bricks

    Full text link
    To satisfy current and future application needs in a cost effective manner, storage systems are evolving from mono-lithic disk arrays to networked storage architectures based on commodity components. So far, this architectural transi-tion has mostly been envisioned as a way to scale capacity and performance. In this work we examine how the block-level interface exported by such networked storage systems can be extended to deal with reliability. Our goals are: (a) At the design level, to examine how strong reliability se-mantics can be offered at the block level; (b) At the imple-mentation level, to examine the mechanisms required and how they may be provided in a modular and configurable manner. We first discuss how transactional-type semantics may be offered at the block level. We present a system design that uses the concept of atomic update intervals combined with existing, block-level locking and snapshot mechanisms, in contrast to the more common journaling techniques. We discuss in detail the design of the associated mechanisms and the trade-offs and challenges when dividing the re-quired functionality between the file-system and the block-level storage. Our approach is based on a unified and thus, non-redundant set of mechanisms for providing reliability both at the block and file level. Our design and imple-mentation effectively provide a tunable, lightweight trans-actions mechanism to higher system and application layers. Finally, we describe how the associated protocols can be implemented in a modular way in a prototype storage sys-tem we are currently building. As our system is currently being implemented, we do not present performance results

    Persistent Memory File Systems:A Survey

    Get PDF
    Persistent Memory (PM) is non-volatile byte-addressable memory that offers read and write latencies in the order of magnitude smaller than flash storage, such as SSDs. This survey discusses how file systems address the most prominent challenges in the implementation of file systems for Persistent Memory. First, we discuss how the properties of Persistent Memory change file system design. Second, we discuss work that aims to optimize small file I/O and the associated meta-data resolution. Third, we address how existing Persistent Memory file systems achieve (meta) data persistence and consistency

    Gurret: Decentralized data management using subscription-based file attribute propagation

    Get PDF
    Research institutions and funding agencies are increasingly adopting open-data science, where data is freely available or available under some data sharing policy. In addition to making publication efforts easier, open data science also promotes collaborative work using data from various sources around the world. While the research datasets are often static and immutable, the metadata of a file can be ever-changing. For researchers who frequently work with metadata, accessing the latest version may be essential. However, this is not trivial in a distributed environment where multiple people access the same file. We hypothesize that the publisher subscriber model is a useful abstraction to achieve this system. To this, we present Gurret: a distributed system for open science that uses a publisher-subscriber based substrate to propagate metadata updates to client machines. Gurret offers a transparent system infrastructure that lets users subscribe to metadata, configure update frequencies, and define custom metadata to create data policies. Additionally, Gurret tracks information flow inside a filesystem container to prevent data leakage and policy violations. Our evaluations show that Gurret has minimal overhead for small to medium-sized files and that Gurret can support hundreds of custom metadata without losing transparency

    ADDING PERSISTENCE TO MAIN MEMORY PROGRAMMING

    Get PDF
    Unlocking the true potential of the new persistent memories (PMEMs) requires eliminating traditional persistent I/O abstractions altogether, by introducing persistent semantics directly into main memory programming. Such a programming model elevates failure atomicity to a first-class application property in addition to in-memory data layout, concurrency-control, and fault tolerance, and therefore requires redesign of programming abstractions for both program correctness and maximum performance gains. To address these challenges, this thesis proposes a set of system software designs that integrate persistence with main memory programming, and makes the following contributions. First, this thesis proposes a PMEM-aware I/O runtime, NVStream, that supports fast durable streaming I/O. NVStream uses a memory-based I/O interface that integrates with existing I/O data movement operations of an application to accelerate persistent data writes. NVStream carefully designs its persistent data storage layout and crash-consistent semantics to match both application and PMEM characteristics. Specifically, we leverage the streaming nature of I/O in HPC workflows, to benefit from using a log-structured PMEM storage engine design, that uses relaxed write orderings and append-only failure-atomic semantics to form strongly consistent application checkpoints. Furthermore, we identify that optimizing the I/O software stack exposes the PMEM bandwidth limitations as a bottleneck during parallel HPC I/O writes, and propose a novel data movement design – PHX. PHX uses alternative network data movement paths available in datacenters to ease up the bandwidth pressure on the PMEM memory interconnects, all while maintaining the correctness of the persistent data. Next, the thesis explores the challenges and opportunities of using PMEM for true main memory persistent programming – a single data domain for both runtime and persistent applicationstate. Such a programming model includes maintaining ACID properties during each and every update to applications persistent structures. ACID-qualified persistent programming for multi-threaded applications is hard, as the programmer has to reason about both crash-consistency and synchronization – crash-sync – semantics for programming correctness. The thesis contributes new understanding of the correctness requirements for mixing different crash-consistent and synchronization protocols, characterizes the performance of different crash-sync realizations for different applications and hardware architectures, and draws actionable insights for future designs of PMEM systems. Finally, the application state stored on node-local persistent memory is still vulnerable to catastrophic node failures. The thesis proposes a replicated persistent memory runtime, Blizzard, that supports truly fault tolerant, concurrent and persistent data-structure programming. Blizzard carefully integrates userspace networking with byte addressable PMEM for a fast, persistent memory replication runtime. The design also incorporates a replication-aware crash-sync protocol that supports consistent and concurrent updates on persistent data-structures. Blizzard offers applications the flexibility to use the data structures that best match their functional requirements, while offering better performance, and providing crucial reliability guarantees lacking from existing persistent memory runtimes.Ph.D

    Benchmarking Hadoop performance on different distributed storage systems

    Get PDF
    Distributed storage systems have been in place for years, and have undergone significant changes in architecture to ensure reliable storage of data in a cost-effective manner. With the demand for data increasing, there has been a shift from disk-centric to memory-centric computing - the focus is on saving data in memory rather than on the disk. The primary motivation for this is the increased speed of data processing. This could, however, mean a change in the approach to providing the necessary fault-tolerance - instead of data replication, other techniques may be considered. One example of an in-memory distributed storage system is Tachyon. Instead of replicating data files in memory, Tachyon provides fault-tolerance by maintaining a record of the operations needed to generate the data files. These operations are replayed if the files are lost. This approach is termed lineage. Tachyon is already deployed by many well-known companies. This thesis work compares the storage performance of Tachyon with that of the on-disk storage systems HDFS and Ceph. After studying the architectures of well-known distributed storage systems, the major contribution of the work is to integrate Tachyon with Ceph as an underlayer storage system, and understand how this affects its performance, and how to tune Tachyon to extract maximum performance out of it

    Optimizing File Systems for High-Performance Storage Devices

    Get PDF
    학위논문 (박사)-- 서울대학교 대학원 : 공과대학 전기·컴퓨터공학부, 2018. 2. 염헌영.High-performance storage technologies such as solid-state drives (SSDs) provide low-latency, high throughput, and high I/O parallelism to legacy storage systems. SSDs access data without mechanical overhead, and they often leads to order-of-magnitude improvements in performance over legacy storage devices such as hard disk drives (HDDs). However, replacing HDDs with SSDs while keeping the software I/O stack or not exploiting SSD features does not lead to maximum performance. In this dissertation, we optimize file systems to fully exploit the SSD features (e.g., low-latency and high I/O parallelism). First, we analyze and explore I/O strategies in the existing file systems on low-latency SSDs. The file systems issue and complete several I/O requests when blocks are not contiguous, which does not take advantage of the low-latency of SSDs. To address this problem, we propose efficient I/O strategies, which transfer requests from discontiguous host memory buffers in the file systems to discontiguous storage segments in a single I/O request. Thus, they enable file systems to fully exploit the performance of low-latency SSDs. Second, we investigate the locking and I/O parallelism in the existing file systems on highly parallel SSDs. In the file systems, the coarse-grained locking to access shared data structures is used and I/O operations are serialized by a single thread. For these reasons, the file systems often face the problem of lock contention and underutilization of I/O bandwidth on multi-cores with highly parallel SSDs. To address these issues, we enable concurrent updates on data structures and parallelize I/O operations. We implement our techniques in EXT4/JBD2 and evaluate them on low-latency and highly parallel SSDs. The experimental results show that our optimized file system improves the performance compared to the existing EXT4 file system.Chapter 1 Introduction 1 1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 Approach and Contributions . . . . . . . . . . . . . . . . . . . . 3 1.3 Dissertation Structure . . . . . . . . . . . . . . . . . . . . . . . . 4 Chapter 2 Background 6 2.1 High-performance Storage Devices . . . . . . . . . . . . . . . . . 6 2.2 Crash Consistency in File Systems . . . . . . . . . . . . . . . . . 7 2.3 Read and Write Operations in the Existing File Systems . . . . . 9 2.4 Journal I/O in the Journaling File Systems . . . . . . . . . . . . 10 2.5 Recovery in the Journaling File Systems . . . . . . . . . . . . . . 13 2.6 Existing Locking and I/O Parallelism in Journaling File Systems 14 Chapter 3 Design and Implementation 24 3.1 Optimizing File Systems for Low-latency Storage Devices . . . . 24 3.1.1 Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 3.1.2 Implementation . . . . . . . . . . . . . . . . . . . . . . . . 30 3.2 Optimizing File Systems for Highly Parallel Storage Devices . . . 33 3.2.1 Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 3.2.2 Implementation . . . . . . . . . . . . . . . . . . . . . . . . 39 Chapter 4 Evaluation 50 4.1 Evaluating the Optimized File System for Low-latency Storage . 50 4.1.1 Run-time Performance . . . . . . . . . . . . . . . . . . . . 52 4.1.2 Recovery Performance . . . . . . . . . . . . . . . . . . . . 57 4.1.3 Experimental Analysis . . . . . . . . . . . . . . . . . . . . 59 4.2 Evaluating the Optimized File System for Highly Parallel Storage 61 4.2.1 Run-time Performance . . . . . . . . . . . . . . . . . . . . 63 4.2.2 Recovery Performance . . . . . . . . . . . . . . . . . . . . 66 4.2.3 Experimental Analysis . . . . . . . . . . . . . . . . . . . . 67 Chapter 5 Related Work 69 5.1 Analysis and Evaluation of High-Performance storage . . . . . . 69 5.2 Study of Journaling File Systems . . . . . . . . . . . . . . . . . . 70 5.3 File and I/O System Optimizations for Low-latency Storage . . . 72 5.4 Study of Scalability in Operating Systems . . . . . . . . . . . . . 75 5.5 File and I/O System Optimizations for Highly Parallel Storage . 75 Chapter 6 Conculsion 78 6.1 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78 6.2 Future work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79Docto
    corecore