5 research outputs found

    Nature of System Calls in CPU-centric Computing Paradigm

    Get PDF
    Modern operating systems are typically POSIX-compliant with major system calls specified decades ago. The next generation of non-volatile memory (NVM) technologies raise concerns about the efficiency of the traditional POSIX-based systems. As one step toward building high performance NVM systems, we explore the potential dependencies between system call performance and major hardware components (e.g., CPU, memory, storage) under typical user cases (e.g., software compilation, installation, web browser, office suite) in this paper. We build histograms for the most frequent and time-consuming system calls with the goal to understand the nature of distribution on different platforms. We find that there is a strong dependency between the system call performance and the CPU architecture. On the other hand, the type of persistent storage plays a less important role in affecting the performance

    Understanding Persistent-Memory Related Issues in the Linux Kernel

    Full text link
    Persistent memory (PM) technologies have inspired a wide range of PM-based system optimizations. However, building correct PM-based systems is difficult due to the unique characteristics of PM hardware. To better understand the challenges as well as the opportunities to address them, this paper presents a comprehensive study of PM-related issues in the Linux kernel. By analyzing 1,553 PM-related kernel patches in-depth and conducting experiments on reproducibility and tool extension, we derive multiple insights in terms of PM patch categories, PM bug patterns, consequences, fix strategies, triggering conditions, and remedy solutions. We hope our results could contribute to the development of robust PM-based storage systemsComment: ACM TRANSACTIONS ON STORAGE(TOS'23

    Manifesting reliability issues in Storage Systems

    No full text
    Storage systems are vital in managing the ever increasing data generated by High Performance Computing based and Cloud-based applications. Therefore ensuring reliability while providing desired performance is important. However, building reliable storage systems is challenging and system may fail due to reasons such as power fault, device failure, software bugs, etc. In such events, storage systems rely on recovery components to bring the system back to a consistent. Unfortunately, similar failure events may occur while performing system recovery and can lead to severe corruptions in the file systems. On the other hand, storage systems are constantly updated to accommodate new storage technologies such as Persistent Memory (PM) devices to satisfy the demands for high performance. PM devices are storage class memory devices that offer low access latency and data persistence. In addition, these devices offer new features such as Direct-Access (DAX) that bypasses the complex Linux storage stack. However building new storage systems using PM devices is quite a challenge. Firstly, there is a new method to access data on these devices. Unlike traditional storage device that operate on block IO interface, PM devices operate over memory IO interface. Therefore, system developers need to develop new methods to access data. Secondly, the Linux kernel had to be modified by including new drivers to accommodate the devices and modifying file systems to support new DAX feature. These modifications can increase the complexity of the storage stack and may hinder the reliability of the storage system. Therefore, as a first step towards building reliable storage systems, this dissertation emphasizes on manifesting the reliability issues explained above. For this we first begin with analyzing the impact of interrupted recovery procedures on the durability of storage systems. To do this we build a fault injection framework to systematically interrupt the recovery procedure of four popular Linux file systems (Ext4, XFS, BtrFS and F2FS). We observe that not only does interrupted recovery induce severe corruption in file system, these corruptions are permanent and cannot be fixed by another run of recovery. We conclude this part by building a generalized redo log library with transaction support that can be easily integrated with existing recovery components to provide some resilience against interruptions. Second, we analyze the impact of PM software stack on system reliability by performing a study on PM-related issues reported in the Linux kernel. To do this we collect all patches submitted to the Linux kernel over the last decade and extract 1,553 PM-related kernel patches. We study these patches in depth and characterize PM-related bugs based on their cause. In addition, we also conduct experiments on PM bug reproducibility and evaluating existing bug detection tools to derive multiple insights such as bug manifesting conditions, remedy solutions, etc. The intuition to perform this study is to assist future work in building tools that can effectively manifest these bugs. Therefore we have open sourced our dataset and workloads utilized to reproduce a subset of PM bugs
    corecore