31 research outputs found

    IRON file systems

    No full text
    Commodity file systems trust disks to either work or fail completely, yet modern disks exhibit more complex failure modes. We suggest a new fail-partial failure model for disks, which incorporates realistic localized faults such as latent sector errors and block corruption. We then develop and apply a novel failure-policy fingerprinting framework, to investigate how commodity file systems react to a range of more realistic disk failures. We classify their failure policies in a new taxonomy that measures their Internal RObustNess (IRON), which includes both failure detection and recovery techniques. We show that commodity file system failure policies are often inconsistent, sometimes buggy, and generally inadequate in their ability to recover from partial disk failures. Finally, we design, implement, and evaluate a prototype IRON file system, Linux ixt3, showing that techniques such as in-disk checksumming, replication, and parity greatly enhance file system robustness while incurring minimal time and space overheads

    An analysis of latent sector errors in disk drives

    No full text
    The reliability measures in today’s disk drive-based storage systems focus predominantly on protecting against complete disk failures. Previous disk reliability studies have analyzed empirical data in an attempt to better understand and predict disk failure rates. Yet, very little is known about the incidence of latent sector errors i.e., errors that go undetected until the corresponding disk sectors are accessed. Our study analyzes data collected from production storage systems over 32 months across 1.53 million disks (both nearline and enterprise class). We analyze factors that impact latent sector errors, observe trends, and explore their implications on the design of reliability mechanisms in storage systems. To the best of our knowledge, this is the first study of such large scale – our sample size is at least an order of magnitude larger than previously published studies – and the first one to focus specifically on latent sector errors and their implications on the design and reliability of storage systems

    Limiting Trust in the Storage Stack

    No full text
    We propose a framework for examining trust in the storage stack based on different levels of trustworthiness present across different channels of information flow. We focus on corruption in one of the channels, the data channel and as a case study, we apply type-aware corruption techniques to examine Windows NTFS behavior when on-disk pointers are corrupted. We find that NTFS does not verify on-disk pointers thoroughly before using them and that even established error handling techniques like replication are often used ineffectively. Our study indicates the need to more carefully examine how trust is managed within modern file systems
    corecore