29 research outputs found

    ECHOFS: a scheduler-guided temporary filesystem to leverage node-local NVMS

    Get PDF
    © 2018 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes,creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.The growth in data-intensive scientific applications poses strong demands on the HPC storage subsystem, as data needs to be copied from compute nodes to I/O nodes and vice versa for jobs to run. The emerging trend of adding denser, NVM-based burst buffers to compute nodes, however, offers the possibility of using these resources to build temporary file systems with specific I/O optimizations for a batch job. In this work, we present echofs, a temporary filesystem that coordinates with the job scheduler to preload a job's input files into node-local burst buffers. We present the results measured with NVM emulation, and different FS backends with DAX/FUSE on a local node, to show the benefits of our proposal and such coordination.This work was partially supported by the Spanish Ministry of Science and Innovation under the TIN2015–65316 grant, the Generalitat de Catalunya under contract 2014– SGR–1051, as well as the European Union’s Horizon 2020 Research and Innovation Programme, under Grant Agreement no. 671951 (NEXTGenIO). Source code available at https://github.com/bsc-ssrg/echofs.Peer ReviewedPostprint (author's final draft

    Understanding (Un)Written Contracts of NVMe ZNS Devices with zns-tools

    Full text link
    Operational and performance characteristics of flash SSDs have long been associated with a set of Unwritten Contracts due to their hidden, complex internals and lack of control from the host software stack. These unwritten contracts govern how data should be stored, accessed, and garbage collected. The emergence of Zoned Namespace (ZNS) flash devices with their open and standardized interface allows us to write these unwritten contracts for the storage stack. However, even with a standardized storage-host interface, due to the lack of appropriate end-to-end operational data collection tools, the quantification and reasoning of such contracts remain a challenge. In this paper, we propose zns.tools, an open-source framework for end-to-end event and metadata collection, analysis, and visualization for the ZNS SSDs contract analysis. We showcase how zns.tools can be used to understand how the combination of RocksDB with the F2FS file system interacts with the underlying storage. Our tools are available openly at \url{https://github.com/stonet-research/zns-tools}

    Understanding (Un)Written Contracts of NVMe ZNS Devices with zns-tools

    Get PDF
    Operational and performance characteristics of flash SSDs have long been associated with a set of Unwritten Contracts due to their hidden, complex internals and lack of control from the host software stack. These unwritten contracts govern how data should be stored, accessed, and garbage collected. The emergence of Zoned Namespace (ZNS) flash devices with their open and standardized interface allows us to write these unwritten contracts for the storage stack. However, even with a standardized storage-host interface, due to the lack of appropriate end-to-end operational data collection tools, the quantification and reasoning of such contracts remain a challenge. In this paper, we propose zns.tools, an open-source framework for end-to-end event and metadata collection, analysis, and visualization for the ZNS SSDs contract analysis. We showcase how zns.tools can be used to understand how the combination of RocksDB with the F2FS file system interacts with the underlying storage. Our tools are available openly at \url{https://github.com/stonet-research/zns-tools}

    Towards a set of metrics to guide the generation of fake computer file systems

    Get PDF
    Fake file systems are used in the field of cyber deception to bait intruders and fool forensic investigators. File system researchers also frequently generate their own synthetic document repositories, due to data privacy and copyright concerns associated with experimenting on real-world corpora. For both these fields, realism is critical. Unfortunately, after creating a set of files and folders, there are no current testing standards that can be applied to validate their authenticity, or conversely, reliably automate their detection. This paper reviews the previous 30 years of file system surveys on real world corpora, to identify a set of discrete measures for generating synthetic file systems. Statistical distributions, such as size, age and lifetime of files, common file types, compression and duplication ratios, directory distribution and depth (and its relationship with numbers of files and sub-directories) were identified and the respective merits discussed. Additionally, this paper highlights notable absences in these surveys, which could be beneficial, such as analysing, on mass, the text content distribution, file naming habits, and comparing file access times against traditional working hours

    Residual-Based Estimation of Peer and Link Lifetimes in P2P Networks

    Get PDF
    Existing methods of measuring lifetimes in P2P systems usually rely on the so-called Create-BasedMethod (CBM), which divides a given observation window into two halves and samples users ldquocreatedrdquo in the first half every Delta time units until they die or the observation period ends. Despite its frequent use, this approach has no rigorous accuracy or overhead analysis in the literature. To shed more light on its performance, we first derive a model for CBM and show that small window size or large Delta may lead to highly inaccurate lifetime distributions. We then show that create-based sampling exhibits an inherent tradeoff between overhead and accuracy, which does not allow any fundamental improvement to the method. Instead, we propose a completely different approach for sampling user dynamics that keeps track of only residual lifetimes of peers and uses a simple renewal-process model to recover the actual lifetimes from the observed residuals. Our analysis indicates that for reasonably large systems, the proposed method can reduce bandwidth consumption by several orders of magnitude compared to prior approaches while simultaneously achieving higher accuracy. We finish the paper by implementing a two-tier Gnutella network crawler equipped with the proposed sampling method and obtain the distribution of ultrapeer lifetimes in a network of 6.4 million users and 60 million links. Our experimental results show that ultrapeer lifetimes are Pareto with shape alpha ap 1.1; however, link lifetimes exhibit much lighter tails with alpha ap 1.8

    Residual-Based Measurement of Peer and Link Lifetimes in Gnutella Networks

    Get PDF
    Existing methods of measuring lifetimes in P2P systems usually rely on the so-called create-based method (CBM), which divides a given observation window into two halves and samples users created in the first half every Delta time units until they die or the observation period ends. Despite its frequent use, this approach has no rigorous accuracy or overhead analysis in the literature. To shed more light on its performance, we flrst derive a model for CBM and show that small window size or large Delta may lead to highly inaccurate lifetime distributions. We then show that create-based sampling exhibits an inherent tradeoff between overhead and accuracy, which does not allow any fundamental improvement to the method. Instead, we propose a completely different approach for sampling user dynamics that keeps track of only residual lifetimes of peers and uses a simple renewal-process model to recover the actual lifetimes from the observed residuals. Our analysis indicates that for reasonably large systems, the proposed method can reduce bandwidth consumption by several orders of magnitude compared to prior approaches while simultaneously achieving higher accuracy. We finish the paper by implementing a two-tier Gnutella network crawler equipped with the proposed sampling method and obtain the distribution of ultrapeer lifetimes in a network of 6.4 million users and 60 million links. Our experimental results show that ultrapeer lifetimes are Pareto with shape a alpha ap 1.1; however, link lifetimes exhibit much lighter tails with alpha ap 1.9