311 research outputs found

    Scalability study of database-backed file systems for High Throughput Computing

    Get PDF
    The purpose of this project is to study the read performance of transparent database-backed file systems, a meld between two technologies with seemingly similar purposes, in relation to conventional file systems. Systems such as the ARC middleware relies on reading several millions of files every day, and as the number of files increases, the performance suffers. To study the capabilities of a database-backed file system, a candidate is chosen and put into test. The candidate, ultimately being Database File System (DBFS), is Oracle Database using FUSE to create a transparent file system interface. DBFS is put into test by storing millions of small files in its datafile and executing a scanning process of the ARC software. With the performance data gathered from these tests, it was concluded that DBFS, while performing well on an HDD when compared to ext4 in terms of scalability and read performance, is simply outperformed by XFS with small (from 50 000 files) and large (up to 1 600 000 files) directories

    Gurret: Decentralized data management using subscription-based file attribute propagation

    Get PDF
    Research institutions and funding agencies are increasingly adopting open-data science, where data is freely available or available under some data sharing policy. In addition to making publication efforts easier, open data science also promotes collaborative work using data from various sources around the world. While the research datasets are often static and immutable, the metadata of a file can be ever-changing. For researchers who frequently work with metadata, accessing the latest version may be essential. However, this is not trivial in a distributed environment where multiple people access the same file. We hypothesize that the publisher subscriber model is a useful abstraction to achieve this system. To this, we present Gurret: a distributed system for open science that uses a publisher-subscriber based substrate to propagate metadata updates to client machines. Gurret offers a transparent system infrastructure that lets users subscribe to metadata, configure update frequencies, and define custom metadata to create data policies. Additionally, Gurret tracks information flow inside a filesystem container to prevent data leakage and policy violations. Our evaluations show that Gurret has minimal overhead for small to medium-sized files and that Gurret can support hundreds of custom metadata without losing transparency

    Characterizing Synchronous Writes in Stable Memory Devices

    Full text link
    Distributed algorithms that operate in the fail-recovery model rely on the state stored in stable memory to guarantee the irreversibility of operations even in the presence of failures. The performance of these algorithms lean heavily on the performance of stable memory. Current storage technologies have a defined performance profile: data is accessed in blocks of hundreds or thousands of bytes, random access to these blocks is expensive and sequential access is somewhat better. File system implementations hide some of the performance limitations of the underlying storage devices using buffers and caches. However, fail-recovery distributed algorithms bypass some of these techniques and perform synchronous writes to be able to tolerate a failure during the write itself. Assuming the distributed system designer is able to buffer the algorithm's writes, we ask how buffer size and latency complement each other. In this paper we start to answer this question by characterizing the performance (throughput and latency) of typical stable memory devices using a representative set of current file systems.Comment: 14 page

    High speed interconnects for DAQ applications

    Get PDF
    corecore