Search CORE

311 research outputs found

Scalability study of database-backed file systems for High Throughput Computing

Author: Trinh Andy
Publication venue: Lunds universitet/Högskoleingenjörsutbildning i datateknik
Publication date: 01/01/2017
Field of study

The purpose of this project is to study the read performance of transparent database-backed file systems, a meld between two technologies with seemingly similar purposes, in relation to conventional file systems. Systems such as the ARC middleware relies on reading several millions of files every day, and as the number of files increases, the performance suffers. To study the capabilities of a database-backed file system, a candidate is chosen and put into test. The candidate, ultimately being Database File System (DBFS), is Oracle Database using FUSE to create a transparent file system interface. DBFS is put into test by storing millions of small files in its datafile and executing a scanning process of the ARC software. With the performance data gathered from these tests, it was concluded that DBFS, while performing well on an HDD when compared to ext4 in terms of scalability and read performance, is simply outperformed by XFS with small (from 50 000 files) and large (up to 1 600 000 files) directories

Gurret: Decentralized data management using subscription-based file attribute propagation

Author: Johansen Sivert
Publication venue: 'UiT The Arctic University of Norway'
Publication date: 13/05/2022
Field of study

Research institutions and funding agencies are increasingly adopting open-data science, where data is freely available or available under some data sharing policy. In addition to making publication efforts easier, open data science also promotes collaborative work using data from various sources around the world. While the research datasets are often static and immutable, the metadata of a file can be ever-changing. For researchers who frequently work with metadata, accessing the latest version may be essential. However, this is not trivial in a distributed environment where multiple people access the same file. We hypothesize that the publisher subscriber model is a useful abstraction to achieve this system. To this, we present Gurret: a distributed system for open science that uses a publisher-subscriber based substrate to propagate metadata updates to client machines. Gurret offers a transparent system infrastructure that lets users subscribe to metadata, configure update frequencies, and define custom metadata to create data policies. Additionally, Gurret tracks information flow inside a filesystem container to prevent data leakage and policy violations. Our evaluations show that Gurret has minimal overhead for small to medium-sized files and that Gurret can support hundreds of custom metadata without losing transparency

Munin - Open Research Archive

Recommended from our members

Decentralized Access Control in Distributed File Systems

Author: Ioannidis Sotiris
Keromytis Angelos D.
Miltchev Stefan
Prevelakis Vassilis
Smith Jonathan M.
Publication venue: 'Columbia University Libraries/Information Services'
Publication date: 01/01/2008
Field of study

The Internet enables global sharing of data across organizational boundaries. Distributed file systems facilitate data sharing in the form of remote file access. However, traditional access control mechanisms used in distributed file systems are intended for machines under common administrative control, and rely on maintaining a centralized database of user identities. They fail to scale to a large user base distributed across multiple organizations. We provide a survey of decentralized access control mechanisms in distributed file systems intended for large scale, in both administrative domains and users. We identify essential properties of such access control mechanisms. We analyze both popular production and experimental distributed file systems in the context of our survey

Columbia University Academic Commons

Characterizing Synchronous Writes in Stable Memory Devices

Author: Mingardi William B.
Vieira Gustavo M. D.
Publication venue: 'Sociedade Brasileira de Computacao - SB'
Publication date: 18/02/2020
Field of study

Distributed algorithms that operate in the fail-recovery model rely on the state stored in stable memory to guarantee the irreversibility of operations even in the presence of failures. The performance of these algorithms lean heavily on the performance of stable memory. Current storage technologies have a defined performance profile: data is accessed in blocks of hundreds or thousands of bytes, random access to these blocks is expensive and sequential access is somewhat better. File system implementations hide some of the performance limitations of the underlying storage devices using buffers and caches. However, fail-recovery distributed algorithms bypass some of these techniques and perform synchronous writes to be able to tolerate a failure during the write itself. Assuming the distributed system designer is able to buffer the algorithm's writes, we ask how buffer size and latency complement each other. In this paper we start to answer this question by characterizing the performance (throughput and latency) of typical stable memory devices using a representative set of current file systems.Comment: 14 page

arXiv.org e-Print Archive

Crossref

High speed interconnects for DAQ applications

Author: Dritschler T.
Publication venue
Publication date: 14/07/2015
Field of study

KITopen

A comparision on current distributed file systems for beowulf clusters

Author: Avila Rafael Bohrer
Denneulin Yves
Navaux Philippe Olivier Alexandre
Publication venue
Publication date: 01/01/2003
Field of study

Lume 5.8