10 research outputs found
Efficient access control for distributed hierarchical file systems
To determine whether a user can access a file in a hierarchical file system a traversal of the directory hierarchy is required in order to check access control for all the parent directories. This traversal can be especially expensive in a distributed system where the files may be on separate devices. We present two approaches for representing the complete access control for a file and its parent directories such that it can be stored locally with each file in order to avoid traversal. We use the well-known CNF and DNF (Conjunctive and Disjunctive Normal Form) formats to store permission and ownership information compactly for the entire path to a file. An examination of the structure of an existing large shared file system demonstrates the efficacy of our solution. 1
Abstract
In petabyte-scale distributed file systems that decouple read and write from metadata operations, behavior of the metadata server cluster will be critical to overall system performance and scalability. We present a dynamic subtree partitioning and adaptive metadata management system designed to efficiently manage hierarchical metadata workloads that evolve over time. We examine the relative merits of our approach in the context of traditional workload partitioning strategies, and demonstrate the performance, scalability and adaptability advantages in a simulation environment.
Dynamic Metadata Management for Petabyte-scale File Systems
In petabyte-scale distributed file systems that decouple read and write from metadata operations, behavior of the metadata server cluster will be critical to overall system performance and scalability. We present a dynamic subtree partitioning and adaptive metadata management system designed to efficiently manage hierarchical metadata workloads that evolve over time. We examine the relative merits of our approach in the context of traditional workload partitioning strategies, and demonstrate the performance, scalability and adaptability advantages in a simulation environment
Quota enforcement for high-performance distributed storage systems
Storage systems manage quota to ensure that each user gets the storage they need, and that no one user can—even by accident—use up all available storage. This is difficult for large, distributed systems, especially those used for high-performance computing applications, because resource allocation occurs on many nodes concurrently. We present a scheme where quota is enforced asynchronously by intelligent storage servers: storage clients contact a shared management service to get vouchers, a capabilitylike certificate that the clients can redeem at participating storage servers to allocate storage space. This approach produces low load on the shared management service, promotes good scaling, and allows the client to make decisions about which storage server(s) to use without communicating with the management service for further approval. Storage servers and the management service periodically reconcile voucher usage to ensure that clients do not cheat by spending the same voucher at multiple storage servers. We report on a simulation study that shows that this approach gives performance nearly as good as not enforcing quota at all, and that the load on the shared management server is remarkably low.
PRESIDIO
The ever-increasing volume of archival data that needs to be reliably retained for long periods of time and the decreasing costs of disk storage, memory, and processing have motivated the design of low-cost, highefficiency disk-based storage systems. However, managed disk storage is still expensive. To further lower the cost, redundancy can be eliminated with the use of interfile and intrafile data compression. However, it is not clear what the optimal strategy for compressing data is, given the diverse collections of data. To create a scalable archival storage system that efficiently stores diverse data, we present PRESIDIO, a framework that selects from different space-reduction efficent storage methods (ESMs) to detect similarity and reduce or eliminate redundancy when storing objects. In addition, the framework uses a virtualized content addressable store (VCAS) that hides from the user the complexity of knowing which space-efficient techniques are used, including chunk-based deduplication or delta compression. Storing and retrieving objects are polymorphic operations independent of their content-based address. A new technique, harmonic super-fingerprinting, is also used for obtaining successively more accurate (but also more costly) measures of similarity to identify the existing objects in a very large data set that are most similar to an incoming new object. The PRESIDIO design, when reported earlier, had comprehensively introduced for the first time the notion of deduplication, which is now being offered as a service in storage systems by major vendors. As an aid to the design of such systems, we evaluate and present various parameters that affect the efficiency of a storage system using empirical data. © 2011 ACM