CSR: Small: Collaborative Research: SANE: Semantic-Aware Namespace in Exascale File Systems

Abstract

Explosive growth in volume and complexity of data exacerbates the key challenge facing the management of massive data in a way that fundamentally improves the ease and efficacy of their usage. Exascale storage systems in general rely on hierarchically structured namespace that leads to severe performance bottlenecks and makes it hard to support real-time queries on multi-dimensional attributes. Thus, existing storage systems, characterized by the hierarchical directory tree structure, are not scalable in light of the explosive growth in both the volume and the complexity of data. As a result, directory-tree based hierarchical namespace has become restrictive, difficult to use, and limited in scalability for today\u27s large-scale file systems. This project investigates a novel semantic-aware namespace scheme to provide dynamic and adaptive namespace management and support typical file-based operations in Exascale file systems. The project leverages semantic correlations among files and exploits the evolution of metadata attributes to support customized namespace management, with the end goal of efficiently facilitating file identification and end users data lookup. This project provides significant performance improvements for existing file systems in Exascale file systems. Since Exascale file systems constitute one of the backbones of the high-performance computing infrastructure, the semantic-aware techniques also benefits a great number of scientific and engineering data-intensive applications. This project strengthens the ongoing development of high performance computing infrastructures at both UNL and UMaine. The project enhances undergraduate and graduate education at both participating institutions and outreach to K-12 in UMaine via an ongoing NSF-funded ITEST program

    Similar works