1 research outputs found

    Towards a distributed multi-tier file system for cluster computing

    No full text
    Distributed storage systems running on clusters of commodity hardware are challenged by the ever-growing data storage and I/O demands of modern large-scale data analytics. A promising trend is to exploit the recent improvements in memory, storage media, and network technologies for sustaining high performance at low cost. While recent work explores using memory and SSDs as a cache for local storage or combining local with network-attached storage, no work has ever looked at all layers together in a distributed setting. We present a novel design for a distributed file system that is aware of heterogeneous storage media (e.g., memory, SSDs, HDDs, NAS) with different capacities and performance characteristics. The storage media are explicitly exposed to users and applications, allowing them to choose the distribution and placement of replicas in the cluster based on their own performance and fault tolerance requirements. At the same time, the system offers a variety of pluggable policies for automating data management for increased performance and better cluster utilization. We analyze the new trends and challenges that led to our application- and data-centric design choices, and discuss how those choices inspire new research opportunities for data-intensive processing systems
    corecore