2 research outputs found

    Survey of storage systems for high-performance computing

    Get PDF
    In current supercomputers, storage is typically provided by parallel distributed file systems for hot data and tape archives for cold data. These file systems are often compatible with local file systems due to their use of the POSIX interface and semantics, which eases development and debugging because applications can easily run both on workstations and supercomputers. There is a wide variety of file systems to choose from, each tuned for different use cases and implementing different optimizations. However, the overall application performance is often held back by I/O bottlenecks due to insufficient performance of file systems or I/O libraries for highly parallel workloads. Performance problems are dealt with using novel storage hardware technologies as well as alternative I/O semantics and interfaces. These approaches have to be integrated into the storage stack seamlessly to make them convenient to use. Upcoming storage systems abandon the traditional POSIX interface and semantics in favor of alternative concepts such as object and key-value storage; moreover, they heavily rely on technologies such as NVM and burst buffers to improve performance. Additional tiers of storage hardware will increase the importance of hierarchical storage management. Many of these changes will be disruptive and require application developers to rethink their approaches to data management and I/O. A thorough understanding of today's storage infrastructures, including their strengths and weaknesses, is crucially important for designing and implementing scalable storage systems suitable for demands of exascale computing

    PASCAL: A Learning-aided Cooperative Bandwidth Control Policy for Hierarchical Storage Systems

    Full text link
    Nowadays, the Hierarchical Storage System (HSS) is considered as an ideal model to meet the cost-performance demand. The data migration between storing tiers of HSS is the way to achieve the cost-performance goal. The bandwidth control is to limit the maximum amount of data migration. Most of previous research about HSS focus on studying the data migration policy instead of bandwidth control. However, the recent research about cache and networking optimization suggest that the bandwidth control has significant impact on the system performance. Few previous work achieves a satisfactory bandwidth control in HSS since it is hard to control bandwidth for so many data migration tasks simultaneously. In this paper, we first give a stochastic programming model to formalize the bandwidth control problem in HSS. Then we propose a learning-aided bandwidth control policy for HSS, named \Pascal{}, which learns to control the bandwidth of different data migration task in an cooperative way. We implement \Pascal{} on a commercial HSS and compare it with three strong baselines over a group of workloads. Our evaluation on the physical system shows that \Pascal{} can effectively decrease 1.95X the tail latency and greatly improve throughput stability (2X ↓\downarrow throughput jitter), and meanwhile keep the throughput at a relatively high level
    corecore