107 research outputs found

    Improving Parallel I/O Performance Using Interval I/O

    Get PDF
    Today\u27s most advanced scientific applications run on large clusters consisting of hundreds of thousands of processing cores, access state of the art parallel file systems that allow files to be distributed across hundreds of storage targets, and utilize advanced interconnections systems that allow for theoretical I/O bandwidth of hundreds of gigabytes per second. Despite these advanced technologies, these applications often fail to obtain a reasonable proportion of available I/O bandwidth. The reasons for the poor performance of application I/O include the noncontiguous I/O access patterns used for scientific computing, contention due to false sharing, and the somewhat finicky nature of parallel file system performance. We argue that a more fundamental cause of this problem is the legacy view of a file as a linear sequence of bytes. To address these issues, we introduce a novel approach for parallel I/O called Interval I/O. Interval I/O is an innovative approach that uses application access patterns to partition a file into a series of intervals, which are used as the fundamental unit for subsequent I/O operations. Use of this approach provides superior performance for the noncontiguous access patterns which are frequently used by scientific applications. In addition, the approach reduces false contention and the unnecessary serialization it causes. Interval I/O also significantly increases the performance of atomic mode operations. Finally, the Interval I/O approach includes a technique for supporting parallel I/O for cooperating applications. We provide a prototype implementation of our Interval I/O system and use it to demonstrate performance improvements of as much as 1000% compared to ROMIO when using Interval I/O with several common benchmarks

    Scientific Data Management Integrated Software Infrastructure Center

    Full text link

    Extending the POSIX I/O interface: a parallel file system perspective.

    Full text link

    Orthrus: A Framework for Implementing Efficient Collective I/O in Multi-core Clusters

    Get PDF
    Abstract. Optimization of access patterns using collective I/O imposes the overhead of exchanging data between processes. In a multi-core-based cluster the costs of inter-node and intra-node data communication are vastly different, and heterogeneity in the efficiency of data exchange poses both a challenge and an opportunity for implementing efficient collective I/O. The opportunity is to effectively exploit fast intra-node communication. We propose to improve communication locality for greater data exchange efficiency. However, such an effort is at odds with improving access locality for I/O efficiency, which can also be critical to collective-I/O performance. To address this issue we propose a framework, Orthrus, that can accommodate multiple collective-I/O implementations, each optimized for some performance aspects, and dynamically select the best performing one accordingly to current workload and system patterns. We have implemented Orthrus in the ROMIO library. Our experimental results with representative MPI-IO benchmarks on both a small dedicated cluster and a large production HPC system show that Orthrus can significantly improve collective I/O performance under various workloads and system scenarios.

    PARALLEX FILE SYSTEM (PXFS): BRIDGING THE GAP BETWEEN EXASCALE PROCESSING CAPABILITIES AND I/O PERFORMANCE

    Get PDF
    Due to processors reaching the maximum performance allowable by current technology, architectural trends for computer systems continue to increase the number of cores per processing chip to maximize system performance. Most estimates suggest massively parallel systems will be available within the decade, containing millions of cores and capable of exaFlops of performance. New models of execution are necessary to maximize processor utilization and minimize power costs for these exascale systems. ParalleX is one such execution model, which attempts to address inefficiencies of current execution models by exposing fine-grained parallelism, increasing system utilization using asynchronous workflow, and resolving resource contention through the use of adaptive and dynamic resource scheduling. A particularly important aspect of these exascale execution models is the design of the I/O subsystem, which has seen limited performance increases compared to processor and network technologies. Parallel file systems have been designed to help alleviate the poor performance of storage technologies by distributing file data across multiple nodes of a parallel system to maximize the aggregate throughput attainable by file system clients. However, the design of parallel file systems needs to be modified to explicitly address the inherent high-latency of remote file system operations without degrading file system performance and scalability. We present modifications to OrangeFS, a high-performance, working model parallel file system geared towards the facilitation of research in the field of parallel I/O, to help address the inefficiencies of current file systems. We deem our resultant parallel file system implementation ParalleX File System (PXFS), as it attempts to support the features required by the I/O subsystem of the ParalleX execution model. Specifically, PXFS offers mechanisms for masking the latency of file system operations, defining meaningful computation to be overlapped with file system communication, and maintaining the high-performance and scalability exhibited by OrangeFS. Our results indicate PXFS successfully improves file system performance and supports the semantics of ParalleX with limited programmer intervention, potentially simplifying the design and increasing the performance of many ParalleX applications
    corecore