2,359,439 research outputs found
Parallel netCDF: A Scientific High-Performance I/O Interface
Dataset storage, exchange, and access play a critical role in scientific
applications. For such purposes netCDF serves as a portable and efficient file
format and programming interface, which is popular in numerous scientific
application domains. However, the original interface does not provide an
efficient mechanism for parallel data storage and access. In this work, we
present a new parallel interface for writing and reading netCDF datasets. This
interface is derived with minimum changes from the serial netCDF interface but
defines semantics for parallel access and is tailored for high performance. The
underlying parallel I/O is achieved through MPI-IO, allowing for dramatic
performance gains through the use of collective I/O optimizations. We compare
the implementation strategies with HDF5 and analyze both. Our tests indicate
programming convenience and significant I/O performance improvement with this
parallel netCDF interface.Comment: 10 pages,7 figure
Recommended from our members
ExaHDF5: Delivering Efficient Parallel I/O on Exascale Computing Systems
Scientific applications at exascale generate and analyze massive amounts of data. A critical requirement of these applications is the capability to access and manage this data efficiently on exascale systems. Parallel I/O, the key technology enables moving data between compute nodes and storage, faces monumental challenges from new applications, memory, and storage architectures considered in the designs of exascale systems. As the storage hierarchy is expanding to include node-local persistent memory, burst buffers, etc., as well as disk-based storage, data movement among these layers must be efficient. Parallel I/O libraries of the future should be capable of handling file sizes of many terabytes and beyond. In this paper, we describe new capabilities we have developed in Hierarchical Data Format version 5 (HDF5), the most popular parallel I/O library for scientific applications. HDF5 is one of the most used libraries at the leadership computing facilities for performing parallel I/O on existing HPC systems. The state-of-the-art features we describe include: Virtual Object Layer (VOL), Data Elevator, asynchronous I/O, full-featured single-writer and multiple-reader (Full SWMR), and parallel querying. In this paper, we introduce these features, their implementations, and the performance and feature benefits to applications and other libraries
Improving Parallel I/O Performance Using Interval I/O
Today\u27s most advanced scientific applications run on large clusters consisting of hundreds of thousands of processing cores, access state of the art parallel file systems that allow files to be distributed across hundreds of storage targets, and utilize advanced interconnections systems that allow for theoretical I/O bandwidth of hundreds of gigabytes per second. Despite these advanced technologies, these applications often fail to obtain a reasonable proportion of available I/O bandwidth. The reasons for the poor performance of application I/O include the noncontiguous I/O access patterns used for scientific computing, contention due to false sharing, and the somewhat finicky nature of parallel file system performance. We argue that a more fundamental cause of this problem is the legacy view of a file as a linear sequence of bytes. To address these issues, we introduce a novel approach for parallel I/O called Interval I/O. Interval I/O is an innovative approach that uses application access patterns to partition a file into a series of intervals, which are used as the fundamental unit for subsequent I/O operations. Use of this approach provides superior performance for the noncontiguous access patterns which are frequently used by scientific applications. In addition, the approach reduces false contention and the unnecessary serialization it causes. Interval I/O also significantly increases the performance of atomic mode operations. Finally, the Interval I/O approach includes a technique for supporting parallel I/O for cooperating applications. We provide a prototype implementation of our Interval I/O system and use it to demonstrate performance improvements of as much as 1000% compared to ROMIO when using Interval I/O with several common benchmarks
MPI-Vector-IO: Parallel I/O and Partitioning for Geospatial Vector Data
In recent times, geospatial datasets are growing in terms of size, complexity and heterogeneity. High performance systems are needed to analyze such data to produce actionable insights in an efficient manner. For polygonal a.k.a vector datasets, operations such as I/O, data partitioning, communication, and load balancing becomes challenging in a cluster environment. In this work, we present MPI-Vector-IO 1 , a parallel I/O library that we have designed using MPI-IO specifically for partitioning and reading irregular vector data formats such as Well Known Text. It makes MPI aware of spatial data, spatial primitives and provides support for spatial data types embedded within collective computation and communication using MPI message-passing library. These abstractions along with parallel I/O support are useful for parallel Geographic Information System (GIS) application development on HPC platforms
Lemon: an MPI parallel I/O library for data encapsulation using LIME
We introduce Lemon, an MPI parallel I/O library that is intended to allow for
efficient parallel I/O of both binary and metadata on massively parallel
architectures. Motivated by the demands of the Lattice Quantum Chromodynamics
community, the data is stored in the SciDAC Lattice QCD Interchange Message
Encapsulation format. This format allows for storing large blocks of binary
data and corresponding metadata in the same file. Even if designed for LQCD
needs, this format might be useful for any application with this type of data
profile. The design, implementation and application of Lemon are described. We
conclude with presenting the excellent scaling properties of Lemon on state of
the art high performance computers
Exploring Scientific Application Performance Using Large Scale Object Storage
One of the major performance and scalability bottlenecks in large scientific
applications is parallel reading and writing to supercomputer I/O systems. The
usage of parallel file systems and consistency requirements of POSIX, that all
the traditional HPC parallel I/O interfaces adhere to, pose limitations to the
scalability of scientific applications. Object storage is a widely used storage
technology in cloud computing and is more frequently proposed for HPC workload
to address and improve the current scalability and performance of I/O in
scientific applications. While object storage is a promising technology, it is
still unclear how scientific applications will use object storage and what the
main performance benefits will be. This work addresses these questions, by
emulating an object storage used by a traditional scientific application and
evaluating potential performance benefits. We show that scientific applications
can benefit from the usage of object storage on large scales.Comment: Preprint submitted to WOPSSS workshop at ISC 201
- …