5 research outputs found

    Data organization and I/O in a parallel ocean circulation model

    Get PDF
    We describe an efficient and scalable parallel I/0 strategy for writing out gigabytes of data generated hourly in the ocean model simulations on massively parallel distributed-memory architectures. Working with Modular Ocean Model, using net CIF file system? and implemented on Cray T3E, the strategy speedup I/0 by a factor of 50 in the sequential case. In parallel case, on 8 PEs up to 256 PEs, our implementation writes out most model dynamic fields of about 1GB to a single netCDF file in 65 seconds, independent of the number of processors. The remap-and-write parallel strategy resolves the memory limitation problem and requires minimal collective I/0 capability of the file system. Several critical optimization on memory management and file access are carried out which ensure this scalability and also speedup the numerical simulation due to improved memory management

    A Lightweight I/O Scheme to Facilitate Spatial and Temporal Queries of Scientific Data Analytics

    Get PDF
    In the era of petascale computing, more scientific applications are being deployed on leadership scale computing platforms to enhance the scientific productivity. Many I/O techniques have been designed to address the growing I/O bottleneck on large-scale systems by handling massive scientific data in a holistic manner. While such techniques have been leveraged in a wide range of applications, they have not been shown as adequate for many mission critical applications, particularly in data post-processing stage. One of the examples is that some scientific applications generate datasets composed of a vast amount of small data elements that are organized along many spatial and temporal dimensions but require sophisticated data analytics on one or more dimensions. Including such dimensional knowledge into data organization can be beneficial to the efficiency of data post-processing, which is often missing from exiting I/O techniques. In this study, we propose a novel I/O scheme named STAR (Spatial and Temporal AggRegation) to enable high performance data queries for scientific analytics. STAR is able to dive into the massive data, identify the spatial and temporal relationships among data variables, and accordingly organize them into an optimized multi-dimensional data structure before storing to the storage. This technique not only facilitates the common access patterns of data analytics, but also further reduces the application turnaround time. In particular, STAR is able to enable efficient data queries along the time dimension, a practice common in scientific analytics but not yet supported by existing I/O techniques. In our case study with a critical climate modeling application GEOS-5, the experimental results on Jaguar supercomputer demonstrate an improvement up to 73 times for the read performance compared to the original I/O method
    corecore