69 research outputs found
Parallel netCDF: A Scientific High-Performance I/O Interface
Dataset storage, exchange, and access play a critical role in scientific
applications. For such purposes netCDF serves as a portable and efficient file
format and programming interface, which is popular in numerous scientific
application domains. However, the original interface does not provide an
efficient mechanism for parallel data storage and access. In this work, we
present a new parallel interface for writing and reading netCDF datasets. This
interface is derived with minimum changes from the serial netCDF interface but
defines semantics for parallel access and is tailored for high performance. The
underlying parallel I/O is achieved through MPI-IO, allowing for dramatic
performance gains through the use of collective I/O optimizations. We compare
the implementation strategies with HDF5 and analyze both. Our tests indicate
programming convenience and significant I/O performance improvement with this
parallel netCDF interface.Comment: 10 pages,7 figure
Exploring Scientific Application Performance Using Large Scale Object Storage
One of the major performance and scalability bottlenecks in large scientific
applications is parallel reading and writing to supercomputer I/O systems. The
usage of parallel file systems and consistency requirements of POSIX, that all
the traditional HPC parallel I/O interfaces adhere to, pose limitations to the
scalability of scientific applications. Object storage is a widely used storage
technology in cloud computing and is more frequently proposed for HPC workload
to address and improve the current scalability and performance of I/O in
scientific applications. While object storage is a promising technology, it is
still unclear how scientific applications will use object storage and what the
main performance benefits will be. This work addresses these questions, by
emulating an object storage used by a traditional scientific application and
evaluating potential performance benefits. We show that scientific applications
can benefit from the usage of object storage on large scales.Comment: Preprint submitted to WOPSSS workshop at ISC 201
Many-Task Computing and Blue Waters
This report discusses many-task computing (MTC) generically and in the
context of the proposed Blue Waters systems, which is planned to be the largest
NSF-funded supercomputer when it begins production use in 2012. The aim of this
report is to inform the BW project about MTC, including understanding aspects
of MTC applications that can be used to characterize the domain and
understanding the implications of these aspects to middleware and policies.
Many MTC applications do not neatly fit the stereotypes of high-performance
computing (HPC) or high-throughput computing (HTC) applications. Like HTC
applications, by definition MTC applications are structured as graphs of
discrete tasks, with explicit input and output dependencies forming the graph
edges. However, MTC applications have significant features that distinguish
them from typical HTC applications. In particular, different engineering
constraints for hardware and software must be met in order to support these
applications. HTC applications have traditionally run on platforms such as
grids and clusters, through either workflow systems or parallel programming
systems. MTC applications, in contrast, will often demand a short time to
solution, may be communication intensive or data intensive, and may comprise
very short tasks. Therefore, hardware and software for MTC must be engineered
to support the additional communication and I/O and must minimize task dispatch
overheads. The hardware of large-scale HPC systems, with its high degree of
parallelism and support for intensive communication, is well suited for MTC
applications. However, HPC systems often lack a dynamic resource-provisioning
feature, are not ideal for task communication via the file system, and have an
I/O system that is not optimized for MTC-style applications. Hence, additional
software support is likely to be required to gain full benefit from the HPC
hardware
Benefit of DDN's IME-FUSE for I/O intensive HPC applications
Many scientific applications are limited by I/O performance offered by parallel file systems on conventional storage systems. Flash- based burst buffers provide significant better performance than HDD backed storage, but at the expense of capacity. Burst buffers are consid- ered as the next step towards achieving wire-speed of interconnect and providing more predictable low latency I/O, which are the holy grail of storage. A critical evaluation of storage technology is mandatory as there is no long-term experience with performance behavior for particular applica- tions scenarios. The evaluation enables data centers choosing the right products and system architects the integration in HPC architectures. This paper investigates the native performance of DDN-IME, a flash- based burst buffer solution. Then, it takes a closer look at the IME-FUSE file systems, which uses IMEs as burst buffer and a Lustre file system as back-end. Finally, by utilizing a NetCDF benchmark, it estimates the performance benefit for climate applications
Assessing the Utility of a Personal Desktop Cluster
The computer workstation, introduced by Sun Microsystems in 1982, was the tool of
choice for scientists and engineers as an interactive computing environment for the development
of scientific codes. However, by the mid-1990s, the performance of workstations
began to lag behind high-end commodity PCs. This, coupled with the disappearance of
BSD-based operating systems in workstations and the emergence of Linux as an opensource
operating system for PCs, arguably led to the demise of the workstation as we
knew it.
Around the same time, computational scientists started to leverage PCs running
Linux to create a commodity-based (Beowulf) cluster that provided dedicated compute
cycles, i.e., supercomputing for the rest of us, as a cost-effective alternative to large
supercomputers, i.e., supercomputing for the few. However, as the cluster movement
has matured, with respect to cluster hardware and open-source software, these clusters
have become much more like their large-scale supercomputing brethren — a shared
datacenter resource that resides in a machine room.
Consequently, the above observations, when coupled with the ever-increasing performance
gap between the PC and cluster supercomputer, provide the motivation for a
personal desktop cluster workstation — a turnkey solution that provides an interactive and parallel computing environment with the approximate form factor of a Sun SPARCstation
1 “pizza box” workstation. In this paper, we present the hardware and software
architecture of such a solution as well as its prowess as a developmental platform for parallel codes. In short, imagine a 12-node personal desktop cluster that achieves 14 Gflops on Linpack but sips only 150-180 watts of power, resulting in a performance-power ratio that is over 300% better than our test SMP platform
Modeling and Implementation of an Asynchronous Approach to Integrating HPC and Big Data Analysis
With the emergence of exascale computing and big data analytics, many important scientific applications require the integration of computationally intensive modeling and simulation with data-intensive analysis to accelerate scientific discovery. In this paper, we create an analytical model to steer the optimization of the end-to-end time-to-solution for the integrated computation and data analysis. We also design and develop an intelligent data broker to efficiently intertwine the computation stage and the analysis stage to practically achieve the optimal time-to-solution predicted by the analytical model. We perform experiments on both synthetic applications and real-world computational fluid dynamics (CFD) applications. The experiments show that the analytic model exhibits an average relative error of less than 10%, and the application performance can be improved by up to 131% for the synthetic programs and by up to 78% for the real-world CFD application
- …