24,561 research outputs found

    MPI-Vector-IO: Parallel I/O and Partitioning for Geospatial Vector Data

    Get PDF
    In recent times, geospatial datasets are growing in terms of size, complexity and heterogeneity. High performance systems are needed to analyze such data to produce actionable insights in an efficient manner. For polygonal a.k.a vector datasets, operations such as I/O, data partitioning, communication, and load balancing becomes challenging in a cluster environment. In this work, we present MPI-Vector-IO 1 , a parallel I/O library that we have designed using MPI-IO specifically for partitioning and reading irregular vector data formats such as Well Known Text. It makes MPI aware of spatial data, spatial primitives and provides support for spatial data types embedded within collective computation and communication using MPI message-passing library. These abstractions along with parallel I/O support are useful for parallel Geographic Information System (GIS) application development on HPC platforms

    Object-Based Caching for MPI-IO

    Get PDF
    As the size of the data sets manipulated by data-intensive scientific applications approaches the petabyte level and beyond, the need for scalable I/O techniques becomes increasingly important and difficult. Much of the research on this issue has been performed within the context ofMPI-IO: the de-facto standard parallel I/O interface for data-intensive applications. Its popularity stems from the fact that MPI-IO provides to applications a rich and flexile parallel I/O API coupled with highly efficient implementations of this API. This problem is being further addressed by the development of powerful parallel I/O subsystems, and state-of-the-art file systems that can efficiently access this infrastructure. However, even with such advances, I/O continues to be a significant bottleneck in application performance.The goal of this research is to provide high-performance I/O for data-intensive applications. A key insight is that a major obstacle in the way of this goal is the legacy view of a file as a linear sequence of bytes. This is because scientific applications rarely access data in a way that matches this file model, using instead what is more accurately described as an object model. In fact, it is the runtime translation between these two data models that is a major contributor to poor I/O performance. To address this issue, this research will develop a more powerful object-based file model for MPI applications, and an object-based caching system to serve as an interface between MPI applications and object-based files. Objects will be carefully defined to encapsulate information about an application\u27s I/O access patterns, and such information will be used to increase the parallelism of file accesses and decrease the cost of maintaining global cache coherence

    MPI-IO: A Parallel File I/O Interface for MPI Version 0.3

    Get PDF
    Thanks to MPI [9], writing portable message passing parallel programs is almost a reality. One of the remaining problems is file I/0. Although parallel file systems support similar interfaces, the lack of a standard makes developing a truly portable program impossible. Further, the closest thing to a standard, the UNIX file interface, is ill-suited to parallel computing. Working together, IBM Research and NASA Ames have drafted MPI-I0, a proposal to address the portable parallel I/0 problem. In a nutshell, this proposal is based on the idea that I/0 can be modeled as message passing: writing to a file is like sending a message, and reading from a file is like receiving a message. MPI-IO intends to leverage the relatively wide acceptance of the MPI interface in order to create a similar I/0 interface. The above approach can be materialized in different ways. The current proposal represents the result of extensive discussions (and arguments), but is by no means finished. Many changes can be expected as additional participants join the effort to define an interface for portable I/0. This document is organized as follows. The remainder of this section includes a discussion of some issues that have shaped the style of the interface. Section 2 presents an overview of MPI-IO as it is currently defined. It specifies what the interface currently supports and states what would need to be added to the current proposal to make the interface more complete and robust. The next seven sections contain the interface definition itself. Section 3 presents definitions and conventions. Section 4 contains functions for file control, most notably open. Section 5 includes functions for independent I/O, both blocking and nonblocking. Section 6 includes functions for collective I/O, both blocking and nonblocking. Section 7 presents functions to support system-maintained file pointers, and shared file pointers. Section 8 presents constructors that can be used to define useful filetypes (the role of filetypes is explained in Section 2 below). Section 9 presents how the error handling mechanism of MPI is supported by the MPI-IO interface. All this is followed by a set of appendices, which contain information about issues that have not been totally resolved yet, and about design considerations. The reader can find there the motivation behind some of our design choices. More information on this would definitely be welcome and will be included in a further release of this document. The first appendix contains a description of MPI-I0's 'hints' structure which is used when opening a file. Appendix B is a discussion of various issues in the support for file pointers. Appendix C explains what we mean in talking about atomic access. Appendix D provides detailed examples of filetype constructors, and Appendix E contains a collection of arguments for and against various design decisions

    Parallel netCDF: A Scientific High-Performance I/O Interface

    Full text link
    Dataset storage, exchange, and access play a critical role in scientific applications. For such purposes netCDF serves as a portable and efficient file format and programming interface, which is popular in numerous scientific application domains. However, the original interface does not provide an efficient mechanism for parallel data storage and access. In this work, we present a new parallel interface for writing and reading netCDF datasets. This interface is derived with minimum changes from the serial netCDF interface but defines semantics for parallel access and is tailored for high performance. The underlying parallel I/O is achieved through MPI-IO, allowing for dramatic performance gains through the use of collective I/O optimizations. We compare the implementation strategies with HDF5 and analyze both. Our tests indicate programming convenience and significant I/O performance improvement with this parallel netCDF interface.Comment: 10 pages,7 figure

    Desarrollo de una interfaz de MPI-IO para HDFS y su uso en aplicaciones Map-reduce en MPI

    Get PDF
    Con el objetivo de resolver problemas complejos existentes en ámbitos científicos surgió la computación de altas prestaciones, la cual está formada por un sistema distribuido que se comporta de cara al usuario como un único ordenador y al que es fácil incluir potencia de cómputo. Una de las principales interfaces de HPC es la interfaz de paso de mensajes (MPI) que permite a un conjunto de máquinas sincronizar el trabajo entre ellas de una forma coordinada y distribuida. MPI no solo permite la comunicación entre procesos, sino, que además, define una interfaz de acceso a sistemas de ficheros paralelos denominada “mpi-io”. Durante este trabajo se va a crear una nueva interfaz de mpi-io para el sistema de ficheros de HDFS. Este es un sistema de ficheros distribuido usado en el campo de BigData y desarrollado por la Apache Software foundation, el cual junto a su framework Hadoop permite el almacenaje de grandes conjuntos de datos y su posterior procesado por medio de una técnica denominada Map-Reduce. Map-Reduce, permite el procesado de datos, generando como resultado un conjunto de tuplas (,). De este modo, se pueden realizar aplicaciones de Map-Reduce para determinar el número de palabras de un texto, o determinar el número de veces que se accede a una determinada web. En este trabajo se presenta la nueva interfaz de mpi-io creada para el sistema de HDFS, y la optimización de una biblioteca Map-Reduce de MPI usando la interfaz desarrollada para Sistema de ficheros distribuido de Hadoop (HDFS) y mejorar su rendimiento añadiendo políticas de localidad.Ingeniería Informátic

    Extending Message Passing Interface Windows to Storage

    Full text link
    This work presents an extension to MPI supporting the one-sided communication model and window allocations in storage. Our design transparently integrates with the current MPI implementations, enabling applications to target MPI windows in storage, memory or both simultaneously, without major modifications. Initial performance results demonstrate that the presented MPI window extension could potentially be helpful for a wide-range of use-cases and with low-overhead
    corecore