104 research outputs found

    Direct-pNFS: Scalable, transparent, and versatile access to parallel file systems

    Full text link
    Grid computations require global access to massive data stores. To meet this need, the GridNFS project aims to provide scalable, high-performance, transparent, and secure wide-area data management as well as a scalable and agile name space. While parallel file systems give high I/O throughput, they are highly specialized, have limited operating system and hardware platform support, and often lack strong security mechanisms. Remote data access tools such as NFS and GridFTP overcome some of these limitations, but fail to provide universal, transparent, and scalable remote data access. As part of GridNFS, this paper introduces Direct-pNFS, which builds on the NFSv4.1 protocol to meet a key challenge in accessing remote parallel file systems: high-performance and scalable data access without sacrificing transparency, security, orportability. Experiments with Direct-pNFS demonstrate I/O throughput that equals or out performs the exported parallel file system across a range of workloads.http://deepblue.lib.umich.edu/bitstream/2027.42/107917/1/citi-tr-07-2.pd

    I/O performance evaluation with Parabench — programmable I/O benchmark

    Get PDF
    AbstractChoosing an appropriate cluster file system for a specific high performance computing application is challenging and depends mainly on the specific application I/O needs. There is a wide variety of I/O requirements: Some implementations require reading and writing large datasets, others out-of-core data access, or they have database access requirements. Application access patterns reflect different I/O behavior and can be used for performance testing.This paper presents the programmable I/O benchmarking tool Parabench. It has access patterns as input, which can be adapted to mimic behavior for a rich set of applications. Using this benchmarking tool, composed patterns can be automatically tested and easily compared on different local and cluster file systems. Here we introduce the design of the proposed benchmark, focusing on the Parabench programming language, which was developed for flexible pattern creation. We also demonstrate here an exemplary usage of Parabench and its capabilities to handle the POSIX and MPI-IO interfaces

    Measurement of PVFS2 performance on InfiniBand

    Get PDF
    InfiniBand is becoming increasingly popular as a fast interconnect technology between servers and storage. It has far better price/performance ratio compared to both Gigabit Ethernet and 10 Gigabit Ethernet, and hence is being increasingly used for high-performance computing applications. PVFS2, the second generation Parallel Virtual File System (PVFS), is a distributed file system for parallel data access that is being increasingly used in clustered applications. As previous studies have shown that in general, PVFS2 over InfiniBand offers enhanced I/O rates compared to PVFS2 over TCP and Gigabit Ethernet. Apart from the hardware technology, the application programming interface into the file system also makes a difference. To get better parallel performance, the choice of a file system interface is important. Our study is to benchmark and compare the performance of PVFS2 running over InfiniBand using different file system interfaces. IOR is a popular I/O microbenchmarking tool that supports the POSIX and MPI I/O file system interfaces. In addition to testing these already supported interfaces, we have written a PVFS2 module extension for IOR to support native PVFS2 interfaces into the PVFS2 file system. As we shall see in this study, using native PVFS2 interface offers significant performance benefit compared to other file system interfaces on the PVFS2 file system. Our benchmarking effort also involves studying the effect of a multi-client environment on the I/O performance of different file system interfaces. Based on the benchmarking results we obtain, we determine the most efficient application programming interface for parallel I/O on PVFS2 in a typical multi-client parallel application scenario

    Hopes and facts in evaluating the performance of HPC-I/O on a cloud environment

    Get PDF
    Currently, there is an increasing interest about the cloud platform by the High Performance Computing (HPC) community, and the Parallel I/O for High Performance Systems is not an exception. In cloud platforms, the user takes into account not only the execution time but also the cost, because the cost can be one of the most important issue. In this paper, we propose a methodology to quickly evaluate the performance and cost of Virtual Clusters for parallel scientific application that uses parallel I/O. From the parallel application I/O model automatically extracted with our tool PAS2P-IO, we obtain the I/O requirements and then the user can select the Virtual Cluster that meets the application requirements. The application I/O model does not depend on the underlying I/O system. One of the main benefits of applying our methodology is that it is not necessary to execute the application to select the Virtual Cluster on cloud. Finally, costs and performance-cost ratio for the Virtual Clusters are provided to facilitate the decision making on the selection of resources on a cloud platform.Facultad de Informátic

    Hopes and facts in evaluating the performance of HPC-I/O on a cloud environment

    Get PDF
    Currently, there is an increasing interest about the cloud platform by the High Performance Computing (HPC) community, and the Parallel I/O for High Performance Systems is not an exception. In cloud platforms, the user takes into account not only the execution time but also the cost, because the cost can be one of the most important issue. In this paper, we propose a methodology to quickly evaluate the performance and cost of Virtual Clusters for parallel scientific application that uses parallel I/O. From the parallel application I/O model automatically extracted with our tool PAS2P-IO, we obtain the I/O requirements and then the user can select the Virtual Cluster that meets the application requirements. The application I/O model does not depend on the underlying I/O system. One of the main benefits of applying our methodology is that it is not necessary to execute the application to select the Virtual Cluster on cloud. Finally, costs and performance-cost ratio for the Virtual Clusters are provided to facilitate the decision making on the selection of resources on a cloud platform.Facultad de Informátic

    HEC: Collaborative Research: SAM^2 Toolkit: Scalable and Adaptive Metadata Management for High-End Computing

    Get PDF
    The increasing demand for Exa-byte-scale storage capacity by high end computing applications requires a higher level of scalability and dependability than that provided by current file and storage systems. The proposal deals with file systems research for metadata management of scalable cluster-based parallel and distributed file storage systems in the HEC environment. It aims to develop a scalable and adaptive metadata management (SAM2) toolkit to extend features of and fully leverage the peak performance promised by state-of-the-art cluster-based parallel and distributed file storage systems used by the high performance computing community. There is a large body of research on data movement and management scaling, however, the need to scale up the attributes of cluster-based file systems and I/O, that is, metadata, has been underestimated. An understanding of the characteristics of metadata traffic, and an application of proper load-balancing, caching, prefetching and grouping mechanisms to perform metadata management correspondingly, will lead to a high scalability. It is anticipated that by appropriately plugging the scalable and adaptive metadata management components into the state-of-the-art cluster-based parallel and distributed file storage systems one could potentially increase the performance of applications and file systems, and help translate the promise and potential of high peak performance of such systems to real application performance improvements. The project involves the following components: 1. Develop multi-variable forecasting models to analyze and predict file metadata access patterns. 2. Develop scalable and adaptive file name mapping schemes using the duplicative Bloom filter array technique to enforce load balance and increase scalability 3. Develop decentralized, locality-aware metadata grouping schemes to facilitate the bulk metadata operations such as prefetching. 4. Develop an adaptive cache coherence protocol using a distributed shared object model for client-side and server-side metadata caching. 5. Prototype the SAM2 components into the state-of-the-art parallel virtual file system PVFS2 and a distributed storage data caching system, set up an experimental framework for a DOE CMS Tier 2 site at University of Nebraska-Lincoln and conduct benchmark, evaluation and validation studies

    A MIDDLE-WARE LEVEL CLIENT CACHE FOR A HIGH PERFORMANCE COMPUTING I/O SIMULATOR

    Get PDF
    This thesis describes the design and run time analysis of the system level middle-ware cache for Hecios. Hecios is a high performance cluster I/O simulator. With Hecios, we provide a simulation environment that accurately captures the performance characteristics of all the components in a clusterwide parallel file system. Hecios was specifically modeled after PVFS2. It was designed to be extensible and to easily allow for various component modules to be easily replaced by those that model other system types. Built around the OMNeT++ simulation package, Hecios\u27 inner-cluster communication module is easily adaptable to any TCP/IP based protocol and all standard network interface cards, switches, hubs, and routers. We will examine the system cache component and describe a methodology for implementing other coherence and replacement techniques within Hecios. Similar to other cache simulation tools, we allow the size of the system cache to be varied independently of the replacement policy and caching technique used

    Extending the POSIX I/O interface: a parallel file system perspective.

    Full text link
    • …
    corecore