16 research outputs found

    A RAID reconfiguration scheme for gracefully degraded operations

    Get PDF
    One distinct advantage of Redundant Array of Independent Disks (RAID) is fault tolerance. But the performance of a disk array in degraded mode is so poor that no one uses the RAID after failure. Continuous operation of RAID in degraded mode is very important in many real time applications, which can not be interrupted in providing continuous services. In this paper, we propose an efficient architectural reconfiguration scheme to enhance the performance of RAID-5 in degraded mode, called reconfigurable RAID-5. It reconfigures RAID-5 to RPTD-0 in degraded mode. Using this scheme, the calculation of the failure data and the generation of parity in writing the new data to the failed disk can be reduced. It also alleviates the small write problem for RAID-5 in degraded mode. We use the phase parallel model to analyze the total execution time of the RAID-5 and of the reconfigurable RAID-5. Through theoretical analysis and benchmark test, we find the performance of the reconfigurable RAID-5 can be 200 times better than conventional RAID-5.published_or_final_versio

    A fast input/output library for high-resolution climate models

    Get PDF
    We describe the design and implementation of climate fast input/output (CFIO), a fast input/output (I/O) library for high-resolution climate models. CFIO provides a simple method for modelers to overlap the I/O phase with the computing phase automatically, so as to shorten the running time of numerical simulations. To minimize the code modifications required for porting, CFIO provides similar interfaces and features to parallel Network Common Data Form (PnetCDF), which is one of the most widely used I/O libraries in climate models. We deployed CFIO in three high-resolution climate models, including two ocean models (POP and LICOM) and one sea ice model (CICE). The experimental results show that CFIO improves the performance of climate models significantly versus the original serial I/O approach. When running with CFIO at 0.1° resolution with about 1000 CPU cores, we managed to reduce the running time by factors of 7.9, 4.6 and 2.0 for POP, CICE, and LICOM, respectively. We also compared the performance of CFIO against two existing libraries, PnetCDF and parallel I/O (PIO), in different scenarios. For scenarios with both data output and computations, CFIO decreases the I/O overhead compared to PnetCDF and PIO

    Automatic parallelization tools and their applications

    Get PDF
    In parallelizing huge legacy codes such as NCAR/Penn State MM5, a proper software environment is critical for reducing the time and effort. This thesis presents an empirical study of automatic parallelization based on the NCAR/Penn State MM5 model, the Pacific Northwest National Laboratory (PNNL) version of MM5 and a FDM benchmark program. ParAgent, a tool for automatic parallelization, Vis5-D a visualization tool, a web-based monitor, and Rabbit, a performance analysis tool were used in this study. In addition, a high-level communication library was developed to complement the use of ParAgent. Performance is one of the most important aspects of parallelism. We tested different types of networks and PC clusters to see how the communication between processors affects performance. Also, we put some efforts in analyzing and reducing load imbalance

    Impacto de la entrada/salida en los computadores paralelos

    Get PDF
    El aumento de las unidades de procesamiento en los cl usters, los avances en velocidad y potencia de las unidades de procesamiento y la creciente complejidad de las aplicaciones cient cas demandan mayores exigencias a los sistemas de Entrada/Salida de los computadores paralelos. En este trabajo se propone una metodolog a para el an alisis de E/S en los cl usters de computadores, que permita analizar c omo afectan las diferentes con guraciones a la aplicaci on y usarla para seleccionar la mejor con guraci on del sistema de E/S. La metodolog a contempla la caracterizaci on del sistema de E/S a distintos niveles: dispositivo, sistema y aplicaci on; con guraci on de diferentes elementos que tienen impacto en las prestaciones y evaluaci on teniendo en cuenta tanto la aplicaci on como la arquitectura de E/S.Presentado en el X Workshop Procesamiento Distribuido y Paralelo (WPDP)Red de Universidades con Carreras en Informática (RedUNCI

    Load balance and Parallel I/O: Optimising COSA for large simulations

    Get PDF
    This paper presents the optimisation of the parallel functionalities of the Navier-Stokes Computational Fluid Dynamics research code COSA, a finite volume structured multi-block code featuring a steady solver, a general purpose time-domain solver, and a frequency-domain harmonic balance solver for the rapid solution of unsteady periodic flows. The optimisation focuses on improving the scalability of the parallel input/output functionalities of the code and developing an effective and user-friendly load balancing approach. Both features are paramount for using COSA efficiently for large-scale production simulations using tens of thousands of computational cores. The efficiency enhancements resulting from optimising the parallel I/O functionality and addressing load balance issues has provided up to a 4x performance improvement for unbalanced simulations, and 2x performance improvements for balanced simulations

    Optimized next-generation sequencing genotype-haplotype calling for genome variability analysis

    Get PDF
    Altres ajuts: CERCA Programme/Generalitat de CatalunyaThe accurate estimation of nucleotide variability using next-generation sequencing data is challenged by the high number of sequencing errors produced by new sequencing technologies, especially for nonmodel species, where reference sequences may not be available and the read depth may be low due to limited budgets. The most popular single-nucleotide polymorphism (SNP) callers are designed to obtain a high SNP recovery and low false discovery rate but are not designed to account appropriately the frequency of the variants. Instead, algorithms designed to account for the frequency of SNPs give precise results for estimating the levels and the patterns of variability. These algorithms are focused on the unbiased estimation of the variability and not on the high recovery of SNPs. Here, we implemented a fast and optimized parallel algorithm that includes the method developed by Roesti et al and Lynch, which estimates the genotype of each individual at each site, considering the possibility to call both bases from the genotype, a single one or none. This algorithm does not consider the reference and therefore is independent of biases related to the reference nucleotide specified. The pipeline starts from a BAM file converted to pileup or mpileup format and the software outputs a FASTA file. The new program not only reduces the running times but also, given the improved use of resources, it allows its usage with smaller computers and large parallel computers, expanding its benefits to a wider range of researchers. The output file can be analyzed using software for population genetics analysis, such as the R library PopGenome, the software VariScan, and the program mstatspop for analysis considering positions with missing data

    A Persistent Storage Model for Extreme Computing

    Get PDF
    The continuing technological progress resulted in a dramatic growth in aggregate computational performance of the largest supercomputing systems. Unfortunately, these advances did not translate to the required extent into accompanying I/O systems and little more in terms of architecture or effective access latency. New classes of algorithms developed for massively parallel applications, that gracefully handle the challenges of asynchrony, heavily multi-threaded distributed codes, and message-driven computation, must be matched by similar advances in I/O methods and algorithms to produce a well performing and balanced supercomputing system. This dissertation proposes PXFS, a storage model for persistent objects inspired by the ParalleX model of execution that addresses many of these challenges. The PXFS model is designed to be asynchronous in nature to comply with ParalleX model and proposes an active TupleSpace concept to hold all kinds of metadata/meta-object for either storage objects or runtime objects. The new active TupleSpace can also register ParalleX actions to be triggered under certain tuple operations. An first implementation of PXFS utilizing a well-known Orange parallel file system as its back-end via asynchronous I/O layer and the implementation of TupleSpace component in HPX, the implementation of ParalleX. These details are also described along with the preliminary performance data. A house-made micro benchmark is developed to measure the disk I/O throughput of the PXFS asynchronous interface. The results show perfect scalability and 3x to 20x times speedup of I/O throughput performance comparing to OrangeFS synchronous user interface. Use cases of TupleSpace components are discussed for real-world applications including micro check-pointing. By utilizing TupleSpace in HPX applications for I/O, global barrier can be replaced with fine-grained parallelism to overlap more computation with communication and greatly boost the performance and efficiency. Also the dissertation showcases the distributed directory service in Orange file system which process directory entries in parallel and effectively improves the directory metada operations

    Using an Adaptive HPC Runtime System to Reconfigure the Cache Hierarchy

    Full text link
    The cache hierarchy often consumes a large portion of a processor’s energy. To save energy in HPC environments, this paper proposes software-controlled reconfiguration of the cache hierarchy with an adaptive runtime system. Our approach addresses the two major limitations associated with other methods that reconfigure the caches: predicting the application’s future and finding the best cache hierarchy configuration. Our approach uses formal language theory to express the application’s pattern and help predict its future. Furthermore, it uses the prevalent Single Program Multiple Data (SPMD) model of HPC codes to find the best configuration in parallel quickly. Our experiments using cycle-level simulations indicate that 67 % of the cache energy can be saved with only a 2.4 % performance penalty on average. Moreover, we demonstrate that, for some applica-tions, switching to a software-controlled reconfigurable streaming buffer configuration can improve performance by up to 30 % and save 75 % of the cache energy. I
    corecore