4,205 research outputs found

    High-performance computing: the essential tool and the essential challenge

    Full text link
    [EN] Prolog to the Journal of Supercomputing, volume 73, issue 1.We would also like to acknowledge to the “Ministerio de Educación y Ciencia” of Spain, for its support to the Spanish CAPAP-H5 network (HPC in Heterogeneous Systems, TIN2014-53522-REDT), and to the “Ministerio de Economía y Competitividad” from Spain/FEDER for supporting Grants TEC2015-67387-C4-1-R and TEC2015-67387-C4-3-R.Alonso-Jordá, P.; Ranilla, J.; Vigo-Aguiar, J. (2017). High-performance computing: the essential tool and the essential challenge. The Journal of Supercomputing. 73(1):1-3. https://doi.org/10.1007/s11227-016-1922-5S1373

    Factorized solution of generalized stable Sylvester equations using many-core GPU accelerators

    Full text link
    [EN] We investigate the factorized solution of generalized stable Sylvester equations such as those arising in model reduction, image restoration, and observer design. Our algorithms, based on the matrix sign function, take advantage of the current trend to integrate high performance graphics accelerators (also known as GPUs) in computer systems. As a result, our realisations provide a valuable tool to solve large-scale problems on a variety of platforms.We acknowledge support of the ANII - MPG Independent Research Group: "Efficient Hetergenous Computing" at UdelaR, a partner group of the Max Planck Institute in Magdeburg.Benner, P.; Dufrechou, E.; Ezzatti, P.; Gallardo, R.; Quintana-Ortí, ES. (2021). Factorized solution of generalized stable Sylvester equations using many-core GPU accelerators. The Journal of Supercomputing (Online). 77(9):10152-19164. https://doi.org/10.1007/s11227-021-03658-y101521916477

    Hierarchical Dynamic Loop Self-Scheduling on Distributed-Memory Systems Using an MPI+MPI Approach

    Full text link
    Computationally-intensive loops are the primary source of parallelism in scientific applications. Such loops are often irregular and a balanced execution of their loop iterations is critical for achieving high performance. However, several factors may lead to an imbalanced load execution, such as problem characteristics, algorithmic, and systemic variations. Dynamic loop self-scheduling (DLS) techniques are devised to mitigate these factors, and consequently, improve application performance. On distributed-memory systems, DLS techniques can be implemented using a hierarchical master-worker execution model and are, therefore, called hierarchical DLS techniques. These techniques self-schedule loop iterations at two levels of hardware parallelism: across and within compute nodes. Hybrid programming approaches that combine the message passing interface (MPI) with open multi-processing (OpenMP) dominate the implementation of hierarchical DLS techniques. The MPI-3 standard includes the feature of sharing memory regions among MPI processes. This feature introduced the MPI+MPI approach that simplifies the implementation of parallel scientific applications. The present work designs and implements hierarchical DLS techniques by exploiting the MPI+MPI approach. Four well-known DLS techniques are considered in the evaluation proposed herein. The results indicate certain performance advantages of the proposed approach compared to the hybrid MPI+OpenMP approach

    Dynamic Loop Scheduling Using MPI Passive-Target Remote Memory Access

    Get PDF
    Scientific applications often contain large computationally-intensive parallel loops. Loop scheduling techniques aim to achieve load balanced executions of such applications. For distributed-memory systems, existing dynamic loop scheduling (DLS) libraries are typically MPI-based, and employ a master-worker execution model to assign variably-sized chunks of loop iterations. The master-worker execution model may adversely impact performance due to the master-level contention. This work proposes a distributed chunk-calculation approach that does not require the master-worker execution scheme. Moreover, it considers the novel features in the latest MPI standards, such as passive-target remote memory access, shared-memory window creation, and atomic read-modify-write operations. To evaluate the proposed approach, five well-known DLS techniques, two applications, and two heterogeneous hardware setups have been considered. The DLS techniques implemented using the proposed approach outperformed their counterparts implemented using the traditional master-worker execution model
    corecore