94 research outputs found

    Sparse approximate inverse preconditioners on high performance GPU platforms

    Get PDF
    Simulation with models based on partial differential equations often requires the solution of (sequences of) large and sparse algebraic linear systems. In multidimensional domains, preconditioned Krylov iterative solvers are often appropriate for these duties. Therefore, the search for efficient preconditioners for Krylov subspace methods is a crucial theme. Recent developments, especially in computing hardware, have renewed the interest in approximate inverse preconditioners in factorized form, because their application during the solution process can be more efficient. We present here some experiences focused on the approximate inverse preconditioners proposed by Benzi and TĹŻma from 1996 and the sparsification and inversion proposed by van Duin in 1999. Computational costs, reorderings and implementation issues are considered both on conventional and innovative computing architectures like Graphics Programming Units (GPUs)

    Sparse matrix-vector multiplication on GPGPUs

    Get PDF
    The multiplication of a sparse matrix by a dense vector (SpMV) is a centerpiece of scientific computing applications: it is the essential kernel for the solution of sparse linear systems and sparse eigenvalue problems by iterative methods. The efficient implementation of the sparse matrix-vector multiplication is therefore crucial and has been the subject of an immense amount of research, with interest renewed with every major new trend in high performance computing architectures. The introduction of General Purpose Graphics Processing Units (GPGPUs) is no exception, and many articles have been devoted to this problem. With this paper we provide a review of the techniques for implementing the SpMV kernel on GPGPUs that have appeared in the literature of the last few years. We discuss the issues and trade-offs that have been encountered by the various researchers, and a list of solutions, organized in categories according to common features. We also provide a performance comparison across different GPGPU models and on a set of test matrices coming from various application domains

    Coarray-based Load Balancing on Heterogeneous and Many-Core Architectures

    Get PDF
    In order to reach challenging performance goals, computer architecture is expected to change significantly in the near future. Heterogeneous chips, equipped with different types of cores and memory, will force application developers to deal with irregular communication patterns, high levels of parallelism, and unexpected behavior. Load balancing among the heterogeneous compute units will be a critical task in order to achieve an effective usage of the computational power provided by such new architectures. In this highly dynamic scenario, Partitioned Global Address Space (PGAS) languages, like Coarray Fortran, appear a promising alternative to standard MPI programming that uses two-sided communications, in particular because of PGAS one-sided semantic and ease of programmability. In this paper, we show how Coarray Fortran can be used for implementing dynamic load balancing algorithms on an exascale compute node and how these algorithms can produce performance benefits for an Asian option pricing problem, running in symmetric mode on Intel Xeon Phi Knights Corner and Knights Landing architectures

    Spin Signatures of Photogenerated Radical Anions in Polymer-[70]Fullerene Bulk Heterojunctions: High Frequency Pulsed EPR Spectroscopy

    Full text link
    Charged polarons in thin films of polymer-fullerene composites are investigated by light-induced electron paramagnetic resonance (EPR) at 9.5 GHz (X-band) and 130 GHz (D-band). The materials studied were poly(3-hexylthiophene) (PHT), [6,6]-phenyl-C61-butyric acid methyl ester (C60-PCBM), and two different soluble C70-derivates: C70-PCBM and diphenylmethano[70]fullerene oligoether (C70-DPM-OE). The first experimental identification of the negative polaron localized on the C70-cage in polymer-fullerene bulk heterojunctions has been obtained. When recorded at conventional X-band EPR, this signal is overlapping with the signal of the positive polaron, which does not allow for its direct experimental identification. Owing to the superior spectral resolution of the high frequency D-band EPR, we were able to separate light-induced signals from P+ and P- in PHT-C70 bulk heterojunctions. Comparing signals from C70-derivatives with different side-chains, we have obtained experimental proof that the polaron is localized on the cage of the C70 molecule

    BootCMatch: A software package for bootstrap AMG based on graph weighted matching

    Get PDF
    This article has two main objectives: one is to describe some extensions of an adaptive Algebraic Multigrid (AMG) method of the form previously proposed by the first and third authors, and a second one is to present a new software framework, named BootCMatch, which implements all the components needed to build and apply the described adaptive AMG both as a stand-alone solver and as a preconditioner in a Krylov method. The adaptive AMG presented is meant to handle general symmetric and positive definite (SPD) sparse linear systems, without assuming any a priori information of the problem and its origin; the goal of adaptivity is to achieve a method with a prescribed convergence rate. The presented method exploits a general coarsening process based on aggregation of unknowns, obtained by a maximum weight matching in the adjacency graph of the system matrix. More specifically, a maximum product matching is employed to define an effective smoother subspace (complementary to the coarse space), a process referred to as compatible relaxation, at every level of the recursive two-level hierarchical AMG process. Results on a large variety of test cases and comparisons with related work demonstrate the reliability and efficiency of the method and of the software

    A framework for unit testing with coarray Fortran

    Get PDF
    Parallelism is a ubiquitous feature of modern computing architectures; indeed, we might even say that serial code is now automatically legacy code. Writing parallel code poses significant challenges to programs, and is often error-prone. Partitioned Global Address Space (PGAS) languages, such as Coarray Fortran (CAF), represent a promising development direction in the quest for a trade-off between simplicity and performance. CAF is a parallel programming model that allows a smooth migration from serial to parallel code. However, despite CAF simplicity, refactoring serial code and migrating it to parallel versions is still error-prone, especially in complex softwares. The combination of unit testing, which drastically reduces defect injection, and CAF is therefore a very appealing prospect; however, it requires appropriate tools to realize its potential. In this paper, we present the first CAF-compatible framework for unit tests, developed as an extension to the Parallel Fortran Unit Test framework (pFUnit)

    Extracting UML Class Diagrams from Object-Oriented Fortran: ForUML

    Get PDF
    Many scientists who implement computational science and engineering software have adopted the object-oriented (OO) Fortran paradigm. One of the challenges faced by OO Fortran developers is the inability to obtain high level software design descriptions of existing applications. Knowledge of the overall software design is not only valuable in the absence of documentation, it can also serve to assist developers with accomplishing different tasks during the software development process, especially maintenance and refactoring. The software engineering community commonly uses reverse engineering techniques to deal with this challenge. A number of reverse engineering-based tools have been proposed, but few of them can be applied to OO Fortran applications. In this paper, we propose a software tool to extract unified modeling language (UML) class diagrams from Fortran code. The UML class diagram facilitates the developers' ability to examine the entities and their relationships in the software system. The extracted diagrams enhance software maintenance and evolution. The experiments carried out to evaluate the proposed tool show its accuracy and a few of the limitations

    Stereodivergent-at-Metal Synthesis of [60]Fullerene Hybrids

    Get PDF
    Chiral fullerene–metal hybrids with complete control over the four stereogenic centers, including the absolute configuration of the metal atom, have been synthesized for the first time. The stereochemistry of the four chiral centers formed during [60]fullerene functionalization is the result of both the chiral catalysts employed and the diastereoselective addition of the metal complexes used (iridium, rhodium, or ruthenium).DFT calculations underpin the observed configurational stability at the metal center, which does not undergo an epimerization process
    • …
    corecore