12 research outputs found

    Architecture-Aware Algorithms for Scalable Performance and Resilience on Heterogeneous Architectures

    Get PDF
    The goal of the Extreme-scale Algorithms & Software Institute (EASI) is to close the �application-architecture performance gap� by exploring algorithms and runtime improvements that will enable key science applications to better exploit the architectural features of DOE extreme-scale systems. For the past year of the project, our efforts at the University of Tennessee have concentrated on, and made significant progress related to, the following high-level EASI goals: � Develop multi-precision and architecture-aware implementations of Krylov, Poisson, Helmholtz solvers, and dense factorizations for heterogeneous multi-core systems; � Explore new methods of algorithm resilience, and develop new algorithms with these capabilities; � Develop runtime support for adaptable algorithms that are dealing with resilience, scalability; � Distribute the new algorithms and runtime support through widely used software packages; � Establish a strong outreach program to disseminate results, interact with colleagues and train students and junior members of our community

    A class of communication-avoiding algorithms for solving general dense linear systems on CPU/GPU parallel machines

    No full text
    We study several solvers for the solution of general linear systems where the main objective is to reduce the communication overhead due to pivoting. We first describe two existing algorithms for the LU factorization on hybrid CPU/GPU architectures. The first one is based on partial pivoting and the second uses a random preconditioning of the original matrix to avoid pivoting. Then we introduce a solver where the panel factorization is performed using a communication-avoiding pivoting heuristic while the update of the trailing submatrix is performed by the GPU. We provide performance comparisons for these solvers on current hybrid multicore-GPU parallel machines

    Dynamically Balanced Synchronization-Avoiding LU Factorization with Multicore and GPUs

    Full text link

    Collective Mind: Towards Practical and Collaborative Auto-Tuning

    Get PDF
    corecore