1,016 research outputs found

    An efficient GPU version of the preconditioned GMRES method

    Full text link
    [EN] In a large number of scientific applications, the solution of sparse linear systems is the stage that concentrates most of the computational effort. This situation has motivated the study and development of several iterative solvers, among which preconditioned Krylov subspace methods occupy a place of privilege. In a previous effort, we developed a GPU-aware version of the GMRES method included in ILUPACK, a package of solvers distinguished by its inverse-based multilevel ILU preconditioner. In this work, we study the performance of our previous proposal and integrate several enhancements in order to mitigate its principal bottlenecks. The numerical evaluation shows that our novel proposal can reach important run-time reductions.Aliaga, JI.; Dufrechou, E.; Ezzatti, P.; Quintana-Orti, ES. (2019). An efficient GPU version of the preconditioned GMRES method. The Journal of Supercomputing. 75(3):1455-1469. https://doi.org/10.1007/s11227-018-2658-1S14551469753Aliaga JI, Badia RM, Barreda M, Bollhöfer M, Dufrechou E, Ezzatti P, Quintana-Ortí ES (2016) Exploiting task and data parallelism in ILUPACK’s preconditioned CG solver on NUMA architectures and many-core accelerators. Parallel Comput 54:97–107Aliaga JI, Bollhöfer M, Dufrechou E, Ezzatti P, Quintana-Ortí ES (2016) A data-parallel ILUPACK for sparse general and symmetric indefinite linear systems. In: Lecture Notes in Computer Science, 14th Int. Workshop on Algorithms, Models and Tools for Parallel Computing on Heterogeneous Platforms—HeteroPar’16. SpringerAliaga JI, Bollhöfer M, Martín AF, Quintana-Ortí ES (2011) Exploiting thread-level parallelism in the iterative solution of sparse linear systems. Parallel Comput 37(3):183–202Aliaga JI, Bollhöfer M, Martín AF, Quintana-Ortí ES (2012) Parallelization of multilevel ILU preconditioners on distributed-memory multiprocessors. Appl Parallel Sci Comput LNCS 7133:162–172Aliaga JI, Dufrechou E, Ezzatti P, Quintana-Ortí ES (2018) Accelerating a preconditioned GMRES method in massively parallel processors. In: CMMSE 2018: Proceedings of the 18th International Conference on Mathematical Methods in Science and Engineering (2018)Bollhöfer M, Grote MJ, Schenk O (2009) Algebraic multilevel preconditioner for the Helmholtz equation in heterogeneous media. SIAM J Sci Comput 31(5):3781–3805Bollhöfer M, Saad Y (2006) Multilevel preconditioners constructed from inverse-based ILUs. SIAM J Sci Comput 27(5):1627–1650Dufrechou E, Ezzatti P (2018) A new GPU algorithm to compute a level set-based analysis for the parallel solution of sparse triangular systems. In: 2018 IEEE International Parallel and Distributed Processing Symposium, IPDPS 2018, Canada, 2018. IEEE Computer SocietyDufrechou E, Ezzatti P (2018) Solving sparse triangular linear systems in modern GPUs: a synchronization-free algorithm. In: 2018 26th Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP), pp 196–203. https://doi.org/10.1109/PDP2018.2018.00034Eijkhout V (1992) LAPACK working note 50: distributed sparse data structures for linear algebra operations. Tech. rep., Knoxville, TN, USAGolub GH, Van Loan CF (2013) Matrix computationsHe K, Tan SXD, Zhao H, Liu XX, Wang H, Shi G (2016) Parallel GMRES solver for fast analysis of large linear dynamic systems on GPU platforms. Integration 52:10–22 http://www.sciencedirect.com/science/article/pii/S016792601500084XLiu W, Li A, Hogg JD, Duff IS, Vinter B (2017) Fast synchronization-free algorithms for parallel sparse triangular solves with multiple right-hand sides. Concurr Comput 29(21)Saad Y (2003) Iterative methods for sparse linear systems, 2nd edn. SIAM, PhiladelphiaSchenk O, Wächter A, Weiser M (2008) Inertia revealing preconditioning for large-scale nonconvex constrained optimization. SIAM J Sci Comput 31(2):939–96

    Lanczos eigensolution method for high-performance computers

    Get PDF
    The theory, computational analysis, and applications are presented of a Lanczos algorithm on high performance computers. The computationally intensive steps of the algorithm are identified as: the matrix factorization, the forward/backward equation solution, and the matrix vector multiples. These computational steps are optimized to exploit the vector and parallel capabilities of high performance computers. The savings in computational time from applying optimization techniques such as: variable band and sparse data storage and access, loop unrolling, use of local memory, and compiler directives are presented. Two large scale structural analysis applications are described: the buckling of a composite blade stiffened panel with a cutout, and the vibration analysis of a high speed civil transport. The sequential computational time for the panel problem executed on a CONVEX computer of 181.6 seconds was decreased to 14.1 seconds with the optimized vector algorithm. The best computational time of 23 seconds for the transport problem with 17,000 degs of freedom was on the the Cray-YMP using an average of 3.63 processors

    Software Support for Irregular and Loosely Synchronous Problems

    Get PDF
    A large class of scientific and engineering applications may be classified as irregular and loosely synchronous from the perspective of parallel processing. We present a partial classification of such problems. This classification has motivated us to enhance Fortran D to provide language support for irregular, loosely synchronous problems. We present techniques for parallelization of such problems in the context of Fortran D

    Software Support for Irregular and Loosely Synchronous Problems

    Get PDF
    A large class of scientific and engineering applications may be classified as irregular and loosely synchronous from the perspective of parallel processing. We present a partial classification of such problems. This classification has motivated us to enhance Fortran D to provide language support for irregular, loosely synchronous problems. We present techniques for parallelization of such problems in the context of Fortran D

    CSM Testbed Development and Large-Scale Structural Applications

    Get PDF
    A research activity called Computational Structural Mechanics (CSM) conducted at the NASA Langley Research Center is described. This activity is developing advanced structural analysis and computational methods that exploit high-performance computers. Methods are developed in the framework of the CSM Testbed software system and applied to representative complex structural analysis problems from the aerospace industry. An overview of the CSM Testbed methods development environment is presented and some new numerical methods developed on a CRAY-2 are described. Selected application studies performed on the NAS CRAY-2 are also summarized

    KSPHPDDM and PCHPDDM: Extending PETSc with advanced Krylov methods and robust multilevel overlapping Schwarz preconditioners

    Full text link
    [EN] Contemporary applications in computational science and engineering often require the solution of linear systems which may be of different sizes, shapes, and structures. The goal of this paper is to explain how two libraries, PETSc and HPDDM, have been interfaced in order to offer end-users robust overlapping Schwarz preconditioners and advanced Krylov methods featuring recycling and the ability to deal with multiple right-hand sides. The flexibility of the implementation is showcased and explained with minimalist, easy-to-run, and reproducible examples, to ease the integration of these algorithms into more advanced frameworks. The examples provided cover applications from eigenanalysis, elasticity, combustion, and electromagnetism.Jose E. Roman was supported by the Spanish Agencia Estatal de Investigacion (AEI) under project SLEPc-DA (PID2019-107379RB-I00)Jolivet, P.; Roman, JE.; Zampini, S. (2021). KSPHPDDM and PCHPDDM: Extending PETSc with advanced Krylov methods and robust multilevel overlapping Schwarz preconditioners. Computers & Mathematics with Applications. 84:277-295. https://doi.org/10.1016/j.camwa.2021.01.0032772958
    • …
    corecore