37 research outputs found

    Parallel Krylov Solvers for the Polynomial Eigenvalue Problem in SLEPc

    Full text link
    Polynomial eigenvalue problems are often found in scientific computing applications. When the coefficient matrices of the polynomial are large and sparse, usually only a few eigenpairs are required and projection methods are the best choice. We focus on Krylov methods that operate on the companion linearization of the polynomial but exploit the block structure with the aim of being memory-efficient in the representation of the Krylov subspace basis. The problem may appear in the form of a low-degree polynomial (quartic or quintic, say) expressed in the monomial basis, or a high-degree polynomial (coming from interpolation of a nonlinear eigenproblem) expressed in a nonmonomial basis. We have implemented a parallel solver in SLEPc covering both cases that is able to compute exterior as well as interior eigenvalues via spectral transformation. We discuss important issues such as scaling and restart and illustrate the robustness and performance of the solver with some numerical experiments.The first author was supported by the Spanish Ministry of Education, Culture and Sport through an FPU grant with reference AP2012-0608.Campos, C.; Román Moltó, JE. (2016). Parallel Krylov Solvers for the Polynomial Eigenvalue Problem in SLEPc. SIAM Journal on Scientific Computing. 38(5):385-411. https://doi.org/10.1137/15M1022458S38541138

    Strategies for spectrum slicing based on restarted Lanczos methods

    Full text link
    In the context of symmetric-definite generalized eigenvalue problems, it is often required to compute all eigenvalues contained in a prescribed interval. For large-scale problems, the method of choice is the so-called spectrum slicing technique: a shift-and-invert Lanczos method combined with a dynamic shift selection that sweeps the interval in a smart way. This kind of strategies were proposed initially in the context of unrestarted Lanczos methods, back in the 1990's. We propose variations that try to incorporate recent developments in the field of Krylov methods, including thick restarting in the Lanczos solver and a rational Krylov update when moving from one shift to the next. We discuss a parallel implementation in the SLEPc library and provide performance results. © 2012 Springer Science+Business Media, LLC.This work was supported by the Spanish Ministerio de Ciencia e Innovacion under grant TIN2009-07519.Campos González, MC.; Román Moltó, JE. (2012). Strategies for spectrum slicing based on restarted Lanczos methods. Numerical Algorithms. 60(2):279-295. https://doi.org/10.1007/s11075-012-9564-z279295602Amestoy, P.R, Duff, I.S., L’Excellent, J.Y.: Multifrontal parallel distributed symmetric and unsymmetric solvers. Comput. Methods Appl. Mech. Eng. 184(2–4), 501–520 (2000)Balay, S., Brown, J., Buschelman, K., Eijkhout, V., Gropp, W., Kaushik, D., Knepley, M., McInnes, L.C., Smith, B., Zhang, H.: PETSc users manual. Tech. Rep. ANL-95/11 - Revision 3.2, Argonne National Laboratory (2011)Ericsson, T., Ruhe, A.: The spectral transformation Lanczos method for the numerical solution of large sparse generalized symmetric eigenvalue problems. Math. Comput. 35(152), 1251–1268 (1980)Grimes, R.G., Lewis, J.G., Simon, H.D.: A shifted block Lanczos algorithm for solving sparse symmetric generalized eigenproblems. SIAM J. Matrix Anal. Appl. 15(1), 228–272 (1994)Hernandez, V., Roman, J.E., Vidal, V.: SLEPc: a scalable and flexible toolkit for the solution of eigenvalue problems. ACM Trans. Math. Softw. 31(3), 351–362 (2005)Hernandez, V., Roman, J.E., Tomas, A.: Parallel Arnoldi eigensolvers with enhanced scalability via global communications rearrangement. Parallel Comput. 33(7–8), 521–540 (2007)Marques, O.A.: BLZPACK: description and user’s guide. Tech. Rep. TR/PA/95/30, CERFACS, Toulouse, France (1995)Meerbergen, K.: Changing poles in the rational Lanczos method for the Hermitian eigenvalue problem. Numer. Linear Algebra Appl. 8(1), 33–52 (2001)Meerbergen, K., Scott, J.: The design of a block rational Lanczos code with partial reorthogonalization and implicit restarting. Tech. Rep. RAL-TR-2000-011, Rutherford Appleton Laboratory (2000)Nour-Omid, B., Parlett, B.N., Ericsson, T., Jensen, P.S.: How to implement the spectral transformation. Math. Comput. 48(178), 663–673 (1987)Olsson, K.H.A., Ruhe, A.: Rational Krylov for eigenvalue computation and model order reduction. BIT Numer. Math. 46, 99–111 (2006)Ruhe, A.: Rational Krylov sequence methods for eigenvalue computation. Linear Algebra Appl. 58, 391–405 (1984)Ruhe, A.: Rational Krylov subspace method. In: Bai, Z., Demmel, J., Dongarra, J., Ruhe, A., van der Vorst, H. (eds.) Templates for the Solution of Algebraic Eigenvalue Problems: A Practical Guide, Society for Industrial and Applied Mathematics, pp. 246–249. Philadelphia (2000)Sorensen, D.C.: Implicit application of polynomial filters in a k-step Arnoldi method. SIAM J. Matrix Anal. Appl. 13, 357–385 (1992)Stewart, G.W.: A Krylov–Schur algorithm for large eigenproblems. SIAM J. Matrix Anal. Appl. 23(3), 601–614 (2001)Vidal, AM., Garcia, V.M., Alonso, P., Bernabeu, M.O.: Parallel computation of the eigenvalues of symmetric Toeplitz matrices through iterative methods. J. Parallel Distrib. Comput. 68(8), 1113–1121 (2008)Wu, K., Simon, H.: Thick-restart Lanczos method for large symmetric eigenvalue problems. SIAM J. Matrix Anal. Appl. 22(2), 602–616 (2000)Zhang, H., Smith, B., Sternberg, M., Zapol, P.: SIPs: Shift-and-invert parallel spectral transformations. ACM Trans. Math. Softw. 33(2), 1–19 (2007

    Optimized analysis of isotropic high-nuclearity spin clusters with GPU acceleration

    Full text link
    This is the author’s version of a work that was accepted for publication in Computer Physics Communications. Changes resulting from the publishing process, such as peer review, editing, corrections, structural formatting, and other quality control mechanisms may not be reflected in this document. Changes may have been made to this work since it was submitted for publication. A definitive version was subsequently published in Computer Physics Communications, vol. 209, (2016). DOI 10.1016/j.cpc.2016.08.014.The numerical simulation of molecular clusters formed by a finite number of exchange-coupled paramagnetic centers is very relevant for many applications, modeling systems between molecules and extended solids. In the context of realistic scenarios, many centers need to be considered, and thus the required computational effort grows very fast. In a previous work (Ramos et al., 2010), a set of parallel programs were presented with standard message-passing parallelization (MPI) for both anisotropic and isotropic systems. In this work, we have further developed the code for isotropic models. On one hand, the computational cost has been significantly reduced by avoiding some of the matrix diagonalizations, corresponding to blocks with negligible contribution for the particular configuration. On the other hand, we have extended the parallelization in order to exploit available graphics processing units (GPUs). The new MPI-GPU paradigm reduces the computational time by at least one additional order of magnitude and enables the resolution of larger problems. © 2016 Elsevier B.V. All rights reserved.This work was partially supported by the Spanish Ministry of Economy and Competitiveness under grant TIN2013-41049-P. Alejandro Lamas Davina was supported by the Spanish Ministry of Education, Culture and Sports through a grant with reference FPU13-06655.Lamas Daviña, A.; Ramos Peinado, E.; Román Moltó, JE. (2016). Optimized analysis of isotropic high-nuclearity spin clusters with GPU acceleration. Computer Physics Communications. 209:70-78. https://doi.org/10.1016/j.cpc.2016.08.014S707820

    Design and implementation of Java bindings in Open MPI

    Full text link
    This paper describes the Java MPI bindings that have been included in the Open MPI distribution. Open MPI is one of the most popular implementations of MPI, the Message-Passing Interface, which is the predominant programming paradigm for parallel applications on distributed memory computers. We have added Java support to Open MPI, exposing MPI functionality to Java programmers. Our approach is based on the Java Native Interface, and has similarities with previous efforts, as well as important differences. This paper serves as a reference for the application program interface, and in addition we provide details of the internal implementation to justify some of the design decisions. We also show some results to assess the performance of the bindings. (C) 2016 Elsevier B.V. All rights reserved.We are indebted to Siegmar Grog for his exhaustive testing of the Java bindings. We also thank Ralph Castain for helping in the integration of the Java bindings in the Open MPI infrastructure. The NPB-MPJ benchmarks used in Section 5 were kindly provided by Guillermo Lopez Taboada. The first two authors were supported by the Spanish Ministry of Economy and Competitiveness under project number TIN2013-41049-P.Vega Gisbert, O.; Román Moltó, JE.; Squyres, JM. (2016). Design and implementation of Java bindings in Open MPI. Parallel Computing. 59:1-20. https://doi.org/10.1016/j.parco.2016.08.004S1205

    A Parallel Structured Divide-and-Conquer Algorithm for Symmetric Tridiagonal Eigenvalue Problems

    Full text link
    © 2021 IEEE. Personal use of this material is permitted. Permissíon from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertisíng or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.[EN] In this article, a parallel structured divide-and-conquer (PSDC) eigensolver is proposed for symmetric tridiagonal matrices based on ScaLAPACK and a parallel structured matrix multiplication algorithm, called PSMMA. Computing the eigenvectors via matrix-matrix multiplications is the most computationally expensive part of the divide-and-conquer algorithm, and one of the matrices involved in such multiplications is a rank-structured Cauchy-like matrix. By exploiting this particular property, PSMMA constructs the local matrices by using generators of Cauchy-like matrices without any communication, and further reduces the computation costs by using a structured low-rank approximation algorithm. Thus, both the communication and computation costs are reduced. Experimental results show that both PSMMA and PSDC are highly scalable and scale to 4096 processes at least. PSDC has better scalability than PHDC that was proposed in [16] and only scaled to 300 processes for the same matrices. Comparing with PDSTEDC in ScaLAPACK, PSDC is always faster and achieves 1.4x-1.6x speedup for some matrices with few deflations. PSDC is also comparable with ELPA, with PSDC being faster than ELPA when using few processes and a little slower when using many processes.The authors would like to thank the referees for their valuable comments which greatly improve the presentation of this article. This work was supported by National Natural Science Foundation of China (No. NNW2019ZT6-B20, NNW2019ZT6B21, NNW2019ZT5-A10, U1611261, 61872392, and U1811461), National Key RD Program of China (2018YFB0204303), NSF of Hunan (No. 2019JJ40339), NSF of NUDT (No. ZK18-03-01), Guangdong Natural Science Foundation (2018B030312002), and the Program for Guangdong Introducing Innovative and Entrepreneurial Teams under Grant 2016ZT06D211. The work of Jose E. Roman was supported by the Spanish Agencia Estatal de Investigacion (AEI) under project SLEPc-DA (PID2019-107379RB-I00).Liao, X.; Li, S.; Lu, Y.; Román Moltó, JE. (2021). A Parallel Structured Divide-and-Conquer Algorithm for Symmetric Tridiagonal Eigenvalue Problems. IEEE Transactions on Parallel and Distributed Systems. 32(2):367-378. https://doi.org/10.1109/TPDS.2020.3019471S36737832

    Control rod drop transient analysis with the coupled parallel code pCTF-PARCSv2.7

    Full text link
    This is the author’s version of a work that was accepted for publication in Annals of Nuclear Energy. Changes resulting from the publishing process, such as peer review, editing, corrections, structural formatting, and other quality control mechanisms may not be reflected in this document. Changes may have been made to this work since it was submitted for publication. A definitive version was subsequently published in Annals of Nuclear Energy, vol. 87, (2016) DOI 10.1016/j.anucene.2015.09.016.In order to reduce the response time when simulating large reactors in detail, a parallel version of the thermal–hydraulic subchannel code COBRA-TF (CTF) has been developed using the standard Message Passing Interface (MPI). The parallelization is oriented to reactor cells, so it is best suited for models consisting of many cells. The generation of the Jacobian matrix is parallelized, in such a way that each processor is in charge of generating the data associated with a subset of cells. Also, the solution of the linear system of equations is done in parallel, using the PETSc toolkit. With the goal of creating a powerful tool to simulate the reactor core behavior during asymmetrical transients, the 3D neutron diffusion code PARCSv2.7 (PARCS) has been coupled with the parallel version of CTF (pCTF) using the Parallel Virtual Machine (PVM) technology. In order to validate the correctness of the parallel coupled code, a control rod drop transient has been simulated comparing the results with the real experimental measures acquired during an NPP real test. 2015 Elsevier Ltd. All rights reserved.This work has been partially supported by the Universitat Politecnica de Valencia under Projects COBRA_PAR (PAID-05-11-2810) and OpenNUC (PAID-05-12), and by the Spanish Ministerio de Economa y Competitividad under Projects SLEPc-PFE (TIN2013-41049-P) and NUC-MULTPHYS (ENE2012-34585). The authors would like to acknowledge the technical support provided by CNAT and IBERDROLA GENERACION S.A. for the realization of this work.Ramos Peinado, E.; Román Moltó, JE.; Abarca Giménez, A.; Miró Herrero, R.; Bermejo, JA. (2016). Control rod drop transient analysis with the coupled parallel code pCTF-PARCSv2.7. Annals of Nuclear Energy. 87(2):308-317. https://doi.org/10.1016/j.anucene.2015.09.016S30831787

    On low-frequency variability of the midlatitude ocean gyres

    Full text link
    This paper studies the large-scale low-frequency variability of the wind-driven midlatitude ocean gyres and their western boundary currents, such as the Gulf Stream or Kuroshio, simulated with the eddy-resolving quasi-geostrophic model. We applied empirical orthogonal functions analysis to turbulent flow solutions and statistically extracted robust and significant large-scale decadal variability modes concentrated around the eastward jet extension of the western boundary currents. In order to interpret these statistical modes dynamically, we linearized the governing quasi-geostrophic equations around the time-mean circulation and solved for the corresponding full set of linear eigenmodes with their eigenfrequencies. We then projected the extracted decadal variability on the eigenmodes and found that this variability is a multimodal coherent pattern phenomenon rather than a single mode or a combination of several modes as in the flow regimes preceding developed turbulence.The first two authors are thankful to the Natural Environment Research Council for the support of this work through the grant NE/J006602/1 and the use of ARCHER (the UK National Supercomputing Service). We express our gratitude to S. Burbidge and M. Harvey for their help with Imperial College London cluster, as well as to A. Thomas for his help with managing and maintaining the data storage. The last two authors were supported by the Spanish Ministry of Economy and Competitiveness under grant TIN2013-41049-P, and this support is gratefully acknowledged. We would also like to thank unknown referees for valuable comments and suggestions, which helped us to improve the paper.Shevchenko, IV.; Berloff, PS.; Guerrero López, D.; Román Moltó, JE. (2016). On low-frequency variability of the midlatitude ocean gyres. Journal of Fluid Mechanics. 795:423-442. https://doi.org/10.1017/jfm.2016.208S42344279
    corecore