14 research outputs found

    A parallel Schur method for solving continuous-time algebraic Riccati equations

    Get PDF
    Numerical algorithms for solving the continuous-time algebraic Riccati matrix equation on a distributed memory parallel computer are considered. In particular, it is shown that the Schur method, based on computing the stable invariant subspace of a Hamiltonian matrix, can be parallelized in an efficient and scalable way. Our implementation employs the state-of-the-art library ScaLAPACK as well as recently developed parallel methods for reordering the eigenvalues in a real Schur form. Some experimental results are presented, confirming the scalability of our implementation and comparing it with an existing implementation of the matrix sign iteration from the PLiCOC library

    Parallel eigenvalue reordering in real Schur forms

    Get PDF
    A parallel algorithm for reordering the eigenvalues in the real Schur form of a matrix is presented and discussed. Our novel approach adopts computational windows and delays multiple outside-window updates until each window has been completely reordered locally. By using multiple concurrent windows the parallel algorithm has a high level of concurrency, and most work is level 3 BLAS operations. The presented algorithm is also extended to the generalized real Schur form. Experimental results for ScaLAPACK-style Fortran 77 implementations on a Linux cluster confirm the efficiency and scalability of our algorithms in terms of more than 16 times of parallel speedup using 64 processors for large-scale problems. Even on a single processor our implementation is demonstrated to perform significantly better compared with the state-of-the-art serial implementation. Copyright (C) 2009 John Wiley & Sons, Ltd

    Parallel computation of 3-D soil-structure interaction in time domain with a coupled FEM/SBFEM approach

    Full text link
    The final publication is available at Springer via http://dx.doi.org/10.1007/s10915-011-9551-xThis paper introduces a parallel algorithm for the scaled boundary finite element method (SBFEM). The application code is designed to run on clusters of computers, and it enables the analysis of large-scale soil-structure-interaction problems, where an unbounded domain has to fulfill the radiation condition for wave propagation to infinity. The main focus of the paper is on the mathematical description and numerical implementation of the SBFEM. In particular, we describe in detail the algorithm to compute the acceleration unit impulse response matrices used in the SBFEM as well as the solvers for the Riccati and Lyapunov equations. Finally, two test cases validate the new code, illustrating the numerical accuracy of the results and the parallel performances. © Springer Science+Business Media, LLC 2011.Jose E. Roman and Enrique S. Quintana-Orti were partially supported by the Spanish Ministerio de Ciencia e Innovacion under grants TIN2009-07519, and TIN2008-06570-C04-01, respectively.Schauer, M.; Román Moltó, JE.; Quintana Orti, ES.; Langer, S. (2012). Parallel computation of 3-D soil-structure interaction in time domain with a coupled FEM/SBFEM approach. Journal of Scientific Computing. 52(2):446-467. doi:10.1007/s10915-011-9551-xS446467522Anderson, E., Bai, Z., Bischof, C., Demmel, J., Dongarra, J., Croz, J.D., Greenbaum, A., Hammarling, S., McKenney, A., Sorensen, D.: LAPACK User’s Guide. Society for Industrial and Applied Mathematics, Philadelphia (1992)Antes, H., Spyrakos, C.: Soil-structure interaction. In: Beskos, D., Anagnotopoulos, S. (eds.) Computer Analysis and Design of Earthquake Resistant Structures, p. 271. Computational Mechanics Publications, Southampton (1997)Appelö, D., Colonius, T.: A high-order super-grid-scale absorbing layer and its application to linear hyperbolic systems. J. Comput. Phys. 228(11), 4200–4217 (2009)Astley, R.J.: Infinite elements for wave problems: a review of current formulations and a assessment of accuracy. Int. J. Numer. Methods Eng. 49(7), 951–976 (2000)Balay, S., Buschelman, K., Eijkhout, V., Gropp, W.D., Kaushik, D., Knepley, M., McInnes, L.C., Smith, B.F., Zhang, H.: PETSc users manual. Tech. Rep. ANL-95/11 - Revision 3.1, Argonne National Laboratory (2010)Benner, P.: Contributions to the numerical solution of algebraic Riccati equations and related eigenvalue problems. Dissertation, Fak. f. Mathematik, TU Chemnitz–Zwickau, Chemnitz, FRG (1997)Benner, P.: Numerical solution of special algebraic Riccati equations via an exact line search method. In: Proc. European Control Conf. ECC 97, Paper 786, BELWARE Information Technology, Waterloo (B) (1997)Benner, P., Quintana-Ortí, E.: Solving stable generalized Lyapunov equations with the matrix sign function. Numer. Algorithms 20(1), 75–100 (1999)Benner, P., Byers, R., Quintana-Ortí, E., Quintana-Ortí, G.: Solving algebraic Riccati equations on parallel computers using Newton’s method with exact line search. Parallel Comput. 26(10), 1345–1368 (2000)Benner, P., Quintana-Ortí, E.S., Quintana-Ortí, G.: Solving linear-quadratic optimal control problems on parallel computers. Optim. Methods Softw. 23(6), 879–909 (2008)Bettess, P.: Infinite Elements. Penshaw Press, Sunderland (1992)Blackford, L.S., Choi, J., Cleary, A., D’Azevedo, E., Demmel, J., Dhillon, I., Dongarra, J., Hammarling, S., Henry, G., Petitet, A., Stanley, K., Walker, D., Whaley, R.C.: ScaLAPACK Users’ Guide. Society for Industrial and Applied Mathematics, Philadelphia (1997)Borsutzky, R.: Braunschweiger Schriften zur Mechanik - Seismic Risk Analysis of Buried Lifelines, vol. 63. Mechanik-Zentrum Technische Universität. Braunschweig (2008)Dongarra, J.J., Whaley, R.C.: LAPACK working note 94: A user’s guide to the BLACS v1.1. Tech. Rep. UT-CS-95-281, Department of Computer Science, University of Tennessee (1995)Engquist, B., Majda, A.: Absorbing boundary conditions for the numerical simulation of waves. Math. Comput. 31(139), 629–651 (1977)Granat, R., Kågström, B.: Algorithm 904: The SCASY library – parallel solvers for Sylvester-type matrix equations with applications in condition estimation, part II. ACM Trans. Math. Softw. 37(3), 33:1–33:4 (2010)Guerrero, D., Hernández, V., Román, J.E.: Parallel SLICOT model reduction routines: The Cholesky factor of Grammians. In: Proceedings of the 15th Triennal IFAC World Congress, Barcelona, Spain (2002)Harr, M.E.: Foundations of Theoretical Soil Mechanics. McGraw-Hill, New York (1966)Hilbert, H., Hughes, T., Taylor, R.: Improved numerical dissipation for time integration algorithms in structural dynamics. Earthquake Eng. Struct. Dyn. 5, 283 (1977)Kleinman, D.: On an iterative technique for Riccati equation computations. IEEE Trans. Autom. Control AC-13, 114–115 (1968)Lehmann, L.: Wave Propagation in Infinite Domains. Springer, Berlin (2006)Lehmann, L., Langer, S., Clasen, D.: Scaled boundary finite element method for acoustics. J. Comput. Acoust. 14(4), 489–506 (2006)Liao, Z.P., Wong, H.L.: A transmitting boundary for the numerical simulation of elastic wave propagation. Soil Dyn. Earthq. Eng. 3(4), 174–183 (1984)Lysmer, J., Kuhlmeyer, R.L.: Finite dynamic model for infinite media. J. Eng. Mech. 95, 859–875 (1969)Meskouris, K., Hinzen, K.G., Butenweg, C., Mistler, M.: Bauwerke und Erdbeben - Grundlagen - Anwendung - Beispiele. Vieweg Teubner, Wiesbaden (2007)MPI Forum: The message passing interface (MPI) standard (1994). http://www.mcs.anl.gov/mpiNewmark, N.: A method of computation for structural dynamics. J. Eng. Mech. Div. 85, 67 (1959)Petersen, C.: Dynamik der Baukonstruktionen. Vieweg/Sohn Verlagsgesellschaft, Braunschweig (2000)Roberts, J.: Linear model reduction and solution of the algebraic Riccati equation by use of the sign function. Int. J. Control 32, 677–687 (1980)Schauer, M., Lehmann, L.: Large scale simulation with scaled boundary finite element method. Proc. Appl. Math. Mech. 9, 103–106 (2009)Wolf, J.: The Scaled Boundary Finite Element Method. Wiley, Chichester (2003)Wolf, J., Song, C.: Finite-Element Modelling of Unbounded Media. Wiley, Chichester (1996

    A Novel Parallel QR Algorithm For Hybrid Distributed Memory HPC Systems

    Get PDF
    A novel variant of the parallel QR algorithm for solving dense nonsymmetric eigenvalue problems on hybrid distributed high performance computing systems is presented. For this purpose, we introduce the concept of multiwindow bulge chain chasing and parallelize aggressive early deflation. The multiwindow approach ensures that most computations when chasing chains of bulges are performed in level 3 BLAS operations, while the aim of aggressive early deflation is to speed up the convergence of the QR algorithm. Mixed MPI-OpenMP coding techniques are utilized for porting the codes to distributed memory platforms with multithreaded nodes, such as multicore processors. Numerous numerical experiments confirm the superior performance of our parallel QR algorithm in comparison with the existing ScaLAPACK code, leading to an implementation that is one to two orders of magnitude faster for sufficiently large problems, including a number of examples from applications

    TR-2012001: Algebraic Algorithms

    Full text link

    Dense and sparse parallel linear algebra algorithms on graphics processing units

    Full text link
    Una línea de desarrollo seguida en el campo de la supercomputación es el uso de procesadores de propósito específico para acelerar determinados tipos de cálculo. En esta tesis estudiamos el uso de tarjetas gráficas como aceleradores de la computación y lo aplicamos al ámbito del álgebra lineal. En particular trabajamos con la biblioteca SLEPc para resolver problemas de cálculo de autovalores en matrices de gran dimensión, y para aplicar funciones de matrices en los cálculos de aplicaciones científicas. SLEPc es una biblioteca paralela que se basa en el estándar MPI y está desarrollada con la premisa de ser escalable, esto es, de permitir resolver problemas más grandes al aumentar las unidades de procesado. El problema lineal de autovalores, Ax = lambda x en su forma estándar, lo abordamos con el uso de técnicas iterativas, en concreto con métodos de Krylov, con los que calculamos una pequeña porción del espectro de autovalores. Este tipo de algoritmos se basa en generar un subespacio de tamaño reducido (m) en el que proyectar el problema de gran dimensión (n), siendo m << n. Una vez se ha proyectado el problema, se resuelve este mediante métodos directos, que nos proporcionan aproximaciones a los autovalores del problema inicial que queríamos resolver. Las operaciones que se utilizan en la expansión del subespacio varían en función de si los autovalores deseados están en el exterior o en el interior del espectro. En caso de buscar autovalores en el exterior del espectro, la expansión se hace mediante multiplicaciones matriz-vector. Esta operación la realizamos en la GPU, bien mediante el uso de bibliotecas o mediante la creación de funciones que aprovechan la estructura de la matriz. En caso de autovalores en el interior del espectro, la expansión requiere resolver sistemas de ecuaciones lineales. En esta tesis implementamos varios algoritmos para la resolución de sistemas de ecuaciones lineales para el caso específico de matrices con estructura tridiagonal a bloques, que se ejecutan en GPU. En el cálculo de las funciones de matrices hemos de diferenciar entre la aplicación directa de una función sobre una matriz, f(A), y la aplicación de la acción de una función de matriz sobre un vector, f(A)b. El primer caso implica un cálculo denso que limita el tamaño del problema. El segundo permite trabajar con matrices dispersas grandes, y para resolverlo también hacemos uso de métodos de Krylov. La expansión del subespacio se hace mediante multiplicaciones matriz-vector, y hacemos uso de GPUs de la misma forma que al resolver autovalores. En este caso el problema proyectado comienza siendo de tamaño m, pero se incrementa en m en cada reinicio del método. La resolución del problema proyectado se hace aplicando una función de matriz de forma directa. Nosotros hemos implementado varios algoritmos para calcular las funciones de matrices raíz cuadrada y exponencial, en las que el uso de GPUs permite acelerar el cálculo.One line of development followed in the field of supercomputing is the use of specific purpose processors to speed up certain types of computations. In this thesis we study the use of graphics processing units as computer accelerators and apply it to the field of linear algebra. In particular, we work with the SLEPc library to solve large scale eigenvalue problems, and to apply matrix functions in scientific applications. SLEPc is a parallel library based on the MPI standard and is developed with the premise of being scalable, i.e. to allow solving larger problems by increasing the processing units. We address the linear eigenvalue problem, Ax = lambda x in its standard form, using iterative techniques, in particular with Krylov's methods, with which we calculate a small portion of the eigenvalue spectrum. This type of algorithms is based on generating a subspace of reduced size (m) in which to project the large dimension problem (n), being m << n. Once the problem has been projected, it is solved by direct methods, which provide us with approximations of the eigenvalues of the initial problem we wanted to solve. The operations used in the expansion of the subspace vary depending on whether the desired eigenvalues are from the exterior or from the interior of the spectrum. In the case of searching for exterior eigenvalues, the expansion is done by matrix-vector multiplications. We do this on the GPU, either by using libraries or by creating functions that take advantage of the structure of the matrix. In the case of eigenvalues from the interior of the spectrum, the expansion requires solving linear systems of equations. In this thesis we implemented several algorithms to solve linear systems of equations for the specific case of matrices with a block-tridiagonal structure, that are run on GPU. In the computation of matrix functions we have to distinguish between the direct application of a matrix function, f(A), and the action of a matrix function on a vector, f(A)b. The first case involves a dense computation that limits the size of the problem. The second allows us to work with large sparse matrices, and to solve it we also make use of Krylov's methods. The expansion of subspace is done by matrix-vector multiplication, and we use GPUs in the same way as when solving eigenvalues. In this case the projected problem starts being of size m, but it is increased by m on each restart of the method. The solution of the projected problem is done by directly applying a matrix function. We have implemented several algorithms to compute the square root and the exponential matrix functions, in which the use of GPUs allows us to speed up the computation.Una línia de desenvolupament seguida en el camp de la supercomputació és l'ús de processadors de propòsit específic per a accelerar determinats tipus de càlcul. En aquesta tesi estudiem l'ús de targetes gràfiques com a acceleradors de la computació i ho apliquem a l'àmbit de l'àlgebra lineal. En particular treballem amb la biblioteca SLEPc per a resoldre problemes de càlcul d'autovalors en matrius de gran dimensió, i per a aplicar funcions de matrius en els càlculs d'aplicacions científiques. SLEPc és una biblioteca paral·lela que es basa en l'estàndard MPI i està desenvolupada amb la premissa de ser escalable, açò és, de permetre resoldre problemes més grans en augmentar les unitats de processament. El problema lineal d'autovalors, Ax = lambda x en la seua forma estàndard, ho abordem amb l'ús de tècniques iteratives, en concret amb mètodes de Krylov, amb els quals calculem una xicoteta porció de l'espectre d'autovalors. Aquest tipus d'algorismes es basa a generar un subespai de grandària reduïda (m) en el qual projectar el problema de gran dimensió (n), sent m << n. Una vegada s'ha projectat el problema, es resol aquest mitjançant mètodes directes, que ens proporcionen aproximacions als autovalors del problema inicial que volíem resoldre. Les operacions que s'utilitzen en l'expansió del subespai varien en funció de si els autovalors desitjats estan en l'exterior o a l'interior de l'espectre. En cas de cercar autovalors en l'exterior de l'espectre, l'expansió es fa mitjançant multiplicacions matriu-vector. Aquesta operació la realitzem en la GPU, bé mitjançant l'ús de biblioteques o mitjançant la creació de funcions que aprofiten l'estructura de la matriu. En cas d'autovalors a l'interior de l'espectre, l'expansió requereix resoldre sistemes d'equacions lineals. En aquesta tesi implementem diversos algorismes per a la resolució de sistemes d'equacions lineals per al cas específic de matrius amb estructura tridiagonal a blocs, que s'executen en GPU. En el càlcul de les funcions de matrius hem de diferenciar entre l'aplicació directa d'una funció sobre una matriu, f(A), i l'aplicació de l'acció d'una funció de matriu sobre un vector, f(A)b. El primer cas implica un càlcul dens que limita la grandària del problema. El segon permet treballar amb matrius disperses grans, i per a resoldre-ho també fem ús de mètodes de Krylov. L'expansió del subespai es fa mitjançant multiplicacions matriu-vector, i fem ús de GPUs de la mateixa forma que en resoldre autovalors. En aquest cas el problema projectat comença sent de grandària m, però s'incrementa en m en cada reinici del mètode. La resolució del problema projectat es fa aplicant una funció de matriu de forma directa. Nosaltres hem implementat diversos algorismes per a calcular les funcions de matrius arrel quadrada i exponencial, en les quals l'ús de GPUs permet accelerar el càlcul.Lamas Daviña, A. (2018). Dense and sparse parallel linear algebra algorithms on graphics processing units [Tesis doctoral no publicada]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/112425TESI
    corecore