7 research outputs found
Dense and sparse parallel linear algebra algorithms on graphics processing units
Una lÃnea de desarrollo seguida en el campo de la supercomputación es el uso de procesadores de propósito especÃfico para acelerar determinados tipos de cálculo. En esta tesis estudiamos el uso de tarjetas gráficas como aceleradores de la computación y lo aplicamos al ámbito del álgebra lineal. En particular trabajamos con la biblioteca SLEPc para resolver problemas de cálculo de autovalores en matrices de gran dimensión, y para aplicar funciones de matrices en los cálculos de aplicaciones cientÃficas. SLEPc es una biblioteca paralela que se basa en el estándar MPI y está desarrollada con la premisa de ser escalable, esto es, de permitir resolver problemas más grandes al aumentar las unidades de procesado.
El problema lineal de autovalores, Ax = lambda x en su forma estándar, lo abordamos con el uso de técnicas iterativas, en concreto con métodos de Krylov, con los que calculamos una pequeña porción del espectro de autovalores. Este tipo de algoritmos se basa en generar un subespacio de tamaño reducido (m) en el que proyectar el problema de gran dimensión (n), siendo m << n. Una vez se ha proyectado el problema, se resuelve este mediante métodos directos, que nos proporcionan aproximaciones a los autovalores del problema inicial que querÃamos resolver. Las operaciones que se utilizan en la expansión del subespacio varÃan en función de si los autovalores deseados están en el exterior o en el interior del espectro. En caso de buscar autovalores en el exterior del espectro, la expansión se hace mediante multiplicaciones matriz-vector. Esta operación la realizamos en la GPU, bien mediante el uso de bibliotecas o mediante la creación de funciones que aprovechan la estructura de la matriz. En caso de autovalores en el interior del espectro, la expansión requiere resolver sistemas de ecuaciones lineales. En esta tesis implementamos varios algoritmos para la resolución de sistemas de ecuaciones lineales para el caso especÃfico de matrices con estructura tridiagonal a bloques, que se ejecutan en GPU.
En el cálculo de las funciones de matrices hemos de diferenciar entre la aplicación directa de una función sobre una matriz, f(A), y la aplicación de la acción de una función de matriz sobre un vector, f(A)b. El primer caso implica un cálculo denso que limita el tamaño del problema. El segundo permite trabajar con matrices dispersas grandes, y para resolverlo también hacemos uso de métodos de Krylov. La expansión del subespacio se hace mediante multiplicaciones matriz-vector, y hacemos uso de GPUs de la misma forma que al resolver autovalores. En este caso el problema proyectado comienza siendo de tamaño m, pero se incrementa en m en cada reinicio del método. La resolución del problema proyectado se hace aplicando una función de matriz de forma directa. Nosotros hemos implementado varios algoritmos para calcular las funciones de matrices raÃz cuadrada y exponencial, en las que el uso de GPUs permite acelerar el cálculo.One line of development followed in the field of supercomputing is the use of specific purpose processors to speed up certain types of computations. In this thesis we study the use of graphics processing units as computer accelerators and apply it to the field of linear algebra. In particular, we work with the SLEPc library to solve large scale eigenvalue problems, and to apply matrix functions in scientific applications. SLEPc is a parallel library based on the MPI standard and is developed with the premise of being scalable, i.e. to allow solving larger problems by increasing the processing units.
We address the linear eigenvalue problem, Ax = lambda x in its standard form, using iterative techniques, in particular with Krylov's methods, with which we calculate a small portion of the eigenvalue spectrum. This type of algorithms is based on generating a subspace of reduced size (m) in which to project the large dimension problem (n), being m << n. Once the problem has been projected, it is solved by direct methods, which provide us with approximations of the eigenvalues of the initial problem we wanted to solve. The operations used in the expansion of the subspace vary depending on whether the desired eigenvalues are from the exterior or from the interior of the spectrum. In the case of searching for exterior eigenvalues, the expansion is done by matrix-vector multiplications. We do this on the GPU, either by using libraries or by creating functions that take advantage of the structure of the matrix. In the case of eigenvalues from the interior of the spectrum, the expansion requires solving linear systems of equations. In this thesis we implemented several algorithms to solve linear systems of equations for the specific case of matrices with a block-tridiagonal structure, that are run on GPU.
In the computation of matrix functions we have to distinguish between the direct application of a matrix function, f(A), and the action of a matrix function on a vector, f(A)b. The first case involves a dense computation that limits the size of the problem. The second allows us to work with large sparse matrices, and to solve it we also make use of Krylov's methods. The expansion of subspace is done by matrix-vector multiplication, and we use GPUs in the same way as when solving eigenvalues. In this case the projected problem starts being of size m, but it is increased by m on each restart of the method. The solution of the projected problem is done by directly applying a matrix function. We have implemented several algorithms to compute the square root and the exponential matrix functions, in which the use of GPUs allows us to speed up the computation.Una lÃnia de desenvolupament seguida en el camp de la supercomputació és l'ús de processadors de propòsit especÃfic per a accelerar determinats tipus de cà lcul. En aquesta tesi estudiem l'ús de targetes grà fiques com a acceleradors de la computació i ho apliquem a l'à mbit de l'à lgebra lineal. En particular treballem amb la biblioteca SLEPc per a resoldre problemes de cà lcul d'autovalors en matrius de gran dimensió, i per a aplicar funcions de matrius en els cà lculs d'aplicacions cientÃfiques. SLEPc és una biblioteca paral·lela que es basa en l'està ndard MPI i està desenvolupada amb la premissa de ser escalable, açò és, de permetre resoldre problemes més grans en augmentar les unitats de processament.
El problema lineal d'autovalors, Ax = lambda x en la seua forma està ndard, ho abordem amb l'ús de tècniques iteratives, en concret amb mètodes de Krylov, amb els quals calculem una xicoteta porció de l'espectre d'autovalors. Aquest tipus d'algorismes es basa a generar un subespai de grandà ria reduïda (m) en el qual projectar el problema de gran dimensió (n), sent m << n. Una vegada s'ha projectat el problema, es resol aquest mitjançant mètodes directes, que ens proporcionen aproximacions als autovalors del problema inicial que volÃem resoldre. Les operacions que s'utilitzen en l'expansió del subespai varien en funció de si els autovalors desitjats estan en l'exterior o a l'interior de l'espectre. En cas de cercar autovalors en l'exterior de l'espectre, l'expansió es fa mitjançant multiplicacions matriu-vector. Aquesta operació la realitzem en la GPU, bé mitjançant l'ús de biblioteques o mitjançant la creació de funcions que aprofiten l'estructura de la matriu. En cas d'autovalors a l'interior de l'espectre, l'expansió requereix resoldre sistemes d'equacions lineals. En aquesta tesi implementem diversos algorismes per a la resolució de sistemes d'equacions lineals per al cas especÃfic de matrius amb estructura tridiagonal a blocs, que s'executen en GPU.
En el cà lcul de les funcions de matrius hem de diferenciar entre l'aplicació directa d'una funció sobre una matriu, f(A), i l'aplicació de l'acció d'una funció de matriu sobre un vector, f(A)b. El primer cas implica un cà lcul dens que limita la grandà ria del problema. El segon permet treballar amb matrius disperses grans, i per a resoldre-ho també fem ús de mètodes de Krylov. L'expansió del subespai es fa mitjançant multiplicacions matriu-vector, i fem ús de GPUs de la mateixa forma que en resoldre autovalors. En aquest cas el problema projectat comença sent de grandà ria m, però s'incrementa en m en cada reinici del mètode. La resolució del problema projectat es fa aplicant una funció de matriu de forma directa. Nosaltres hem implementat diversos algorismes per a calcular les funcions de matrius arrel quadrada i exponencial, en les quals l'ús de GPUs permet accelerar el cà lcul.Lamas Daviña, A. (2018). Dense and sparse parallel linear algebra algorithms on graphics processing units [Tesis doctoral no publicada]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/112425TESI
Evaluación de prestaciones de aplicaciones paralelas en diferentes configuraciones de memoria en arquitectura NUMA
Lamas Daviña, A. (2010). Evaluación de prestaciones de aplicaciones paralelas en diferentes configuraciones de memoria en arquitectura NUMA. http://hdl.handle.net/10251/9108.Archivo delegad
MPI-CUDA parallel linear solvers for block-tridiagonal matrices in the context of SLEPc's eigensolvers
[EN] We consider the computation of a few eigenpairs of a generalized eigenvalue problem Ax = lambda Bx with block-tridiagonal matrices, not necessarily symmetric, in the context of Krylov methods. In this kind of computation, it is often necessary to solve a linear system of equations in each iteration of the eigensolver, for instance when B is not the identity matrix or when computing interior eigenvalues with the shift-and-invert spectral transformation. In this work, we aim to compare different direct linear solvers that can exploit the block-tridiagonal structure. Block cyclic reduction and the Spike algorithm are considered. A parallel implementation based on MPI is developed in the context of the SLEPc library. The use of GPU devices to accelerate local computations shows to be competitive for large block sizes.This work was supported by Agencia Estatal de Investigacion (AEI) under grant TIN2016-75985-P, which includes European Commission ERDF funds. Alejandro Lamas Davina was supported by the Spanish Ministry of Education, Culture and Sport through a grant with reference FPU13-06655.Lamas Daviña, A.; Roman, JE. (2018). MPI-CUDA parallel linear solvers for block-tridiagonal matrices in the context of SLEPc's eigensolvers. Parallel Computing. 74:118-135. https://doi.org/10.1016/j.parco.2017.11.006S1181357
Optimized analysis of isotropic high-nuclearity spin clusters with GPU acceleration
This is the author’s version of a work that was accepted for publication in Computer Physics Communications. Changes resulting from the publishing process, such as peer review, editing, corrections, structural formatting, and other quality control mechanisms may not be reflected in this document. Changes may have been made to this work since it was submitted for publication. A definitive version was subsequently published in Computer Physics Communications, vol. 209, (2016). DOI 10.1016/j.cpc.2016.08.014.The numerical simulation of molecular clusters formed by a finite number of exchange-coupled
paramagnetic centers is very relevant for many applications, modeling systems between molecules and
extended solids. In the context of realistic scenarios, many centers need to be considered, and thus the
required computational effort grows very fast. In a previous work (Ramos et al., 2010), a set of parallel
programs were presented with standard message-passing parallelization (MPI) for both anisotropic and
isotropic systems. In this work, we have further developed the code for isotropic models. On one hand,
the computational cost has been significantly reduced by avoiding some of the matrix diagonalizations,
corresponding to blocks with negligible contribution for the particular configuration. On the other hand,
we have extended the parallelization in order to exploit available graphics processing units (GPUs). The
new MPI-GPU paradigm reduces the computational time by at least one additional order of magnitude
and enables the resolution of larger problems.
© 2016 Elsevier B.V. All rights reserved.This work was partially supported by the Spanish Ministry of Economy and Competitiveness under grant TIN2013-41049-P. Alejandro Lamas Davina was supported by the Spanish Ministry of Education, Culture and Sports through a grant with reference FPU13-06655.Lamas Daviña, A.; Ramos Peinado, E.; Román Moltó, JE. (2016). Optimized analysis of isotropic high-nuclearity spin clusters with GPU acceleration. Computer Physics Communications. 209:70-78. https://doi.org/10.1016/j.cpc.2016.08.014S707820
Parallel Direct Solution of the Covariance-Localized Ensemble Square Root Kalman Filter Equations with Matrix Functions
[EN] Recently, the serial approach to solving the square root ensemble Kalman filter (ESRF) equations in the presence of covariance localization was found to depend on the order of observations. As shown previously, correctly updating the localized posterior covariance in serial requires additional effort and computational expense. A recent work by Steward et al. details an all-at-once direct method to solve the ESRF equations in parallel. This method uses the eigenvectors and eigenvalues of the forward observation covariance matrix to solve the difficult portion of the ESRF equations. The remaining assimilation is easily parallelized, and the analysis does not depend on the order of observations. While this allows for long localization lengths that would render local analysis methods inefficient, in theory, an eigenpair-based method scales as the cube number of observations, making it infeasible for large numbers of observations. In this work, we extend this method to use the theory of matrix functions to avoid eigenpair computations. The Arnoldi process is used to evaluate the covariance-localized ESRF equations on the reduced-order Krylov subspace basis. This method is shown to converge quickly and apparently regains a linear scaling with the number of observations. The method scales similarly to the widely used serial approach of Anderson and Collins in wall time but not in memory usage. To improve the memory usage issue, this method potentially can be used without an explicit matrix. In addition, hybrid ensemble and climatological covariances can be incorporated.This research was partially funded by the NOAA Hurricane Forecast Improvement Project Award NA14NWS4680022. This work was partially supported by Agencia Estatal de Investigacion (AEI) under Grant TIN2016-75985-P, which includes European Commission ERDF funds. Alejandro Lamas Davina was supported by the Spanish Ministry of Education, Culture and Sport through a grant with reference FPU13-06655. The fourth author's work was in part carried out under the auspices of CIMAS, a joint institute of the University of Miami and NOAA, Cooperative Agreement NA15OAR4320064. The authors acknowledge the NOAA Research and Development High Performance Computing Program for providing computing and storage resources that have contributed to the research results reported within this paper (http://rdhpcs.noaa.gov). We thank Jeff Anderson, Shu-Chih Yang, and three anonymous reviewers for their helpful comments and contributions. We also thank Hui Christophersen for providing technical assistance.Steward, JL.; Roman, JE.; Lamas Daviña, A.; Aksoy, A. (2018). Parallel Direct Solution of the Covariance-Localized Ensemble Square Root Kalman Filter Equations with Matrix Functions. Monthly Weather Review. 146(9):2819-2836. https://doi.org/10.1175/MWR-D-18-0022.1S28192836146
Diseño y desarrollo de una plataforma distribuida de lanzamiento y ejecución de trabajos no interactivos basada en arquitectura OGSA, haciendo uso de los estándares BES y JSDL, comunicación de los componentes mediante AMQP y transferencia de datos sobre CDMI
[EN] Design and development of a distributed execution platform following the current
standards, including its deploy and testing. The platform is oriented to heterogeneous
infrastructure environments, with a work schedule based on tasks bag that provides an
automatic distribution of the workload between the workers in function of their
performance. Its modular design allows it to take advantage of different computing
resources and also allows the increase of its work capacity by means of a complete or
partial deployment in external services.[ES] Diseño y desarrollo de una plataforma de ejecución distribuida de trabajos siguiendo los
estándares actuales, asà como su despliegue y prueba. La plataforma está orientada a
entornos con infraestructuras heterogéneas, con una planificación del trabajo basada en
una bolsa de tareas, que proporciona una distribución automática de la carga entre los
componentes de procesado en función de su rendimiento. Su diseño modular le permite
aprovechar diferentes recursos de cálculo e incrementar la capacidad de trabajo mediante
su despliegue total o parcial en servicios externosLamas Daviña, A. (2013). Diseño y desarrollo de una plataforma distribuida de lanzamiento y ejecución de trabajos no interactivos basada en arquitectura OGSA, haciendo uso de los estándares BES y JSDL, comunicación de los componentes mediante AMQP y transferencia de datos sobre CDMI. http://hdl.handle.net/10251/45099Archivo delegad
Improvements to SLEPc in releases 3.14-3.18
[EN] This short article describes the main newfeatures added to SLEPc, the Scalable Library for Eigenvalue Problem Computations, in the past two and a half years, corresponding to five release versions. The main novelty is the extension of the SVD module with new problem types, such as the generalized SVD or the hyperbolic SVD. Additionally, many improvements have been incorporated in different parts of the library, including contour integral eigensolvers, preconditioning, and GPU support.Jose E. Roman; Alvarruiz Bermejo, F.; Campos, C.; Dalcin, L.; Jolivet, P.; Lamas Daviña, A. (2023). Improvements to SLEPc in releases 3.14-3.18. ACM Transactions on Mathematical Software. 49(3). https://doi.org/10.1145/360337349