436 research outputs found

    Alternating-Direction Line-Relaxation Methods on Multicomputers

    Get PDF
    We study the multicom.puter performance of a three-dimensional Navier–Stokes solver based on alternating-direction line-relaxation methods. We compare several multicomputer implementations, each of which combines a particular line-relaxation method and a particular distributed block-tridiagonal solver. In our experiments, the problem size was determined by resolution requirements of the application. As a result, the granularity of the computations of our study is finer than is customary in the performance analysis of concurrent block-tridiagonal solvers. Our best results were obtained with a modified half-Gauss–Seidel line-relaxation method implemented by means of a new iterative block-tridiagonal solver that is developed here. Most computations were performed on the Intel Touchstone Delta, but we also used the Intel Paragon XP/S, the Parsytec SC-256, and the Fujitsu S-600 for comparison

    GPU implementation of Krylov solvers for block-tridiagonal eigenvalue problems

    Full text link
    The final publication is available at Springer via http://dx.doi.org/10.1007/978-3-319-32149-3_18In an eigenvalue problem defined by one or two matrices with block-tridiagonal structure, if only a few eigenpairs are required it is interesting to consider iterative methods based on Krylov subspaces, even if matrix blocks are dense. In this context, using the GPU for the associated dense linear algebra may provide high performance. We analyze this in an implementation done in the context of SLEPc, the Scalable Library for Eigenvalue Problem Computations. In the case of a generalized eigenproblem or when interior eigenvalues are computed with shift-and-invert, the main computational kernel is the solution of linear systems with a block-tridiagonal matrix. We explore possible implementations of this operation on the GPU, including a block cyclic reduction algorithm.This work was partially supported by the Spanish Ministry of Economy and Competitiveness under grant TIN2013-41049-P. Alejandro Lamas was supported by the Spanish Ministry of Education, Culture and Sport through grant FPU13-06655.Lamas Daviña, A.; Román Moltó, JE. (2016). GPU implementation of Krylov solvers for block-tridiagonal eigenvalue problems. En Parallel Processing and Applied Mathematics. Springer. 182-191. https://doi.org/10.1007%2F978-3-319-32149-3_18S182191Baghapour, B., Esfahanian, V., Torabzadeh, M., Darian, H.M.: A discontinuous Galerkin method with block cyclic reduction solver for simulating compressible flows on GPUs. Int. J. Comput. Math. 92(1), 110–131 (2014)Bientinesi, P., Igual, F.D., Kressner, D., Petschow, M., Quintana-Ortí, E.S.: Condensed forms for the symmetric eigenvalue problem on multi-threaded architectures. Concur. Comput. Pract. Exp. 23, 694–707 (2011)Haidar, A., Ltaief, H., Dongarra, J.: Toward a high performance tile divide and conquer algorithm for the dense symmetric eigenvalue problem. SIAM J. Sci. Comput. 34(6), C249–C274 (2012)Heller, D.: Some aspects of the cyclic reduction algorithm for block tridiagonal linear systems. SIAM J. Numer. Anal. 13(4), 484–496 (1976)Hernandez, V., Roman, J.E., Vidal, V.: SLEPc: a scalable and flexible toolkit for the solution of eigenvalue problems. ACM Trans. Math. Softw. 31(3), 351–362 (2005)Hirshman, S.P., Perumalla, K.S., Lynch, V.E., Sanchez, R.: BCYCLIC: a parallel block tridiagonal matrix cyclic solver. J. Comput. Phys. 229(18), 6392–6404 (2010)Minden, V., Smith, B., Knepley, M.G.: Preliminary implementation of PETSc using GPUs. In: Yuen, D.A., Wang, L., Chi, X., Johnsson, L., Ge, W., Shi, Y. (eds.) GPU Solutions to Multi-scale Problems in Science and Engineering. Lecture Notes in Earth System Sciences, pp. 131–140. Springer, Heidelberg (2013)NVIDIA: CUBLAS Library V7.0. Technical report, DU-06702-001 _\_ v7.0, NVIDIA Corporation (2015)Park, A.J., Perumalla, K.S.: Efficient heterogeneous execution on large multicore and accelerator platforms: case study using a block tridiagonal solver. J. Parallel and Distrib. Comput. 73(12), 1578–1591 (2013)Reguly, I., Giles, M.: Efficient sparse matrix-vector multiplication on cache-based GPUs. In: Innovative Parallel Computing (InPar), pp. 1–12 (2012)Roman, J.E., Vasconcelos, P.B.: Harnessing GPU power from high-level libraries: eigenvalues of integral operators with SLEPc. In: International Conference on Computational Science. Procedia Computer Science, vol. 18, pp. 2591–2594. Elsevier (2013)Seal, S.K., Perumalla, K.S., Hirshman, S.P.: Revisiting parallel cyclic reduction and parallel prefix-based algorithms for block tridiagonal systems of equations. J. Parallel Distrib. Comput. 73(2), 273–280 (2013)Stewart, G.W.: A Krylov-Schur algorithm for large eigenproblems. SIAM J. Matrix Anal. Appl. 23(3), 601–614 (2001)Tomov, S., Nath, R., Dongarra, J.: Accelerating the reduction to upper Hessenberg, tridiagonal, and bidiagonal forms through hybrid GPU-based computing. Parallel Comput. 36(12), 645–654 (2010)Vomel, C., Tomov, S., Dongarra, J.: Divide and conquer on hybrid GPU-accelerated multicore systems. SIAM J. Sci. Comput. 34(2), C70–C82 (2012)Zhang, Y., Cohen, J., Owens, J.D.: Fast tridiagonal solvers on the GPU. In: Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming. PPopp 2010, pp. 127–136 (2010

    Minimizing Communication for Eigenproblems and the Singular Value Decomposition

    Full text link
    Algorithms have two costs: arithmetic and communication. The latter represents the cost of moving data, either between levels of a memory hierarchy, or between processors over a network. Communication often dominates arithmetic and represents a rapidly increasing proportion of the total cost, so we seek algorithms that minimize communication. In \cite{BDHS10} lower bounds were presented on the amount of communication required for essentially all O(n3)O(n^3)-like algorithms for linear algebra, including eigenvalue problems and the SVD. Conventional algorithms, including those currently implemented in (Sca)LAPACK, perform asymptotically more communication than these lower bounds require. In this paper we present parallel and sequential eigenvalue algorithms (for pencils, nonsymmetric matrices, and symmetric matrices) and SVD algorithms that do attain these lower bounds, and analyze their convergence and communication costs.Comment: 43 pages, 11 figure

    Fast computation of spectral projectors of banded matrices

    Full text link
    We consider the approximate computation of spectral projectors for symmetric banded matrices. While this problem has received considerable attention, especially in the context of linear scaling electronic structure methods, the presence of small relative spectral gaps challenges existing methods based on approximate sparsity. In this work, we show how a data-sparse approximation based on hierarchical matrices can be used to overcome this problem. We prove a priori bounds on the approximation error and propose a fast algo- rithm based on the QDWH algorithm, along the works by Nakatsukasa et al. Numerical experiments demonstrate that the performance of our algorithm is robust with respect to the spectral gap. A preliminary Matlab implementation becomes faster than eig already for matrix sizes of a few thousand.Comment: 27 pages, 10 figure

    Improved Accuracy and Parallelism for MRRR-based Eigensolvers -- A Mixed Precision Approach

    Get PDF
    The real symmetric tridiagonal eigenproblem is of outstanding importance in numerical computations; it arises frequently as part of eigensolvers for standard and generalized dense Hermitian eigenproblems that are based on a reduction to tridiagonal form. For its solution, the algorithm of Multiple Relatively Robust Representations (MRRR) is among the fastest methods. Although fast, the solvers based on MRRR do not deliver the same accuracy as competing methods like Divide & Conquer or the QR algorithm. In this paper, we demonstrate that the use of mixed precisions leads to improved accuracy of MRRR-based eigensolvers with limited or no performance penalty. As a result, we obtain eigensolvers that are not only equally or more accurate than the best available methods, but also -in most circumstances- faster and more scalable than the competition

    High-Performance Solvers for Dense Hermitian Eigenproblems

    Full text link
    We introduce a new collection of solvers - subsequently called EleMRRR - for large-scale dense Hermitian eigenproblems. EleMRRR solves various types of problems: generalized, standard, and tridiagonal eigenproblems. Among these, the last is of particular importance as it is a solver on its own right, as well as the computational kernel for the first two; we present a fast and scalable tridiagonal solver based on the Algorithm of Multiple Relatively Robust Representations - referred to as PMRRR. Like the other EleMRRR solvers, PMRRR is part of the freely available Elemental library, and is designed to fully support both message-passing (MPI) and multithreading parallelism (SMP). As a result, the solvers can equally be used in pure MPI or in hybrid MPI-SMP fashion. We conducted a thorough performance study of EleMRRR and ScaLAPACK's solvers on two supercomputers. Such a study, performed with up to 8,192 cores, provides precise guidelines to assemble the fastest solver within the ScaLAPACK framework; it also indicates that EleMRRR outperforms even the fastest solvers built from ScaLAPACK's components

    Band gap engineering in finite elongated graphene nanoribbon heterojunctions: Tight-binding model

    Full text link
    A simple model based on the divide and conquer rule and tight-binding (TB) approximation is employed for studying the role of finite size effect on the electronic properties of elongated graphene nanoribbon (GNR) heterojunctions. In our model, the GNR heterojunction is divided into three parts: a left (L) part, middle (M) part, and right (R) part. The left part is a GNR of width WLW_{L}, the middle part is a GNR of width WMW_{M}, and the right part is a GNR of width WRW_{R}. We assume that the left and right parts of the GNR heterojunction interact with the middle part only. Under this approximation, the Hamiltonian of the system can be expressed as a block tridiagonal matrix. The matrix elements of the tridiagonal matrix are computed using real space nearest neighbor orthogonal TB approximation. The electronic structure of the GNR heterojunction is analyzed by computing the density of states. We demonstrate that for heterojunctions for which WL=WRW_{L} = W_{R}, the band gap of the system can be tuned continuously by varying the length of the middle part, thus providing a new approach to band gap engineering in GNRs. Our TB results were compared with calculations employing divide and conquer rule in combination with density functional theory (DFT) and were found to agree nicely.Comment: arXiv admin note: text overlap with arXiv:1404.249
    • …
    corecore