558 research outputs found

    BCYCLIC: A parallel block tridiagonal matrix cyclic solver

    Get PDF
    13 pages, 6 figures.A block tridiagonal matrix is factored with minimal fill-in using a cyclic reduction algorithm that is easily parallelized. Storage of the factored blocks allows the application of the inverse to multiple right-hand sides which may not be known at factorization time. Scalability with the number of block rows is achieved with cyclic reduction, while scalability with the block size is achieved using multithreaded routines (OpenMP, GotoBLAS) for block matrix manipulation. This dual scalability is a noteworthy feature of this new solver, as well as its ability to efficiently handle arbitrary (non-powers-of-2) block row and processor numbers. Comparison with a state-of-the art parallel sparse solver is presented. It is expected that this new solver will allow many physical applications to optimally use the parallel resources on current supercomputers. Example usage of the solver in magneto-hydrodynamic (MHD), three-dimensional equilibrium solvers for high-temperature fusion plasmas is cited.This research has been sponsored by the US Department of Energy under Contract DE-AC05-00OR22725 with UT-Battelle, LLC. This research used resources of the National Center for Computational Sciences at Oak Ridge National Laboratory, which is supported by the Office of Science of the Department of Energy under Contract DE-AC05-00OR22725.Publicad

    Extension of the SIESTA MHD equilibrium code to free-plasma-boundary problems

    Get PDF
    is a recently developed MHD equilibrium code designed to perform fast and accurate calculations of ideal MHD equilibria for three-dimensional magnetic configurations. Since SIESTA does not assume closed magnetic surfaces, the solution can exhibit magnetic islands and stochastic regions. In its original implementation SIESTA addressed only fixed-boundary problems. That is, the shape of the plasma edge, assumed to be a magnetic surface, was kept fixed as the solution iteratively converges to equilibrium. This condition somewhat restricts the possible applications of SIESTA. In this paper, we discuss an extension that will enable SIESTA to address free-plasma-boundary problems, opening up the possibility of investigating problems in which the plasma boundary is perturbed either externally or internally. As an illustration, SIESTA is applied to a configuration of the W7-X stellarator.This research was funded in part by the Ministerio de Economía, Industria y Competitividad of Spain, Grant No. ENE2015-68265. This research was carried out in part at the Max-Planck-Institute for Plasma Physics in Greifswald (Germany), whose hospitality is gratefully acknowledged. This research was supported in part by the U.S. Department of Energy, Office of Fusion Energy Sciences under Award DE-AC05-00OR22725. SIESTA runs have been carred out in Uranus, a supercomputer cluster located at Universidad Carlos III de Madrid and funded jointly by the European Regional Development Funds (EU-FEDER) Project No. UNC313-4E-2361, and by the Ministerio de Economía, Industria y Competitividad via the National Project Nos. ENE2009-12213-C03-03, ENE2012-33219, and ENE2012-31753

    Perpendicular momentum injection by lower hybrid wave in a tokamak

    Full text link
    The injection of lower hybrid waves for current drive into a tokamak affects the profile of intrinsic rotation. In this article, the momentum deposition by the lower hybrid wave on the electrons is studied. Due to the increase in the poloidal momentum of the wave as it propagates into the tokamak, the parallel momentum of the wave increases considerably. The change of the perpendicular momentum of the wave is such that the toroidal angular momentum of the wave is conserved. If the perpendicular momentum transfer via electron Landau damping is ignored, the transfer of the toroidal angular momentum to the plasma will be larger than the injected toroidal angular momentum. A proper quasilinear treatment proves that both perpendicular and parallel momentum are transferred to the electrons. The toroidal angular momentum of the electrons is then transferred to the ions via different mechanisms for the parallel and perpendicular momentum. The perpendicular momentum is transferred to ions through an outward radial electron pinch, while the parallel momentum is transferred through collisions.Comment: 22 pages, 4 figure

    Plasmas and Controlled Nuclear Fusion

    Get PDF
    Contains reports on three research projects.U. S. Atomic Energy Commission (Contract AT(11-1)-3070

    GPU implementation of Krylov solvers for block-tridiagonal eigenvalue problems

    Full text link
    The final publication is available at Springer via http://dx.doi.org/10.1007/978-3-319-32149-3_18In an eigenvalue problem defined by one or two matrices with block-tridiagonal structure, if only a few eigenpairs are required it is interesting to consider iterative methods based on Krylov subspaces, even if matrix blocks are dense. In this context, using the GPU for the associated dense linear algebra may provide high performance. We analyze this in an implementation done in the context of SLEPc, the Scalable Library for Eigenvalue Problem Computations. In the case of a generalized eigenproblem or when interior eigenvalues are computed with shift-and-invert, the main computational kernel is the solution of linear systems with a block-tridiagonal matrix. We explore possible implementations of this operation on the GPU, including a block cyclic reduction algorithm.This work was partially supported by the Spanish Ministry of Economy and Competitiveness under grant TIN2013-41049-P. Alejandro Lamas was supported by the Spanish Ministry of Education, Culture and Sport through grant FPU13-06655.Lamas Daviña, A.; Román Moltó, JE. (2016). GPU implementation of Krylov solvers for block-tridiagonal eigenvalue problems. En Parallel Processing and Applied Mathematics. Springer. 182-191. https://doi.org/10.1007%2F978-3-319-32149-3_18S182191Baghapour, B., Esfahanian, V., Torabzadeh, M., Darian, H.M.: A discontinuous Galerkin method with block cyclic reduction solver for simulating compressible flows on GPUs. Int. J. Comput. Math. 92(1), 110–131 (2014)Bientinesi, P., Igual, F.D., Kressner, D., Petschow, M., Quintana-Ortí, E.S.: Condensed forms for the symmetric eigenvalue problem on multi-threaded architectures. Concur. Comput. Pract. Exp. 23, 694–707 (2011)Haidar, A., Ltaief, H., Dongarra, J.: Toward a high performance tile divide and conquer algorithm for the dense symmetric eigenvalue problem. SIAM J. Sci. Comput. 34(6), C249–C274 (2012)Heller, D.: Some aspects of the cyclic reduction algorithm for block tridiagonal linear systems. SIAM J. Numer. Anal. 13(4), 484–496 (1976)Hernandez, V., Roman, J.E., Vidal, V.: SLEPc: a scalable and flexible toolkit for the solution of eigenvalue problems. ACM Trans. Math. Softw. 31(3), 351–362 (2005)Hirshman, S.P., Perumalla, K.S., Lynch, V.E., Sanchez, R.: BCYCLIC: a parallel block tridiagonal matrix cyclic solver. J. Comput. Phys. 229(18), 6392–6404 (2010)Minden, V., Smith, B., Knepley, M.G.: Preliminary implementation of PETSc using GPUs. In: Yuen, D.A., Wang, L., Chi, X., Johnsson, L., Ge, W., Shi, Y. (eds.) GPU Solutions to Multi-scale Problems in Science and Engineering. Lecture Notes in Earth System Sciences, pp. 131–140. Springer, Heidelberg (2013)NVIDIA: CUBLAS Library V7.0. Technical report, DU-06702-001 _\_ v7.0, NVIDIA Corporation (2015)Park, A.J., Perumalla, K.S.: Efficient heterogeneous execution on large multicore and accelerator platforms: case study using a block tridiagonal solver. J. Parallel and Distrib. Comput. 73(12), 1578–1591 (2013)Reguly, I., Giles, M.: Efficient sparse matrix-vector multiplication on cache-based GPUs. In: Innovative Parallel Computing (InPar), pp. 1–12 (2012)Roman, J.E., Vasconcelos, P.B.: Harnessing GPU power from high-level libraries: eigenvalues of integral operators with SLEPc. In: International Conference on Computational Science. Procedia Computer Science, vol. 18, pp. 2591–2594. Elsevier (2013)Seal, S.K., Perumalla, K.S., Hirshman, S.P.: Revisiting parallel cyclic reduction and parallel prefix-based algorithms for block tridiagonal systems of equations. J. Parallel Distrib. Comput. 73(2), 273–280 (2013)Stewart, G.W.: A Krylov-Schur algorithm for large eigenproblems. SIAM J. Matrix Anal. Appl. 23(3), 601–614 (2001)Tomov, S., Nath, R., Dongarra, J.: Accelerating the reduction to upper Hessenberg, tridiagonal, and bidiagonal forms through hybrid GPU-based computing. Parallel Comput. 36(12), 645–654 (2010)Vomel, C., Tomov, S., Dongarra, J.: Divide and conquer on hybrid GPU-accelerated multicore systems. SIAM J. Sci. Comput. 34(2), C70–C82 (2012)Zhang, Y., Cohen, J., Owens, J.D.: Fast tridiagonal solvers on the GPU. In: Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming. PPopp 2010, pp. 127–136 (2010
    corecore