25 research outputs found

    GPU implementation of Krylov solvers for block-tridiagonal eigenvalue problems

    Full text link
    The final publication is available at Springer via http://dx.doi.org/10.1007/978-3-319-32149-3_18In an eigenvalue problem defined by one or two matrices with block-tridiagonal structure, if only a few eigenpairs are required it is interesting to consider iterative methods based on Krylov subspaces, even if matrix blocks are dense. In this context, using the GPU for the associated dense linear algebra may provide high performance. We analyze this in an implementation done in the context of SLEPc, the Scalable Library for Eigenvalue Problem Computations. In the case of a generalized eigenproblem or when interior eigenvalues are computed with shift-and-invert, the main computational kernel is the solution of linear systems with a block-tridiagonal matrix. We explore possible implementations of this operation on the GPU, including a block cyclic reduction algorithm.This work was partially supported by the Spanish Ministry of Economy and Competitiveness under grant TIN2013-41049-P. Alejandro Lamas was supported by the Spanish Ministry of Education, Culture and Sport through grant FPU13-06655.Lamas Daviña, A.; Román Moltó, JE. (2016). GPU implementation of Krylov solvers for block-tridiagonal eigenvalue problems. En Parallel Processing and Applied Mathematics. Springer. 182-191. https://doi.org/10.1007%2F978-3-319-32149-3_18S182191Baghapour, B., Esfahanian, V., Torabzadeh, M., Darian, H.M.: A discontinuous Galerkin method with block cyclic reduction solver for simulating compressible flows on GPUs. Int. J. Comput. Math. 92(1), 110–131 (2014)Bientinesi, P., Igual, F.D., Kressner, D., Petschow, M., Quintana-Ortí, E.S.: Condensed forms for the symmetric eigenvalue problem on multi-threaded architectures. Concur. Comput. Pract. Exp. 23, 694–707 (2011)Haidar, A., Ltaief, H., Dongarra, J.: Toward a high performance tile divide and conquer algorithm for the dense symmetric eigenvalue problem. SIAM J. Sci. Comput. 34(6), C249–C274 (2012)Heller, D.: Some aspects of the cyclic reduction algorithm for block tridiagonal linear systems. SIAM J. Numer. Anal. 13(4), 484–496 (1976)Hernandez, V., Roman, J.E., Vidal, V.: SLEPc: a scalable and flexible toolkit for the solution of eigenvalue problems. ACM Trans. Math. Softw. 31(3), 351–362 (2005)Hirshman, S.P., Perumalla, K.S., Lynch, V.E., Sanchez, R.: BCYCLIC: a parallel block tridiagonal matrix cyclic solver. J. Comput. Phys. 229(18), 6392–6404 (2010)Minden, V., Smith, B., Knepley, M.G.: Preliminary implementation of PETSc using GPUs. In: Yuen, D.A., Wang, L., Chi, X., Johnsson, L., Ge, W., Shi, Y. (eds.) GPU Solutions to Multi-scale Problems in Science and Engineering. Lecture Notes in Earth System Sciences, pp. 131–140. Springer, Heidelberg (2013)NVIDIA: CUBLAS Library V7.0. Technical report, DU-06702-001 _\_ v7.0, NVIDIA Corporation (2015)Park, A.J., Perumalla, K.S.: Efficient heterogeneous execution on large multicore and accelerator platforms: case study using a block tridiagonal solver. J. Parallel and Distrib. Comput. 73(12), 1578–1591 (2013)Reguly, I., Giles, M.: Efficient sparse matrix-vector multiplication on cache-based GPUs. In: Innovative Parallel Computing (InPar), pp. 1–12 (2012)Roman, J.E., Vasconcelos, P.B.: Harnessing GPU power from high-level libraries: eigenvalues of integral operators with SLEPc. In: International Conference on Computational Science. Procedia Computer Science, vol. 18, pp. 2591–2594. Elsevier (2013)Seal, S.K., Perumalla, K.S., Hirshman, S.P.: Revisiting parallel cyclic reduction and parallel prefix-based algorithms for block tridiagonal systems of equations. J. Parallel Distrib. Comput. 73(2), 273–280 (2013)Stewart, G.W.: A Krylov-Schur algorithm for large eigenproblems. SIAM J. Matrix Anal. Appl. 23(3), 601–614 (2001)Tomov, S., Nath, R., Dongarra, J.: Accelerating the reduction to upper Hessenberg, tridiagonal, and bidiagonal forms through hybrid GPU-based computing. Parallel Comput. 36(12), 645–654 (2010)Vomel, C., Tomov, S., Dongarra, J.: Divide and conquer on hybrid GPU-accelerated multicore systems. SIAM J. Sci. Comput. 34(2), C70–C82 (2012)Zhang, Y., Cohen, J., Owens, J.D.: Fast tridiagonal solvers on the GPU. In: Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming. PPopp 2010, pp. 127–136 (2010

    Arctigenin Efficiently Enhanced Sedentary Mice Treadmill Endurance

    Get PDF
    Physical inactivity is considered as one of the potential risk factors for the development of type 2 diabetes and other metabolic diseases, while endurance exercise training could enhance fat oxidation that is associated with insulin sensitivity improvement in obesity. AMP-activated protein kinase (AMPK) as an energy sensor plays pivotal roles in the regulation of energy homeostasis, and its activation could improve glucose uptake, promote mitochondrial biogenesis and increase glycolysis. Recent research has even suggested that AMPK activation contributed to endurance enhancement without exercise. Here we report that the natural product arctigenin from the traditional herb Arctium lappa L. (Compositae) strongly increased AMPK phosphorylation and subsequently up-regulated its downstream pathway in both H9C2 and C2C12 cells. It was discovered that arctigenin phosphorylated AMPK via calmodulin-dependent protein kinase kinase (CaMKK) and serine/threonine kinase 11(LKB1)-dependent pathways. Mice treadmill based in vivo assay further indicated that administration of arctigenin improved efficiently mice endurance as reflected by the increased fatigue time and distance, and potently enhanced mitochondrial biogenesis and fatty acid oxidation (FAO) related genes expression in muscle tissues. Our results thus suggested that arctigenin might be used as a potential lead compound for the discovery of the agents with mimic exercise training effects to treat metabolic diseases

    Role of biomechanics in the understanding of normal, injured, and healing ligaments and tendons

    Get PDF
    Ligaments and tendons are soft connective tissues which serve essential roles for biomechanical function of the musculoskeletal system by stabilizing and guiding the motion of diarthrodial joints. Nevertheless, these tissues are frequently injured due to repetition and overuse as well as quick cutting motions that involve acceleration and deceleration. These injuries often upset this balance between mobility and stability of the joint which causes damage to other soft tissues manifested as pain and other morbidity, such as osteoarthritis

    Three-dimensional equilibria and transport in RFX-mod: A description using stellarator tools

    No full text
    RFX-mod self-organized single helical axis (SHAx) states provide a unique opportunity to advance 3D fusion physics and establish a common knowledge basis in a parameter region not covered by stellarators and tokamaks. The VMEC code has been adapted to the reversed-field pinch (RFP) to model SHAx equilibria in fixed boundary mode with experimental measurements as constraint. The averaged particle diffusivity over the helical volume, estimated with the Monte Carlo code ORBIT, has a neoclassical-like dependence on collisionality and does not show the 1/ trend of un-optimized stellarators. In particular, the helical region boundary, corresponding to an electron transport barrier with zero magnetic shear and improved confinement, has been investigated using numerical codes common to the stellarator community. In fact, the DKES/PENTA codes have been applied to RFP for local neoclassical transport computations, including radial electric field, to estimate thermal diffusion coefficients in the barrier region for typical RFX-mod temperature and density profiles. A comparison with power balance estimates shows that residual chaos due to secondary tearing modes and small-scale turbulence still contribute to drive anomalous transport in the barrier region. © 2011 American Institute of Physics

    Magnetic configuration effects on the Wendelstein 7-X stellarator

    No full text
    The two leading concepts for confining high-temperature fusion plasmas are the tokamak and the stellarator. Tokamaks are rotationally symmetric and use a large plasma current to achieve confinement, whereas stellarators are non-axisymmetric and employ three-dimensionally shaped magnetic field coils to twist the field and confine the plasma. As a result, the magnetic field of a stellarator needs to be carefully designed to minimize the collisional transport arising from poorly confined particle orbits, which would otherwise cause excessive power losses at high plasma temperatures. In addition, this type of transport leads to the appearance of a net toroidal plasma current, the so-called bootstrap current. Here, we analyse results from the first experimental campaign of the Wendelstein 7-X stellarator, showing that its magnetic-field design allows good control of bootstrap currents and collisional transport. The energy confinement time is among the best ever achieved in stellarators, both in absolute figures (τE > 100 ms) and relative to the stellarator confinement scaling. The bootstrap current responds as predicted to changes in the magnetic mirror ratio. These initial experiments confirm several theoretically predicted properties of Wendelstein 7-X plasmas, and already indicate consistency with optimization measures
    corecore