24 research outputs found

    GPU implementation of Krylov solvers for block-tridiagonal eigenvalue problems

    Full text link
    The final publication is available at Springer via http://dx.doi.org/10.1007/978-3-319-32149-3_18In an eigenvalue problem defined by one or two matrices with block-tridiagonal structure, if only a few eigenpairs are required it is interesting to consider iterative methods based on Krylov subspaces, even if matrix blocks are dense. In this context, using the GPU for the associated dense linear algebra may provide high performance. We analyze this in an implementation done in the context of SLEPc, the Scalable Library for Eigenvalue Problem Computations. In the case of a generalized eigenproblem or when interior eigenvalues are computed with shift-and-invert, the main computational kernel is the solution of linear systems with a block-tridiagonal matrix. We explore possible implementations of this operation on the GPU, including a block cyclic reduction algorithm.This work was partially supported by the Spanish Ministry of Economy and Competitiveness under grant TIN2013-41049-P. Alejandro Lamas was supported by the Spanish Ministry of Education, Culture and Sport through grant FPU13-06655.Lamas Daviña, A.; Román Moltó, JE. (2016). GPU implementation of Krylov solvers for block-tridiagonal eigenvalue problems. En Parallel Processing and Applied Mathematics. Springer. 182-191. https://doi.org/10.1007%2F978-3-319-32149-3_18S182191Baghapour, B., Esfahanian, V., Torabzadeh, M., Darian, H.M.: A discontinuous Galerkin method with block cyclic reduction solver for simulating compressible flows on GPUs. Int. J. Comput. Math. 92(1), 110–131 (2014)Bientinesi, P., Igual, F.D., Kressner, D., Petschow, M., Quintana-Ortí, E.S.: Condensed forms for the symmetric eigenvalue problem on multi-threaded architectures. Concur. Comput. Pract. Exp. 23, 694–707 (2011)Haidar, A., Ltaief, H., Dongarra, J.: Toward a high performance tile divide and conquer algorithm for the dense symmetric eigenvalue problem. SIAM J. Sci. Comput. 34(6), C249–C274 (2012)Heller, D.: Some aspects of the cyclic reduction algorithm for block tridiagonal linear systems. SIAM J. Numer. Anal. 13(4), 484–496 (1976)Hernandez, V., Roman, J.E., Vidal, V.: SLEPc: a scalable and flexible toolkit for the solution of eigenvalue problems. ACM Trans. Math. Softw. 31(3), 351–362 (2005)Hirshman, S.P., Perumalla, K.S., Lynch, V.E., Sanchez, R.: BCYCLIC: a parallel block tridiagonal matrix cyclic solver. J. Comput. Phys. 229(18), 6392–6404 (2010)Minden, V., Smith, B., Knepley, M.G.: Preliminary implementation of PETSc using GPUs. In: Yuen, D.A., Wang, L., Chi, X., Johnsson, L., Ge, W., Shi, Y. (eds.) GPU Solutions to Multi-scale Problems in Science and Engineering. Lecture Notes in Earth System Sciences, pp. 131–140. Springer, Heidelberg (2013)NVIDIA: CUBLAS Library V7.0. Technical report, DU-06702-001 _\_ v7.0, NVIDIA Corporation (2015)Park, A.J., Perumalla, K.S.: Efficient heterogeneous execution on large multicore and accelerator platforms: case study using a block tridiagonal solver. J. Parallel and Distrib. Comput. 73(12), 1578–1591 (2013)Reguly, I., Giles, M.: Efficient sparse matrix-vector multiplication on cache-based GPUs. In: Innovative Parallel Computing (InPar), pp. 1–12 (2012)Roman, J.E., Vasconcelos, P.B.: Harnessing GPU power from high-level libraries: eigenvalues of integral operators with SLEPc. In: International Conference on Computational Science. Procedia Computer Science, vol. 18, pp. 2591–2594. Elsevier (2013)Seal, S.K., Perumalla, K.S., Hirshman, S.P.: Revisiting parallel cyclic reduction and parallel prefix-based algorithms for block tridiagonal systems of equations. J. Parallel Distrib. Comput. 73(2), 273–280 (2013)Stewart, G.W.: A Krylov-Schur algorithm for large eigenproblems. SIAM J. Matrix Anal. Appl. 23(3), 601–614 (2001)Tomov, S., Nath, R., Dongarra, J.: Accelerating the reduction to upper Hessenberg, tridiagonal, and bidiagonal forms through hybrid GPU-based computing. Parallel Comput. 36(12), 645–654 (2010)Vomel, C., Tomov, S., Dongarra, J.: Divide and conquer on hybrid GPU-accelerated multicore systems. SIAM J. Sci. Comput. 34(2), C70–C82 (2012)Zhang, Y., Cohen, J., Owens, J.D.: Fast tridiagonal solvers on the GPU. In: Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming. PPopp 2010, pp. 127–136 (2010

    The SPARC Water Vapor Assessment II: assessment of satellite measurements of upper tropospheric humidity

    Get PDF
    Nineteen limb-viewing data sets (occultation, passive thermal, and UV scattering) and two nadir upper tropospheric humidity (UTH) data sets are intercompared and also compared to frost-point hygrometer balloon sondes. The upper troposphere considered here covers the pressure range from 300-100 hPa. UTH is a challenging measurement, because concentrations vary between 2-1000 ppmv (parts per million by volume), with sharp changes in vertical gradients near the tropopause. Cloudiness in this region also makes the measurement challenging. The atmospheric temperature is also highly variable ranging from 180-250 K. The assessment of satellite-measured UTH is based on coincident comparisons with balloon frost-point hygrometer sondes, multi-month mapped comparisons, zonal mean time series comparisons, and coincident satellite-to-satellite comparisons. While the satellite fields show similar features in maps and time series, quantitatively they can differ by a factor of 2 in concentration, with strong dependencies on the amount of UTH. Additionally, time-lag response-corrected Vaisala RS92 radiosondes are compared to satellites and the frost-point hygrometer measurements. In summary, most satellite data sets reviewed here show on average similar to 30 % agreement amongst themselves and frost-point data but with an additional similar to 30 % variability about the mean bias. The Vaisala RS92 sonde, even with a time-lag correction, shows poor behavior for pressures less than 200 hPa

    Manganese deficiency and toxicity levels for Japanese mint

    No full text

    Radiation Dry Bias of the Vaisala RS92 Humidity Sensor

    No full text
    The comparison of simultaneous humidity measurements by the Vaisala RS92 radiosonde and by the Cryogenic Frostpoint Hygrometer (CFH) launched at Alajuela, Cosla Rica, during July 2005 reveals a large solar radiation dry bias of the Vaisala RS92 humidity sensor and a minor temperature-dependent calibration error. For soundings launched at solar zenith angles between 10" and 30 , the average dry bias is on the order of 9% at the surface and increases to 50% at 15 km. A simple pressure- and temperature-dependent correction based on the comparison with the CFH can reduce this error to less than 7% at all altitudes up to 15.2 km, which is 700 m below the tropical tropopause. The correction does not depend on relative humidity, but is able to reproduce the relative humidity distribution observed by the CFH
    corecore