24 research outputs found
Recommended from our members
Performance and Accuracy of LAPACK's Symmetric TridiagonalEigensolvers
We compare four algorithms from the latest LAPACK 3.1 release for computing eigenpairs of a symmetric tridiagonal matrix. These include QR iteration, bisection and inverse iteration (BI), the Divide-and-Conquer method (DC), and the method of Multiple Relatively Robust Representations (MR). Our evaluation considers speed and accuracy when computing all eigenpairs, and additionally subset computations. Using a variety of carefully selected test problems, our study includes a variety of today's computer architectures. Our conclusions can be summarized as follows. (1) DC and MR are generally much faster than QR and BI on large matrices. (2) MR almost always does the fewest floating point operations, but at a lower MFlop rate than all the other algorithms. (3) The exact performance of MR and DC strongly depends on the matrix at hand. (4) DC and QR are the most accurate algorithms with observed accuracy O({radical}ne). The accuracy of BI and MR is generally O(ne). (5) MR is preferable to BI for subset computations
GPU implementation of Krylov solvers for block-tridiagonal eigenvalue problems
The final publication is available at Springer via http://dx.doi.org/10.1007/978-3-319-32149-3_18In an eigenvalue problem defined by one or two matrices with block-tridiagonal structure, if only a few eigenpairs are required it is interesting to consider iterative methods based on Krylov subspaces, even if matrix blocks are dense. In this context, using the GPU for the associated dense linear algebra may provide high performance. We analyze this in an implementation done in the context of SLEPc, the Scalable Library for Eigenvalue Problem Computations. In the case of a generalized eigenproblem or when interior eigenvalues are computed with shift-and-invert, the main computational kernel is the solution of linear systems with a block-tridiagonal matrix. We explore possible implementations of this operation on the GPU, including a block cyclic reduction algorithm.This work was partially supported by the Spanish Ministry of Economy and Competitiveness under grant TIN2013-41049-P. Alejandro Lamas was supported by the Spanish Ministry of Education, Culture and Sport through grant FPU13-06655.Lamas Daviña, A.; Román Moltó, JE. (2016). GPU implementation of Krylov solvers for block-tridiagonal eigenvalue problems. En Parallel Processing and Applied Mathematics. Springer. 182-191. https://doi.org/10.1007%2F978-3-319-32149-3_18S182191Baghapour, B., Esfahanian, V., Torabzadeh, M., Darian, H.M.: A discontinuous Galerkin method with block cyclic reduction solver for simulating compressible flows on GPUs. Int. J. Comput. Math. 92(1), 110–131 (2014)Bientinesi, P., Igual, F.D., Kressner, D., Petschow, M., Quintana-Ortí, E.S.: Condensed forms for the symmetric eigenvalue problem on multi-threaded architectures. Concur. Comput. Pract. Exp. 23, 694–707 (2011)Haidar, A., Ltaief, H., Dongarra, J.: Toward a high performance tile divide and conquer algorithm for the dense symmetric eigenvalue problem. SIAM J. Sci. Comput. 34(6), C249–C274 (2012)Heller, D.: Some aspects of the cyclic reduction algorithm for block tridiagonal linear systems. SIAM J. Numer. Anal. 13(4), 484–496 (1976)Hernandez, V., Roman, J.E., Vidal, V.: SLEPc: a scalable and flexible toolkit for the solution of eigenvalue problems. ACM Trans. Math. Softw. 31(3), 351–362 (2005)Hirshman, S.P., Perumalla, K.S., Lynch, V.E., Sanchez, R.: BCYCLIC: a parallel block tridiagonal matrix cyclic solver. J. Comput. Phys. 229(18), 6392–6404 (2010)Minden, V., Smith, B., Knepley, M.G.: Preliminary implementation of PETSc using GPUs. In: Yuen, D.A., Wang, L., Chi, X., Johnsson, L., Ge, W., Shi, Y. (eds.) GPU Solutions to Multi-scale Problems in Science and Engineering. Lecture Notes in Earth System Sciences, pp. 131–140. Springer, Heidelberg (2013)NVIDIA: CUBLAS Library V7.0. Technical report, DU-06702-001 v7.0, NVIDIA Corporation (2015)Park, A.J., Perumalla, K.S.: Efficient heterogeneous execution on large multicore and accelerator platforms: case study using a block tridiagonal solver. J. Parallel and Distrib. Comput. 73(12), 1578–1591 (2013)Reguly, I., Giles, M.: Efficient sparse matrix-vector multiplication on cache-based GPUs. In: Innovative Parallel Computing (InPar), pp. 1–12 (2012)Roman, J.E., Vasconcelos, P.B.: Harnessing GPU power from high-level libraries: eigenvalues of integral operators with SLEPc. In: International Conference on Computational Science. Procedia Computer Science, vol. 18, pp. 2591–2594. Elsevier (2013)Seal, S.K., Perumalla, K.S., Hirshman, S.P.: Revisiting parallel cyclic reduction and parallel prefix-based algorithms for block tridiagonal systems of equations. J. Parallel Distrib. Comput. 73(2), 273–280 (2013)Stewart, G.W.: A Krylov-Schur algorithm for large eigenproblems. SIAM J. Matrix Anal. Appl. 23(3), 601–614 (2001)Tomov, S., Nath, R., Dongarra, J.: Accelerating the reduction to upper Hessenberg, tridiagonal, and bidiagonal forms through hybrid GPU-based computing. Parallel Comput. 36(12), 645–654 (2010)Vomel, C., Tomov, S., Dongarra, J.: Divide and conquer on hybrid GPU-accelerated multicore systems. SIAM J. Sci. Comput. 34(2), C70–C82 (2012)Zhang, Y., Cohen, J., Owens, J.D.: Fast tridiagonal solvers on the GPU. In: Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming. PPopp 2010, pp. 127–136 (2010
The SPARC Water Vapor Assessment II: assessment of satellite measurements of upper tropospheric humidity
Nineteen limb-viewing data sets (occultation, passive thermal, and UV scattering) and two nadir upper tropospheric humidity (UTH) data sets are intercompared and also compared to frost-point hygrometer balloon sondes. The upper troposphere considered here covers the pressure range from 300-100 hPa. UTH is a challenging measurement, because concentrations vary between 2-1000 ppmv (parts per million by volume), with sharp changes in vertical gradients near the tropopause. Cloudiness in this region also makes the measurement challenging. The atmospheric temperature is also highly variable ranging from 180-250 K. The assessment of satellite-measured UTH is based on coincident comparisons with balloon frost-point hygrometer sondes, multi-month mapped comparisons, zonal mean time series comparisons, and coincident satellite-to-satellite comparisons. While the satellite fields show similar features in maps and time series, quantitatively they can differ by a factor of 2 in concentration, with strong dependencies on the amount of UTH. Additionally, time-lag response-corrected Vaisala RS92 radiosondes are compared to satellites and the frost-point hygrometer measurements. In summary, most satellite data sets reviewed here show on average similar to 30 % agreement amongst themselves and frost-point data but with an additional similar to 30 % variability about the mean bias. The Vaisala RS92 sonde, even with a time-lag correction, shows poor behavior for pressures less than 200 hPa
Recommended from our members
Computations of Eigenpair Subsets with the MRRR Algorithm
The main advantage of inverse iteration over the QR algorithm and Divide & Conquer for the symmetric tridiagonal eigenproblem is that subsets of eigenpairs can be computed at reduced cost. The MRRR algorithm (MRRR = Multiple Relatively Robust Representations) is a clever variant of inverse iteration without the need for reorthogonalization. STEGR, the current version of MRRR in LAPACK 3.0, does not allow for subset computations. The next release of STEGR is designed to compute a (sub-)set of k eigenpairs with {Omicron}(kn) operations. Because of the special way in which eigenvectors are computed, MRRR subset computations are more complicated than when using inverse iteration. Unlike the latter, MRRR sometimes cannot ignore the unwanted part of the spectrum. We describe the problems with what we call 'false singletons'. These are eigenvalues that appear to be isolated with respect to the wanted eigenvalues but in fact belong to a tight cluster of unwanted eigenvalues. This paper analyzes these complications and ways to deal with them
Radiation Dry Bias of the Vaisala RS92 Humidity Sensor
The comparison of simultaneous humidity measurements by the Vaisala RS92 radiosonde and by the Cryogenic Frostpoint Hygrometer (CFH) launched at Alajuela, Cosla Rica, during July 2005 reveals a large solar radiation dry bias of the Vaisala RS92 humidity sensor and a minor temperature-dependent calibration error. For soundings launched at solar zenith angles between 10" and 30 , the average dry bias is on the order of 9% at the surface and increases to 50% at 15 km. A simple pressure- and temperature-dependent correction based on the comparison with the CFH can reduce this error to less than 7% at all altitudes up to 15.2 km, which is 700 m below the tropical tropopause. The correction does not depend on relative humidity, but is able to reproduce the relative humidity distribution observed by the CFH