20 research outputs found

    GPU implementation of Krylov solvers for block-tridiagonal eigenvalue problems

    Full text link
    The final publication is available at Springer via http://dx.doi.org/10.1007/978-3-319-32149-3_18In an eigenvalue problem defined by one or two matrices with block-tridiagonal structure, if only a few eigenpairs are required it is interesting to consider iterative methods based on Krylov subspaces, even if matrix blocks are dense. In this context, using the GPU for the associated dense linear algebra may provide high performance. We analyze this in an implementation done in the context of SLEPc, the Scalable Library for Eigenvalue Problem Computations. In the case of a generalized eigenproblem or when interior eigenvalues are computed with shift-and-invert, the main computational kernel is the solution of linear systems with a block-tridiagonal matrix. We explore possible implementations of this operation on the GPU, including a block cyclic reduction algorithm.This work was partially supported by the Spanish Ministry of Economy and Competitiveness under grant TIN2013-41049-P. Alejandro Lamas was supported by the Spanish Ministry of Education, Culture and Sport through grant FPU13-06655.Lamas Daviña, A.; Román Moltó, JE. (2016). GPU implementation of Krylov solvers for block-tridiagonal eigenvalue problems. En Parallel Processing and Applied Mathematics. Springer. 182-191. https://doi.org/10.1007%2F978-3-319-32149-3_18S182191Baghapour, B., Esfahanian, V., Torabzadeh, M., Darian, H.M.: A discontinuous Galerkin method with block cyclic reduction solver for simulating compressible flows on GPUs. Int. J. Comput. Math. 92(1), 110–131 (2014)Bientinesi, P., Igual, F.D., Kressner, D., Petschow, M., Quintana-Ortí, E.S.: Condensed forms for the symmetric eigenvalue problem on multi-threaded architectures. Concur. Comput. Pract. Exp. 23, 694–707 (2011)Haidar, A., Ltaief, H., Dongarra, J.: Toward a high performance tile divide and conquer algorithm for the dense symmetric eigenvalue problem. SIAM J. Sci. Comput. 34(6), C249–C274 (2012)Heller, D.: Some aspects of the cyclic reduction algorithm for block tridiagonal linear systems. SIAM J. Numer. Anal. 13(4), 484–496 (1976)Hernandez, V., Roman, J.E., Vidal, V.: SLEPc: a scalable and flexible toolkit for the solution of eigenvalue problems. ACM Trans. Math. Softw. 31(3), 351–362 (2005)Hirshman, S.P., Perumalla, K.S., Lynch, V.E., Sanchez, R.: BCYCLIC: a parallel block tridiagonal matrix cyclic solver. J. Comput. Phys. 229(18), 6392–6404 (2010)Minden, V., Smith, B., Knepley, M.G.: Preliminary implementation of PETSc using GPUs. In: Yuen, D.A., Wang, L., Chi, X., Johnsson, L., Ge, W., Shi, Y. (eds.) GPU Solutions to Multi-scale Problems in Science and Engineering. Lecture Notes in Earth System Sciences, pp. 131–140. Springer, Heidelberg (2013)NVIDIA: CUBLAS Library V7.0. Technical report, DU-06702-001 _\_ v7.0, NVIDIA Corporation (2015)Park, A.J., Perumalla, K.S.: Efficient heterogeneous execution on large multicore and accelerator platforms: case study using a block tridiagonal solver. J. Parallel and Distrib. Comput. 73(12), 1578–1591 (2013)Reguly, I., Giles, M.: Efficient sparse matrix-vector multiplication on cache-based GPUs. In: Innovative Parallel Computing (InPar), pp. 1–12 (2012)Roman, J.E., Vasconcelos, P.B.: Harnessing GPU power from high-level libraries: eigenvalues of integral operators with SLEPc. In: International Conference on Computational Science. Procedia Computer Science, vol. 18, pp. 2591–2594. Elsevier (2013)Seal, S.K., Perumalla, K.S., Hirshman, S.P.: Revisiting parallel cyclic reduction and parallel prefix-based algorithms for block tridiagonal systems of equations. J. Parallel Distrib. Comput. 73(2), 273–280 (2013)Stewart, G.W.: A Krylov-Schur algorithm for large eigenproblems. SIAM J. Matrix Anal. Appl. 23(3), 601–614 (2001)Tomov, S., Nath, R., Dongarra, J.: Accelerating the reduction to upper Hessenberg, tridiagonal, and bidiagonal forms through hybrid GPU-based computing. Parallel Comput. 36(12), 645–654 (2010)Vomel, C., Tomov, S., Dongarra, J.: Divide and conquer on hybrid GPU-accelerated multicore systems. SIAM J. Sci. Comput. 34(2), C70–C82 (2012)Zhang, Y., Cohen, J., Owens, J.D.: Fast tridiagonal solvers on the GPU. In: Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming. PPopp 2010, pp. 127–136 (2010

    The SPARC Water Vapor Assessment II: assessment of satellite measurements of upper tropospheric humidity

    Get PDF
    Nineteen limb-viewing data sets (occultation, passive thermal, and UV scattering) and two nadir upper tropospheric humidity (UTH) data sets are intercompared and also compared to frost-point hygrometer balloon sondes. The upper troposphere considered here covers the pressure range from 300-100 hPa. UTH is a challenging measurement, because concentrations vary between 2-1000 ppmv (parts per million by volume), with sharp changes in vertical gradients near the tropopause. Cloudiness in this region also makes the measurement challenging. The atmospheric temperature is also highly variable ranging from 180-250 K. The assessment of satellite-measured UTH is based on coincident comparisons with balloon frost-point hygrometer sondes, multi-month mapped comparisons, zonal mean time series comparisons, and coincident satellite-to-satellite comparisons. While the satellite fields show similar features in maps and time series, quantitatively they can differ by a factor of 2 in concentration, with strong dependencies on the amount of UTH. Additionally, time-lag response-corrected Vaisala RS92 radiosondes are compared to satellites and the frost-point hygrometer measurements. In summary, most satellite data sets reviewed here show on average similar to 30 % agreement amongst themselves and frost-point data but with an additional similar to 30 % variability about the mean bias. The Vaisala RS92 sonde, even with a time-lag correction, shows poor behavior for pressures less than 200 hPa

    Reference Upper-Air Observations for Climate: Rationale, Progress, and Plans

    Get PDF
    While the global upper-air observing network has provided useful observations for operational weather forecasting for decades, its measurements lack the accuracy and long-term continuity needed for understanding climate change. Consequently, the scientific community faces uncertainty on key climate issues, such as the nature of temperature trends in the troposphere and stratosphere; the climatology, radiative effects, and hydrological role of water vapor in the upper troposphere and stratosphere; and the vertical profile of changes in atmospheric ozone, aerosols, and other trace constituents. Radiosonde data provide adequate vertical resolution to address these issues, but they have questionable accuracy and time-varying biases due to changing instrumentation and techniques. Although satellite systems provide global coverage, their vertical resolution is sometimes inadequate and they require independent reference observations for sensor and data product validation, and for merging observations from different platforms into homogeneous climate records. To address these shortcomings, and to ensure that future climate records will be more useful than the records to date, the Global Climate Observing System (GCOS) program is initiating a GCOS Reference Upper-Air Network (GRUAN) to provide high-quality observations using specialized radiosondes and complementary remote sensing profiling instrumentation that can be used for validation. This paper outlines the scientific rationale for GRUAN, its role in the Global Earth Observation System of Systems, network requirements and likely instrumentation, management structure, current status, and future plans. It also illustrates the value of prototype reference upper-air observations in constructing climate records and their potential contribution to the Global Space-Based Inter-Calibration System. We invite constructive feedback on the GRUAN concept and the engagement of the scientific community

    Analysis of Raman Lidar and Radiosonde Measurements from the AWEX-G Field Campaign and Its Relation to Aqua Validation

    No full text
    Early work within the Aqua validation activity revealed there to be large differences in water vapor measurement accuracy among the various technologies in use for providing validation data. The validation measurements were made at globally distributed sites making it difficult to isolate the sources of the apparent measurement differences among the various sensors, which included both Raman lidar and radiosonde. Because of this, the AIRS Water Vapor Experiment-Ground (AWEX-G) was held in October-November 2003 with the goal of bringing validation technologies to a common site for intercomparison and resolving the measurement discrepancies. Using the University of Colorado Cryogenic Frostpoint Hygrometer (CFH) as the water vapor reference, the AWEX-G field campaign permitted correction techniques to be validated for Raman lidar, Vaisala RS80-H and RS90/92 that significantly improve the absolute accuracy of water vapor measurements from these systems particularly in the upper troposphere. Mean comparisons of radiosondes and lidar are performed demonstrating agreement between corrected sensors and the CFH to generally within 5% thereby providing data of sufficient accuracy for Aqua validation purposes. Examples of the use of the correction techniques in radiance and retrieval comparisons are provided and discussed

    Leading Edge Hybrid Multi-GPU Algorithms for Generalized Eigenproblems in Electronic Structure Calculations ⋆

    No full text
    Abstract. Today’s high computational demands from engineering fields and complex hardware development make it necessary to develop and optimize new algorithms toward achieving high performance and good scalability on the next generation of computers. The enormous gap between the high-performance capabilities of GPUs and the slow interconnect between them has made the development of numerical software that is scalable across multiple GPUs extremely challenging. We describe and analyze a successful methodology to address the challenges—starting from our algorithm design, kernel optimization and tuning, to our programming model—in the development of a scalable high-performance generalized eigenvalue solver in the context of electronic structure calculations in materials science applications. We developed a set of leading edge dense linear algebra algorithms, as part of a generalized eigensolver, featuring fine grained memory aware kernels, a task based approach and hybrid execution/scheduling. The goal of the new design is to increase the computational intensity of the major compute kernels and to reduce synchronization and data transfers between GPUs. We report the performance impact on the generalized eigensolver when different fractions of eigenvectors are needed. The algorithm described provides an enormous performance boost compared to current GPU-based solutions, and performance comparable to state-of-the-art distributed solutions, using a single node with multiple GPUs.

    SHADOZ in the Aura Era

    No full text
    We present comparisons of observed tropical and sub-tropical ozone from the Southern Hemisphere Additional Ozonesondes (SHADOZ) project with satellite measurements using Aura's Ozone Monitoring Instrument (OMI) and Microwave Limb Sounder (MLS) instruments. Satellite products of total and derived tropospheric column ozone from OMI and profiles of ozone in the UT/LS region from MLS are used
    corecore