37 research outputs found

    Accelerating the task/data-parallel version of ILUPACKÂżs BiCG in multi-CPU/GPU configurations

    Full text link
    [EN] ILUPACK is a valuable tool for the solution of sparse linear systems via iterative Krylov subspace-based methods. Its relevance for the solution of real problems has motivated several efforts to enhance its performance on parallel machines. In this work we focus on exploiting the task-level parallelism derived from the structure of the BiCG method, in addition to the data-level parallelism of the internal matrix computations, with the goal of boosting the performance of a GPU (graphics processing unit) implementation of this solver. First, we revisit the use of dual-GPU systems to execute independent stages of the BiCG concurrently on both accelerators, while leveraging the extra memory space to improve the data access patterns. In addition, we extend our ideas to compute the BiCG method efficiently in multicore platforms with a single GPU. In this line, we study the possibilities offered by hybrid CPU-GPU computations, as well as a novel synchronization-free sparse triangular linear solver. The experimental results with the new solvers show important acceleration factors with respect to the previous data-parallel CPU and GPU versions. (C) 2019 Elsevier B.V. All rights reserved.J. I. Aliaga and E. S. Quintana-Orti were supported by project TIN2017-82972-R of the MINECO and FEDER. E. Dufrechou and P. Ezzatti were supported by Programa de Desarrollo de las Ciencias Basicas (PEDECIBA), Uruguay.Aliaga, JI.; Dufrechou, E.; Ezzatti, P.; Quintana-OrtĂ­, ES. (2019). Accelerating the task/data-parallel version of ILUPACKÂżs BiCG in multi-CPU/GPU configurations. Parallel Computing. 85:79-87. https://doi.org/10.1016/j.parco.2019.02.005S79878

    Factorized solution of generalized stable Sylvester equations using many-core GPU accelerators

    Full text link
    [EN] We investigate the factorized solution of generalized stable Sylvester equations such as those arising in model reduction, image restoration, and observer design. Our algorithms, based on the matrix sign function, take advantage of the current trend to integrate high performance graphics accelerators (also known as GPUs) in computer systems. As a result, our realisations provide a valuable tool to solve large-scale problems on a variety of platforms.We acknowledge support of the ANII - MPG Independent Research Group: "Efficient Hetergenous Computing" at UdelaR, a partner group of the Max Planck Institute in Magdeburg.Benner, P.; Dufrechou, E.; Ezzatti, P.; Gallardo, R.; Quintana-OrtĂ­, ES. (2021). Factorized solution of generalized stable Sylvester equations using many-core GPU accelerators. The Journal of Supercomputing (Online). 77(9):10152-19164. https://doi.org/10.1007/s11227-021-03658-y101521916477

    Unleashing GPU acceleration for symmetric band linear algebra kernels and model reduction

    Get PDF
    Linear algebra operations arise in a myriad of scientific and engineering applications and, therefore, their optimization is targeted by a significant number of high performance computing (HPC) research efforts. In particular, the matrix multiplication and the solution of linear systems are two key problems with efficient implementations (or kernels) for a variety of high per- formance parallel architectures. For these specific prob- lems, leveraging the structure of the associated matrices often leads to remarkable time and memory savings, as is the case, e.g., for symmetric band problems. In this work, we exploit the ample hardware concurrency of many-core graphics processors (GPUs) to accelerate the solution of symmetric positive definite band linear systems, introducing highly tuned versions of the corre- sponding LAPACK routines. The experimental results with the new GPU kernels reveal important reductions of the execution time when compared with tuned imple- mentations of the same operations provided in Intel’s MKL. In addition, we evaluate the performance of the GPU kernels when applied to the solution of model or- der reduction problems and the associated matrix equa- tions.Ernesto Dufrechou and Pablo Ezzatti acknowledge the support from Programa de Desarrollo de las Ciencias Básicas, and Agencia Nacional de Investigación e Innovacioón, Uruguay. Enrique S. Quintana-Ortí was sup- ported by project TIN2011-23283 of the Ministry of Science and Competitiveness (MINECO) and EU FEDER, and project P1-1B2013-20 of the Fundació Caixa Castelló-Bancaixa and UJI

    Characterizing the efficiency of multicore and manycore processors for the solution of sparse linear systems

    Get PDF
    We analyze the efficiency of servers equipped with state-of-the-art general-purpose multicore processors as well as platforms based on accelerators such as graphics processing units (GPUs) and the Intel Xeon Phi. Following the proposal recently advocated in the High Performance Conjugate Gradient (HPCG) benchmark, we leverage for this purpose efficient implementations of ILUPACK, a preconditioned solver for sparse linear systems that comprises numerical kernels and data access patterns analogous to those of HPCG. Our study analyzes the (computational) performance and energy efficiency, with two different metrics for each: time/floating-point throughput for the former; and energy/floating-point throughput-per-Watt for the latter.This work was supported by the CICYT project TIN2011-23283 of MINECO and FEDER, and the EU Project FP7 318793 “EXA2GREEN”

    An efficient GPU version of the preconditioned GMRES method

    Full text link
    [EN] In a large number of scientific applications, the solution of sparse linear systems is the stage that concentrates most of the computational effort. This situation has motivated the study and development of several iterative solvers, among which preconditioned Krylov subspace methods occupy a place of privilege. In a previous effort, we developed a GPU-aware version of the GMRES method included in ILUPACK, a package of solvers distinguished by its inverse-based multilevel ILU preconditioner. In this work, we study the performance of our previous proposal and integrate several enhancements in order to mitigate its principal bottlenecks. The numerical evaluation shows that our novel proposal can reach important run-time reductions.Aliaga, JI.; Dufrechou, E.; Ezzatti, P.; Quintana-Orti, ES. (2019). An efficient GPU version of the preconditioned GMRES method. The Journal of Supercomputing. 75(3):1455-1469. https://doi.org/10.1007/s11227-018-2658-1S14551469753Aliaga JI, Badia RM, Barreda M, Bollhöfer M, Dufrechou E, Ezzatti P, Quintana-Ortí ES (2016) Exploiting task and data parallelism in ILUPACK’s preconditioned CG solver on NUMA architectures and many-core accelerators. Parallel Comput 54:97–107Aliaga JI, Bollhöfer M, Dufrechou E, Ezzatti P, Quintana-Ortí ES (2016) A data-parallel ILUPACK for sparse general and symmetric indefinite linear systems. In: Lecture Notes in Computer Science, 14th Int. Workshop on Algorithms, Models and Tools for Parallel Computing on Heterogeneous Platforms—HeteroPar’16. SpringerAliaga JI, Bollhöfer M, Martín AF, Quintana-Ortí ES (2011) Exploiting thread-level parallelism in the iterative solution of sparse linear systems. Parallel Comput 37(3):183–202Aliaga JI, Bollhöfer M, Martín AF, Quintana-Ortí ES (2012) Parallelization of multilevel ILU preconditioners on distributed-memory multiprocessors. Appl Parallel Sci Comput LNCS 7133:162–172Aliaga JI, Dufrechou E, Ezzatti P, Quintana-Ortí ES (2018) Accelerating a preconditioned GMRES method in massively parallel processors. In: CMMSE 2018: Proceedings of the 18th International Conference on Mathematical Methods in Science and Engineering (2018)Bollhöfer M, Grote MJ, Schenk O (2009) Algebraic multilevel preconditioner for the Helmholtz equation in heterogeneous media. SIAM J Sci Comput 31(5):3781–3805Bollhöfer M, Saad Y (2006) Multilevel preconditioners constructed from inverse-based ILUs. SIAM J Sci Comput 27(5):1627–1650Dufrechou E, Ezzatti P (2018) A new GPU algorithm to compute a level set-based analysis for the parallel solution of sparse triangular systems. In: 2018 IEEE International Parallel and Distributed Processing Symposium, IPDPS 2018, Canada, 2018. IEEE Computer SocietyDufrechou E, Ezzatti P (2018) Solving sparse triangular linear systems in modern GPUs: a synchronization-free algorithm. In: 2018 26th Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP), pp 196–203. https://doi.org/10.1109/PDP2018.2018.00034Eijkhout V (1992) LAPACK working note 50: distributed sparse data structures for linear algebra operations. Tech. rep., Knoxville, TN, USAGolub GH, Van Loan CF (2013) Matrix computationsHe K, Tan SXD, Zhao H, Liu XX, Wang H, Shi G (2016) Parallel GMRES solver for fast analysis of large linear dynamic systems on GPU platforms. Integration 52:10–22 http://www.sciencedirect.com/science/article/pii/S016792601500084XLiu W, Li A, Hogg JD, Duff IS, Vinter B (2017) Fast synchronization-free algorithms for parallel sparse triangular solves with multiple right-hand sides. Concurr Comput 29(21)Saad Y (2003) Iterative methods for sparse linear systems, 2nd edn. SIAM, PhiladelphiaSchenk O, Wächter A, Weiser M (2008) Inertia revealing preconditioning for large-scale nonconvex constrained optimization. SIAM J Sci Comput 31(2):939–96

    3175 Chemical Shifts and Coupling Constants for C14H28NO3P

    No full text

    3075 Chemical Shifts and Coupling Constants for C14H22NO3P

    No full text

    Selecting optimal SpMV realizations for GPUs via machine learning

    Full text link
    [EN] More than 10 years of research related to the development of efficient GPU routines for the sparse matrix-vector product (SpMV) have led to several realizations, each with its own strengths and weaknesses. In this work, we review some of the most relevant efforts on the subject, evaluate a few prominent routines that are publicly available using more than 3000 matrices from different applications, and apply machine learning techniques to anticipate which SpMV realization will perform best for each sparse matrix on a given parallel platform. Our numerical experiments confirm the methods offer such varied behaviors depending on the matrix structure that the identification of general rules to select the optimal method for a given matrix becomes extremely difficult, though some useful strategies (heuristics) can be defined. Using a machine learning approach, we show that it is possible to obtain unexpensive classifiers that predict the best method for a given sparse matrix with over 80% accuracy, demonstrating that this approach can deliver important reductions in both execution time and energy consumptionThe author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: ES Quintana-Ort was supported by project TIN2017-82972-R of the MINECO and FEDERDufrechou, E.; Ezzatti, P.; Quintana-Orti, ES. (2021). Selecting optimal SpMV realizations for GPUs via machine learning. International Journal of High Performance Computing Applications. 35(3):254-267. https://doi.org/10.1177/1094342021990738S25426735

    Studies of polyphosphate composition and their interaction with dairy matrices by ion chromatography and 31P NMR spectroscopy

    No full text
    International audienceThe use of ion-exchange chromatography and 31P-nuclear magnetic resonance (NMR) to analyse the composition and the chain length of phosphate emulsifying salts were studied, as well as the impact of these salts in dairy products. Ion chromatography was more appropriate than 31P-NMR to study polyphosphate composition in complex environments, whereas interactions between phosphate species and dairy components were elucidated by 31P-NMR. Phosphate species interacting with calcium, as well as the percentage of chelated calcium, were identified using 31P-NMR. Thus, ion chromatography and solid-sate 31P-NMR could be used as complementary methods to study compositions of polyphosphate blends and their interactions with dairy matrices

    Machine learning for optimal selection of sparse triangular system solvers on GPUs

    Full text link
    [EN] Many numerical algorithms for science and engineering applications require the solution of sparse triangular linear systems (sptrsv) as their most costly stage. For this reason, considerable research has been dedicated to produce efficient implementations for almost all high performance computing platforms. In the case of graphics processing units (GPUs), there are several strategies to perform this operation, which translate into a handful of different routines. In general, it is difficult to establish a priori which is the best routine for a given problem, and thus, an automatic procedure able to select the best solver for each matrix can entail large performance benefits. This work extends a previous effort, in which we relied on machine learning techniques to predict the bestsptrsvroutine for each matrix, by improving both the accuracy and the speed of the selection procedure. Specifically, we focus on the most efficient machine learning techniques regarding the speed of their training and prediction stages; evaluate the artificial generation of sparse matrices to expand our dataset; and propose heuristics to compute approximations of some expensive features. The experimental results show that we can strongly improve the runtime of our procedure without compromising the quality results. (C) 2021 Elsevier Inc. All rights reserved.The researchers from UdelaRwere supported by PEDECIBADufrechou, E.; Ezzatti, P.; Freire, M.; Quintana-OrtĂ­, ES. (2021). Machine learning for optimal selection of sparse triangular system solvers on GPUs. Journal of Parallel and Distributed Computing. 158:47-55. https://doi.org/10.1016/j.jpdc.2021.07.013S475515
    corecore