131 research outputs found

    Sparse matrix‐vector and matrix‐multivector products for the truncated SVD on graphics processors

    Get PDF
    Many practical algorithms for numerical rank computations implement an iterative procedure that involves repeated multiplications of a vector, or a collection of vectors, with both a sparse matrix AA and its transpose. Unfortunately, the realization of these sparse products on current high performance libraries often deliver much lower arithmetic throughput when the matrix involved in the product is transposed. In this work, we propose a hybrid sparse matrix layout, named CSRC, that combines the flexibility of some well-known sparse formats to offer a number of appealing properties: (1) CSRC can be obtained at low cost from the popular CSR (compressed sparse row) format; (2) CSRC has similar storage requirements as CSR; and especially, (3) the implementation of the sparse product kernels delivers high performance for both the direct product and its transposed variant on modern graphics accelerators thanks to a significant reduction of atomic operations compared to a conventional implementation based on CSR. This solution thus renders considerably higher performance when integrated into an iterative algorithm for the truncated singular value decomposition (SVD), such as the randomized SVD or, as demonstrated in the experimental results, the block Golub–Kahan–Lanczos algorithm

    Compressed basis GMRES on high-performance graphics processing units

    Get PDF
    Krylov methods provide a fast and highly parallel numerical tool for the iterative solution of many large-scale sparse linear systems. To a large extent, the performance of practical realizations of these methods is constrained by the communication bandwidth in current computer architectures, motivating the investigation of sophisticated techniques to avoid, reduce, and/or hide the message-passing costs (in distributed platforms) and the memory accesses (in all architectures). This article leverages Ginkgo’s memory accessor in order to integrate a communication-reduction strategy into the (Krylov) GMRES solver that decouples the storage format (i.e., the data representation in memory) of the orthogonal basis from the arithmetic precision that is employed during the operations with that basis. Given that the execution time of the GMRES solver is largely determined by the memory accesses, the cost of the datatype transforms can be mostly hidden, resulting in the acceleration of the iterative step via a decrease in the volume of bits being retrieved from memory. Together with the special properties of the orthonormal basis (whose elements are all bounded by 1), this paves the road toward the aggressive customization of the storage format, which includes some floating-point as well as fixed-point formats with mild impact on the convergence of the iterative process. We develop a high-performance implementation of the “compressed basis GMRES” solver in the Ginkgo sparse linear algebra library using a large set of test problems from the SuiteSparse Matrix Collection. We demonstrate robustness and performance advantages on a modern NVIDIA V100 graphics processing unit (GPU) of up to 50% over the standard GMRES solver that stores all data in IEEE double-precision

    Clinical value of diascopy and other non-invasive techniques on differential diagnosis algorithms of oral pigmentations: a systematic review

    Get PDF
    Objectives: To determine the diagnostic value of diascopy and other non-invasive clinical aids on recent differential diagnosis algorithms of oral mucosal pigmentations affecting subjects of any age. Material and Methods: Data Sources: this systematic review was conducted by searching PubMed, Scopus, Dentistry & Oral Sciences Source and the Cochrane Library (2000-2015); Study Selection: two reviewers independently selected all types of English articles describing differential diagnosis algorithms of oral pigmentations and checked the references of finally included papers; Data Extraction: one reviewer performed the data extraction and quality assessment based on previously defined fields while the other reviewer checked their validity. Results: Data Synthesis: eight narrative reviews and one single case report met the inclusion criteria. Diascopy was used on six algorithms (66.67%) and X-ray was included once (11.11%; 44.44% with text mentions); these were considered helpful tools in the diagnosis of intravascular and exogenous pigmentations, respectively. Surface rubbing was described once in the text (11.11%). Conclusions: Diascopy was the most applied method followed by X-ray and surface rubbing. The limited scope of these procedures only makes them useful when a positive result is obtained, turning biopsy into the most recommended technique when diagnosis cannot be established on clinical grounds alon

    Plasmatic protein values in captive adult Iberian red deer stags (Cervus elaphus hispanicus)

    Get PDF
    The aim of this study was to assess the time trend of plasmatic proteins in red deer stags. Blood samples were taken monthly from 17 male red deer for 22 months. Total plasmatic determination and protein electrophoresis were performed. Plasmatic proteins showed minimum values during spring and summer and a maximum at the peak of the mating period. Total globulins, β and γ, followed a pattern similar to that observed for total proteins, whereas α1 and α2 globulins showed no seasonal variations. Albumin showed higher values in early spring and summer and lower values at the beginning of autumn, coinciding with the mating season. These seasonal changes in plasmatic proteins should be taken into account when assessing blood protein analysis results.This study was funded by projects AGL2007-63838/gan, PBI-05-040, PAC 06-01304298 and PET2006-0263 and MICINN (PTQ 09-02-01923).Peer Reviewe

    Performance–energy trade‑ofs of deep learning convolution algorithms on ARM processors

    Get PDF
    In this work, we assess the performance and energy efciency of high-performance codes for the convolution operator, based on the direct, explicit/implicit lowering and Winograd algorithms used for deep learning (DL) inference on a series of ARM-based processor architectures. Specifcally, we evaluate the NVIDIA Denver2 and Carmel processors, as well as the ARM Cortex-A57 and Cortex-A78AE CPUs as part of a recent set of NVIDIA Jetson platforms. The performance–energy evaluation is carried out using the ResNet-50 v1.5 convolutional neural network (CNN) on varying confgurations of convolution algorithms, number of threads/cores, and operating frequencies on the tested processor cores. The results demonstrate that the best throughput is obtained on all platforms with the Winograd convolution operator running on all the cores at their highest frequency. However, if the goal is to reduce the energy footprint, there is no rule of thumb for the optimal confguration.Funding for open access charge: CRUE-Universitat Jaume

    Performance–energy trade-offs of deep learning convolution algorithms on ARM processors

    Get PDF
    In this work, we assess the performance and energy efficiency of high-performance codes for the convolution operator, based on the direct, explicit/implicit lowering and Winograd algorithms used for deep learning (DL) inference on a series of ARM-based processor architectures. Specifically, we evaluate the NVIDIA Denver2 and Carmel processors, as well as the ARM Cortex-A57 and Cortex-A78AE CPUs as part of a recent set of NVIDIA Jetson platforms. The performance–energy evaluation is carried out using the ResNet-50 v1.5 convolutional neural network (CNN) on varying configurations of convolution algorithms, number of threads/cores, and operating frequencies on the tested processor cores. The results demonstrate that the best throughput is obtained on all platforms with the Winograd convolution operator running on all the cores at their highest frequency. However, if the goal is to reduce the energy footprint, there is no rule of thumb for the optimal configuration.Open Access funding provided thanks to the CRUE-CSIC agreement with Springer Nature. This research was funded by Project PID2020-113656RB-C21/C22 supported by MCIN/AEI/10.13039/501100011033. Manuel F. Dolz was also supported by the Plan Gen–T grant CDEIGENT/2018/014 of the Generalitat Valenciana. Héctor Martínez is a POSTDOC_21_00025 fellow supported by Junta de Andalucía. Adrián Castelló is a FJC2019-039222-I fellow supported by MCIN/AEI/10.13039/501100011033. Antonio Maciá is a PRE2021-099284 fellow supported by MCIN/AEI/10.13039/501100011033

    Calibration of semi-analytic models of galaxy formation using Particle Swarm Optimization

    Get PDF
    We present a fast and accurate method to select an optimal set of parameters in semi-analytic models of galaxy formation and evolution (SAMs). Our approach compares the results of a model against a set of observables applying a stochastic technique called Particle Swarm Optimization (PSO), a self-learning algorithm for localizing regions of maximum likelihood in multidimensional spaces that outperforms traditional sampling methods in terms of computational cost. We apply the PSO technique to the SAG semi-analytic model combined with merger trees extracted from a standard Λ\LambdaCDM N-body simulation. The calibration is performed using a combination of observed galaxy properties as constraints, including the local stellar mass function and the black hole to bulge mass relation. We test the ability of the PSO algorithm to find the best set of free parameters of the model by comparing the results with those obtained using a MCMC exploration. Both methods find the same maximum likelihood region, however the PSO method requires one order of magnitude less evaluations. This new approach allows a fast estimation of the best-fitting parameter set in multidimensional spaces, providing a practical tool to test the consequences of including other astrophysical processes in SAMs.Comment: 11 pages, 4 figures, 1 table. Accepted for publication in ApJ. Comments are welcom

    Bifunctional W/NH Cuboidal Aminophosphino W3S4 Cluster Hydrides: The Puzzling Behaviour behind the Hydridic-Protonic Interplay

    Get PDF
    The novel [W3S4H3(edpp)3]+ (edpp=(2-aminoethyl)diphenylphosphine) (1+) cluster hydride with an acidic −NH2 functionality has been synthetized and studied. Its crystal structure shows the characteristic incomplete W3S4 cubane core with the outer positions occupied by the P and N atoms of the edpp ligands. Although no signal due to the hydride ligands is observed in the 1H NMR spectrum, hydride assignment is supported by 1H-15N HSQC techniques, the changes in the 31P{1H} NMR chemical shift, and FT-IR spectra in the W−H region of the deuterated [W3S4D2H(edpp)3]+ (1+-d2) samples. Moreover, all NMR evidences suggest that one of the hydrogen atoms of the NH2 group in 1+ is rapidly exchanging with the hydride. The reaction of 1+ with acids (HCl, HBr and DCl) features complex polyphasic kinetics with zero-order dependence with respect to the acid concentration, being also independent of the solvent nature. This behavior differs from that of their diphosphino analogues, suggesting a different mechanism
    corecore