89 research outputs found

    Computing the sparse matrix vector product using block-based kernels without zero padding on processors with AVX-512 instructions

    Full text link
    The sparse matrix-vector product (SpMV) is a fundamental operation in many scientific applications from various fields. The High Performance Computing (HPC) community has therefore continuously invested a lot of effort to provide an efficient SpMV kernel on modern CPU architectures. Although it has been shown that block-based kernels help to achieve high performance, they are difficult to use in practice because of the zero padding they require. In the current paper, we propose new kernels using the AVX-512 instruction set, which makes it possible to use a blocking scheme without any zero padding in the matrix memory storage. We describe mask-based sparse matrix formats and their corresponding SpMV kernels highly optimized in assembly language. Considering that the optimal blocking size depends on the matrix, we also provide a method to predict the best kernel to be used utilizing a simple interpolation of results from previous executions. We compare the performance of our approach to that of the Intel MKL CSR kernel and the CSR5 open-source package on a set of standard benchmark matrices. We show that we can achieve significant improvements in many cases, both for sequential and for parallel executions. Finally, we provide the corresponding code in an open source library, called SPC5.Comment: Published in Peer J C

    Shock capturing techniques for hp-adaptive finite elements

    No full text
    The aim of this work is to propose an hp-adaptive algorithm for discontinuous Galerkin methods that is capable to detect the discontinuities and sharp layers and avoid the spurious oscillation of the solution around them. In order to control the spurious oscillations, artificial viscosity is used with the particularity that it is only applied around the layers where the solution changes abruptly. To do so, a novel troubled-cell detector has been developed in order to mark the elements around those layers and to impose linear order in them. The detector takes advantage of the evolution of the value of the gradient through the adaptive process.Peer ReviewedPostprint (published version

    Separation of Plastic Particles in Electrostatic Field Produced by Electrodes of Optimized Shape

    Get PDF
    Shape optimization of electrodes for the device for electrostatic separation of triboelectrically charged plastic particles is carried out. The objective function maximizes the efficiency of separation consisting in the highest possible number of particles falling down to the prescribed bins. Electric field in the system is solved numerically, using the fully adaptive higher-order finite element method. The movement of particles in the device influenced by the Coulomb force is determined by means of an adaptive Runge-Kutta-Fehlberg method with a time varying time step. The shape optimization is carried out using a technique based on genetic algorithms. The methodology is illustrated by an example whose results are discussed

    Unmasking Differential Effects of Rosiglitazone and Pioglitazone in the Combination Treatment with n-3 Fatty Acids in Mice Fed a High-Fat Diet

    Get PDF
    Combining pharmacological treatments and life style interventions is necessary for effective therapy of major diseases associated with obesity, which are clustered in the metabolic syndrome. Acting via multiple mechanisms, combination treatments may reduce dose requirements and, therefore, lower the risk of adverse side effects, which are usually associated with long-term pharmacological interventions. Our previous study in mice fed high-fat diet indicated additivity in preservation of insulin sensitivity and in amelioration of major metabolic syndrome phenotypes by the combination treatment using n-3 long-chain polyunsaturated fatty acids (n-3 LC-PUFA) and rosiglitazone, i.e. an anti-diabetic drug of the thiazolidinedione (TZD) family. We investigated here whether pioglitazone, a TZD-drug in clinical use, could elicit the additive beneficial effects when combined with n-3 LC-PUFA. Adult male mice (C57BL/6N) were fed an obesogenic corn oil-based high-fat diet (cHF) for 8 weeks, or randomly assigned to various dietary treatments (i) cHF+F, cHF with n-3 LC-PUFA concentrate replacing 15% of dietary lipids; (ii) cHF+ROSI, cHF with 10 mg rosiglitazone/kg diet; (iii) cHF+F+ROSI; (iv) cHF+PIO, cHF with 50 mg pioglitazone/kg diet; and (v) cHF+F+PIO, or chow-fed. Plasma concentrations of 163 metabolites were evaluated using a targeted metabolomics approach. Both TZDs preserved glucose homeostasis and normal plasma lipid levels while inducing adiponectin, with pioglitazone showing better effectiveness. The beneficial effects of TZDs were further augmented by the combination treatments. cHF+F+ROSI but not cHF+F+PIO counteracted development of obesity, in correlation with inducibility of fatty acid β-oxidation, as revealed by the metabolomic analysis. By contrast, only cHF+F+PIO eliminated hepatic steatosis and this treatment also reversed insulin resistance in dietary obese mice. Our results reveal differential effects of rosiglitazone and pioglitazone, unmasked in the combination treatment with n-3 LC-PUFA, and support the notion that n-3 LC-PUFA could be used as add-on treatment to TZDs in order to improve diabetic patient's therapy

    Benefits from using mixed precision computations in the ELPA-AEO and ESSEX-II eigensolver projects

    Get PDF
    We first briefly report on the status and recent achievements of the ELPA-AEO (Eigenvalue Solvers for Petaflop Applications - Algorithmic Extensions and Optimizations) and ESSEX II (Equipping Sparse Solvers for Exascale) projects. In both collaboratory efforts, scientists from the application areas, mathematicians, and computer scientists work together to develop and make available efficient highly parallel methods for the solution of eigenvalue problems. Then we focus on a topic addressed in both projects, the use of mixed precision computations to enhance efficiency. We give a more detailed description of our approaches for benefiting from either lower or higher precision in three selected contexts and of the results thus obtained

    Software for the frontiers of quantum chemistry:An overview of developments in the Q-Chem 5 package

    Get PDF
    This article summarizes technical advances contained in the fifth major release of the Q-Chem quantum chemistry program package, covering developments since 2015. A comprehensive library of exchange–correlation functionals, along with a suite of correlated many-body methods, continues to be a hallmark of the Q-Chem software. The many-body methods include novel variants of both coupled-cluster and configuration-interaction approaches along with methods based on the algebraic diagrammatic construction and variational reduced density-matrix methods. Methods highlighted in Q-Chem 5 include a suite of tools for modeling core-level spectroscopy, methods for describing metastable resonances, methods for computing vibronic spectra, the nuclear–electronic orbital method, and several different energy decomposition analysis techniques. High-performance capabilities including multithreaded parallelism and support for calculations on graphics processing units are described. Q-Chem boasts a community of well over 100 active academic developers, and the continuing evolution of the software is supported by an “open teamware” model and an increasingly modular design

    ATLAS Run 1 searches for direct pair production of third-generation squarks at the Large Hadron Collider

    Get PDF

    Measurements of top-quark pair differential cross-sections in the eμe\mu channel in pppp collisions at s=13\sqrt{s} = 13 TeV using the ATLAS detector

    Get PDF

    Measurement of the W boson polarisation in ttˉt\bar{t} events from pp collisions at s\sqrt{s} = 8 TeV in the lepton + jets channel with ATLAS

    Get PDF
    corecore