840 research outputs found

    Evaluation of Directive-Based GPU Programming Models on a Block Eigensolver with Consideration of Large Sparse Matrices

    Get PDF
    Achieving high performance and performance portability for large-scale scientific applications is a major challenge on heterogeneous computing systems such as many-core CPUs and accelerators like GPUs. In this work, we implement a widely used block eigensolver, Locally Optimal Block Preconditioned Conjugate Gradient (LOBPCG), using two popular directive based programming models (OpenMP and OpenACC) for GPU-accelerated systems. Our work differs from existing work in that it adopts a holistic approach that optimizes the full solver performance rather than narrowing the problem into small kernels (e.g., SpMM, SpMV). Our LOPBCG GPU implementation achieves a 2.8×{\times }–4.3×{\times } speedup over an optimized CPU implementation when tested with four different input matrices. The evaluated configuration compared one Skylake CPU to one Skylake CPU and one NVIDIA V100 GPU. Our OpenMP and OpenACC LOBPCG GPU implementations gave nearly identical performance. We also consider how to create an efficient LOBPCG solver that can solve problems larger than GPU memory capacity. To this end, we create microbenchmarks representing the two dominant kernels (inner product and SpMM kernel) in LOBPCG and then evaluate performance when using two different programming approaches: tiling the kernels, and using Unified Memory with the original kernels. Our tiled SpMM implementation achieves a 2.9×{\times } and 48.2×{\times } speedup over the Unified Memory implementation on supercomputers with PCIe Gen3 and NVLink 2.0 CPU to GPU interconnects, respectively

    Theoretical investigation of TbNi_{5-x}Cu_x optical properties

    Full text link
    In this paper we present theoretical investigation of optical conductivity for intermetallic TbNi_{5-x}Cu_x series. In the frame of LSDA+U calculations electronic structure for x=0,1,2 and on top of that optical conductivities were calculated. Disorder effects of Ni for Cu substitution on a level of LSDA+U densities of states (DOS) were taken into account via averaging over all possible Cu ion positions for given doping level x. Gradual suppression and loosing of structure of optical conductivity at 2 eV together with simultaneous intensity growth at 4 eV correspond to increase of Cu and decrease of Ni content. As reported before [Knyazev et al., Optics and Spectroscopy 104, 360 (2008)] plasma frequency has non monotonic doping behaviour with maximum at x=1. This behaviour is explained as competition between lowering of total density of states on the Fermi level N(E_F) and growing of number of carriers. Our theoretical results agree well with variety of recent experiments.Comment: 4 pages, 3 figure

    Electronic structure, magnetic and optical properties of intermetallic compounds R2Fe17 (R=Pr,Gd)

    Full text link
    In this paper we report comprehensive experimental and theoretical investigation of magnetic and electronic properties of the intermetallic compounds Pr2Fe17 and Gd2Fe17. For the first time electronic structure of these two systems was probed by optical measurements in the spectral range of 0.22-15 micrometers. On top of that charge carriers parameters (plasma frequency and relaxation frequency) and optical conductivity s(w) were determined. Self-consistent spin-resolved bandstructure calculations within the conventional LSDA+U method were performed. Theoretical interpetation of the experimental s(w) dispersions indicates transitions between 3d and 4p states of Fe ions to be the biggest ones. Qualitatively the line shape of the theoretical optical conductivity coincides well with our experimental data. Calculated by LSDA+U method magnetic moments per formula unit are found to be in good agreement with observed experimental values of saturation magnetization.Comment: 16 pages, 5 figures, 1 tabl

    Introductory Remarks from the Guest Editors

    Full text link

    Optical spectroscopy and electronic structure of compounds HoNi 5-x Alx (x = 0, 1, 2)

    Full text link
    The optical properties of the compounds HoNi5 - x Al x (x = 0, 1, 2) have been investigated using the ellipsometric method in the wavelength range from 0.22 to 16 μm. The electronic structure of these intermetallic compounds has been calculated in the local electron-spin density approximation with the correction for strong electronic interactions in the 4f shell of the holmium ions. The experimental dispersion dependences of optical conductivity in the region of interband light absorption have been interpreted based on the results of the calculation of the electron density of states. The plasma and relaxation frequencies of electrons have been determined. © 2013 Pleiades Publishing, Ltd

    Specific features of the electronic structure and spectral properties of NdNi5 - xCux compounds

    Full text link
    The spectral properties of the intermetallic compounds NdNi5 - xCux (x = 0, 1, 2) have been studied using optical ellipsometry in the wavelength range 0.22-16 μm. It has been established that substitution of copper atoms for nickel leads to noticeable changes in the optical absorption spectra, plasma frequencies, and relaxation frequencies of conduction electrons. Spin-polarized calculations of the electronic structure of these compounds have been performed in the local spin density approximation allowing for strong electron correlations (LSDA + U method) in the 4f shell of the rare-earth ion. The calculated electron densities of states have been used to interpret the experimental dispersion curves of optical conductivity in the interband light absorption region. © 2013 Pleiades Publishing, Ltd

    Design Principles for Sparse Matrix Multiplication on the GPU

    Full text link
    We implement two novel algorithms for sparse-matrix dense-matrix multiplication (SpMM) on the GPU. Our algorithms expect the sparse input in the popular compressed-sparse-row (CSR) format and thus do not require expensive format conversion. While previous SpMM work concentrates on thread-level parallelism, we additionally focus on latency hiding with instruction-level parallelism and load-balancing. We show, both theoretically and experimentally, that the proposed SpMM is a better fit for the GPU than previous approaches. We identify a key memory access pattern that allows efficient access into both input and output matrices that is crucial to getting excellent performance on SpMM. By combining these two ingredients---(i) merge-based load-balancing and (ii) row-major coalesced memory access---we demonstrate a 4.1x peak speedup and a 31.7% geomean speedup over state-of-the-art SpMM implementations on real-world datasets.Comment: 16 pages, 7 figures, International European Conference on Parallel and Distributed Computing (Euro-Par) 201

    Influence of aluminum impurity on the electronic structure and optical properties of the TbNi5 intermetallic compound

    Full text link
    The electronic structure of the TbNi5 - xAlx intermetallic compounds (x = 0, 1, 2) is calculated in the local electron density approximation with the correction to strong electron correlations in 4f shell of terbium ions. Spectral properties of these compounds are measured by ellipsometry in a wavelength range of 0. 22-16 μm. Frequency dependences of optical conductivity in the region of interband optical absorption are interpreted based on the results of calculations of electron densities of states. The relaxation and plasma frequencies of conduction electrons are determined. © 2013 Pleiades Publishing, Ltd

    Renormalization of hole-hole interaction at decreasing Drude conductivity

    Full text link
    The diffusion contribution of the hole-hole interaction to the conductivity is analyzed in gated GaAs/Inx_xGa1x_{1-x}As/GaAs heterostructures. We show that the change of the interaction correction to the conductivity with the decreasing Drude conductivity results both from the compensation of the singlet and triplet channels and from the arising prefactor αi<1\alpha_i<1 in the conventional expression for the interaction correction.Comment: 6 pages, 5 figure
    corecore