842 research outputs found

    An efficient MPI/OpenMP parallelization of the Hartree-Fock method for the second generation of Intel Xeon Phi processor

    Full text link
    Modern OpenMP threading techniques are used to convert the MPI-only Hartree-Fock code in the GAMESS program to a hybrid MPI/OpenMP algorithm. Two separate implementations that differ by the sharing or replication of key data structures among threads are considered, density and Fock matrices. All implementations are benchmarked on a super-computer of 3,000 Intel Xeon Phi processors. With 64 cores per processor, scaling numbers are reported on up to 192,000 cores. The hybrid MPI/OpenMP implementation reduces the memory footprint by approximately 200 times compared to the legacy code. The MPI/OpenMP code was shown to run up to six times faster than the original for a range of molecular system sizes.Comment: SC17 conference paper, 12 pages, 7 figure

    DPP-PMRF: Rethinking Optimization for a Probabilistic Graphical Model Using Data-Parallel Primitives

    Full text link
    We present a new parallel algorithm for probabilistic graphical model optimization. The algorithm relies on data-parallel primitives (DPPs), which provide portable performance over hardware architecture. We evaluate results on CPUs and GPUs for an image segmentation problem. Compared to a serial baseline, we observe runtime speedups of up to 13X (CPU) and 44X (GPU). We also compare our performance to a reference, OpenMP-based algorithm, and find speedups of up to 7X (CPU).Comment: LDAV 2018, October 201

    Performance and Optimization Abstractions for Large Scale Heterogeneous Systems in the Cactus/Chemora Framework

    Full text link
    We describe a set of lower-level abstractions to improve performance on modern large scale heterogeneous systems. These provide portable access to system- and hardware-dependent features, automatically apply dynamic optimizations at run time, and target stencil-based codes used in finite differencing, finite volume, or block-structured adaptive mesh refinement codes. These abstractions include a novel data structure to manage refinement information for block-structured adaptive mesh refinement, an iterator mechanism to efficiently traverse multi-dimensional arrays in stencil-based codes, and a portable API and implementation for explicit SIMD vectorization. These abstractions can either be employed manually, or be targeted by automated code generation, or be used via support libraries by compilers during code generation. The implementations described below are available in the Cactus framework, and are used e.g. in the Einstein Toolkit for relativistic astrophysics simulations

    Angpow: a software for the fast computation of accurate tomographic power spectra

    Full text link
    The statistical distribution of galaxies is a powerful probe to constrain cosmological models and gravity. In particular the matter power spectrum P(k)P(k) brings information about the cosmological distance evolution and the galaxy clustering together. However the building of P(k)P(k) from galaxy catalogues needs a cosmological model to convert angles on the sky and redshifts into distances, which leads to difficulties when comparing data with predicted P(k)P(k) from other cosmological models, and for photometric surveys like LSST. The angular power spectrum Cℓ(z1,z2)C_\ell(z_1,z_2) between two bins located at redshift z1z_1 and z2z_2 contains the same information than the matter power spectrum, is free from any cosmological assumption, but the prediction of Cℓ(z1,z2)C_\ell(z_1,z_2) from P(k)P(k) is a costly computation when performed exactly. The Angpow software aims at computing quickly and accurately the auto (z1=z2z_1=z_2) and cross (z1≠z2z_1 \neq z_2) angular power spectra between redshift bins. We describe the developed algorithm, based on developments on the Chebyshev polynomial basis and on the Clenshaw-Curtis quadrature method. We validate the results with other codes, and benchmark the performance. Angpow is flexible and can handle any user defined power spectra, transfer functions, and redshift selection windows. The code is fast enough to be embedded inside programs exploring large cosmological parameter spaces through the Cℓ(z1,z2)C_\ell(z_1,z_2) comparison with data. We emphasize that the Limber's approximation, often used to fasten the computation, gives wrong CℓC_\ell values for cross-correlations.Comment: Published in Astronomy & Astrophysic

    A Sparse SCF algorithm and its parallel implementation: Application to DFTB

    Full text link
    We present an algorithm and its parallel implementation for solving a self consistent problem as encountered in Hartree Fock or Density Functional Theory. The algorithm takes advantage of the sparsity of matrices through the use of local molecular orbitals. The implementation allows to exploit efficiently modern symmetric multiprocessing (SMP) computer architectures. As a first application, the algorithm is used within the density functional based tight binding method, for which most of the computational time is spent in the linear algebra routines (diagonalization of the Fock/Kohn-Sham matrix). We show that with this algorithm (i) single point calculations on very large systems (millions of atoms) can be performed on large SMP machines (ii) calculations involving intermediate size systems (1~000--100~000 atoms) are also strongly accelerated and can run efficiently on standard servers (iii) the error on the total energy due to the use of a cut-off in the molecular orbital coefficients can be controlled such that it remains smaller than the SCF convergence criterion.Comment: 13 pages, 11 figure

    AMRA: An Adaptive Mesh Refinement Hydrodynamic Code for Astrophysics

    Get PDF
    Implementation details and test cases of a newly developed hydrodynamic code, AMRA, are presented. The numerical scheme exploits the adaptive mesh refinement technique coupled to modern high-resolution schemes which are suitable for relativistic and non-relativistic flows. Various physical processes are incorporated using the operator splitting approach, and include self-gravity, nuclear burning, physical viscosity, implicit and explicit schemes for conductive transport, simplified photoionization, and radiative losses from an optically thin plasma. Several aspects related to the accuracy and stability of the scheme are discussed in the context of hydrodynamic and astrophysical flows.Comment: 41 pages, 21 figures (9 low-resolution), LaTeX, requires elsart.cls, submitted to Comp. Phys. Comm.; additional documentation and high-resolution figures available from http://www.camk.edu.pl/~tomek/AMRA/index.htm
    • 

    corecore