4,199 research outputs found

    Flexible Multi-Dimensional FFTs for Plane Wave Density Functional Theory Codes

    Full text link
    Multi-dimensional Fourier transforms are key mathematical building blocks that appear in a wide range of applications from materials science, physics, chemistry and even machine learning. Over the past years, a multitude of software packages targeting distributed multi-dimensional Fourier transforms have been developed. Most variants attempt to offer efficient implementations for single transforms applied on data mapped onto rectangular grids. However, not all scientific applications conform to this pattern, i.e. plane wave Density Functional Theory codes require multi-dimensional Fourier transforms applied on data represented as batches of spheres. Typically, the implementations for this use case are hand-coded and tailored for the requirements of each application. In this work, we present the Fastest Fourier Transform from Berkeley (FFTB) a distributed framework that offers flexible implementations for both regular/non-regular data grids and batched/non-batched transforms. We provide a flexible implementations with a user-friendly API that captures most of the use cases. Furthermore, we provide implementations for both CPU and GPU platforms, showing that our approach offers improved execution time and scalability on the HP Cray EX supercomputer. In addition, we outline the need for flexible implementations for different use cases of the software package.Comment: 17 pages, 9 figure

    Static Subspace Approximation for Random Phase Approximation Correlation Energies: Implementation and Performance

    Full text link
    Developing theoretical understanding of complex reactions and processes at interfaces requires using methods that go beyond semilocal density functional theory to accurately describe the interactions between solvent, reactants and substrates. Methods based on many-body perturbation theory, such as the random phase approximation (RPA), have previously been limited due to their computational complexity. However, this is now a surmountable barrier due to the advances in computational power available, in particular through modern GPU-based supercomputers. In this work, we describe the implementation of RPA calculations within BerkeleyGW and show its favorable computational performance on large complex systems relevant for catalysis and electrochemistry applications. Our implementation builds off of the static subspace approximation which, by employing a compressed representation of the frequency dependent polarizability, enables the evaluation of the RPA correlation energy with significant acceleration and systematically controllable accuracy. We find that the computational cost of calculating the RPA correlation energy scales only linearly with system size for systems containing up to 50 thousand bands, and is expected to scale quadratically thereafter. We also show excellent strong scaling results across several supercomputers, demonstrating the performance and portability of this implementation.Comment: 10 pages, 5 figure

    Cost-Effective Methodology for Complex Tuning Searches in HPC: Navigating Interdependencies and Dimensionality

    Full text link
    Tuning searches are pivotal in High-Performance Computing (HPC), addressing complex optimization challenges in computational applications. The complexity arises not only from finely tuning parameters within routines but also potential interdependencies among them, rendering traditional optimization methods inefficient. Instead of scrutinizing interdependencies among parameters and routines, practitioners often face the dilemma of conducting independent tuning searches for each routine, thereby overlooking interdependence, or pursuing a more resource-intensive joint search for all routines. This decision is driven by the consideration that some interdependence analysis and high-dimensional decomposition techniques in literature may be prohibitively expensive in HPC tuning searches. Our methodology adapts and refines these methods to ensure computational feasibility while maximizing performance gains in real-world scenarios. Our methodology leverages a cost-effective interdependence analysis to decide whether to merge several tuning searches into a joint search or conduct orthogonal searches. Tested on synthetic functions with varying levels of parameter interdependence, our methodology efficiently explores the search space. In comparison to Bayesian-optimization-based full independent or fully joint searches, our methodology suggested an optimized breakdown of independent and merged searches that led to final configurations up to 8% more accurate, reducing the search time by up to 95%. When applied to GPU-offloaded Real-Time Time-Dependent Density Functional Theory (RT-TDDFT), an application in computational materials science that challenges modern HPC autotuners, our methodology achieved an effective tuning search. Its adaptability and efficiency extend beyond RT-TDDFT, making it valuable for related applications in HPC

    Measurement of the cosmic ray spectrum above 4×10184{\times}10^{18} eV using inclined events detected with the Pierre Auger Observatory

    Full text link
    A measurement of the cosmic-ray spectrum for energies exceeding 4×10184{\times}10^{18} eV is presented, which is based on the analysis of showers with zenith angles greater than 6060^{\circ} detected with the Pierre Auger Observatory between 1 January 2004 and 31 December 2013. The measured spectrum confirms a flux suppression at the highest energies. Above 5.3×10185.3{\times}10^{18} eV, the "ankle", the flux can be described by a power law EγE^{-\gamma} with index γ=2.70±0.02(stat)±0.1(sys)\gamma=2.70 \pm 0.02 \,\text{(stat)} \pm 0.1\,\text{(sys)} followed by a smooth suppression region. For the energy (EsE_\text{s}) at which the spectral flux has fallen to one-half of its extrapolated value in the absence of suppression, we find Es=(5.12±0.25(stat)1.2+1.0(sys))×1019E_\text{s}=(5.12\pm0.25\,\text{(stat)}^{+1.0}_{-1.2}\,\text{(sys)}){\times}10^{19} eV.Comment: Replaced with published version. Added journal reference and DO

    Standalone vertex finding in the ATLAS muon spectrometer

    Get PDF
    A dedicated reconstruction algorithm to find decay vertices in the ATLAS muon spectrometer is presented. The algorithm searches the region just upstream of or inside the muon spectrometer volume for multi-particle vertices that originate from the decay of particles with long decay paths. The performance of the algorithm is evaluated using both a sample of simulated Higgs boson events, in which the Higgs boson decays to long-lived neutral particles that in turn decay to bbar b final states, and pp collision data at √s = 7 TeV collected with the ATLAS detector at the LHC during 2011

    Measurement of inclusive two-particle angular correlations in pp collisions with the ATLAS detector at the LHC

    Get PDF
    We present a measurement of two-particle angular correlations in proton- proton collisions at s√=900 GeV and 7 TeV. The collision events were collected during 2009 and 2010 with the ATLAS detector at the Large Hadron Collider using a single-arm minimum bias trigger. Correlations are measured for charged particles produced in the kinematic range of transverse momentum p T  > 100 MeV and pseudorapidity |η| < 2.5. A complex structure in pseudorapidity and azimuth is observed at both collision energies. Results are compared to pythia 8 and herwig++ as well as to the AMBT2B, DW and Perugia 2011 tunes of pythia 6. The data are not satisfactorily described by any of these models
    corecore