1,830 research outputs found
Computational Physics on Graphics Processing Units
The use of graphics processing units for scientific computations is an
emerging strategy that can significantly speed up various different algorithms.
In this review, we discuss advances made in the field of computational physics,
focusing on classical molecular dynamics, and on quantum simulations for
electronic structure calculations using the density functional theory, wave
function techniques, and quantum field theory.Comment: Proceedings of the 11th International Conference, PARA 2012,
Helsinki, Finland, June 10-13, 201
On the Efficient Evaluation of the Exchange Correlation Potential on Graphics Processing Unit Clusters
The predominance of Kohn-Sham density functional theory (KS-DFT) for the
theoretical treatment of large experimentally relevant systems in molecular
chemistry and materials science relies primarily on the existence of efficient
software implementations which are capable of leveraging the latest advances in
modern high performance computing (HPC). With recent trends in HPC leading
towards in increasing reliance on heterogeneous accelerator based architectures
such as graphics processing units (GPU), existing code bases must embrace these
architectural advances to maintain the high-levels of performance which have
come to be expected for these methods. In this work, we purpose a three-level
parallelism scheme for the distributed numerical integration of the
exchange-correlation (XC) potential in the Gaussian basis set discretization of
the Kohn-Sham equations on large computing clusters consisting of multiple GPUs
per compute node. In addition, we purpose and demonstrate the efficacy of the
use of batched kernels, including batched level-3 BLAS operations, in achieving
high-levels of performance on the GPU. We demonstrate the performance and
scalability of the implementation of the purposed method in the NWChemEx
software package by comparing to the existing scalable CPU XC integration in
NWChem.Comment: 26 pages, 9 figure
QMCPACK: Advances in the development, efficiency, and application of auxiliary field and real-space variational and diffusion Quantum Monte Carlo
We review recent advances in the capabilities of the open source ab initio
Quantum Monte Carlo (QMC) package QMCPACK and the workflow tool Nexus used for
greater efficiency and reproducibility. The auxiliary field QMC (AFQMC)
implementation has been greatly expanded to include k-point symmetries,
tensor-hypercontraction, and accelerated graphical processing unit (GPU)
support. These scaling and memory reductions greatly increase the number of
orbitals that can practically be included in AFQMC calculations, increasing
accuracy. Advances in real space methods include techniques for accurate
computation of band gaps and for systematically improving the nodal surface of
ground state wavefunctions. Results of these calculations can be used to
validate application of more approximate electronic structure methods including
GW and density functional based techniques. To provide an improved foundation
for these calculations we utilize a new set of correlation-consistent effective
core potentials (pseudopotentials) that are more accurate than previous sets;
these can also be applied in quantum-chemical and other many-body applications,
not only QMC. These advances increase the efficiency, accuracy, and range of
properties that can be studied in both molecules and materials with QMC and
QMCPACK
Distributed Memory, GPU Accelerated Fock Construction for Hybrid, Gaussian Basis Density Functional Theory
With the growing reliance of modern supercomputers on accelerator-based
architectures such a GPUs, the development and optimization of electronic
structure methods to exploit these massively parallel resources has become a
recent priority. While significant strides have been made in the development of
GPU accelerated, distributed memory algorithms for many-body (e.g.
coupled-cluster) and spectral single-body (e.g. planewave, real-space and
finite-element density functional theory [DFT]), the vast majority of
GPU-accelerated Gaussian atomic orbital methods have focused on shared memory
systems with only a handful of examples pursuing massive parallelism on
distributed memory GPU architectures. In the present work, we present a set of
distributed memory algorithms for the evaluation of the Coulomb and
exact-exchange matrices for hybrid Kohn-Sham DFT with Gaussian basis sets via
direct density-fitted (DF-J-Engine) and seminumerical (sn-K) methods,
respectively. The absolute performance and strong scalability of the developed
methods are demonstrated on systems ranging from a few hundred to over one
thousand atoms using up to 128 NVIDIA A100 GPUs on the Perlmutter
supercomputer.Comment: 45 pages, 9 figure
Thoughts on finding the right computer buddy: a moveable feast.
The burgeoning supernova of medical information is rapidly overtaking the practicing physician's envelope of comprehension. More physicians by necessity are turning to automated resources as a means of amplifying the information they need to know while, at the same time, reducing the volume of technical pollution. Computers are capable of being a silent partner at your side as you talk with your patient--ready to cut to the quick and retrieve the latest information for the particular clinical problem at hand. Computers can be considered an extension of the brain. In a sense, they are silicon-based "life" forms. Virtuosity is learned from them as familiarity is gained--the same as becoming acquainted with a human stranger. This article is about one physician's solution to the problem of too much information. It's unabashedly anecdotal but we hope the reader will glean some hints while navigating through the realms of cyberspace
Grid Infrastructure for Domain Decomposition Methods in Computational ElectroMagnetics
The accurate and efficient solution of Maxwell's equation is the problem addressed by the scientific discipline called Computational ElectroMagnetics (CEM). Many macroscopic phenomena in a great number of fields are governed by this set of differential equations: electronic, geophysics, medical and biomedical technologies, virtual EM prototyping, besides the traditional antenna and propagation applications. Therefore, many efforts are focussed on the development of new and more efficient approach to solve Maxwell's equation. The interest in CEM applications is growing on. Several problems, hard to figure out few years ago, can now be easily addressed thanks to the reliability and flexibility of new technologies, together with the increased computational power. This technology evolution opens the possibility to address large and complex tasks. Many of these applications aim to simulate the electromagnetic behavior, for example in terms of input impedance and radiation pattern in antenna problems, or Radar Cross Section for scattering applications. Instead, problems, which solution requires high accuracy, need to implement full wave analysis techniques, e.g., virtual prototyping context, where the objective is to obtain reliable simulations in order to minimize measurement number, and as consequence their cost. Besides, other tasks require the analysis of complete structures (that include an high number of details) by directly simulating a CAD Model. This approach allows to relieve researcher of the burden of removing useless details, while maintaining the original complexity and taking into account all details. Unfortunately, this reduction implies: (a) high computational effort, due to the increased number of degrees of freedom, and (b) worsening of spectral properties of the linear system during complex analysis. The above considerations underline the needs to identify appropriate information technologies that ease solution achievement and fasten required elaborations. The authors analysis and expertise infer that Grid Computing techniques can be very useful to these purposes. Grids appear mainly in high performance computing environments. In this context, hundreds of off-the-shelf nodes are linked together and work in parallel to solve problems, that, previously, could be addressed sequentially or by using supercomputers. Grid Computing is a technique developed to elaborate enormous amounts of data and enables large-scale resource sharing to solve problem by exploiting distributed scenarios. The main advantage of Grid is due to parallel computing, indeed if a problem can be split in smaller tasks, that can be executed independently, its solution calculation fasten up considerably. To exploit this advantage, it is necessary to identify a technique able to split original electromagnetic task into a set of smaller subproblems. The Domain Decomposition (DD) technique, based on the block generation algorithm introduced in Matekovits et al. (2007) and Francavilla et al. (2011), perfectly addresses our requirements (see Section 3.4 for details). In this chapter, a Grid Computing infrastructure is presented. This architecture allows parallel block execution by distributing tasks to nodes that belong to the Grid. The set of nodes is composed by physical machines and virtualized ones. This feature enables great flexibility and increase available computational power. Furthermore, the presence of virtual nodes allows a full and efficient Grid usage, indeed the presented architecture can be used by different users that run different applications
A real-time digital holographic microscope with an optical tweezer
The most significant advantage of the holographic microscopy is being able to image transparent objects such as biological cells without staining. Therefore, the cell image can be captured while it is alive. Moreover, Manipulating a living cell without destructing it’s structure can be achieved by the use of an optical tweezer which apply a pulling force around a tightly focused laser beam without a physical contact. Therefore, an instrument that combines the holographic microscope and the optical tweezer is quite useful for biological studies. Another advantage of holographic imaging is that, one does not need to do mechanical focusing for the scene when recording the hologram. Focusing is achieved by reconstructing the hologram at a certain depth. If the object’s optical depth from the recording plane is not known a priori, auto-focusing algorithms must be used to estimate this distance. However, auto-focusing and reconstruction can be quite time consuming as the hologram sizes increase and the microscope can not operate in real-time with high resolution holograms using traditional central processing units (CPUs). Therefore, for real-time operation, additional hardware accelerators are required for reconstructing high resolution holograms. A holograms can be reconstructed tens of times faster with a graphics processing unit than with the state-of-the-art main CPUs. In this thesis, an auto-focusing megapixel-resolution digital holographic microscope (DHM) that uses a commodity graphics card as the calculation engine is presented. The computational power of the GPU allows the DHM to work in realtime such that the reconstruction distance is estimated unsupervised, and the postprocessing of the hologram is transparent to the user. Performances of the DHM under GPU and CPU settings are presented and a maximum of 70 focused reconstructions per second (frps) are achieved with 1024 ⇥ 1024 pixels holograms. Moreover, a setup for incorporating an optical tweezer to the holographic microscope is provided. With this setup, it is possible to trap small particles while performing holographic imaging
- …