1,500 research outputs found
Direct numerical simulation of turbulence on a Connection Machine CM-5
In this paper we report on our first experiences with direct numerical simulation of turbulent flow on a 16-node Connection Machine CM-5. The CM-5 has been programmed at a global level using data parallel Fortran. A two-dimensional direct simulation, where the pressure is solved using a Conjugate Gradient method without preconditioning, runs at 23% of the peak. Due to higher communication costs, 3D simulations run at 13% of the peak. A diagonalwise re-ordered Incomplete Choleski Conjugate Gradient method cannot compete with a standard CG-method on the CM-5.
Parallel eigensolvers in plane-wave Density Functional Theory
We consider the problem of parallelizing electronic structure computations in
plane-wave Density Functional Theory. Because of the limited scalability of
Fourier transforms, parallelism has to be found at the eigensolver level. We
show how a recently proposed algorithm based on Chebyshev polynomials can scale
into the tens of thousands of processors, outperforming block conjugate
gradient algorithms for large computations
Large scale ab initio calculations based on three levels of parallelization
We suggest and implement a parallelization scheme based on an efficient
multiband eigenvalue solver, called the locally optimal block preconditioned
conjugate gradient LOBPCG method, and using an optimized three-dimensional (3D)
fast Fourier transform (FFT) in the ab initio}plane-wave code ABINIT. In
addition to the standard data partitioning over processors corresponding to
different k-points, we introduce data partitioning with respect to blocks of
bands as well as spatial partitioning in the Fourier space of coefficients over
the plane waves basis set used in ABINIT. This k-points-multiband-FFT
parallelization avoids any collective communications on the whole set of
processors relying instead on one-dimensional communications only. For a single
k-point, super-linear scaling is achieved for up to 100 processors due to an
extensive use of hardware optimized BLAS, LAPACK, and SCALAPACK routines,
mainly in the LOBPCG routine. We observe good performance up to 200 processors.
With 10 k-points our three-way data partitioning results in linear scaling up
to 1000 processors for a practical system used for testing.Comment: 8 pages, 5 figures. Accepted to Computational Material Scienc
New Algebraic Formulation of Density Functional Calculation
This article addresses a fundamental problem faced by the ab initio
community: the lack of an effective formalism for the rapid exploration and
exchange of new methods. To rectify this, we introduce a novel, basis-set
independent, matrix-based formulation of generalized density functional
theories which reduces the development, implementation, and dissemination of
new ab initio techniques to the derivation and transcription of a few lines of
algebra. This new framework enables us to concisely demystify the inner
workings of fully functional, highly efficient modern ab initio codes and to
give complete instructions for the construction of such for calculations
employing arbitrary basis sets. Within this framework, we also discuss in full
detail a variety of leading-edge ab initio techniques, minimization algorithms,
and highly efficient computational kernels for use with scalar as well as
shared and distributed-memory supercomputer architectures
- …