28 research outputs found
Baryonic and mesonic 3-point functions with open spin indices
We have implemented a new way of computing three-point correlation functions.
It is based on a factorization of the entire correlation function into two
parts which are evaluated with open spin- (and to some extent flavor-) indices.
This allows us to estimate the two contributions simultaneously for many
different initial and final states and momenta, with little computational
overhead. We explain this factorization as well as its efficient implementation
in a new library which has been written to provide the necessary functionality
on modern parallel architectures and on CPUs, including Intel's Xeon Phi
series.Comment: 7 pages, 5 figures, Proceedings of Lattice 201
Lattice QCD with Domain Decomposition on Intel Xeon Phi Co-Processors
The gap between the cost of moving data and the cost of computing continues
to grow, making it ever harder to design iterative solvers on extreme-scale
architectures. This problem can be alleviated by alternative algorithms that
reduce the amount of data movement. We investigate this in the context of
Lattice Quantum Chromodynamics and implement such an alternative solver
algorithm, based on domain decomposition, on Intel Xeon Phi co-processor (KNC)
clusters. We demonstrate close-to-linear on-chip scaling to all 60 cores of the
KNC. With a mix of single- and half-precision the domain-decomposition method
sustains 400-500 Gflop/s per chip. Compared to an optimized KNC implementation
of a standard solver [1], our full multi-node domain-decomposition solver
strong-scales to more nodes and reduces the time-to-solution by a factor of 5.Comment: 12 pages, 7 figures, presented at Supercomputing 2014, November
16-21, 2014, New Orleans, Louisiana, USA, speaker Simon Heybrock; SC '14
Proceedings of the International Conference for High Performance Computing,
Networking, Storage and Analysis, pages 69-80, IEEE Press Piscataway, NJ, USA
(c)201
A nested Krylov subspace method to compute the sign function of large complex matrices
We present an acceleration of the well-established Krylov-Ritz methods to
compute the sign function of large complex matrices, as needed in lattice QCD
simulations involving the overlap Dirac operator at both zero and nonzero
baryon density. Krylov-Ritz methods approximate the sign function using a
projection on a Krylov subspace. To achieve a high accuracy this subspace must
be taken quite large, which makes the method too costly. The new idea is to
make a further projection on an even smaller, nested Krylov subspace. If
additionally an intermediate preconditioning step is applied, this projection
can be performed without affecting the accuracy of the approximation, and a
substantial gain in efficiency is achieved for both Hermitian and non-Hermitian
matrices. The numerical efficiency of the method is demonstrated on lattice
configurations of sizes ranging from 4^4 to 10^4, and the new results are
compared with those obtained with rational approximation methods.Comment: 17 pages, 12 figures, minor corrections, extended analysis of the
preconditioning ste
QPACE 2 and Domain Decomposition on the Intel Xeon Phi
We give an overview of QPACE 2, which is a custom-designed supercomputer
based on Intel Xeon Phi processors, developed in a collaboration of Regensburg
University and Eurotech. We give some general recommendations for how to write
high-performance code for the Xeon Phi and then discuss our implementation of a
domain-decomposition-based solver and present a number of benchmarks.Comment: plenary talk at Lattice 2014, to appear in the conference proceedings
PoS(LATTICE2014), 15 pages, 9 figure
Short-recurrence Krylov subspace methods for the overlap Dirac operator at nonzero chemical potential
The overlap operator in lattice QCD requires the computation of the sign
function of a matrix, which is non-Hermitian in the presence of a quark
chemical potential. In previous work we introduced an Arnoldi-based Krylov
subspace approximation, which uses long recurrences. Even after the deflation
of critical eigenvalues, the low efficiency of the method restricts its
application to small lattices. Here we propose new short-recurrence methods
which strongly enhance the efficiency of the computational method. Using
rational approximations to the sign function we introduce two variants, based
on the restarted Arnoldi process and on the two-sided Lanczos method,
respectively, which become very efficient when combined with multishift
solvers. Alternatively, in the variant based on the two-sided Lanczos method
the sign function can be evaluated directly. We present numerical results which
compare the efficiencies of a restarted Arnoldi-based method and the direct
two-sided Lanczos approximation for various lattice sizes. We also show that
our new methods gain substantially when combined with deflation.Comment: 14 pages, 4 figures; as published in Comput. Phys. Commun., modified
data in Figs. 2,3 and 4 for improved implementation of FOM algorithm,
extended discussion of the algorithmic cos