1,161 research outputs found
Towards an Efficient Use of the BLAS Library for Multilinear Tensor Contractions
Mathematical operators whose transformation rules constitute the building
blocks of a multi-linear algebra are widely used in physics and engineering
applications where they are very often represented as tensors. In the last
century, thanks to the advances in tensor calculus, it was possible to uncover
new research fields and make remarkable progress in the existing ones, from
electromagnetism to the dynamics of fluids and from the mechanics of rigid
bodies to quantum mechanics of many atoms. By now, the formal mathematical and
geometrical properties of tensors are well defined and understood; conversely,
in the context of scientific and high-performance computing, many tensor-
related problems are still open. In this paper, we address the problem of
efficiently computing contractions among two tensors of arbitrary dimension by
using kernels from the highly optimized BLAS library. In particular, we
establish precise conditions to determine if and when GEMM, the kernel for
matrix products, can be used. Such conditions take into consideration both the
nature of the operation and the storage scheme of the tensors, and induce a
classification of the contractions into three groups. For each group, we
provide a recipe to guide the users towards the most effective use of BLAS.Comment: 27 Pages, 7 figures and additional tikz generated diagrams. Submitted
to Applied Mathematics and Computatio
Tensor hypercontraction: A universal technique for the resolution of matrix elements of local, finite-range -body potentials in many-body quantum problems
Configuration-space matrix elements of N-body potentials arise naturally and
ubiquitously in the Ritz-Galerkin solution of many-body quantum problems. For
the common specialization of local, finite-range potentials, we develop the
eXact Tensor HyperContraction (X-THC) method, which provides a quantized
renormalization of the coordinate-space form of the N-body potential, allowing
for a highly separable tensor factorization of the configuration-space matrix
elements. This representation allows for substantial computational savings in
chemical, atomic, and nuclear physics simulations, particularly with respect to
difficult "exchange-like" contractions.Comment: Third version of the manuscript after referee's comments. In press in
PRL. Main text: 4 pages, 2 figures, 1 table; Supplemental material (also
included): 14 pages, 2 figures, 2 table
TTC: A Tensor Transposition Compiler for Multiple Architectures
We consider the problem of transposing tensors of arbitrary dimension and
describe TTC, an open source domain-specific parallel compiler. TTC generates
optimized parallel C++/CUDA C code that achieves a significant fraction of the
system's peak memory bandwidth. TTC exhibits high performance across multiple
architectures, including modern AVX-based systems (e.g.,~Intel Haswell, AMD
Steamroller), Intel's Knights Corner as well as different CUDA-based GPUs such
as NVIDIA's Kepler and Maxwell architectures. We report speedups of TTC over a
meaningful baseline implementation generated by external C++ compilers; the
results suggest that a domain-specific compiler can outperform its general
purpose counterpart significantly: For instance, comparing with Intel's latest
C++ compiler on the Haswell and Knights Corner architecture, TTC yields
speedups of up to and , respectively. We also showcase
TTC's support for multiple leading dimensions, making it a suitable candidate
for the generation of performance-critical packing functions that are at the
core of the ubiquitous BLAS 3 routines
- …