Search CORE

1,161 research outputs found

Towards an Efficient Use of the BLAS Library for Multilinear Tensor Contractions

Author: Bientinesi Paolo
Di Napoli Edoardo
Fabregat-Traver Diego
Quintana-Ortì Gregorio
Publication venue
Publication date: 01/01/2013
Field of study

Mathematical operators whose transformation rules constitute the building blocks of a multi-linear algebra are widely used in physics and engineering applications where they are very often represented as tensors. In the last century, thanks to the advances in tensor calculus, it was possible to uncover new research fields and make remarkable progress in the existing ones, from electromagnetism to the dynamics of fluids and from the mechanics of rigid bodies to quantum mechanics of many atoms. By now, the formal mathematical and geometrical properties of tensors are well defined and understood; conversely, in the context of scientific and high-performance computing, many tensor- related problems are still open. In this paper, we address the problem of efficiently computing contractions among two tensors of arbitrary dimension by using kernels from the highly optimized BLAS library. In particular, we establish precise conditions to determine if and when GEMM, the kernel for matrix products, can be used. Such conditions take into consideration both the nature of the operation and the storage scheme of the tensors, and induce a classification of the contractions into three groups. For each group, we provide a recipe to guide the users towards the most effective use of BLAS.Comment: 27 Pages, 7 figures and additional tikz generated diagrams. Submitted to Applied Mathematics and Computatio

arXiv.org e-Print Archive

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Repositori Institucional de la Universitat Jaume I

Publikationsserver der RWTH Aachen University

Juelich Shared Electronic Resources

Tensor hypercontraction: A universal technique for the resolution of matrix elements of local, finite-range $N$ -body potentials in many-body quantum problems

Author: Hohenstein Edward G.
Martinez Todd J.
Parrish Robert M.
Schunck Nicolas F.
Sherrill C. David
Publication venue: 'American Physical Society (APS)'
Publication date: 04/09/2013
Field of study

Configuration-space matrix elements of N-body potentials arise naturally and ubiquitously in the Ritz-Galerkin solution of many-body quantum problems. For the common specialization of local, finite-range potentials, we develop the eXact Tensor HyperContraction (X-THC) method, which provides a quantized renormalization of the coordinate-space form of the N-body potential, allowing for a highly separable tensor factorization of the configuration-space matrix elements. This representation allows for substantial computational savings in chemical, atomic, and nuclear physics simulations, particularly with respect to difficult "exchange-like" contractions.Comment: Third version of the manuscript after referee's comments. In press in PRL. Main text: 4 pages, 2 figures, 1 table; Supplemental material (also included): 14 pages, 2 figures, 2 table

arXiv.org e-Print Archive

UNT Digital Library

TTC: A Tensor Transposition Compiler for Multiple Architectures

Author: Abadi M.
Knijnenburg P. M.
Knijnenburg P. M.
Knijnenburg P. M.
Knijnenburg P. M.
Knijnenburg P. M.
Knijnenburg P. M.
Springer P.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2016
Field of study

We consider the problem of transposing tensors of arbitrary dimension and describe TTC, an open source domain-specific parallel compiler. TTC generates optimized parallel C++/CUDA C code that achieves a significant fraction of the system's peak memory bandwidth. TTC exhibits high performance across multiple architectures, including modern AVX-based systems (e.g.,~Intel Haswell, AMD Steamroller), Intel's Knights Corner as well as different CUDA-based GPUs such as NVIDIA's Kepler and Maxwell architectures. We report speedups of TTC over a meaningful baseline implementation generated by external C++ compilers; the results suggest that a domain-specific compiler can outperform its general purpose counterpart significantly: For instance, comparing with Intel's latest C++ compiler on the Haswell and Knights Corner architecture, TTC yields speedups of up to

8\times

and

32\times

, respectively. We also showcase TTC's support for multiple leading dimensions, making it a suitable candidate for the generation of performance-critical packing functions that are at the core of the ubiquitous BLAS 3 routines

arXiv.org e-Print Archive

Crossref

Publikationsserver der RWTH Aachen University