1,161 research outputs found

    Towards an Efficient Use of the BLAS Library for Multilinear Tensor Contractions

    Get PDF
    Mathematical operators whose transformation rules constitute the building blocks of a multi-linear algebra are widely used in physics and engineering applications where they are very often represented as tensors. In the last century, thanks to the advances in tensor calculus, it was possible to uncover new research fields and make remarkable progress in the existing ones, from electromagnetism to the dynamics of fluids and from the mechanics of rigid bodies to quantum mechanics of many atoms. By now, the formal mathematical and geometrical properties of tensors are well defined and understood; conversely, in the context of scientific and high-performance computing, many tensor- related problems are still open. In this paper, we address the problem of efficiently computing contractions among two tensors of arbitrary dimension by using kernels from the highly optimized BLAS library. In particular, we establish precise conditions to determine if and when GEMM, the kernel for matrix products, can be used. Such conditions take into consideration both the nature of the operation and the storage scheme of the tensors, and induce a classification of the contractions into three groups. For each group, we provide a recipe to guide the users towards the most effective use of BLAS.Comment: 27 Pages, 7 figures and additional tikz generated diagrams. Submitted to Applied Mathematics and Computatio

    Tensor hypercontraction: A universal technique for the resolution of matrix elements of local, finite-range NN-body potentials in many-body quantum problems

    Full text link
    Configuration-space matrix elements of N-body potentials arise naturally and ubiquitously in the Ritz-Galerkin solution of many-body quantum problems. For the common specialization of local, finite-range potentials, we develop the eXact Tensor HyperContraction (X-THC) method, which provides a quantized renormalization of the coordinate-space form of the N-body potential, allowing for a highly separable tensor factorization of the configuration-space matrix elements. This representation allows for substantial computational savings in chemical, atomic, and nuclear physics simulations, particularly with respect to difficult "exchange-like" contractions.Comment: Third version of the manuscript after referee's comments. In press in PRL. Main text: 4 pages, 2 figures, 1 table; Supplemental material (also included): 14 pages, 2 figures, 2 table

    TTC: A Tensor Transposition Compiler for Multiple Architectures

    Full text link
    We consider the problem of transposing tensors of arbitrary dimension and describe TTC, an open source domain-specific parallel compiler. TTC generates optimized parallel C++/CUDA C code that achieves a significant fraction of the system's peak memory bandwidth. TTC exhibits high performance across multiple architectures, including modern AVX-based systems (e.g.,~Intel Haswell, AMD Steamroller), Intel's Knights Corner as well as different CUDA-based GPUs such as NVIDIA's Kepler and Maxwell architectures. We report speedups of TTC over a meaningful baseline implementation generated by external C++ compilers; the results suggest that a domain-specific compiler can outperform its general purpose counterpart significantly: For instance, comparing with Intel's latest C++ compiler on the Haswell and Knights Corner architecture, TTC yields speedups of up to 8×8\times and 32×32\times, respectively. We also showcase TTC's support for multiple leading dimensions, making it a suitable candidate for the generation of performance-critical packing functions that are at the core of the ubiquitous BLAS 3 routines
    • …
    corecore