Search CORE

8,164 research outputs found

Format Abstraction for Sparse Tensor Algebra Compilers

Author: Amarasinghe Saman
Chou Stephen
Kjolstad Fredrik
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 11/11/2018
Field of study

This paper shows how to build a sparse tensor algebra compiler that is agnostic to tensor formats (data layouts). We develop an interface that describes formats in terms of their capabilities and properties, and show how to build a modular code generator where new formats can be added as plugins. We then describe six implementations of the interface that compose to form the dense, CSR/CSF, COO, DIA, ELL, and HASH tensor formats and countless variants thereof. With these implementations at hand, our code generator can generate code to compute any tensor algebra expression on any combination of the aforementioned formats. To demonstrate our technique, we have implemented it in the taco tensor algebra compiler. Our modular code generator design makes it simple to add support for new tensor formats, and the performance of the generated code is competitive with hand-optimized implementations. Furthermore, by extending taco to support a wider range of formats specialized for different application and data characteristics, we can improve end-user application performance. For example, if input data is provided in the COO format, our technique allows computing a single matrix-vector multiplication directly with the data in COO, which is up to 3.6

\times

faster than by first converting the data to CSR.Comment: Presented at OOPSLA 201

arXiv.org e-Print Archive

DSpace@MIT

The Tensor Networks Anthology: Simulation techniques for many-body quantum lattice systems

Author: Gerster Matthias
Jaschke Daniel
Jünemann Johannes
Montangero Simone
Rizzi Matteo
Silvi Pietro
Tschirsich Ferdinand
Publication venue: 'Stichting SciPost'
Publication date: 23/10/2018
Field of study

We present a compendium of numerical simulation techniques, based on tensor network methods, aiming to address problems of many-body quantum mechanics on a classical computer. The core setting of this anthology are lattice problems in low spatial dimension at finite size, a physical scenario where tensor network methods, both Density Matrix Renormalization Group and beyond, have long proven to be winning strategies. Here we explore in detail the numerical frameworks and methods employed to deal with low-dimension physical setups, from a computational physics perspective. We focus on symmetries and closed-system simulations in arbitrary boundary conditions, while discussing the numerical data structures and linear algebra manipulation routines involved, which form the core libraries of any tensor network code. At a higher level, we put the spotlight on loop-free network geometries, discussing their advantages, and presenting in detail algorithms to simulate low-energy equilibrium states. Accompanied by discussions of data structures, numerical techniques and performance, this anthology serves as a programmer's companion, as well as a self-contained introduction and review of the basic and selected advanced concepts in tensor networks, including examples of their applications.Comment: 115 pages, 56 figure

arXiv.org e-Print Archive

Archivio istituzionale della ricerca - Università di Padova

Design and optimization of a portable LQCD Monte Carlo code using OpenACC

Author: Bonati Claudio
Calore Enrico
Coscetti Simone
D'Elia Massimo
Mesiti Michele
Negro Francesco
Schifano Sebastiano Fabio
Silvi Giorgio
Tripiccione Raffaele
Publication venue: 'World Scientific Pub Co Pte Lt'
Publication date: 01/01/2017
Field of study

The present panorama of HPC architectures is extremely heterogeneous, ranging from traditional multi-core CPU processors, supporting a wide class of applications but delivering moderate computing performance, to many-core GPUs, exploiting aggressive data-parallelism and delivering higher performances for streaming computing applications. In this scenario, code portability (and performance portability) become necessary for easy maintainability of applications; this is very relevant in scientific computing where code changes are very frequent, making it tedious and prone to error to keep different code versions aligned. In this work we present the design and optimization of a state-of-the-art production-level LQCD Monte Carlo application, using the directive-based OpenACC programming model. OpenACC abstracts parallel programming to a descriptive level, relieving programmers from specifying how codes should be mapped onto the target architecture. We describe the implementation of a code fully written in OpenACC, and show that we are able to target several different architectures, including state-of-the-art traditional CPUs and GPUs, with the same code. We also measure performance, evaluating the computing efficiency of our OpenACC code on several architectures, comparing with GPU-specific implementations and showing that a good level of performance-portability can be reached.Comment: 26 pages, 2 png figures, preprint of an article submitted for consideration in International Journal of Modern Physics

arXiv.org e-Print Archive

Archivio della Ricerca - Università di Pisa

Archivio istituzionale della ricerca - Università di Ferrara

Juelich Shared Electronic Resources

Conjugate gradient solvers on Intel Xeon Phi and NVIDIA GPUs

Author: Kaczmarek O.
Schmidt C.
Steinbrecher P.
Wagner M.
Publication venue
Publication date: 17/11/2014
Field of study

Lattice Quantum Chromodynamics simulations typically spend most of the runtime in inversions of the Fermion Matrix. This part is therefore frequently optimized for various HPC architectures. Here we compare the performance of the Intel Xeon Phi to current Kepler-based NVIDIA Tesla GPUs running a conjugate gradient solver. By exposing more parallelism to the accelerator through inverting multiple vectors at the same time, we obtain a performance greater than 300 GFlop/s on both architectures. This more than doubles the performance of the inversions. We also give a short overview of the Knights Corner architecture, discuss some details of the implementation and the effort required to obtain the achieved performance.Comment: 7 pages, proceedings, presented at 'GPU Computing in High Energy Physics', September 10-12, 2014, Pisa, Ital

arXiv.org e-Print Archive

DESY Publication Database

DESY

Optimal network topologies: Expanders, Cages, Ramanujan graphs, Entangled networks and all that

Author: Barthelemy M Flammini A
Bollobás B
Bollt E M
Chung F
Chung F
Davidoff G
Donetti L
Donetti L
Dorogovtsev S N
Franco Neri
Gastner M T Newman M E J
Kirkpatrick S
Lovász L
Luca Donetti
Margulis G A
Miguel A Muñoz
Mohar B
Myrvold W
Pastor Satorras R
Pons P Latapy M
Read R C
Reingold O
Sarnak P
Tutte W
Weisstein E W
Publication venue: 'IOP Publishing'
Publication date: 15/06/2006
Field of study

We report on some recent developments in the search for optimal network topologies. First we review some basic concepts on spectral graph theory, including adjacency and Laplacian matrices, and paying special attention to the topological implications of having large spectral gaps. We also introduce related concepts as ``expanders'', Ramanujan, and Cage graphs. Afterwards, we discuss two different dynamical feautures of networks: synchronizability and flow of random walkers and so that they are optimized if the corresponding Laplacian matrix have a large spectral gap. From this, we show, by developing a numerical optimization algorithm that maximum synchronizability and fast random walk spreading are obtained for a particular type of extremely homogeneous regular networks, with long loops and poor modular structure, that we call entangled networks. These turn out to be related to Ramanujan and Cage graphs. We argue also that these graphs are very good finite-size approximations to Bethe lattices, and provide almost or almost optimal solutions to many other problems as, for instance, searchability in the presence of congestion or performance of neural networks. Finally, we study how these results are modified when studying dynamical processes controlled by a normalized (weighted and directed) dynamics; much more heterogeneous graphs are optimal in this case. Finally, a critical discussion of the limitations and possible extensions of this work is presented.Comment: 17 pages. 11 figures. Small corrections and a new reference. Accepted for pub. in JSTA

arXiv.org e-Print Archive

Crossref

An Algebraic Framework for Compositional Program Analysis

Author: Farzan Azadeh
Kincaid Zachary
Publication venue
Publication date: 13/10/2013
Field of study

The purpose of a program analysis is to compute an abstract meaning for a program which approximates its dynamic behaviour. A compositional program analysis accomplishes this task with a divide-and-conquer strategy: the meaning of a program is computed by dividing it into sub-programs, computing their meaning, and then combining the results. Compositional program analyses are desirable because they can yield scalable (and easily parallelizable) program analyses. This paper presents algebraic framework for designing, implementing, and proving the correctness of compositional program analyses. A program analysis in our framework defined by an algebraic structure equipped with sequencing, choice, and iteration operations. From the analysis design perspective, a particularly interesting consequence of this is that the meaning of a loop is computed by applying the iteration operator to the loop body. This style of compositional loop analysis can yield interesting ways of computing loop invariants that cannot be defined iteratively. We identify a class of algorithms, the so-called path-expression algorithms [Tarjan1981,Scholz2007], which can be used to efficiently implement analyses in our framework. Lastly, we develop a theory for proving the correctness of an analysis by establishing an approximation relationship between an algebra defining a concrete semantics and an algebra defining an analysis.Comment: 15 page

arXiv.org e-Print Archive

CiteSeerX

MILC Code Performance on High End CPU and GPU Supercomputer Clusters

Author: DeTar Carleton
Gottlieb Steven
Li Ruizi
Toussaint Doug
Publication venue
Publication date: 30/11/2017
Field of study

With recent developments in parallel supercomputing architecture, many core, multi-core, and GPU processors are now commonplace, resulting in more levels of parallelism, memory hierarchy, and programming complexity. It has been necessary to adapt the MILC code to these new processors starting with NVIDIA GPUs, and more recently, the Intel Xeon Phi processors. We report on our efforts to port and optimize our code for the Intel Knights Landing architecture. We consider performance of the MILC code with MPI and OpenMP, and optimizations with QOPQDP and QPhiX. For the latter approach, we concentrate on the staggered conjugate gradient and gauge force. We also consider performance on recent NVIDIA GPUs using the QUDA library

arXiv.org e-Print Archive

EDP Sciences OAI-PMH repository (1.2.0)

Lecture Notes of Tensor Network Contractions

Author: Chen Xi
Lewenstein Maciej
Peng Cheng
Ran Shi-Ju
Su Gang
Tagliacozzo Luca
Tirrito Emanuele
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 27/07/2019
Field of study

Tensor network (TN), a young mathematical tool of high vitality and great potential, has been undergoing extremely rapid developments in the last two decades, gaining tremendous success in condensed matter physics, atomic physics, quantum information science, statistical physics, and so on. In this lecture notes, we focus on the contraction algorithms of TN as well as some of the applications to the simulations of quantum many-body systems. Starting from basic concepts and definitions, we first explain the relations between TN and physical problems, including the TN representations of classical partition functions, quantum many-body states (by matrix product state, tree TN, and projected entangled pair state), time evolution simulations, etc. These problems, which are challenging to solve, can be transformed to TN contraction problems. We present then several paradigm algorithms based on the ideas of the numerical renormalization group and/or boundary states, including density matrix renormalization group, time-evolving block decimation, coarse-graining/corner tensor renormalization group, and several distinguished variational algorithms. Finally, we revisit the TN approaches from the perspective of multi-linear algebra (also known as tensor algebra or tensor decompositions) and quantum simulation. Despite the apparent differences in the ideas and strategies of different TN algorithms, we aim at revealing the underlying relations and resemblances in order to present a systematic picture to understand the TN contraction approaches.Comment: 134 pages, 68 figures. In this version, the manuscript has been changed into the format of book; new sections about tensor network and quantum circuits have been adde

arXiv.org e-Print Archive

OAPEN Library

CERN Document Server

Directory of Open Access Books (DOAB)