Search CORE

3,706 research outputs found

Near-optimal loop tiling by means of cache miss equations and genetic algorithms

Author: Abella Ferrer Jaume
González Colás Antonio María
Llosa Espuny José Francisco
Vera Rivera Francisco Javier
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2002
Field of study

The effectiveness of the memory hierarchy is critical for the performance of current processors. The performance of the memory hierarchy can be improved by means of program transformations such as loop tiling, which is a code transformation targeted to reduce capacity misses. This paper presents a novel systematic approach to perform near-optimal loop tiling based on an accurate data locality analysis (cache miss equations) and a powerful technique to search the solution space that is based on a genetic algorithm. The results show that this approach can remove practically all capacity misses for all considered benchmarks. The reduction of replacement misses results in a decrease of the miss ratio that can be as significant as a factor of 7 for the matrix multiply kernel.Peer ReviewedPostprint (published version

UPCommons. Portal del coneixement obert de la UPC

AutoAccel: Automated Accelerator Generation and Optimization with Composable, Parallel and Pipeline Architecture

Author: Cong Jason
Wei Peng
Yu Cody Hao
Zhang Peng
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 29/07/2018
Field of study

CPU-FPGA heterogeneous architectures are attracting ever-increasing attention in an attempt to advance computational capabilities and energy efficiency in today's datacenters. These architectures provide programmers with the ability to reprogram the FPGAs for flexible acceleration of many workloads. Nonetheless, this advantage is often overshadowed by the poor programmability of FPGAs whose programming is conventionally a RTL design practice. Although recent advances in high-level synthesis (HLS) significantly improve the FPGA programmability, it still leaves programmers facing the challenge of identifying the optimal design configuration in a tremendous design space. This paper aims to address this challenge and pave the path from software programs towards high-quality FPGA accelerators. Specifically, we first propose the composable, parallel and pipeline (CPP) microarchitecture as a template of accelerator designs. Such a well-defined template is able to support efficient accelerator designs for a broad class of computation kernels, and more importantly, drastically reduce the design space. Also, we introduce an analytical model to capture the performance and resource trade-offs among different design configurations of the CPP microarchitecture, which lays the foundation for fast design space exploration. On top of the CPP microarchitecture and its analytical model, we develop the AutoAccel framework to make the entire accelerator generation automated. AutoAccel accepts a software program as an input and performs a series of code transformations based on the result of the analytical-model-based design space exploration to construct the desired CPP microarchitecture. Our experiments show that the AutoAccel-generated accelerators outperform their corresponding software implementations by an average of 72x for a broad class of computation kernels

arXiv.org e-Print Archive

Crossref

Scipedia

Nodal domains of the equilateral triangle billiard

Author: Jain Sudhir R.
Samajdar Rhine
Publication venue: 'IOP Publishing'
Publication date: 06/03/2014
Field of study

We characterise the eigenfunctions of an equilateral triangle billiard in terms of its nodal domains. The number of nodal domains has a quadratic form in terms of the quantum numbers, with a non-trivial number-theoretic factor. The patterns of the eigenfunctions follow a group-theoretic connection in a way that makes them predictable as one goes from one state to another. Extensive numerical investigations bring out the distribution functions of the mode number and signed areas. The statistics of the boundary intersections is also treated analytically. Finally, the distribution functions of the nodal loop count and the nodal counting function are shown to contain information about the classical periodic orbits using the semiclassical trace formula. We believe that the results belong generically to non-separable systems, thus extending the previous works which are concentrated on separable and chaotic systems.Comment: 26 pages, 13 figure

arXiv.org e-Print Archive

Open Access Repository of IISc Research Publications

Refactoring intermediately executed code to reduce cache capacity misses

Author: Beyls Kristof
D'Hollander Erik
Publication venue
Publication date: 01/01/2008
Field of study

The growing memory wall requires that more attention is given to the data cache behavior of programs. In this paper, attention is given to the capacity misses i.e. the misses that occur because the cache size is smaller than the data footprint between the use and the reuse of the same data. The data footprint is measured with the reuse distance metric, by counting the distinct memory locations accessed between use and reuse. For reuse distances larger than the cache size, the associated code needs to be refactored in a way that reduces the reuse distance to below the cache size so that the capacity misses are eliminated. In a number of simple loops, the reuse distance can be calculated analytically. However, in most cases profiling is needed to pinpoint the areas where the program needs to be transformed for better data locality. This is achieved by the reuse distance visualizer, RDVIS, which shows the intermediately executed code for critical data reuses. In addition, another tool, SLO, annotates the source program with suggestions for locality ptimization. Both tools have been used to analyze and to refactor a number of SPEC2000 benchmark programs with very positive results

Ghent University Academic Bibliography

Domino tilings and the six-vertex model at its free fermion point

Author: Allison D Reshetikhin N
Bleher P
Brak R
Cohn H
Herbert Spohn
Korepin V
Lieb E H
Patrik L Ferrari
Zinn-Justin P
Publication venue: 'IOP Publishing'
Publication date: 16/05/2006
Field of study

At the free-fermion point, the six-vertex model with domain wall boundary conditions (DWBC) can be related to the Aztec diamond, a domino tiling problem. We study the mapping on the level of complete statistics for general domains and boundary conditions. This is obtained by associating to both models a set of non-intersecting lines in the Lindstroem-Gessel-Viennot (LGV) scheme. One of the consequence for DWBC is that the boundaries of the ordered phases are described by the Airy process in the thermodynamic limit.Comment: 14 pages, 8 figure

arXiv.org e-Print Archive

Crossref

Open boundary Quantum Knizhnik-Zamolodchikov equation and the weighted enumeration of Plane Partitions with symmetries

Author: Batchelor M T
Bressoud D
Ciucu M
Di Francesco P
Di Francesco P
Di Francesco P
Di Francesco P
Di Francesco P
Di Francesco P
Di Francesco P
Di Francesco P Zinn-Justin P
Izergin A
Kasatani M Pasquier V
Knutson A
Knutson A Zinn-Justin P
Krattenthaler C
Lindström B
P Di Francesco
Pasquier V
Pearce P
Pearce P Rittenberg V de Gier J
Razumov A V
Razumov A V
Razumov A V
Publication venue: 'IOP Publishing'
Publication date: 15/01/2007
Field of study

We propose new conjectures relating sum rules for the polynomial solution of the qKZ equation with open (reflecting) boundaries as a function of the quantum parameter

q

and the

\tau

-enumeration of Plane Partitions with specific symmetries, with

\tau=-(q+q^{-1})

. We also find a conjectural relation \`a la Razumov-Stroganov between the

\tau\to 0

limit of the qKZ solution and refined numbers of Totally Symmetric Self Complementary Plane Partitions.Comment: 27 pages, uses lanlmac, epsf and hyperbasics, minor revision

arXiv.org e-Print Archive

Crossref