Search CORE

2 research outputs found

Reducing Redundancy in Data Organization and Arithmetic Calculation for Stencil Computations

Author: Cao Hang
Li Kun
Lu Pengqi
Yuan Liang
Yue Yue
Zhang Yunquan
Publication venue
Publication date: 16/03/2021
Field of study

Stencil computation is one of the most important kernels in various scientific and engineering applications. A variety of work has focused on vectorization techniques, aiming at exploiting the in-core data parallelism. Briefly, they either incur data alignment conflicts or hurt the data locality when integrated with tiling. In this paper, a novel transpose layout is devised to preserve the data locality for tiling in the data space and reduce the data reorganization overhead for vectorization simultaneously. We then propose an approach of temporal computation folding designed to further reduce the redundancy of arithmetic calculations by exploiting the register reuse, alleviating the increased register pressure, and deducing generalization with a linear regression model. Experimental results on the AVX-2 and AVX-512 CPUs show that our approach obtains a competitive performance.Comment: arXiv admin note: substantial text overlap with arXiv:2103.0882

arXiv.org e-Print Archive

Domain-Specific Multi-Level IR Rewriting for GPU

Author: Davis Eddie
Fuhrer Oliver
Grosser Tobias
Gysi Tobias
Herhut Stephan
Hoefler Torsten
Müller Christoph
Wicky Tobias
Zinenko Oleksandr
Publication venue
Publication date: 27/07/2020
Field of study

Traditional compilers operate on a single generic intermediate representation (IR). These IRs are usually low-level and close to machine instructions. As a result, optimizations relying on domain-specific information are either not possible or require complex analysis to recover the missing information. In contrast, multi-level rewriting instantiates a hierarchy of dialects (IRs), lowers programs level-by-level, and performs code transformations at the most suitable level. We demonstrate the effectiveness of this approach for the weather and climate domain. In particular, we develop a prototype compiler and design stencil- and GPU-specific dialects based on a set of newly introduced design principles. We find that two domain-specific optimizations (500 lines of code) realized on top of LLVM's extensible MLIR compiler infrastructure suffice to outperform state-of-the-art solutions. In essence, multi-level rewriting promises to herald the age of specialized compilers composed from domain- and target-specific dialects implemented on top of a shared infrastructure.Comment: 12 pages, 16 figure

arXiv.org e-Print Archive