2 research outputs found
Reducing Redundancy in Data Organization and Arithmetic Calculation for Stencil Computations
Stencil computation is one of the most important kernels in various
scientific and engineering applications. A variety of work has focused on
vectorization techniques, aiming at exploiting the in-core data parallelism.
Briefly, they either incur data alignment conflicts or hurt the data locality
when integrated with tiling. In this paper, a novel transpose layout is devised
to preserve the data locality for tiling in the data space and reduce the data
reorganization overhead for vectorization simultaneously. We then propose an
approach of temporal computation folding designed to further reduce the
redundancy of arithmetic calculations by exploiting the register reuse,
alleviating the increased register pressure, and deducing generalization with a
linear regression model. Experimental results on the AVX-2 and AVX-512 CPUs
show that our approach obtains a competitive performance.Comment: arXiv admin note: substantial text overlap with arXiv:2103.0882
Domain-Specific Multi-Level IR Rewriting for GPU
Traditional compilers operate on a single generic intermediate representation
(IR). These IRs are usually low-level and close to machine instructions. As a
result, optimizations relying on domain-specific information are either not
possible or require complex analysis to recover the missing information. In
contrast, multi-level rewriting instantiates a hierarchy of dialects (IRs),
lowers programs level-by-level, and performs code transformations at the most
suitable level. We demonstrate the effectiveness of this approach for the
weather and climate domain. In particular, we develop a prototype compiler and
design stencil- and GPU-specific dialects based on a set of newly introduced
design principles. We find that two domain-specific optimizations (500 lines of
code) realized on top of LLVM's extensible MLIR compiler infrastructure suffice
to outperform state-of-the-art solutions. In essence, multi-level rewriting
promises to herald the age of specialized compilers composed from domain- and
target-specific dialects implemented on top of a shared infrastructure.Comment: 12 pages, 16 figure