7 research outputs found

    PowerFusion: A Tensor Compiler with Explicit Data Movement Description and Instruction-level Graph IR

    Full text link
    Deep neural networks (DNNs) are of critical use in different domains. To accelerate DNN computation, tensor compilers are proposed to generate efficient code on different domain-specific accelerators. Existing tensor compilers mainly focus on optimizing computation efficiency. However, memory access is becoming a key performance bottleneck because the computational performance of accelerators is increasing much faster than memory performance. The lack of direct description of memory access and data dependence in current tensor compilers' intermediate representation (IR) brings significant challenges to generate memory-efficient code. In this paper, we propose IntelliGen, a tensor compiler that can generate high-performance code for memory-intensive operators by considering both computation and data movement optimizations. IntelliGen represent a DNN program using GIR, which includes primitives indicating its computation, data movement, and parallel strategies. This information will be further composed as an instruction-level dataflow graph to perform holistic optimizations by searching different memory access patterns and computation operations, and generating memory-efficient code on different hardware. We evaluate IntelliGen on NVIDIA GPU, AMD GPU, and Cambricon MLU, showing speedup up to 1.97x, 2.93x, and 16.91x(1.28x, 1.23x, and 2.31x on average), respectively, compared to current most performant frameworks.Comment: 12 pages, 14 figure

    OLLIE: Derivation-based Tensor Program Optimizer

    Full text link
    Boosting the runtime performance of deep neural networks (DNNs) is critical due to their wide adoption in real-world tasks. Existing approaches to optimizing the tensor algebra expression of a DNN only consider expressions representable by a fixed set of predefined operators, missing possible optimization opportunities between general expressions. We propose OLLIE, the first derivation-based tensor program optimizer. OLLIE optimizes tensor programs by leveraging transformations between general tensor algebra expressions, enabling a significantly larger expression search space that includes those supported by prior work as special cases. OLLIE uses a hybrid derivation-based optimizer that effectively combines explorative and guided derivations to quickly discover highly optimized expressions. Evaluation on seven DNNs shows that OLLIE can outperform existing optimizers by up to 2.73×\times (1.46×\times on average) on an A100 GPU and up to 2.68×\times (1.51×\times) on a V100 GPU, respectively

    Boussinesq Simulation of Coastal Wave Interaction with Bottom-Mounted Porous Structures

    No full text
    A Boussinesq-type wave model is developed in this paper to simulate the interaction of coastal waves with bottom-mounted porous structures. The governing equations are rewritten in the conservative form to facilitate the use of hybrid finite volume (FV) and finite difference (FD) method. Higher-order slope terms are also inserted into the equations to account for rapidly varying bathymetry. The convective flux is approximated using the FV method, while the remaining terms are discretized using the FD method in a uniform rectangle grid system. The time integration is implemented using the third order Runge–Kutta method with an adaptive time step. A single GPU parallel computation is also implemented to save computation costs. The numerical model is validated against a series of experimental datasets, including data acquired in a new laboratory experiment. The predictions are in overall agreement with the measurements, proving that the model is capable of handling wave interaction with porous structures in the coastal region for a wide range of scenarios

    Boussinesq Simulation of Coastal Wave Interaction with Bottom-Mounted Porous Structures

    No full text
    A Boussinesq-type wave model is developed in this paper to simulate the interaction of coastal waves with bottom-mounted porous structures. The governing equations are rewritten in the conservative form to facilitate the use of hybrid finite volume (FV) and finite difference (FD) method. Higher-order slope terms are also inserted into the equations to account for rapidly varying bathymetry. The convective flux is approximated using the FV method, while the remaining terms are discretized using the FD method in a uniform rectangle grid system. The time integration is implemented using the third order Runge–Kutta method with an adaptive time step. A single GPU parallel computation is also implemented to save computation costs. The numerical model is validated against a series of experimental datasets, including data acquired in a new laboratory experiment. The predictions are in overall agreement with the measurements, proving that the model is capable of handling wave interaction with porous structures in the coastal region for a wide range of scenarios
    corecore