14,690 research outputs found

    From Physics Model to Results: An Optimizing Framework for Cross-Architecture Code Generation

    Full text link
    Starting from a high-level problem description in terms of partial differential equations using abstract tensor notation, the Chemora framework discretizes, optimizes, and generates complete high performance codes for a wide range of compute architectures. Chemora extends the capabilities of Cactus, facilitating the usage of large-scale CPU/GPU systems in an efficient manner for complex applications, without low-level code tuning. Chemora achieves parallelism through MPI and multi-threading, combining OpenMP and CUDA. Optimizations include high-level code transformations, efficient loop traversal strategies, dynamically selected data and instruction cache usage strategies, and JIT compilation of GPU code tailored to the problem characteristics. The discretization is based on higher-order finite differences on multi-block domains. Chemora's capabilities are demonstrated by simulations of black hole collisions. This problem provides an acid test of the framework, as the Einstein equations contain hundreds of variables and thousands of terms.Comment: 18 pages, 4 figures, accepted for publication in Scientific Programmin

    Numerical Fitting-based Likelihood Calculation to Speed up the Particle Filter

    Get PDF
    The likelihood calculation of a vast number of particles is the computational bottleneck for the particle filter in applications where the observation information is rich. For fast computing the likelihood of particles, a numerical fitting approach is proposed to construct the Likelihood Probability Density Function (Li-PDF) by using a comparably small number of so-called fulcrums. The likelihood of particles is thereby analytically inferred, explicitly or implicitly, based on the Li-PDF instead of directly computed by utilizing the observation, which can significantly reduce the computation and enables real time filtering. The proposed approach guarantees the estimation quality when an appropriate fitting function and properly distributed fulcrums are used. The details for construction of the fitting function and fulcrums are addressed respectively in detail. In particular, to deal with multivariate fitting, the nonparametric kernel density estimator is presented which is flexible and convenient for implicit Li-PDF implementation. Simulation comparison with a variety of existing approaches on a benchmark 1-dimensional model and multi-dimensional robot localization and visual tracking demonstrate the validity of our approach.Comment: 42 pages, 17 figures, 4 tables and 1 appendix. This paper is a draft/preprint of one paper submitted to the IEEE Transaction

    Achieving High Speed CFD simulations: Optimization, Parallelization, and FPGA Acceleration for the unstructured DLR TAU Code

    Get PDF
    Today, large scale parallel simulations are fundamental tools to handle complex problems. The number of processors in current computation platforms has been recently increased and therefore it is necessary to optimize the application performance and to enhance the scalability of massively-parallel systems. In addition, new heterogeneous architectures, combining conventional processors with specific hardware, like FPGAs, to accelerate the most time consuming functions are considered as a strong alternative to boost the performance. In this paper, the performance of the DLR TAU code is analyzed and optimized. The improvement of the code efficiency is addressed through three key activities: Optimization, parallelization and hardware acceleration. At first, a profiling analysis of the most time-consuming processes of the Reynolds Averaged Navier Stokes flow solver on a three-dimensional unstructured mesh is performed. Then, a study of the code scalability with new partitioning algorithms are tested to show the most suitable partitioning algorithms for the selected applications. Finally, a feasibility study on the application of FPGAs and GPUs for the hardware acceleration of CFD simulations is presented
    corecore