14,690 research outputs found
From Physics Model to Results: An Optimizing Framework for Cross-Architecture Code Generation
Starting from a high-level problem description in terms of partial
differential equations using abstract tensor notation, the Chemora framework
discretizes, optimizes, and generates complete high performance codes for a
wide range of compute architectures. Chemora extends the capabilities of
Cactus, facilitating the usage of large-scale CPU/GPU systems in an efficient
manner for complex applications, without low-level code tuning. Chemora
achieves parallelism through MPI and multi-threading, combining OpenMP and
CUDA. Optimizations include high-level code transformations, efficient loop
traversal strategies, dynamically selected data and instruction cache usage
strategies, and JIT compilation of GPU code tailored to the problem
characteristics. The discretization is based on higher-order finite differences
on multi-block domains. Chemora's capabilities are demonstrated by simulations
of black hole collisions. This problem provides an acid test of the framework,
as the Einstein equations contain hundreds of variables and thousands of terms.Comment: 18 pages, 4 figures, accepted for publication in Scientific
Programmin
Numerical Fitting-based Likelihood Calculation to Speed up the Particle Filter
The likelihood calculation of a vast number of particles is the computational
bottleneck for the particle filter in applications where the observation
information is rich. For fast computing the likelihood of particles, a
numerical fitting approach is proposed to construct the Likelihood Probability
Density Function (Li-PDF) by using a comparably small number of so-called
fulcrums. The likelihood of particles is thereby analytically inferred,
explicitly or implicitly, based on the Li-PDF instead of directly computed by
utilizing the observation, which can significantly reduce the computation and
enables real time filtering. The proposed approach guarantees the estimation
quality when an appropriate fitting function and properly distributed fulcrums
are used. The details for construction of the fitting function and fulcrums are
addressed respectively in detail. In particular, to deal with multivariate
fitting, the nonparametric kernel density estimator is presented which is
flexible and convenient for implicit Li-PDF implementation. Simulation
comparison with a variety of existing approaches on a benchmark 1-dimensional
model and multi-dimensional robot localization and visual tracking demonstrate
the validity of our approach.Comment: 42 pages, 17 figures, 4 tables and 1 appendix. This paper is a
draft/preprint of one paper submitted to the IEEE Transaction
Achieving High Speed CFD simulations: Optimization, Parallelization, and FPGA Acceleration for the unstructured DLR TAU Code
Today, large scale parallel simulations are fundamental tools to handle complex problems. The number of processors in current computation platforms has been recently increased and therefore it is necessary to optimize the application performance and to enhance the scalability of massively-parallel systems. In addition, new heterogeneous architectures, combining conventional processors with specific hardware, like FPGAs, to accelerate the most time consuming functions are considered as a strong alternative to boost the performance.
In this paper, the performance of the DLR TAU code is analyzed and optimized. The improvement of the code efficiency is addressed through three key activities: Optimization, parallelization and hardware acceleration. At first, a profiling analysis of the most time-consuming processes of the Reynolds Averaged Navier Stokes flow solver on a three-dimensional unstructured mesh is performed. Then, a study of the code scalability with new partitioning algorithms are tested to show the most suitable partitioning algorithms for the selected applications. Finally, a feasibility study on the application of FPGAs and GPUs for the hardware acceleration of CFD simulations is presented
Recent Advances in Graph Partitioning
We survey recent trends in practical algorithms for balanced graph
partitioning together with applications and future research directions
- …