18,466 research outputs found
Implementation of a parallel unstructured Euler solver on shared and distributed memory architectures
An efficient three dimensional unstructured Euler solver is parallelized on a Cray Y-MP C90 shared memory computer and on an Intel Touchstone Delta distributed memory computer. This paper relates the experiences gained and describes the software tools and hardware used in this study. Performance comparisons between two differing architectures are made
Construction and Application of an AMR Algorithm for Distributed Memory Computers
While the parallelization of blockstructured adaptive mesh refinement techniques is relatively straight-forward on shared memory architectures, appropriate distribution strategies for the emerging generation of distributed
memory machines are a topic of on-going research. In this paper, a locality-preserving domain decomposition is proposed that partitions the entire AMR hierarchy from the base level on. It is shown that the approach reduces the
communication costs and simplifies the implementation. Emphasis is put on the effective parallelization of the flux correction procedure at coarse-fine boundaries, which is indispensable for conservative finite volume schemes. An
easily reproducible standard benchmark and a highly resolved parallel AMR
simulation of a diffracting hydrogen-oxygen detonation demonstrate the proposed
strategy in practice
High-order, Dispersionless "Fast-Hybrid" Wave Equation Solver. Part I: Sampling Cost via Incident-Field Windowing and Recentering
This paper proposes a frequency/time hybrid integral-equation method for the
time dependent wave equation in two and three-dimensional spatial domains.
Relying on Fourier Transformation in time, the method utilizes a fixed
(time-independent) number of frequency-domain integral-equation solutions to
evaluate, with superalgebraically-small errors, time domain solutions for
arbitrarily long times. The approach relies on two main elements, namely, 1) A
smooth time-windowing methodology that enables accurate band-limited
representations for arbitrarily-long time signals, and 2) A novel Fourier
transform approach which, in a time-parallel manner and without causing
spurious periodicity effects, delivers numerically dispersionless
spectrally-accurate solutions. A similar hybrid technique can be obtained on
the basis of Laplace transforms instead of Fourier transforms, but we do not
consider the Laplace-based method in the present contribution. The algorithm
can handle dispersive media, it can tackle complex physical structures, it
enables parallelization in time in a straightforward manner, and it allows for
time leaping---that is, solution sampling at any given time at
-bounded sampling cost, for arbitrarily large values of ,
and without requirement of evaluation of the solution at intermediate times.
The proposed frequency-time hybridization strategy, which generalizes to any
linear partial differential equation in the time domain for which
frequency-domain solutions can be obtained (including e.g. the time-domain
Maxwell equations), and which is applicable in a wide range of scientific and
engineering contexts, provides significant advantages over other available
alternatives such as volumetric discretization, time-domain integral equations,
and convolution-quadrature approaches.Comment: 33 pages, 8 figures, revised and extended manuscript (and now
including direct comparisons to existing CQ and TDIE solver implementations)
(Part I of II
Recent Advances in Graph Partitioning
We survey recent trends in practical algorithms for balanced graph
partitioning together with applications and future research directions
Achieving High Speed CFD simulations: Optimization, Parallelization, and FPGA Acceleration for the unstructured DLR TAU Code
Today, large scale parallel simulations are fundamental tools to handle complex problems. The number of processors in current computation platforms has been recently increased and therefore it is necessary to optimize the application performance and to enhance the scalability of massively-parallel systems. In addition, new heterogeneous architectures, combining conventional processors with specific hardware, like FPGAs, to accelerate the most time consuming functions are considered as a strong alternative to boost the performance.
In this paper, the performance of the DLR TAU code is analyzed and optimized. The improvement of the code efficiency is addressed through three key activities: Optimization, parallelization and hardware acceleration. At first, a profiling analysis of the most time-consuming processes of the Reynolds Averaged Navier Stokes flow solver on a three-dimensional unstructured mesh is performed. Then, a study of the code scalability with new partitioning algorithms are tested to show the most suitable partitioning algorithms for the selected applications. Finally, a feasibility study on the application of FPGAs and GPUs for the hardware acceleration of CFD simulations is presented
- …