739 research outputs found
Parallel Anisotropic Unstructured Grid Adaptation
Computational Fluid Dynamics (CFD) has become critical to the design and analysis of aerospace vehicles. Parallel grid adaptation that resolves multiple scales with anisotropy is identified as one of the challenges in the CFD Vision 2030 Study to increase the capacity and capability of CFD simulation. The Study also cautions that computer architectures are undergoing a radical change and dramatic increases in algorithm concurrency will be required to exploit full performance. This paper reviews four different methods to parallel anisotropic grid generation. They cover both ends of the spectrum: (i) using existing state-of-the-art software optimized for a single core and modifying it for parallel platforms and (ii) designing and implementing scalable software with incomplete, but rapidly maturating functionality. A brief overview for each grid adaptation system is presented in the context of a telescopic approach for multilevel concurrency. These methods employ different approaches to enable parallel execution, which provides a unique opportunity to illustrate the relative behavior of each approach. Qualitative and quantitative metric evaluations are used to draw lessons for future developments in this critical area for parallel CFD simulation
Million Atom Electronic Structure and Device Calculations on Peta-Scale Computers
Semiconductor devices are scaled down to the level which constituent
materials are no longer considered continuous. To account for atomistic
randomness, surface effects and quantum mechanical effects, an atomistic
modeling approach needs to be pursued. The Nanoelectronic Modeling Tool (NEMO
3-D) has satisfied the requirement by including emprical and
tight binding models and considering strain to successfully
simulate various semiconductor material systems. Computationally, however, NEMO
3-D needs significant improvements to utilize increasing supply of processors.
This paper introduces the new modeling tool, OMEN 3-D, and discusses the major
computational improvements, the 3-D domain decomposition and the multi-level
parallelism. As a featured application, a full 3-D parallelized
Schr\"odinger-Poisson solver and its application to calculate the bandstructure
of doped phosphorus(P) layer in silicon is demonstrated. Impurity
bands due to the donor ion potentials are computed.Comment: 4 pages, 6 figures, IEEE proceedings of the 13th International
Workshop on Computational Electronics, Tsinghua University, Beijing, May
27-29 200
Hybrid PDE solver for data-driven problems and modern branching
The numerical solution of large-scale PDEs, such as those occurring in
data-driven applications, unavoidably require powerful parallel computers and
tailored parallel algorithms to make the best possible use of them. In fact,
considerations about the parallelization and scalability of realistic problems
are often critical enough to warrant acknowledgement in the modelling phase.
The purpose of this paper is to spread awareness of the Probabilistic Domain
Decomposition (PDD) method, a fresh approach to the parallelization of PDEs
with excellent scalability properties. The idea exploits the stochastic
representation of the PDE and its approximation via Monte Carlo in combination
with deterministic high-performance PDE solvers. We describe the ingredients of
PDD and its applicability in the scope of data science. In particular, we
highlight recent advances in stochastic representations for nonlinear PDEs
using branching diffusions, which have significantly broadened the scope of
PDD.
We envision this work as a dictionary giving large-scale PDE practitioners
references on the very latest algorithms and techniques of a non-standard, yet
highly parallelizable, methodology at the interface of deterministic and
probabilistic numerical methods. We close this work with an invitation to the
fully nonlinear case and open research questions.Comment: 23 pages, 7 figures; Final SMUR version; To appear in the European
Journal of Applied Mathematics (EJAM
Achieving High Speed CFD simulations: Optimization, Parallelization, and FPGA Acceleration for the unstructured DLR TAU Code
Today, large scale parallel simulations are fundamental tools to handle complex problems. The number of processors in current computation platforms has been recently increased and therefore it is necessary to optimize the application performance and to enhance the scalability of massively-parallel systems. In addition, new heterogeneous architectures, combining conventional processors with specific hardware, like FPGAs, to accelerate the most time consuming functions are considered as a strong alternative to boost the performance.
In this paper, the performance of the DLR TAU code is analyzed and optimized. The improvement of the code efficiency is addressed through three key activities: Optimization, parallelization and hardware acceleration. At first, a profiling analysis of the most time-consuming processes of the Reynolds Averaged Navier Stokes flow solver on a three-dimensional unstructured mesh is performed. Then, a study of the code scalability with new partitioning algorithms are tested to show the most suitable partitioning algorithms for the selected applications. Finally, a feasibility study on the application of FPGAs and GPUs for the hardware acceleration of CFD simulations is presented
- …