Search CORE

13 research outputs found

Production Level CFD Code Acceleration for Hybrid Many-Core Architectures

Author: Duffy Austen C.
Hammond Dana P.
Nielsen Eric J.
Publication venue
Publication date
Field of study

In this work, a novel graphics processing unit (GPU) distributed sharing model for hybrid many-core architectures is introduced and employed in the acceleration of a production-level computational fluid dynamics (CFD) code. The latest generation graphics hardware allows multiple processor cores to simultaneously share a single GPU through concurrent kernel execution. This feature has allowed the NASA FUN3D code to be accelerated in parallel with up to four processor cores sharing a single GPU. For codes to scale and fully use resources on these and the next generation machines, codes will need to employ some type of GPU sharing model, as presented in this work. Findings include the effects of GPU sharing on overall performance. A discussion of the inherent challenges that parallel unstructured CFD codes face in accelerator-based computing environments is included, with considerations for future generation architectures. This work was completed by the author in August 2010, and reflects the analysis and results of the time

NASA Technical Reports Server

Optimization of Ported CFD Kernels on Intel Data Center GPU Max 1550 Using oneAPI ESIMD

Author: Bauinger Christoph
Nastac Gabriel
Nielsen Eric
Walden Aaron
Zhu Xiao
Zubair Mohammad
Publication venue: ODU Digital Commons
Publication date: 01/01/2023
Field of study

We describe our experience porting FUN3D’s CUDA-optimized kernels to Intel oneAPI SYCL.We faced several challenges, including foremost the suboptimal performance of the oneAPI code on Intel’s new data center GPU. Suboptimal performance of the oneAPI code was due primarily to high register spills, memory latency, and poor vectorization. We addressed these issues by implementing the kernels using Intel oneAPI’s Explicit SIMD SYCL extension (ESIMD) API. The ESIMD API enables the writing of explicitly vectorized kernel code, gives more precise control over register usage and prefetching, and better handles thread divergence compared to SYCL. The ESIMD code outperforms the optimized SYCL code by up to a factor of 3.6, depending on the kernel.We also compared the performance of three ESIMD kernels on the Intel Data Center Max 1550 GPU with the CUDA-optimized versions on NVIDIA V100 and A100 GPUs. We found the performance of a single tile of the Intel GPU using ESIMD greater than NVIDIA V100 and similar to NVIDIA A100

Old Dominion University

Geometry Modeling for Unstructured Mesh Adaptation

Author: Dannenhoffer John F., III
Haimes Robert
Jones William T.
Kleb Bill
Krakos Joshua A.
Loseille Adrien
Michal Todd
Park Michael A.
Publication venue
Publication date
Field of study

The quantification and control of discretization error is critical to obtaining reliable simulation results. Adaptive mesh techniques have the potential to automate discretization error control, but have made limited impact on production analysis workflow. Recent progress has matured a number of independent implementations of flow solvers, error estimation methods, and anisotropic mesh adaptation mechanics. However, the poor integration of initial mesh generation and adaptive mesh mechanics to typical sources of geometry has hindered adoption of adaptive mesh techniques, where these geometries are often created in Mechanical Computer- Aided Design (MCAD) systems. The difficulty of this coupling is compounded by two factors: the inherent complexity of the model (e.g., large range of scales, bodies in proximity, details not required for analysis) and unintended geometry construction artifacts (e.g., translation, uneven parameterization, degeneracy, self-intersection, sliver faces, gaps, large tolerances be- tween topological elements, local high curvature to enforce continuity). Manual preparation of geometry is commonly employed to enable fixed-grid and adaptive-grid workflows by reducing the severity and negative impacts of these construction artifacts, but manual process interaction inhibits workflow automation. Techniques to permit the use of complex geometry models and reduce the impact of geometry construction artifacts on unstructured grid workflows are models from the AIAA Sonic Boom and High Lift Prediction are shown to demonstrate the utility of the current approach

NASA Technical Reports Server

Parallelization of Numerical Methods on Parallel Processor Architectures

Author: László Endre
Publication venue
Publication date: 01/01/2016
Field of study

REAL-PhD

Computational Methods in Science and Engineering : Proceedings of the Workshop SimLabs@KIT, November 29 - 30, 2010, Karlsruhe, Germany

Author: Kirner Ole
Kondov Ivan
Poghosyan Gevorg
Schmitz Frank
Schneider Olaf
Publication venue: KIT Scientific Publishing, Karlsruhe
Publication date: 01/01/2011
Field of study

In this proceedings volume we provide a compilation of article contributions equally covering applications from different research fields and ranging from capacity up to capability computing. Besides classical computing aspects such as parallelization, the focus of these proceedings is on multi-scale approaches and methods for tackling algorithm and data complexity. Also practical aspects regarding the usage of the HPC infrastructure and available tools and software at the SCC are presented

KITopen

Asynchronous versions of Jacobi, multigrid, and Chebyshev solvers

Author: Wolfson-Pou Jordi
Publication venue: Georgia Institute of Technology
Publication date: 08/09/2020
Field of study

Iterative methods are commonly used for solving large, sparse systems of linear equations on parallel computers. Implementations of parallel iterative solvers contain kernels (e.g., parallel sparse matrix-vector products) in which parallel processes alternate between phases of computation and communication. Standard software packages use synchronous implementations where there are one or more synchronization points per iteration. These synchronization points occur during communication phases where each process sends data to other processes and idles until all data needed for the next iteration is received. Synchronization points scale poorly on massively parallel machines and may become the primary bottleneck for future exascale computers. This calls for research and development of asynchronous iterative methods, which is the subject of this dissertation. In asynchronous iterative methods there are no synchronization points. This means that, after a phase of computation, processes immediately proceed to the next phase of computation using whatever data is currently available. Since the late 1960s, research on asynchronous methods has primarily considered basic fixed-point methods, e.g., Jacobi, where proving asymptotic convergence bounds has been the focus. However, the practical behavior of asynchronous methods is not well understood, and asynchronous versions of certain fast-converging solvers have not been developed. This dissertation focuses on studying the practical behavior of asynchronous Jacobi, developing new communication-avoiding asynchronous iterative solvers, and introducing the first asynchronous versions of multigrid and Chebyshev. To better understand the practical behavior of asynchronous Jacobi, we examine a model of asynchronous Jacobi where communication delays are neglected. We call this model simplified asynchronous Jacobi. Simplified asynchronous Jacobi can be used to model asynchronous Jacobi implemented in shared memory or distributed memory with fast communication networks. We analyze simplified asynchronous Jacobi for linear systems where the coefficient matrix is symmetric positive-definite and compare our analysis to experimental results from shared and distributed memory implementations. We present three important results for asynchronous Jacobi: it can converge when synchronous Jacobi does not, it can reduce the residual norm when some processes are delayed, and its convergence rate can increase with increasing parallelism. We develop new asynchronous communication-avoiding methods using the idea of the sequential Southwell method. In the sequential Southwell method, which converges faster than Gauss-Seidel, the component of the residual with the largest residual in absolute value is relaxed during each iteration. We use the idea of choosing large residual values to create communication-avoiding parallel methods, where residual values of communication neighbors are compared rather than computing a global maximum. We present three methods: the Parallel Southwell, Distributed Southwell, and Stochastic Parallel Southwell methods. All our methods converge faster than Jacobi and use less communication. We introduce the first asynchronous multigrid methods. We use the idea of additive multigrid where smoothing on all grids is carried out concurrently. We present models of asynchronous additive multigrid and use these models to study the convergence properties of asynchronous multigrid. We also introduce algorithms for implementing asynchronous multigrid in shared and distributed memory. Our experimental results show that asynchronous multigrid can exhibit grid-size independent convergence and can be faster than classical multigrid in terms of wall-clock time. Lastly, we present the first asynchronous Chebyshev methods. We present models of Jacobi-preconditioned asynchronous Chebyshev. We use a little-known version of the BPX multigrid preconditioner where BPX is written as Jacobi on an extended system, which makes BPX convenient for asynchronous execution within Chebsyhev. Our experimental results show that asynchronous Chebyshev is faster than its synchronous counterpart in terms of both wall-clock time and number of iterations.Ph.D

Scholarly Materials And Research @ Georgia Tech

Proceedings, MSVSCC 2018

Author: Old Dominion University Department of Modeling, Simulation & Visualization Engineering
Old Dominion University Virginia Modeling, Analysis & Simulation Center
Publication venue: ODU Digital Commons
Publication date: 19/04/2018
Field of study

Proceedings of the 12th Annual Modeling, Simulation & Visualization Student Capstone Conference held on April 19, 2018 at VMASC in Suffolk, Virginia. 155 pp

Old Dominion University

Proceedings of the 2018 Canadian Society for Mechanical Engineering (CSME) International Congress

Author: Czekanski Aleksander
Publication venue: 'York University Libraries'
Publication date: 30/05/2018
Field of study

Published proceedings of the 2018 Canadian Society for Mechanical Engineering (CSME) International Congress, hosted by York University, 27-30 May 2018

YorkSpace

Undergraduate Course Catalog 2015-2016

Author: University of New Hampshire
Publication venue: University of New Hampshire Scholars\u27 Repository
Publication date: 01/01/2015
Field of study

UNH Scholars' Repository

Undergraduate Academic Catalog 2020-2021

Author: University of New Hampshire
Publication venue: University of New Hampshire Scholars\u27 Repository
Publication date: 01/01/2020
Field of study

UNH Scholars' Repository