1,978 research outputs found
Recommended from our members
Swept time-space domain decomposition on GPUs and heterogeneous computing systems
Modern scientific and engineering problems often require simulations with a level of resolution difficult to achieve in reasonable amounts of time—even in effectively parallelized programs. Therefore, applications that exploit high performance computing (HPC) systems have become invaluable in academia and industry over the past two decades. Addressing the questions that arise from continual scientific advancement requires solutions from hardware and software are required to supply the necessary throughput for demand across scientific disciplines.
The most important development on the hardware side has been the General Purpose Graphics Processing Unit (GPGPU), a class of massively parallel device that now composes a substantial portion of the computational power of the top 500 supercomputers. As these systems grow, barriers to increased performance arise from small costs accumulated over innumerable iterations such as latency, the fixed cost of memory accesses, which becomes significantly larger when access requires communication between two distant CPU processes. This thesis implements and analyzes swept time-space domain decomposition, a communication avoiding scheme for time-stepping stencil codes, for GPGPU and heterogeneous (CPU/GPU) architectures.
The GPGPU program significantly improves the execution time of finite-difference solvers for relatively simple one-dimensional time-stepping partial differential equations (PDEs). The swept decomposition code showed speedups of 2-9x compared with simple GPU domain decompositions and 7-300x compared with parallel CPU versions over a range of problem sizes: 10ÂÂ3 – 106 spatial points. However, for a more sophisticated one-dimensional system of equations discretized with a second-order finite-volume scheme, the swept rule performs 1.2-1.9x than a standard implementation for all problem sizes. The program targeting heterogeneous systems with distributed memory patterns performs significantly better on both simple problems, speedup 4-18x, and more complex equation systems, speedup 1.5-3x, over the range of problem sizes: 105-107 spatial points. This demonstrates the benefit of GPU architecture and the contingent effectiveness of swept time-space decomposition for accelerating explicit PDE solvers on current computational architectures
Recommended from our members
Accelerating solutions of one-dimensional unsteady PDEs with GPU-based swept time-space decomposition
The expedient design of precision components in aerospace and other high-tech industries requires simulations of physical phenomena often described by partial differential equations (PDEs) without exact solutions. Modern design problems require simulations with a level of resolution difficult to achieve in reasonable amounts of time-even in effectively parallelized solvers. Though the scale of the problem relative to available computing power is the greatest impediment to accelerating these applications, significant performance gains can be achieved through careful attention to the details of memory communication and access. The swept time-space decomposition rule reduces communication between sub-domains by exhausting the domain of influence before communicating boundary values. Here we present a GPU implementation of the swept rule, which modifies the algorithm for improved performance on this processing architecture by prioritizing use of private (shared) memory, avoiding interblock communication, and overwriting unnecessary values. It shows significant improvement in the execution time of finite-difference solvers for one-dimensional unsteady PDEs, producing speedups of 2-9 x for a range of problem sizes, respectively, compared with simple GPU versions and 7-300 x compared with parallel CPU versions. However, for a more sophisticated one-dimensional system of equations discretized with a second-order finite-volume scheme, the swept rule performs 1.2-1.9 x worse than a standard implementation for all problem sizes. (C) 2017 Elsevier Inc. All rights reserved
A survey of computational aerodynamics in the United States
Programs in theoretical and computational aerodynamics in the United States are described. Those aspects of programs that relate to aeronautics are detailed. The role of analysis at various levels of sophistication is discussed as well as the inverse solution techniques that are of primary importance in design methodology. The research is divided into the broad categories of application for boundary layer flow, Navier-Stokes turbulence modeling, internal flows, two-dimensional configurations, subsonic and supersonic aircraft, transonic aircraft, and the space shuttle. A survey of representative work in each area is presented
A fast Fourier transform approach to dislocation-based polycrystal plasticity
Polycrystalline materials serve as a basis for much of our current technology and will undoubtedly continue to serve a similar role in the future. Their mechanical properties depend not only on intragranular interactions between various defects, including the distribution of sizes and orientations of the grains, but also interactions with the grain boundaries. Modeling the mechanical behavior of polycrystals has become a standard part of the multiscale treatment of deformation. Currently, polycrystalline simulations are done through crystal plasticity methods, which are often informed through elastically isotropic single-crystal dislocation dynamics studies. These single-crystal studies, however, miss out on crucial effects due to the presence of grain boundaries, and as such, a corrective factor has to be taken when applying the output to higher-scale methods. In addition, these studies are generally done under an assumption of isotropic elasticity, due to the computational expense incurred when including anisotropic calculations. I have developed a Fourier transform-based spectral method that allows for the simulation of the evolution defects, such as dislocations, in heterogeneous systems. This method allows for a more accurate understanding of the interplay between defects and their environment, and will have the capability to determine more accurate constitutive laws for the deformation of polycrystals, to be fed into crystal plasticity models
Coupling different discretizations for fluid structure interaction in a monolithic approach
In this thesis we present a monolithic coupling approach for the simulation of phenomena involving interacting fluid and structure using different discretizations for the subproblems. For many applications in fluid dynamics, the Finite Volume method is the first choice in simulation science. Likewise, for the simulation of structural mechanics the Finite Element method is one of the most, if not the most, popular discretization method. However, despite the advantages of these discretizations in their respective application domains, monolithic coupling schemes have so far been restricted to a single discretization for both subproblems. We present a fluid structure coupling scheme based on a mixed Finite Volume/Finite Element method that combines the benefits of these discretizations. An important challenge in coupling fluid and structure is the transfer of forces and velocities at the fluidstructure interface in a stable and efficient way. In our approach this is achieved by means of a fully implicit formulation, i.e., the transfer of forces and displacements is carried out in a common set of equations for fluid and structure. We assemble the two different discretizations for the fluid and structure subproblems as well as the coupling conditions for forces and displacements into a single large algebraic system. Since we simulate real world problems, as a consequence of the complexity of the considered geometries, we end up with algebraic systems with a large number of degrees of freedom. This necessitates the use of parallel solution techniques. Our work covers the design and implementation of the proposed heterogeneous monolithic coupling approach as well as the efficient solution of the arising large nonlinear systems on distributed memory supercomputers. We apply Newton’s method to linearize the fully implicit coupled nonlinear fluid structure interaction problem. The resulting linear system is solved with a Krylov subspace correction method. For the preconditioning of the iterative solver we propose the use of multilevel methods. Specifically, we study a multigrid as well as a two-level restricted additive Schwarz method. We illustrate the performance of our method on a benchmark example and compare the afore mentioned different preconditioning strategies for the parallel solution of the monolithic coupled system
Components of Nonlinear Oscillation and Optimal Averaging for Stiff PDEs
A novel solver which uses finite wave averaging to mitigate oscillatory stiffness is proposed and
analysed. We have found that triad resonances contribute to the oscillatory stiffness of the problem and
that they provide a natural way of understanding stability limits and the role averaging has on reducing
stiffness. In particular, an explicit formulation of the nonlinearity gives rise to a stiffness regulator function
which allows for analysis of the wave averaging.
A practical application of such a solver is also presented. As this method provides large timesteps at
comparable computational cost but with some additional error when compared to a full solution, it is a
natural choice for the coarse solver in a Parareal-style parallel-in-time method
Development of a Navier-Stokes algorithm for parallel-processing supercomputers
An explicit flow solver, applicable to the hierarchy of model equations ranging from Euler to full Navier-Stokes, is combined with several techniques designed to reduce computational expense. The computational domain consists of local grid refinements embedded in a global coarse mesh, where the locations of these refinements are defined by the physics of the flow. Flow characteristics are also used to determine which set of model equations is appropriate for solution in each region, thereby reducing not only the number of grid points at which the solution must be obtained, but also the computational effort required to get that solution. Acceleration to steady-state is achieved by applying multigrid on each of the subgrids, regardless of the particular model equations being solved. Since each of these components is explicit, advantage can readily be taken of the vector- and parallel-processing capabilities of machines such as the Cray X-MP and Cray-2
- …