347 research outputs found

    Three-dimensional Euler time accurate simulations of fan rotor-stator interactions

    Get PDF
    A numerical method useful to describe unsteady 3-D flow fields within turbomachinery stages is presented. The method solves the compressible, time dependent, Euler conservation equations with a finite volume, flux splitting, total variation diminishing, approximately factored, implicit scheme. Multiblock composite gridding is used to partition the flow field into a specified arrangement of blocks with static and dynamic interfaces. The code is optimized to take full advantage of the processing power and speed of the Cray Y/MP supercomputer. The method is applied to the computation of the flow field within a single stage, axial flow fan, thus reproducing the unsteady 3-D rotor-stator interaction

    Nonlinear structural response using adaptive dynamic relaxation on a massively-parallel-processing system

    Get PDF
    A parallel adaptive dynamic relaxation (ADR) algorithm has been developed for nonlinear structural analysis. This algorithm has minimal memory requirements, is easily parallelizable and scalable to many processors, and is generally very reliable and efficient for highly nonlinear problems. Performance evaluations on single-processor computers have shown that the ADR algorithm is reliable and highly vectorizable, and that it is competitive with direct solution methods for the highly nonlinear problems considered. The present algorithm is implemented on the 512-processor Intel Touchstone DELTA system at Caltech, and it is designed to minimize the extent and frequency of interprocessor communication. The algorithm has been used to solve for the nonlinear static response of two and three dimensional hyperelastic systems involving contact. Impressive relative speedups have been achieved and demonstrate the high scalability of the ADR algorithm. For the class of problems addressed, the ADR algorithm represents a very promising approach for parallel-vector processing

    Code Generation for High Performance PDE Solvers on Modern Architectures

    Get PDF
    Numerical simulation with partial differential equations is an important discipline in high performance computing. Notable application areas include geosciences, fluid dynamics, solid mechanics and electromagnetics. Recent hardware developments have made it increasingly hard to achieve very good performance. This is both due to a lack of numerical algorithms suited for the hardware and efficient implementations of these algorithms not being available. Modern CPUs require a sufficiently high arithmetic intensity in order to unfold their full potential. In this thesis, we use a numerical scheme that is well-suited for this scenario: The Discontinuous Galerkin Finite Element Method on cuboid meshes can be implemented with optimal complexity exploiting the tensor product structure of basis functions and quadrature formulae using a technique called sum factorization. A matrix-free implementation of this scheme significantly lowers the memory footprint of the method and delivers a fully compute-bound algorithm. An efficient implementation of this scheme for a modern CPU requires maximum use of the processor’s SIMD units. General purpose compilers are not capable of autovectorizing traditional PDE simulation codes, requiring high performance implementations to explicitly spell out SIMD instructions. With the SIMD width increasing in the last years (reaching its current peak at 512 bits in the Intel Skylake architecture) and programming languages not providing tools to directly target SIMD units, such code suffers from a performance portability issue. This work proposes generative programming as a solution to this issue. To this end, we develop a toolchain that translates a PDE problem expressed in a domain specific language into a piece of machine-dependent, optimized C++ code. This toolchain is embedded into the existing user workflow of the DUNE project, an open source framework for the numerical solution of PDEs. Compared to other such toolchains, special emphasis is put on an intermediate representation that enables performance-oriented transformations. Furthermore, this thesis defines a new class of SIMD vectorization strategies that operate on batches of subkernels within one integration kernel. The space of these vectorization strategies is explored systematically from within the code generator in an autotuning procedure. We demonstrate the performance of our vectorization strategies and their implementation by providing measurements on the Intel Haswell and Intel Skylake architectures. We present numbers for the diffusion-reaction equation, the Stokes equations and Maxwell’s equations, achieving up to 40% of the machine’s theoretical floating point performance for an application of the DG operator

    Dynamic Boundary Element Analysis of Machine Foundations

    Get PDF
    The central theme of this thesis is the further development of boundary element methods for the analysis of three-dimensional machine foundations, pertaining to various (translational and rotational) modes of vibration and, in particular, to high frequency response. Surface and embedded rectangular foundations are considered. The soil is assumed to behave approximately as a linear elastic material for small amplitudes of strain. The problem is formulated and solved in the frequency domain. This work includes rigorous theoretical studies, effective numerical techniques for the solution of the boundary integral equations, and efficient computer implementation of the algorithm. The derivation of the boundary integral formulation is reviewed and the dynamic fundamental solutions are examined in detail. The particular fundamental solutions for incompressible media has been derived in order to deal more effectively with these materials. Advanced integration schemes for non-singular and singular integrals have been developed in order to improve the computational accuracy and efficiency of the boundary element analysis. A novel infinite boundary element for dynamic analyses has been developed, which provides an efficient means for including far-field effects, without the necessity of explicit discrete representation outside the near field. The implementation and vectorization of the computer program using the IBM 3090-150 Vector Facility is described. Various numerical results for rectangular foundations are presented in order to illustrate the potential of the infinite boundary element formulation. Included among these are new results pertaining to the high frequency response of machine foundations

    An Efficient Estimator for Dealing with Missing Data on Explanatory Variables in a Probit Choice Model

    Get PDF
    A common approach to dealing with missing data in econometrics is to estimate the model on the common subset of data, by necessity throwing away potentially useful data. In this paper we consider a particular pattern of missing data on explanatory variables that often occurs in practice and develop a new efficient estimator for models where the dependent variable is binary. We derive exact formulae for the estimator and its asymptotic variance. Simulation results show that our estimator performs well when compared to popular alternatives, such as complete case analysis and multiple imputation. We then use our estimator to examine the portfolio allocation decision of Italian households using the Survey of Household Income and Wealth carried out by the Bank of ItalyMissing Data, Probit Model, Portfolio Allocation, Risk Aversion

    Simulation of 1+1 dimensional surface growth and lattices gases using GPUs

    Get PDF
    Restricted solid on solid surface growth models can be mapped onto binary lattice gases. We show that efficient simulation algorithms can be realized on GPUs either by CUDA or by OpenCL programming. We consider a deposition/evaporation model following Kardar-Parisi-Zhang growth in 1+1 dimensions related to the Asymmetric Simple Exclusion Process and show that for sizes, that fit into the shared memory of GPUs one can achieve the maximum parallelization speedup ~ x100 for a Quadro FX 5800 graphics card with respect to a single CPU of 2.67 GHz). This permits us to study the effect of quenched columnar disorder, requiring extremely long simulation times. We compare the CUDA realization with an OpenCL implementation designed for processor clusters via MPI. A two-lane traffic model with randomized turning points is also realized and the dynamical behavior has been investigated.Comment: 20 pages 12 figures, 1 table, to appear in Comp. Phys. Com

    HPC-enabling technologies for high-fidelity combustion simulations

    Get PDF
    With the increase in computational power in the last decade and the forthcoming Exascale supercomputers, a new horizon in computational modelling and simulation is envisioned in combustion science. Considering the multiscale and multiphysics characteristics of turbulent reacting flows, combustion simulations are considered as one of the most computationally demanding applications running on cutting-edge supercomputers. Exascale computing opens new frontiers for the simulation of combustion systems as more realistic conditions can be achieved with high-fidelity methods. However, an efficient use of these computing architectures requires methodologies that can exploit all levels of parallelism. The efficient utilization of the next generation of supercomputers needs to be considered from a global perspective, that is, involving physical modelling and numerical methods with methodologies based on High-Performance Computing (HPC) and hardware architectures. This review introduces recent developments in numerical methods for large-eddy simulations (LES) and direct-numerical simulations (DNS) to simulate combustion systems, with focus on the computational performance and algorithmic capabilities. Due to the broad scope, a first section is devoted to describe the fundamentals of turbulent combustion, which is followed by a general description of state-of-the-art computational strategies for solving these problems. These applications require advanced HPC approaches to exploit modern supercomputers, which is addressed in the third section. The increasing complexity of new computing architectures, with tightly coupled CPUs and GPUs, as well as high levels of parallelism, requires new parallel models and algorithms exposing the required level of concurrency. Advances in terms of dynamic load balancing, vectorization, GPU acceleration and mesh adaptation have permitted to achieve highly-efficient combustion simulations with data-driven methods in HPC environments. Therefore, dedicated sections covering the use of high-order methods for reacting flows, integration of detailed chemistry and two-phase flows are addressed. Final remarks and directions of future work are given at the end. }The research leading to these results has received funding from the European Union’s Horizon 2020 Programme under the CoEC project, grant agreement No. 952181 and the CoE RAISE project grant agreement no. 951733.Peer ReviewedPostprint (published version

    Development of upwind schemes for the Euler equations

    Get PDF
    Described are many algorithmic and computational aspects of upwind schemes and their second-order accurate formulations based on Total-Variation-Diminishing (TVD) approaches. An operational unification of the underlying first-order scheme is first presented encompassing Godunov's, Roe's, Osher's, and Split-Flux methods. For higher order versions, the preprocessing and postprocessing approaches to constructing TVD discretizations are considered. TVD formulations can be used to construct relaxation methods for unfactored implicit upwind schemes, which in turn can be exploited to construct space-marching procedures for even the unsteady Euler equations. A major part of the report describes time- and space-marching procedures for solving the Euler equations in 2-D, 3-D, Cartesian, and curvilinear coordinates. Along with many illustrative examples, several results of efficient computations on 3-D supersonic flows with subsonic pockets are presented
    • 

    corecore