89,962 research outputs found

    Integral formulation of the measured equation of invariance

    Get PDF
    A novel integral formulation of the measured equation of invariance is derived from the reciprocity theorem. This formulation leads to a sparse matrix equation for the induced surface current, resulting in great CPU time and memory savings over the conventional approaches. The algorithm has been implemented for two-dimensional perfectly conducting scatterers.Peer ReviewedPostprint (published version

    Integral equation mei applied to three-dimensional arbitrary surfaces

    Get PDF
    The authors present a new formulation of the integral equation of the measured equation of invariance (MEI) as a confined field integral equation discretised by the method of moments, in which the use of numerically derived testing functions results in an approximately sparse linear system with storage memory requirements and a CPU time for computing the matrix coefficients proportional to the number of unknowns.Peer ReviewedPostprint (published version

    A Second-Order Distributed Trotter-Suzuki Solver with a Hybrid Kernel

    Full text link
    The Trotter-Suzuki approximation leads to an efficient algorithm for solving the time-dependent Schr\"odinger equation. Using existing highly optimized CPU and GPU kernels, we developed a distributed version of the algorithm that runs efficiently on a cluster. Our implementation also improves single node performance, and is able to use multiple GPUs within a node. The scaling is close to linear using the CPU kernels, whereas the efficiency of GPU kernels improve with larger matrices. We also introduce a hybrid kernel that simultaneously uses multicore CPUs and GPUs in a distributed system. This kernel is shown to be efficient when the matrix size would not fit in the GPU memory. Larger quantum systems scale especially well with a high number nodes. The code is available under an open source license.Comment: 11 pages, 10 figure

    Efficient GPU Offloading with OpenMP for a Hyperbolic Finite Volume Solver on Dynamically Adaptive Meshes

    Get PDF
    We identify and show how to overcome an OpenMP bottleneck in the administration of GPU memory. It arises for a wave equation solver on dynamically adaptive block-structured Cartesian meshes, which keeps all CPU threads busy and allows all of them to offload sets of patches to the GPU. Our studies show that multithreaded, concurrent, non-deterministic access to the GPU leads to performance breakdowns, since the GPU memory bookkeeping as offered through OpenMP’s map clause, i.e., the allocation and freeing, becomes another runtime challenge besides expensive data transfer and actual computation. We, therefore, propose to retain the memory management responsibility on the host: A caching mechanism acquires memory on the accelerator for all CPU threads, keeps hold of this memory and hands it out to the offloading threads upon demand. We show that this user-managed, CPU-based memory administration helps us to overcome the GPU memory bookkeeping bottleneck and speeds up the time-to-solution of Finite Volume kernels by more than an order of magnitude

    Efficient GPU Offloading with OpenMP for a Hyperbolic Finite Volume Solver on Dynamically Adaptive Meshes

    Get PDF
    We identify and show how to overcome an OpenMP bottleneck in the administration of GPU memory. It arises for a wave equation solver on dynamically adaptive block-structured Cartesian meshes, which keeps all CPU threads busy and allows all of them to offload sets of patches to the GPU. Our studies show that multithreaded, concurrent, non-deterministic access to the GPU leads to performance breakdowns, since the GPU memory bookkeeping as offered through OpenMP’s map clause, i.e., the allocation and freeing, becomes another runtime challenge besides expensive data transfer and actual computation. We, therefore, propose to retain the memory management responsibility on the host: A caching mechanism acquires memory on the accelerator for all CPU threads, keeps hold of this memory and hands it out to the offloading threads upon demand. We show that this user-managed, CPU-based memory administration helps us to overcome the GPU memory bookkeeping bottleneck and speeds up the time-to-solution of Finite Volume kernels by more than an order of magnitude

    An efficient and robust algorithm for two dimensional time dependent incompressible Navier-Stokes equations: High Reynolds number flows

    Get PDF
    An algorithm is presented for unsteady two-dimensional incompressible Navier-Stokes calculations. This algorithm is based on the fourth order partial differential equation for incompressible fluid flow which uses the streamfunction as the only dependent variable. The algorithm is second order accurate in both time and space. It uses a multigrid solver at each time step. It is extremely efficient with respect to the use of both CPU time and physical memory. It is extremely robust with respect to Reynolds number

    Two-dimensional Euler and Navier-Stokes Time accurate simulations of fan rotor flows

    Get PDF
    Two numerical methods are presented which describe the unsteady flow field in the blade-to-blade plane of an axial fan rotor. These methods solve the compressible, time-dependent, Euler and the compressible, turbulent, time-dependent, Navier-Stokes conservation equations for mass, momentum, and energy. The Navier-Stokes equations are written in Favre-averaged form and are closed with an approximate two-equation turbulence model with low Reynolds number and compressibility effects included. The unsteady aerodynamic component is obtained by superposing inflow or outflow unsteadiness to the steady conditions through time-dependent boundary conditions. The integration in space is performed by using a finite volume scheme, and the integration in time is performed by using k-stage Runge-Kutta schemes, k = 2,5. The numerical integration algorithm allows the reduction of the computational cost of an unsteady simulation involving high frequency disturbances in both CPU time and memory requirements. Less than 200 sec of CPU time are required to advance the Euler equations in a computational grid made up of about 2000 grid during 10,000 time steps on a CRAY Y-MP computer, with a required memory of less than 0.3 megawords

    Numerical solution of a three-dimensional cubic cavity flow by using the Boltzmann equation

    Get PDF
    A three-dimensional cubic cavity flow has been analyzed for diatomic gases by using the Boltzmann equation with the Bhatnagar-Gross-Krook (B-G-K) model. The method of discrete ordinate was applied, and the diffuse reflection boundary condition was assumed. The results, which show a consistent trend toward the Navier-Stokes solution as the Knudson number is reduced, give us confidence to apply the method to a three-dimensional geometry for practical predictions of rarefied-flow characteristics. The CPU time and the main memory required for a three-dimensional geometry using this method seem reasonable
    • …
    corecore