Search CORE

7,133 research outputs found

An INTRODUCTION TO CUDA Programming

Author: Irina Mocanu
Publication venue
Publication date
Field of study

The graphics boards have become so powerful that they are usded for mathematical computations, such as matrix multiplication and transposition, which are required for complex visual and physics simulations in computer games. NVIDIA has supported this trend by releasing the CUDA (Compute Unified Device Architecture) interface library to allow applications developers to write code that can be uploaded into an NVIDIA-based card for execution by NVIDIA's massively parallel GPUs. This paper is an introduction to the CUDA programming based on the documentation from [2] and [4].cuda programming

Research Papers in Economics

Fine-sorting One-dimensional Particle-In-Cell Algorithm with Monte-Carlo Collisions on a Graphics Processing Unit

Author: Anderson
Anderson
Asadchev
Badal
Birdsall
Birdsall
Conte
Denis Eremin
Harvey
Hockney
Kersevan
Marsaglia
Peter Awakowicz
Phelps
Phelps
Philipp Mertmann
Rajasekaran
Ralf Peter Brinkmann
Stantchev
Tajima
Thomas Mussenbrock
Tskhakaya
Turner
Verboncoeur
Verboncoeur
Yanguas-Gil
Publication venue: 'Elsevier BV'
Publication date: 20/04/2011
Field of study

Particle-in-cell (PIC) simulations with Monte-Carlo collisions are used in plasma science to explore a variety of kinetic effects. One major problem is the long run-time of such simulations. Even on modern computer systems, PIC codes take a considerable amount of time for convergence. Most of the computations can be massively parallelized, since particles behave independently of each other within one time step. Current graphics processing units (GPUs) offer an attractive means for execution of the parallelized code. In this contribution we show a one-dimensional PIC code running on Nvidia GPUs using the CUDA environment. A distinctive feature of the code is that size of the cells that the code uses to sort the particles with respect to their coordinates is comparable to size of the grid cells used for discretization of the electric field. Hence, we call the corresponding algorithm "fine-sorting". Implementation details and optimization of the code are discussed and the speed-up compared to classical CPU approaches is computed

arXiv.org e-Print Archive

Crossref

Solving the Boltzmann Equation on GPU

Author: A. Frezzotti
Anderson
Aristov
Aristov
Baker
Bird
Cercignani
Chapman
Elsen
Frezzotti
Frezzotti
Frezzotti
G.P. Ghiroldi
Homolle
Januszewski
L. Gibelli
Matinsen
Salas
Tcheremissine
Varoutis
Wagner
Publication venue: 'Elsevier BV'
Publication date: 28/05/2010
Field of study

We show how to accelerate the direct solution of the Boltzmann equation using Graphics Processing Units (GPUs). In order to fully exploit the computational power of the GPU, we choose a method of solution which combines a finite difference discretization of the free-streaming term with a Monte Carlo evaluation of the collision integral. The efficiency of the code is demonstrated by solving the two-dimensional driven cavity flow. Computational results show that it is possible to cut down the computing time of the sequential code of two order of magnitudes. This makes the proposed method of solution a viable alternative to particle simulations for studying unsteady low Mach number flows.Comment: 18 pages, 3 pseudo-codes, 6 figures, 1 tabl

arXiv.org e-Print Archive

Archivio istituzionale della ricerca - Politecnico di Milano

Crossref

Highly parallel computation

Author: Denning Peter J.
Tichy Walter F.
Publication venue
Publication date
Field of study

Highly parallel computing architectures are the only means to achieve the computation rates demanded by advanced scientific problems. A decade of research has demonstrated the feasibility of such machines and current research focuses on which architectures designated as multiple instruction multiple datastream (MIMD) and single instruction multiple datastream (SIMD) have produced the best results to date; neither shows a decisive advantage for most near-homogeneous scientific problems. For scientific problems with many dissimilar parts, more speculative architectures such as neural networks or data flow may be needed

NASA Technical Reports Server

A Pure Java Parallel Flow Solver

Author: Gollnick Torsten
Hauser Jochem
Ludewig Thorsten
Muylaert Jean
Spel Martin
Williams Roy
Winkelmann Ralf
Publication venue: 'California Institute of Technology Library'
Publication date: 01/01/1999
Field of study

In this paper an overview is given on the "Have Java" project to attain a pure Java parallel Navier-Stokes flow solver (JParNSS) based on the thread concept and remote method invocation (RMI). The goal of this project is to produce an industrial flow solver running on an arbitrary sequential or parallel architecture, utilizing the Internet, capable of handling the most complex 3D geometries as well as flow physics, and also linking to codes in other areas such as aeroelasticity etc. Since Java is completely object-oriented the code has been written in an object-oriented programming (OOP) style. The code also includes a graphics user interface (GUI) as well as an interactive steering package for the parallel architecture. The Java OOP approach provides profoundly improved software productivity, robustness, and security as well as reusability and maintainability. OOP allows code construction similar to the aerodynamic design process because objects can be software coded and integrated, reflecting actual design procedures. In addition, Java is the programming language of the Internet and thus Java is the programming language of the Internet and thus Java objects on disparate machines or even separate networks can be connected. We explain the motivation for the design of JParNSS along with its capabilities that set it apart from other solvers. In the first two sections we present a discussion of the Java language as the programming tool for aerospace applications. In section three the objectives of the Have Java project are presented. In the next section the layer structures of JParNSS are discussed with emphasis on the parallelization and client-server (RMI) layers. JParNSS, like its predecessor ParNSS (ANSI-C), is based on the multiblock idea, and allows for arbitrarily complex topologies. Grids are accepted in GridPro property settings, grids of any size or block number can be directly read by JParNSS without any further modifications, requiring no additional preparation time for the solver input. In the last section, computational results are presented, with emphasis on multiprocessor Pentium and Sun parallel systems run by the Solaris operating system (OS)

Caltech Authors