Search CORE

570 research outputs found

Fat vs. thin threading approach on GPUs: application to stochastic simulation of chemical reactions

Author: Erban R.
Giles M. B.
Klingbeil G.
Maini P. K.
Publication venue
Publication date: 01/01/2010
Field of study

We explore two different threading approaches on a graphics processing unit (GPU) exploiting two different characteristics of the current GPU architecture. The fat thread approach tries to minimise data access time by relying on shared memory and registers potentially sacrificing parallelism. The thin thread approach maximises parallelism and tries to hide access latencies. We apply these two approaches to the parallel stochastic simulation of chemical reaction systems using the stochastic simulation algorithm (SSA) by Gillespie (J. Phys. Chem, Vol. 81, p. 2340-2361, 1977). In these cases, the proposed thin thread approach shows comparable performance while eliminating the limitation of the reaction system’s size

Oxford University Research Archive

Adaptive Mesh Fluid Simulations on GPU

Author: Abel Tom
Kaehler Ralf
Wang Peng
Publication venue: 'Elsevier BV'
Publication date: 28/10/2009
Field of study

We describe an implementation of compressible inviscid fluid solvers with block-structured adaptive mesh refinement on Graphics Processing Units using NVIDIA's CUDA. We show that a class of high resolution shock capturing schemes can be mapped naturally on this architecture. Using the method of lines approach with the second order total variation diminishing Runge-Kutta time integration scheme, piecewise linear reconstruction, and a Harten-Lax-van Leer Riemann solver, we achieve an overall speedup of approximately 10 times faster execution on one graphics card as compared to a single core on the host computer. We attain this speedup in uniform grid runs as well as in problems with deep AMR hierarchies. Our framework can readily be applied to more general systems of conservation laws and extended to higher order shock capturing schemes. This is shown directly by an implementation of a magneto-hydrodynamic solver and comparing its performance to the pure hydrodynamic case. Finally, we also combined our CUDA parallel scheme with MPI to make the code run on GPU clusters. Close to ideal speedup is observed on up to four GPUs.Comment: Submitted to New Astronom

arXiv.org e-Print Archive

CiteSeerX

Evaluation of a local strategy for high performance memory management

Author: Sato Liria Matsumoto
Toshimi Midorikawa Edson
Zuffo João Antônio
Publication venue
Publication date: 01/10/1998
Field of study

Conventional operating systems, like Silicon Graphics' IRIX and IBM's AIX, adopt a single Memory Management algorithm. The choice of this algorithm is usually based on its good performance in relation to the set of programs executed in the computer. Some approximation of LRU (leastrecently used) is usually adopted. This choice can take to certain situations in that the computer presents a bad performance due to its bad behavior for certain programs. A possible solution for such cases is to enable each program to have a specific Management algorithm (local strategy) that is adapted to its Memory access pattern. For example, programs with sequential access pattern, such as SOR, should be managed by the algorithm MRU (mostrecently used) because its bad performance when managed by LRU. In this strategy it is very important to decide the Memory partitioning strategy among the programs in execution in a multiprogramming environment. Our strategy named CAPR (CompilerAided Page Replacement) analyze the pattern of Memory references from the source program of an application and communicate these characteristics to the operating system that will make the choice of the best Management algorithm and Memory partitioning strategy. This paper evaluates the influence of the Management algorithms and Memory partitioning strategy in the global system performance and in the individual performance of each program. It is also presented a comparison of this local strategy with the classic global strategy and the viability of the strategy is analyzed. The obtained results showed a difference of at least an order of magnitude in the number of page faults among the algorithms LRU and MRU in the global strategy. After that, starting from the analysis of the intrinsic behavior of each application in relation to its Memory access pattern and of the number of page faults, an optimization procedure of Memory system performance was developed for multiprogramming environments. This procedure allows to decide system performance parameters, such as Memory partitioning strategy among the programs and the appropriate Management algorithm for each program. The results showed that, with the local Management strategy, it was obtained a reduction of at least an order of magnitude in the number of page faults and a reduction in the mean Memory usage of about 3 to 4 times in relation to the global strategy. This performance improvement shows the viability of our strategy. It is also presented some implementation aspects of this strategy in traditional operating systems.Sistemas Distribuidos - Redes ConcurrenciaRed de Universidades con Carreras en Informática (RedUNCI

Performance measurement and analysis of PC based cluster server using SET of Architecture and modeling a scalable High performance cluster

Author: Mehta Mihir J.
Publication venue
Publication date: 01/04/2006
Field of study

Not availabl

Etheses - A Saurashtra University Library Service

An investigation into Multiprocessor Systems based on UNIX

Author: Welten P.J.M.
Publication venue
Publication date: 28/02/1989
Field of study

Pure OAI Repository

A GPU-based Implementation for Improved Online Rebinning Performance in Clinical 3-D PET

Author: Patlolla Dilip Reddy
Publication venue: TRACE: Tennessee Research and Creative Exchange
Publication date: 01/12/2009
Field of study

Online rebinning is an important and well-established technique for reducing the time required to process Positron Emission Tomography data. However, the need for efficient data processing in a clinical setting is growing rapidly and is beginning to exceed the capability of traditional online processing methods. High-count rate applications such as Rubidium 3-D PET studies can easily saturate current online rebinning technology. Realtime processing at these high-count rates is essential to avoid significant data loss. In addition, the emergence of time-of-flight (TOF) scanners is producing very large data sets for processing. TOF applications require efficient online Rebinning methods so as to maintain high patient throughput. Currently, new hardware architectures such as Graphics Processing Units (GPUs) are available to speedup data parallel and number crunching algorithms. In comparison to the usual parallel systems, such as multiprocessor or clustered machines, GPU hardware can be much faster and above all, it is significantly cheaper. The GPUs have been primarily delivered for graphics for video games but are now being used for High Performance computing across many domains. The goal of this thesis is to investigate the suitability of the GPU for PET rebinning algorithms

University of Tennessee, Knoxville: Trace

Demand-based coscheduling of parallel jobs on multiprogrammed multiprocessors

Author: Sobalvarro Patrick G
Publication venue: Massachusetts Institute of Technology
Publication date: 01/01/1997
Field of study

Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 1997.Includes bibliographical references (p. 92-94).by Patrick Gregory Sobalvarro.Ph.D

CiteSeerX

DSpace@MIT