570 research outputs found
Fat vs. thin threading approach on GPUs: application to stochastic simulation of chemical reactions
We explore two different threading approaches on a graphics processing unit (GPU) exploiting two different characteristics of the current GPU architecture. The fat thread approach tries to minimise data access time by relying on shared memory and registers potentially sacrificing parallelism. The thin thread approach maximises parallelism and tries to hide access latencies. We apply these two approaches to the parallel stochastic simulation of chemical reaction systems using the stochastic simulation algorithm (SSA) by Gillespie (J. Phys. Chem, Vol. 81, p. 2340-2361, 1977). In these cases, the proposed thin thread approach shows comparable performance while eliminating the limitation of the reaction system’s size
Adaptive Mesh Fluid Simulations on GPU
We describe an implementation of compressible inviscid fluid solvers with
block-structured adaptive mesh refinement on Graphics Processing Units using
NVIDIA's CUDA. We show that a class of high resolution shock capturing schemes
can be mapped naturally on this architecture. Using the method of lines
approach with the second order total variation diminishing Runge-Kutta time
integration scheme, piecewise linear reconstruction, and a Harten-Lax-van Leer
Riemann solver, we achieve an overall speedup of approximately 10 times faster
execution on one graphics card as compared to a single core on the host
computer. We attain this speedup in uniform grid runs as well as in problems
with deep AMR hierarchies. Our framework can readily be applied to more general
systems of conservation laws and extended to higher order shock capturing
schemes. This is shown directly by an implementation of a magneto-hydrodynamic
solver and comparing its performance to the pure hydrodynamic case. Finally, we
also combined our CUDA parallel scheme with MPI to make the code run on GPU
clusters. Close to ideal speedup is observed on up to four GPUs.Comment: Submitted to New Astronom
Evaluation of a local strategy for high performance memory management
Conventional operating systems, like Silicon Graphics' IRIX and IBM's AIX, adopt a single Memory Management algorithm. The choice of this algorithm is usually based on its good performance in relation to the set of programs executed in the computer. Some approximation of LRU (leastrecently used) is usually adopted. This choice can take to certain situations in
that the computer presents a bad performance due to its bad behavior for certain programs. A possible solution for such cases is to enable each program to have a specific Management algorithm (local strategy) that is adapted to its Memory access pattern. For example, programs with sequential access pattern, such as SOR, should be managed by the algorithm MRU (mostrecently used) because its bad performance when managed by LRU. In this strategy it is very important to decide the Memory partitioning strategy among the programs in execution in a multiprogramming environment. Our strategy named CAPR (CompilerAided Page Replacement) analyze the pattern of Memory references from the source program of an application and communicate these characteristics to the operating system that will make the choice of the best Management algorithm and Memory partitioning strategy.
This paper evaluates the influence of the Management algorithms and Memory partitioning strategy in the global system performance and in the individual performance of each program. It is also presented a comparison of this local strategy with the classic global strategy and the viability of the strategy is analyzed. The obtained results showed a difference of at least an order of magnitude in the number of page faults among the algorithms LRU and MRU in the global strategy. After that, starting from the analysis of the intrinsic behavior of each application in relation to its Memory access pattern and of the number of page faults, an optimization procedure of Memory system performance was developed for multiprogramming environments. This procedure allows to decide system performance parameters, such as Memory partitioning strategy among the programs and the appropriate Management algorithm for each program. The results showed that, with the local Management strategy, it was obtained a reduction of at least an order of magnitude in the number of page faults and a reduction in the mean Memory usage of about 3 to 4 times in relation to the global strategy. This performance improvement shows the viability of our strategy. It is also presented some implementation aspects of this strategy in traditional operating systems.Sistemas Distribuidos - Redes ConcurrenciaRed de Universidades con Carreras en Informática (RedUNCI
A GPU-based Implementation for Improved Online Rebinning Performance in Clinical 3-D PET
Online rebinning is an important and well-established technique for reducing the time required to process Positron Emission Tomography data. However, the need for efficient data processing in a clinical setting is growing rapidly and is beginning to exceed the capability of traditional online processing methods. High-count rate applications such as Rubidium 3-D PET studies can easily saturate current online rebinning technology. Realtime processing at these high-count rates is essential to avoid significant data loss. In addition, the emergence of time-of-flight (TOF) scanners is producing very large data sets for processing. TOF applications require efficient online Rebinning methods so as to maintain high patient throughput. Currently, new hardware architectures such as Graphics Processing Units (GPUs) are available to speedup data parallel and number crunching algorithms. In comparison to the usual parallel systems, such as multiprocessor or clustered machines, GPU hardware can be much faster and above all, it is significantly cheaper. The GPUs have been primarily delivered for graphics for video games but are now being used for High Performance computing across many domains. The goal of this thesis is to investigate the suitability of the GPU for PET rebinning algorithms
Demand-based coscheduling of parallel jobs on multiprogrammed multiprocessors
Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 1997.Includes bibliographical references (p. 92-94).by Patrick Gregory Sobalvarro.Ph.D
- …