Search CORE

28,379 research outputs found

Solving Large Problem Sizes of Index-Digit Algorithms on GPU: FFT and Tridiagonal System Solvers

Author: Amor Margarita
Doallo Ramón
Lobeiras Blanco Jacobo
Pérez Diéguez Adrián
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2018
Field of study

[Abstract] Current Graphics Processing Units (GPUs) are capable of obtaining high computational performance in scientific applications. Nevertheless, programmers have to use suitable parallel algorithms for these architectures and usually have to consider optimization techniques in the implementation in order to achieve said performance. There are many efficient proposals for limited-size problems which fit directly in the shared memory of CUDA GPUs, however, there are few GPU proposals that tackle the design of efficient algorithms for large problem sizes that exceed shared memory storage capacity. In this work, we present a tuning strategy that addresses this problem for some parallel prefix algorithms that can be represented according to a set of common permutations of the digits of each of its element indices [1], denoted as Index-Digit (ID) algorithms. Specifically, our strategy has been applied to develop flexible Multi-Stage (MS) algorithms for the Fast Fourier Transform (FFT) algorithm (MS-ID-FFT) and a tridiagonal system solver (MS-ID-TS) on the GPU. The resulting implementation is compact and outperforms other well-known and commonly used state-of-the-art libraries, with an improvement of up to 1.47x with respect to NVIDIA's complex CUFFT, and up to 33.2x in comparison with NVIDIA's CUSPARSE for real data tridiagonal systems

Repositorio da Universidade da Coruña

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Crossref

Using parallel computation to improve Independent Metropolis--Hastings based estimation

Author: Jacob Pierre
Robert Christian P.
Smith Murray H.
Publication venue
Publication date: 01/01/2011
Field of study

In this paper, we consider the implications of the fact that parallel raw-power can be exploited by a generic Metropolis--Hastings algorithm if the proposed values are independent. In particular, we present improvements to the independent Metropolis--Hastings algorithm that significantly decrease the variance of any estimator derived from the MCMC output, for a null computing cost since those improvements are based on a fixed number of target density evaluations. Furthermore, the techniques developed in this paper do not jeopardize the Markovian convergence properties of the algorithm, since they are based on the Rao--Blackwell principles of Gelfand and Smith (1990), already exploited in Casella and Robert (1996), Atchade and Perron (2005) and Douc and Robert (2010). We illustrate those improvements both on a toy normal example and on a classical probit regression model, but stress the fact that they are applicable in any case where the independent Metropolis-Hastings is applicable.Comment: 19 pages, 8 figures, to appear in Journal of Computational and Graphical Statistic

arXiv.org e-Print Archive

CiteSeerX

Base de publications de l'université Paris-Dauphine

Crossref

HAL-Polytechnique

A survey on algorithmic aspects of modular decomposition

Author: Habib Michel
Paul Christophe
Publication venue
Publication date: 01/01/2009
Field of study

The modular decomposition is a technique that applies but is not restricted to graphs. The notion of module naturally appears in the proofs of many graph theoretical theorems. Computing the modular decomposition tree is an important preprocessing step to solve a large number of combinatorial optimization problems. Since the first polynomial time algorithm in the early 70's, the algorithmic of the modular decomposition has known an important development. This paper survey the ideas and techniques that arose from this line of research

arXiv.org e-Print Archive

CiteSeerX

HAL Descartes

Hal-Diderot