117 research outputs found

    Sparse Systems Solving on GPUs with GMRES

    No full text
    International audienceScientific applications very often rely on solving one or more linear systems. When matrices are sparse, iterative methods are preferred to direct ones. Nevertheless, the value of non zero elements and their distribution (i.e. the sketch of the matrix) greatly influence the efficiency of those methods (in terms of computation time, number of iterations, result precision) or simply prevent the convergence. Among iterative methods, GMRES is often chosen when dealing with general non symmetric matrices. Indeed its convergence is very fast and more stable than the biconjugate gradient. Furthermore, it is mainly based on mathematical operations (matrix-vector and dot products, norms, \ldots) that can be heavily parallelized and is thus a good candidate to implement a solver for sparse systems on Graphics Processing Units (GPU). This paper presents a GMRES method for such an architecture. It is based on the modified Gram-Schmidt approach and is very similar to that of Sparselib. Our version uses restarting and a very basic preconditioning. For its implementation, we have based our code on CUBLAS and SpMV libraries, in order to achieve a good performance whatever the matrix sizes and their sketch are. Our experiments exhibit encouraging results on the comparison between Central Processing Units (CPU) and GPU executions in double precision, obtaining a speedup ranging from 8 up-to 23 for a large variety of problems

    Parallelization of direct algorithms using multisplitting methods in grid environments

    No full text
    The goal of this paper is to introduce a new approach to the building of efficient distributed linear system solvers. The starting point of the results of this paper lies in the fact that the parallelization of direct algorithms requires frequent synchronizations in order to obtain the solution for a linear problem. In a grid computing environment, communication times are significant and the bandwidth is variable, therefore frequent synchronizations slow down performances. Thus it is desirable to reduce the number of synchronizations in a parallel direct algorithm. Inspired from multisplitting techniques, the method we present consists in solving several linear problems obtained by splitting the original one. Each linear system is solved independently on a cluster by using the direct method. This paper uses the theoretical results of \cite{BMR97} in order to build coarse grained algorithms designed for solving linear systems in the grid computing context

    {MAHEVE}: An Efficient Reliable Mapping of Asynchronous Iterative Applications on volatile and Heterogeneous Environments

    No full text
    International audienceThe asynchronous iteration model, called AIAC, has been proven to be an efficient solution for heterogeneous and distributed architectures. An efficient mapping of application tasks is essential to reduce their execution time. In this paper we present a new mapping algorithm, called MAHEVE (Mapping Algorithm for HEterogeneous and Volatile Environments) which is efficient on such architectures and integrates a fault tolerance mechanism to resist computing node failures. Our experiments show gains on a typical AIAC application execution time up to 65%, executed on distributed clusters architectures containing more than 400 computing cores with the JaceP2P-V2 environment

    Adaptation and Evaluation of the Multisplitting-Newton and Waveform Relaxation Methods Over Distributed Volatile Environments

    No full text
    International audienceThis paper presents new adaptations of two methods that solve large differential equations systems, to the grid context. The first method isbased on the Multisplitting concept and the second on the Waveform Relaxation concept. Their adaptations are implemented according to the asynchronous iteration model which is well suited to volatile architectures that suffer from high latency networks. Many experiments were conducted to evaluate and compare the accuracy and performance of both methods while solving the advection-diffusion problem over heterogeneous, distributed and volatile architectures. The JACEP2P-V2 middleware provided the fault tolerant asynchronous environment, required for these experiments

    A parallel implementation of the Durand-Kerner algorithm for polynomial root-finding on GPU

    No full text
    International audienceIn this article we present a parallel implementation of the Durand-Kerner algorithm to find roots of polynomials of high degree on a GPU architecture (Graphics Processing Unit). We have implemented both a CPU version in and a GPU compatible version with CUDA. The main result of our work is a parallel implementation that is 10 times as fast as its sequential counterpart on a single CPU for high degree polynomials that is greater than about 48,000

    An efficient and robust decentralized algorithm for detecting the global convergence in asynchronous iterative algorithms

    No full text
    URL : http://vecpar.fe.up.pt/2008/papers/25.pdfInternational audienceIn this paper we present a practical, efficient and robust algorithm for detecting the global convergence in any asynchronous iterative process. A proven theoretical version, together with a first practical version, was presented in [1]. However, the main drawback of that first practical version was to require the determination of the maximal communication time between any couple of nodes in the system during the entire iterative process. The version presented in this paper does not require any additional information on the parallel system while always ensuring correct detections

    Sur la distribution de probabilité de l'indice d'intensité d'implication classique de Gras entre deux variables aléatoires binnaires

    Get PDF
    In this contribution we study the behavior of the classical Gras implication index as a random variable, when applied to a couple of Bernoulli variables (X,Y) , independent or not. We also show the effect of the conditional probability Y X p | on its probability distribution, and specially on its mean value and quartiles.Dans cette contribution nous étudions le comportement de l'indice d'implication classique de Gras comme une variable aléatoire quand celui-ci est associé à un couple de variables de Bernouilli (X,Y) . Nous montrons également l'effet de la probabilité conditionnelle Y X p | sur sa distribution de probabilité, plus particulièrement sur sa moyenne et ses quartiles

    On the probability distribution of the classical Gras implication index between two binary random variables

    Get PDF
    In this contribution we study the behavior of the classical Gras implication index as a random variable, when applied to a couple of Bernoulli variables, independent or not. We also show the effect of the conditional probabilityon its probability distribution, and specially on its mean value and quartiles

    Dynamic Load Balancing and Efficient Load Estimators for Asynchronous Iterative Algorithms

    No full text
    In a previous paper~\cite{HPCS2002}, we have shown the very high power of asynchronism for parallel iterative algorithms in a global context of grid computing. In this article, we study the interest of coupling load balancing with asynchronism in such algorithms. After proposing a non-centralized version of dynamic load balancing which is best suited to asynchronism, we verify its efficiency by some experiments on a general Partial Differential Equation (PDE) problem. Finally, we give some general conditions for the use of load balancing to obtain good results with this kind of algorithms and discuss the choice of the residual as an efficient load estimator
    corecore