Search CORE

82 research outputs found

A Class of Parallel Tiled Linear Algebra Algorithms for Multicore Architectures

Author: Buttari Alfredo
Dongarra Jack
Kurzak Jakub
Langou Julien
Publication venue
Publication date: 01/01/2007
Field of study

As multicore systems continue to gain ground in the High Performance Computing world, linear algebra algorithms have to be reformulated or new algorithms have to be developed in order to take advantage of the architectural features on these new processors. Fine grain parallelism becomes a major requirement and introduces the necessity of loose synchronization in the parallel execution of an operation. This paper presents an algorithm for the Cholesky, LU and QR factorization where the operations can be represented as a sequence of small tasks that operate on square blocks of data. These tasks can be dynamically scheduled for execution based on the dependencies among them and on the availability of computational resources. This may result in an out of order execution of the tasks which will completely hide the presence of intrinsically sequential tasks in the factorization. Performance comparisons are presented with the LAPACK algorithms where parallelism can only be exploited at the level of the BLAS operations and vendor implementations

arXiv.org e-Print Archive

CiteSeerX

Recommended from our members

Improving parallel program performance using critical path analysis

Author: Bic Lubomir
Gajski Daniel D.
Kwan Andrew W.
Publication venue: eScholarship, University of California
Publication date: 01/01/1989
Field of study

A programming tool that performs analysis of critical paths for parallel programs has been developed. This tool determines the critical path for the program as scheduled onto a parallel computer with P processing elements, the critical path for the program expressed as a data flow graph (when maximal parallelism can be expressed), and the minimum number of processing elements (P_opt) needed to obtain maximum program speedup. Experiments were performed using several versions of a Gaussian elimination program to examine how speedup varied with changes in granularity and critical path length. These experiments showed that when the available numer of processing elements P < P_opt, increasing granularity improved program speedup more than reducing (the data flow graph's) critical path length, whereas when P ≥ P_opt, increasing granularity degraded program speedup while reducing critical path length improved program speedup

eScholarship - University of California

Parallel algorithms for algebraic and numerical problems :

Author: Ajwa Iyad A.
Publication venue: Lehigh Preserve
Publication date
Field of study

Lehigh University: Lehigh Preserve

Numerical simulation of non-Newtonian fluid flow in mixing geometries

Author: Havard Stephen Paul
Publication venue: University of Glamorgan
Publication date: 16/05/2012
Field of study

In this thesis, a theoretical investigation is undertaken into fluid and mixing flows generated by various geometries for Newtonian and non-Newtonian fluids, on both sequential and parallel computer systems. The thesis begins by giving the necessary background to the mixing process and a summary of the fundamental characteristics of parallel architecture machines. This is followed by a literature review which covers accomplished work in mixing flows, numerical methods employed to simulate fluid mechanics problems and also a review of relevant parallel algorithms. Next, an overview is given of the numerical methods that have been reviewed, discussing the advantages and disadvantages of the different methods. In the first section of the work the implementation of the primitive variable finite element method to solve a simple two dimensional fluid flow problem is studied. For the same geometry colour band mixing is also investigated. Further investigational work is undertaken into the flows generated by various rotors for both Newtonian and non-Newtonian fluids. An extended version of the primitive variable formulation is employed, colour band mixing is also carried out on two of these geometries. The latter work is carried out on a parallel architecture machine. The design specifications of a parallel algorithm for a MIMD system are discussed, with particular emphasis placed on frontal and multifrontal methods. This is followed by an explanation of the implementation of the proposed parallel algorithm, applied to the same fluid flow problems as considered earlier and a discussion of the efficiency of the system is given. Finally, a discussion of the conclusions of the entire accomplished work is presented. A number of suggestions for future work are also given. Three published papers relating to the work carried out on the transputer networks are included in the appendices