Search CORE

6,838 research outputs found

Revisiting Matrix Product on Master-Worker Platforms

Author: Dongarra Jack
Laboratoire de l'informatique du parallélisme
Pineau Jean-François
Robert Yves
Shi Zhiao
Vivien Frédéric
Publication venue
Publication date: 01/01/2006
Field of study

This paper is aimed at designing efficient parallel matrix-product algorithms for heterogeneous master-worker platforms. While matrix-product is well-understood for homogeneous 2D-arrays of processors (e.g., Cannon algorithm and ScaLAPACK outer product algorithm), there are three key hypotheses that render our work original and innovative: - Centralized data. We assume that all matrix files originate from, and must be returned to, the master. - Heterogeneous star-shaped platforms. We target fully heterogeneous platforms, where computational resources have different computing powers. - Limited memory. Because we investigate the parallelization of large problems, we cannot assume that full matrix panels can be stored in the worker memories and re-used for subsequent updates (as in ScaLAPACK). We have devised efficient algorithms for resource selection (deciding which workers to enroll) and communication ordering (both for input and result messages), and we report a set of numerical experiments on various platforms at Ecole Normale Superieure de Lyon and the University of Tennessee. However, we point out that in this first version of the report, experiments are limited to homogeneous platforms

arXiv.org e-Print Archive

HAL-ENS-LYON

CiteSeerX

Crossref

INRIA a CCSD electronic archive server

Libre Acces aux Rapports Scientifiques et Techniques

The University of Manchester - Institutional Repository

Hal-Diderot

HAL-Rennes 1

A Case Study in Coordination Programming: Performance Evaluation of S-Net vs Intel's Concurrent Collections

Author: Gijsbers Bert
Grelck Clemens
Shafarenko Alex
Tveretina Olga
Zaichenkov Pavel
Publication venue
Publication date: 01/01/2014
Field of study

We present a programming methodology and runtime performance case study comparing the declarative data flow coordination language S-Net with Intel's Concurrent Collections (CnC). As a coordination language S-Net achieves a near-complete separation of concerns between sequential software components implemented in a separate algorithmic language and their parallel orchestration in an asynchronous data flow streaming network. We investigate the merits of S-Net and CnC with the help of a relevant and non-trivial linear algebra problem: tiled Cholesky decomposition. We describe two alternative S-Net implementations of tiled Cholesky factorization and compare them with two CnC implementations, one with explicit performance tuning and one without, that have previously been used to illustrate Intel CnC. Our experiments on a 48-core machine demonstrate that S-Net manages to outperform CnC on this problem.Comment: 9 pages, 8 figures, 1 table, accepted for PLC 2014 worksho

arXiv.org e-Print Archive

Crossref

Ghent University Academic Bibliography

International Migration, Integration and Social Cohesion online publications

Strategies to optimize the LU factorization algorithm on multicore computers

Author: Ortiz Javier
Soler Janet
Wolfmann Aaron Gustavo
Publication venue
Publication date: 01/01/2013
Field of study

The number of cores in multicore computers has an irreversible tendency to increase. Also, computers with multiple sockets to insert multicore chips are based on a complex hardware design and are becoming more common. To parallelize the algorithms that run on this type of computers in order to obtain a higher performance rate, is a goal that can only be achieved by taking into account hardware architecture. As hardware evolves, so must software. This leads to old parallelization strategies quickly become obsolete. This paper presents a series of alternatives for parallelization the LU factorization algorithm and its results intended to running on a multicore system. Simple strategies lead to poor results. This study presents complex strategies that merge double levels of parallelism with asynchronous scheduling whose results reach up to the State-of-the-art in the field and even go further.http://hpc2013.hpclatam.org/talks.html#fullpaper17Fil: Soler, Janet. Universidad Nacional de Córdoba. Facultad de Ciencias Exactas, Físicas y Naturales. Laboratorio de Computación; Argentina.Fil: Ortiz, Javier. Universidad Nacional de Córdoba. Facultad de Ciencias Exactas, Físicas y Naturales. Laboratorio de Computación; Argentina.Fil: Wolfmann, Aaron Gustavo. Universidad Nacional de Córdoba. Facultad de Ciencias Exactas, Físicas y Naturales. Laboratorio de Computación; Argentina.Ciencias de la Computació

Repositorio Digital de la Universidad Nacional de Córdoba