Search CORE

14 research outputs found

Leveraging task-parallelism in message-passing dense matrix factorizations using SMPSs

Author: Badia Sala Rosa Maria
Martín Huertas Alberto Francisco
Quintana Ortí Enrique Salvador
Reyes Ruyman
Publication venue: 'Elsevier BV'
Publication date: 01/01/2014
Field of study

In this paper, we investigate how to exploit task-parallelism during the execution of the Cholesky factorization on clusters of multicore processors with the SMPSs programming model. Our analysis reveals that the major difficulties in adapting the code for this operation in ScaLAPACK to SMPSs lie in algorithmic restrictions and the semantics of the SMPSs programming model, but also that they both can be overcome with a limited programming effort. The experimental results report considerable gains in performance and scalability of the routine parallelized with SMPSs when compared with conventional approaches to execute the original ScaLAPACK implementation in parallel as well as two recent message-passing routines for this operation. In summary, our study opens the door to the possibility of reusing message-passing legacy codes/libraries for linear algebra, by introducing up-to-date techniques like dynamic out-of-order scheduling that significantly upgrade their performance, while avoiding a costly rewrite/reimplementation.This research was supported by Project EU INFRA-2010-1.2.2 \TEXT:Towards EXa op applicaTions". The researcher at BSC-CNS was supported by the HiPEAC-2 Network of Excellence (FP7/ICT 217068), the Spanish Ministry of Education (CICYT TIN2011-23283, TIN2007-60625 and CSD2007- 00050), and the Generalitat de Catalunya (2009-SGR-980). The researcher at CIMNE was partially funded by the UPC postdoctoral grants under the programme \BKC5-Atracció i Fidelització de talent al BKC". The researcher at UJI was supported by project CICYT TIN2008-06570-C04-01 and FEDER. We thank Jesus Labarta, from BSC-CNS, for helpful discussions on SMPSs and his help with the performance analysis of the codes with Paraver. We thank Vladimir Marjanovic, also from BSC-CNS, for his help in the set-up and tuning of the MPI/SMPSs tools on JuRoPa. Finally, we thank Rafael Mayo, from UJI, for his support in the preliminary stages of this work. The authors gratefully acknowledge the computing time granted on the supercomputer JuRoPa at Jülich Supercomputing Centrer.Peer ReviewedPreprin

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC

Repositori Institucional de la Universitat Jaume I

A Makespan Lower Bound for the Scheduling of the Tiled Cholesky Factorization based on ALAP Schedule

Author: Beaumont Olivier
Langou Julien
Quach Willy
Shilova Alena
Publication venue: 'Springer Fachmedien Wiesbaden GmbH'
Publication date: 24/08/2020
Field of study

International audienceDue to the advent of multicore architectures and massive parallelism, the tiled Cholesky factorization algorithm has recently received plenty of attention and is often referenced by practitioners as a case study. It is also implemented in mainstream dense linear algebra libraries and is used as a testbed for runtime systems. However, we note that theoretical study of the parallelism of this algorithm is currently lacking. In this paper, we present new theoretical results about the tiled Cholesky factorization in the context of a parallel homogeneous model without communication costs. Based on the relative costs of involved kernels, we prove that only two different situations must be considered, typically corresponding to CPUs and GPUs. By a careful analysis on the number of tasks of each type that run simultaneously in the ALAP (As Late As Possible) schedule without resource limitation, we are able to determine precisely the number of busy processors at any time (as degree 2 polynomials). We then use this information to find a closed form formula for the minimum time to schedule a tiled Cholesky factorization of size n on P processors. We show that this bound outperforms classical bounds from the literature. We also prove that ALAP(P), an ALAP-based schedule where the number of resources is limited to P , has a makespan extremely close to the lower bound, thus proving both the effectiveness of ALAP(P) schedule and of the lower bound on the makespan

INRIA a CCSD electronic archive server

DAGuE: A generic distributed DAG engine for High Performance Computing

Author: Anthony Danalis
Augonnet
Aurelien Bouteiller
Bernstein
Blackford
Bolze
Buttari
Buttari
Buttari
Chan
Choi
Cosnard
Cosnard
Dongarra
George Bosilca
Gustavson
Gustavson
Husbands
Jack Dongarra
Pierre Lemarinier
Quintana-Ortí
Schreiber
Song
Thomas Herault
Publication venue: 'Elsevier BV'
Publication date
Field of study

Crossref

A Software Cache Autotuning Strategy for Dataflow Computing with UPC++ DepSpawn

Author: Andrade Diego
Fraguela Basilio B.
Publication venue: 'Wiley'
Publication date: 01/01/2021
Field of study

This is the accepted version of the following article: B. B. Fraguela, D. Andrade. A software cache autotuning strategy for dataflow computing with UPC++ DepSpawn. Computational and Mathematical Methods, 3(6), e1148. November 2021, which has been published in final form at http://dx.doi.org/10.1002/cmm4.1148. This article may be used for noncommercial purposes in accordance with the Wiley Self-Archiving Policy [http://www.wileyauthors.com/self-archiving].[Abstract] Dataflow computing allows to start computations as soon as all their dependencies are satisfied. This is particularly useful in applications with irregular or complex patterns of dependencies which would otherwise involve either coarse grain synchronizations which would degrade performance, or high programming costs. A recent proposal for the easy development of performant dataflow algorithms in hybrid shared/distributed memory systems is UPC++ DepSpawn. Among the many techniques it applies to provide good performance is a software cache that minimizes the communications among the processes involved. In this article we provide the details of the implementation and operation of this cache and we present an autotuning strategy that simplifies its usage by freeing the user from having to estimate an adequate size for this cache. Rather, the runtime is now able to define reasonably sized caches that provide near optimal behavior.This research was funded by the Ministry of Science and Innovation of Spain (TIN2016-75845-P and PID2019-104184RB-I00, AEI/FEDER/EU, 10.13039/501100011033), and by the Xunta de Galicia co-funded by the European Regional Development Fund (ERDF) under the Consolidation Programme of Competitive Reference Groups (ED431C 2017/04). The authors acknowledge also the support from the Centro Singular de Investigación de Galicia “CITIC,” funded by Xunta de Galicia and the European Union (European Regional Development Fund- Galicia 2014-2020 Program), by grant ED431G 2019/01. They also acknowledge the Centro de Supercomputación de Galicia (CESGA) for the use of its computersXunta de Galicia; ED431C 2017/04Xunta de Galicia; ED431G/0

Repositorio da Universidade da Coruña

Algoritmes voor Software Radio, geschikt voor gecodeerde transmissie --Software Radio Algorithms for Coded Transmission

Author: Wymeersch H
Publication venue
Publication date: 01/01/2005
Field of study

Ghent University Academic Bibliography

Extended Abstracts presented at the 25th International Symposium on Mathematical Theory of Networks and Systems MTNS 2022 : held 12-16 September 2022 in Bayreuth, Germany

Author: make_name_string expected hash reference
Publication venue
Publication date: 01/01/2022
Field of study

EPub Bayreuth

Distributed SBP Cholesky factorization algorithms with near-optimal scheduling

Author: Bo Kågström
Choi J.
Dackland K.
Forum
Fred G. Gustavson
Gerasoulis A.
Gustavson F. G.
Gustavson F. G.
Lars Karlsson
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date
Field of study

Crossref

Handbook of Mathematical Geosciences

Author
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

This Open Access handbook published at the IAMG's 50th anniversary, presents a compilation of invited path-breaking research contributions by award-winning geoscientists who have been instrumental in shaping the IAMG. It contains 45 chapters that are categorized broadly into five parts (i) theory, (ii) general applications, (iii) exploration and resource estimation, (iv) reviews, and (v) reminiscences covering related topics like mathematical geosciences, mathematical morphology, geostatistics, fractals and multifractals, spatial statistics, multipoint geostatistics, compositional data analysis, informatics, geocomputation, numerical methods, and chaos theory in the geosciences

OAPEN Library

A comparison of the CAR and DAGAR spatial random effects models with an application to diabetics rate estimation in Belgium

Author: Christel Faes
Niel Hens
Pierpaolo Brutti
Vittoria La Serra
Publication venue: place:Milano
Publication date: 01/01/2020
Field of study

When hierarchically modelling an epidemiological phenomenon on a finite collection of sites in space, one must always take a latent spatial effect into account in order to capture the correlation structure that links the phenomenon to the territory. In this work, we compare two autoregressive spatial models that can be used for this purpose: the classical CAR model and the more recent DAGAR model. Differently from the former, the latter has a desirable property: its ρ parameter can be naturally interpreted as the average neighbor pair correlation and, in addition, this parameter can be directly estimated when the effect is modelled using a DAGAR rather than a CAR structure. As an application, we model the diabetics rate in Belgium in 2014 and show the adequacy of these models in predicting the response variable when no covariates are available

Archivio della ricerca- Università di Roma La Sapienza