Search CORE

4 research outputs found

Optimisation de l'utilisation du cache dans EUROPLEXUS

Author: Faucher Vincent
Gautier Thierry
Raffin Bruno
Sridi Marwa
Publication venue: HAL CCSD
Publication date: 22/04/2014
Field of study

National audiencein this paper we propose a new data structure organization for EUROPLEXUS: a simulation code developed by the CEA and dedicated to the analysis of fast phenomena of fluids and structures. The approach we propose is built so that the data accessed by the processor operating on a portion of the calculation for a time step are as contiguous as possible. This new distribution will help to minimize the number of cache misses compared to that obtained with the current organization of the data structure. Studies have validated the performance gain achieved with the new organization in the case of large scale problems.Dans cet article, nous proposons une nouvelle organisation de la structure de données d'EUROPLEXUS,un code de simulation en dynamique rapide des fluides et des structures développé par le CEA. Cette nouvelle organisation est construite de telle sorte que les données consultées par le processeur travaillant sur une partie du calcul pendant un pas de temps Ti soient le plus contigües possible afin qu'elles tiennent dans le cache de ce dernier. Cette nouvelle répartition nous permettra de minimiser le nombre de défauts de cache comparé à celui obtenu avec l'organisation actuelle de la structure de données. Les études de performance ont validé le gain réalisé avec la nouvelle organisation des données dans le cas des problèmes de grande taille

Hal - Université Grenoble Alpes

INRIA a CCSD electronic archive server

HAL-CEA

HAL-Rennes 1

Communication-optimal Parallel and Sequential Cholesky Decomposition

Author: Grey Ballard
Grey Ballard
James Demmel
James Demmel
Oded Schwartz
Oded Schwartz
Olga Holtz
Olga Holtz
Publication venue: 'Society for Industrial & Applied Mathematics (SIAM)'
Publication date: 01/01/2009
Field of study

Numerical algorithms have two kinds of costs: arithmetic and communication, by which we mean either moving data between levels of a memory hierarchy (in the sequential case) or over a network connecting processors (in the parallel case). Communication costs often dominate arithmetic costs, so it is of interest to design algorithms minimizing communication. In this paper we first extend known lower bounds on the communication cost (both for bandwidth and for latency) of conventional (O(n^3)) matrix multiplication to Cholesky factorization, which is used for solving dense symmetric positive definite linear systems. Second, we compare the costs of various Cholesky decomposition implementations to these lower bounds and identify the algorithms and data structures that attain them. In the sequential case, we consider both the two-level and hierarchical memory models. Combined with prior results in [13, 14, 15], this gives a set of communication-optimal algorithms for O(n^3) implementations of the three basic factorizations of dense linear algebra: LU with pivoting, QR and Cholesky. But it goes beyond this prior work on sequential LU by optimizing communication for any number of levels of memory hierarchy.Comment: 29 pages, 2 tables, 6 figure

arXiv.org e-Print Archive

CiteSeerX

Crossref

Hardware-Oriented Implementation of Cache Oblivious Matrix Operations Based on Space-Filling Curves

Author: K. Yotov
M. Bader
M. Bader
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2008
Field of study

Crossref