4 research outputs found

    Efficient generation of elimination trees and graph associahedra

    Get PDF
    An elimination tree for a connected graph~GG is a rooted tree on the vertices of~GG obtained by choosing a root~xx and recursing on the connected components of~G−xG-x to produce the subtrees of~xx. Elimination trees appear in many guises in computer science and discrete mathematics, and they encode many interesting combinatorial objects, such as bitstrings, permutations and binary trees. We apply the recent Hartung-Hoang-M\"utze-Williams combinatorial generation framework to elimination trees, and prove that all elimination trees for a chordal graph~GG can be generated by tree rotations using a simple greedy algorithm. This yields a short proof for the existence of Hamilton paths on graph associahedra of chordal graphs. Graph associahedra are a general class of high-dimensional polytopes introduced by Carr, Devadoss, and Postnikov, whose vertices correspond to elimination trees and whose edges correspond to tree rotations. As special cases of our results, we recover several classical Gray codes for bitstrings, permutations and binary trees, and we obtain a new Gray code for partial permutations. Our algorithm for generating all elimination trees for a chordal graph~GG can be implemented in time~\cO(m+n) per generated elimination tree, where mm and~nn are the number of edges and vertices of~GG, respectively. If GG is a tree, we improve this to a loopless algorithm running in time~\cO(1) per generated elimination tree. We also prove that our algorithm produces a Hamilton cycle on the graph associahedron of~GG, rather than just Hamilton path, if the graph~GG is chordal and 2-connected. Moreover, our algorithm characterizes chordality, i.e., it computes a Hamilton path on the graph associahedron of~GG if and only if GG is chordal

    Book of Abstracts of the Sixth SIAM Workshop on Combinatorial Scientific Computing

    Get PDF
    Book of Abstracts of CSC14 edited by Bora UçarInternational audienceThe Sixth SIAM Workshop on Combinatorial Scientific Computing, CSC14, was organized at the Ecole Normale Supérieure de Lyon, France on 21st to 23rd July, 2014. This two and a half day event marked the sixth in a series that started ten years ago in San Francisco, USA. The CSC14 Workshop's focus was on combinatorial mathematics and algorithms in high performance computing, broadly interpreted. The workshop featured three invited talks, 27 contributed talks and eight poster presentations. All three invited talks were focused on two interesting fields of research specifically: randomized algorithms for numerical linear algebra and network analysis. The contributed talks and the posters targeted modeling, analysis, bisection, clustering, and partitioning of graphs, applied in the context of networks, sparse matrix factorizations, iterative solvers, fast multi-pole methods, automatic differentiation, high-performance computing, and linear programming. The workshop was held at the premises of the LIP laboratory of ENS Lyon and was generously supported by the LABEX MILYON (ANR-10-LABX-0070, Université de Lyon, within the program ''Investissements d'Avenir'' ANR-11-IDEX-0007 operated by the French National Research Agency), and by SIAM

    ProblÚmes de mémoire et de performance de la factorisation multifrontale parallÚle et de la résolution triangulaire à seconds membres creux

    Get PDF
    We consider the solution of very large sparse systems of linear equations on parallel architectures. In this context, memory is often a bottleneck that prevents or limits the use of direct solvers, especially those based on the multifrontal method. This work focuses on memory and performance issues of the two memory and computationally intensive phases of direct methods, namely, the numerical factorization and the solution phase. In the first part we consider the solution phase with sparse right-hand sides, and in the second part we consider the memory scalability of the multifrontal factorization. In the first part, we focus on the triangular solution phase with multiple sparse right-hand sides, that appear in numerous applications. We especially emphasize the computation of entries of the inverse, where both the right-hand sides and the solution are sparse. We first present several storage schemes that enable a significant compression of the solution space, both in a sequential and a parallel context. We then show that the way the right-hand sides are partitioned into blocks strongly influences the performance and we consider two different settings: the out-of-core case, where the aim is to reduce the number of accesses to the factors, that are stored on disk, and the in-core case, where the aim is to reduce the computational cost. Finally, we show how to enhance the parallel efficiency. In the second part, we consider the parallel multifrontal factorization. We show that controlling the active memory specific to the multifrontal method is critical, and that commonly used mapping techniques usually fail to do so: they cannot achieve a high memory scalability, i.e., they dramatically increase the amount of memory needed by the factorization when the number of processors increases. We propose a class of "memory-aware" mapping and scheduling algorithms that aim at maximizing performance while enforcing a user-given memory constraint and provide robust memory estimates before the factorization. These techniques have raised performance issues in the parallel dense kernels used at each step of the factorization, and we have proposed some algorithmic improvements. The ideas presented throughout this study have been implemented within the MUMPS (MUltifrontal Massively Parallel Solver) solver and experimented on large matrices (up to a few tens of millions unknowns) and massively parallel architectures (up to a few thousand cores). They have demonstrated to improve the performance and the robustness of the code, and will be available in a future release. Some of the ideas presented in the first part have also been implemented within the PDSLin (Parallel Domain decomposition Schur complement based Linear solver) package.Nous nous intĂ©ressons Ă  la rĂ©solution de systĂšmes linĂ©aires creux de trĂšs grande taille sur des machines parallĂšles. Dans ce contexte, la mĂ©moire est un facteur qui limite voire empĂȘche souvent l'utilisation de solveurs directs, notamment ceux basĂ©s sur la mĂ©thode multifrontale. Cette Ă©tude se concentre sur les problĂšmes de mĂ©moire et de performance des deux phases des mĂ©thodes directes les plus coĂ»teuses en mĂ©moire et en temps : la factorisation numĂ©rique et la rĂ©solution triangulaire. Dans une premiĂšre partie nous nous intĂ©ressons Ă  la phase de rĂ©solution Ă  seconds membres creux, puis, dans une seconde partie, nous nous intĂ©ressons Ă  la scalabilitĂ© mĂ©moire de la factorisation multifrontale. La premiĂšre partie de cette Ă©tude se concentre sur la rĂ©solution triangulaire Ă  seconds membres creux, qui apparaissent dans de nombreuses applications. En particulier, nous nous intĂ©ressons au calcul d'entrĂ©es de l'inverse d'une matrice creuse, oĂč les seconds membres et les vecteurs solutions sont tous deux creux. Nous prĂ©sentons d'abord plusieurs schĂ©mas de stockage qui permettent de rĂ©duire significativement l'espace mĂ©moire utilisĂ© lors de la rĂ©solution, dans le cadre d'exĂ©cutions sĂ©quentielles et parallĂšles. Nous montrons ensuite que la façon dont les seconds membres sont regroupĂ©s peut fortement influencer la performance et nous considĂ©rons deux cadres diffĂ©rents : le cas "hors-mĂ©moire" (out-of-core) oĂč le but est de rĂ©duire le nombre d'accĂšs aux facteurs stockĂ©s sur disque, et le cas "en mĂ©moire" (in-core) oĂč le but est de rĂ©duire le nombre d'opĂ©rations. Finalement, nous montrons comment amĂ©liorer le parallĂ©lisme. Dans la seconde partie, nous nous intĂ©ressons Ă  la factorisation multifrontale parallĂšle. Nous montrons tout d'abord que contrĂŽler la mĂ©moire active spĂ©cifique Ă  la mĂ©thode multifrontale est crucial, et que les techniques de "rĂ©partition" (mapping) classiques ne peuvent fournir une bonne scalabilitĂ© mĂ©moire : le coĂ»t mĂ©moire de la factorisation augmente fortement avec le nombre de processeurs. Nous proposons une classe d'algorithmes de rĂ©partition et d'ordonnancement "conscients de la mĂ©moire" (memory-aware) qui cherchent Ă  maximiser la performance tout en respectant une contrainte mĂ©moire fournie par l'utilisateur. Ces techniques ont rĂ©vĂ©lĂ© des problĂšmes de performances dans certains des noyaux parallĂšles denses utilisĂ©s Ă  chaque Ă©tape de la factorisation, et nous avons proposĂ© plusieurs amĂ©liorations algorithmiques. Les idĂ©es prĂ©sentĂ©es tout au long de cette Ă©tude ont Ă©tĂ© implantĂ©es dans le solveur MUMPS (Solveur MUltifrontal Massivement ParallĂšle) et expĂ©rimentĂ©es sur des matrices de grande taille (plusieurs dizaines de millions d'inconnues) et sur des machines massivement parallĂšles (jusqu'Ă  quelques milliers de coeurs). Elles ont permis d'amĂ©liorer les performances et la robustesse du code et seront disponibles dans une prochaine version. Certaines des idĂ©es prĂ©sentĂ©es dans la premiĂšre partie ont Ă©galement Ă©tĂ© implantĂ©es dans le solveur PDSLin (solveur linĂ©aire hybride basĂ© sur une mĂ©thode de complĂ©ment de Schur)

    Memory and performance issues in parallel multifrontal factorizations and triangular solutions with sparse right-hand sides

    Get PDF
    Nous nous intĂ©ressons Ă  la rĂ©solution de systĂšmes linĂ©aires creux de trĂšs grande taille sur des machines parallĂšles. Dans ce contexte, la mĂ©moire est un facteur qui limite voire empĂȘche souvent l’utilisation de solveurs directs, notamment ceux basĂ©s sur la mĂ©thode multifrontale. Cette Ă©tude se concentre sur les problĂšmes de mĂ©moire et de performance des deux phases des mĂ©thodes directes les plus coĂ»teuses en mĂ©moire et en temps : la factorisation numĂ©rique et la rĂ©solution triangulaire. Dans une premiĂšre partie nous nous intĂ©ressons Ă  la phase de rĂ©solution Ă  seconds membres creux, puis, dans une seconde partie, nous nous intĂ©ressons Ă  la scalabilitĂ© mĂ©moire de la factorisation multifrontale. La premiĂšre partie de cette Ă©tude se concentre sur la rĂ©solution triangulaire Ă  seconds membres creux, qui apparaissent dans de nombreuses applications. En particulier, nous nous intĂ©ressons au calcul d’entrĂ©es de l’inverse d’une matrice creuse, oĂč les seconds membres et les vecteurs solutions sont tous deux creux. Nous prĂ©sentons d’abord plusieurs schĂ©mas de stockage qui permettent de rĂ©duire significativement l’espace mĂ©moire utilisĂ© lors de la rĂ©solution, dans le cadre d’exĂ©cutions sĂ©quentielles et parallĂšles. Nous montrons ensuite que la façon dont les seconds membres sont regroupĂ©s peut fortement influencer la performance et nous considĂ©rons deux cadres diffĂ©rents : le cas "hors-mĂ©moire" (out-of-core) oĂč le but est de rĂ©duire le nombre d’accĂšs aux facteurs, qui sont stockĂ©s sur disque, et le cas "en mĂ©moire" (in-core) oĂč le but est de rĂ©duire le nombre d’opĂ©rations. Finalement, nous montrons comment amĂ©liorer le parallĂ©lisme. Dans la seconde partie, nous nous intĂ©ressons Ă  la factorisation multifrontale parallĂšle. Nous montrons tout d’abord que contrĂŽler la mĂ©moire active spĂ©cifique Ă  la mĂ©thode multifrontale est crucial, et que les technique de "rĂ©partition" (mapping) classiques ne peuvent fournir une bonne scalabilitĂ© mĂ©moire : le coĂ»t mĂ©moire de la factorisation augmente fortement avec le nombre de processeurs. Nous proposons une classe d’algorithmes de rĂ©partition et d’ordonnancement "conscients de la mĂ©moire" (memory-aware) qui cherchent Ă  maximiser la performance tout en respectant une contrainte mĂ©moire fournie par l’utilisateur. Ces techniques ont rĂ©vĂ©lĂ© des problĂšmes de performances dans certains des noyaux parallĂšles denses utilisĂ©s Ă  chaque Ă©tape de la factorisation, et nous avons proposĂ© plusieurs amĂ©liorations algorithmiques. Les idĂ©es prĂ©sentĂ©es tout au long de cette Ă©tude ont Ă©tĂ© implantĂ©es dans le solveur MUMPS (Solveur MUltifrontal Massivement ParallĂšle) et expĂ©rimentĂ©es sur des matrices de grande taille (plusieurs dizaines de millions d’inconnues) et sur des machines massivement parallĂšles (jusqu’à quelques milliers de coeurs). Elles ont permis d’amĂ©liorer les performances et la robustesse du code et seront disponibles dans une prochaine version. Certaines des idĂ©es prĂ©sentĂ©es dans la premiĂšre partie ont Ă©galement Ă©tĂ© implantĂ©es dans le solveur PDSLin (solveur linĂ©aire hybride basĂ© sur une mĂ©thode de complĂ©ment de Schur). ABSTRACT : We consider the solution of very large sparse systems of linear equations on parallel architectures. In this context, memory is often a bottleneck that prevents or limits the use of direct solvers, especially those based on the multifrontal method. This work focuses on memory and performance issues of the two memory and computationally intensive phases of direct methods, that is, the numerical factorization and the solution phase. In the first part we consider the solution phase with sparse right-hand sides, and in the second part we consider the memory scalability of the multifrontal factorization. In the first part, we focus on the triangular solution phase with multiple sparse right-hand sides, that appear in numerous applications. We especially emphasize the computation of entries of the inverse, where both the right-hand sides and the solution are sparse. We first present several storage schemes that enable a significant compression of the solution space, both in a sequential and a parallel context. We then show that the way the right-hand sides are partitioned into blocks strongly influences the performance and we consider two different settings: the out-of-core case, where the aim is to reduce the number of accesses to the factors, that are stored on disk, and the in-core case, where the aim is to reduce the computational cost. Finally, we show how to enhance the parallel efficiency. In the second part, we consider the parallel multifrontal factorization. We show that controlling the active memory specific to the multifrontal method is critical, and that commonly used mapping techniques usually fail to do so: they cannot achieve a high memory scalability, i.e. they dramatically increase the amount of memory needed by the factorization when the number of processors increases. We propose a class of "memory-aware" mapping and scheduling algorithms that aim at maximizing performance while enforcing a user-given memory constraint and provide robust memory estimates before the factorization. These techniques have raised performance issues in the parallel dense kernels used at each step of the factorization, and we have proposed some algorithmic improvements. The ideas presented throughout this study have been implemented within the MUMPS (MUltifrontal Massively Parallel Solver) solver and experimented on large matrices (up to a few tens of millions unknowns) and massively parallel architectures (up to a few thousand cores). They have demonstrated to improve the performance and the robustness of the code, and will be available in a future release. Some of the ideas presented in the first part have also been implemented within the PDSLin (Parallel Domain decomposition Schur complement based Linear solver) solver
    corecore