18 research outputs found

    Derivation of passage-time densities in PEPA models using ipc: the imperial PEPA compiler

    Get PDF
    We present a technique for defining and extracting passage-time densities from high-level stochastic process algebra models. Our high-level formalism is PEPA, a popular Markovian process algebra for expressing compositional performance models. We introduce ipc, a tool which can process PEPA-specified passage-time densities and models by compiling the PEPA model and passage specification into the DNAmaca formalism. DNAmaca is an established modelling language for the low-level specification of very large Markov and semi-Markov chains. We provide performance results for ipc/DNAmaca and comparisons with another tool which supports PEPA, PRISM. Finally, we generate passage-time densities and quantiles for a case study of a high-availability web server. 1

    Multi-level direct K-way hypergraph partitioning with multiple constraints and fixed vertices

    Get PDF
    K-way hypergraph partitioning has an ever-growing use in parallelization of scientific computing applications. We claim that hypergraph partitioning with multiple constraints and fixed vertices should be implemented using direct K-way refinement, instead of the widely adopted recursive bisection paradigm. Our arguments are based on the fact that recursive-bisection-based partitioning algorithms perform considerably worse when used in the multiple constraint and fixed vertex formulations. We discuss possible reasons for this performance degradation. We describe a careful implementation of a multi-level direct K-way hypergraph partitioning algorithm, which performs better than a well-known recursive-bisection-based partitioning algorithm in hypergraph partitioning with multiple constraints and fixed vertices. We also experimentally show that the proposed algorithm is effective in standard hypergraph partitioning. © 2007 Elsevier Inc. All rights reserved

    Replicated partitioning for undirected hypergraphs

    Get PDF
    Cataloged from PDF version of article.Hypergraph partitioning (HP) and replication are diverse but powerful tools that are traditionally applied separately to minimize the costs of parallel and sequential systems that access related data or process related tasks. When combined together, these two techniques have the potential of achieving significant improvements in performance of many applications. In this study, we provide an approach involving a tool that simultaneously performs replication and partitioning of the vertices of an undirected hypergraph whose vertices represent data and nets represent task dependencies among these data. In this approach, we propose an iterative-improvement-based replicated bipartitioning heuristic, which is capable of move, replication, and unreplication of vertices. In order to utilize our replicated bipartitioning heuristic in a recursive bipartitioning framework, we also propose appropriate cut-net removal, cut-net splitting, and pin selection algorithms to correctly encapsulate the two most commonly used cutsize metrics. We embed our replicated bipartitioning scheme into the state-of-the-art multilevel HP tool PaToH to provide an effective and efficient replicated HP tool, rpPaToH. The performance of the techniques proposed and the tools developed is tested over the undirected hypergraphs that model the communication costs of parallel query processing in information retrieval systems. Our experimental analysis indicates that the proposed technique provides significant improvements in the quality of the partitions, especially under low replication ratios. (C) 2012 Elsevier Inc. All rights reserved

    Shared Transaction Markov Chains for Fluid Analysis of Massively Parallel Systems

    No full text
    Published versio

    Résolution triangulaire de systèmes linéaires creux de grande taille dans un contexte parallèle multifrontal et hors-mémoire

    Get PDF
    Nous nous intéressons à la résolution de systèmes linéaires creux de très grande taille par des méthodes directes de factorisation. Dans ce contexte, la taille de la matrice des facteurs constitue un des facteurs limitants principaux pour l'utilisation de méthodes directes de résolution. Nous supposons donc que la matrice des facteurs est de trop grande taille pour être rangée dans la mémoire principale du multiprocesseur et qu'elle a donc été écrite sur les disques locaux (hors-mémoire : OOC) d'une machine multiprocesseurs durant l'étape de factorisation. Nous nous intéressons à l'étude et au développement de techniques efficaces pour la phase de résolution après une factorization multifrontale creuse. La phase de résolution, souvent négligée dans les travaux sur les méthodes directes de résolution directe creuse, constitue alors un point critique de la performance de nombreuses applications scientifiques, souvent même plus critique que l'étape de factorisation. Cette thèse se compose de deux parties. Dans la première partie nous nous proposons des algorithmes pour améliorer la performance de la résolution hors-mémoire. Dans la deuxième partie nous pousuivons ce travail en montrant comment exploiter la nature creuse des seconds membres pour réduire le volume de données accédées en mémoire. Dans la première partie de cette thèse nous introduisons deux approches de lecture des données sur le disque dur. Nous montrons ensuite que dans un environnement parallèle le séquencement des tâches peut fortement influencer la performance. Nous prouvons qu'un ordonnancement contraint des tâches peut être introduit; qu'il n'introduit pas d'interblocage entre processus et qu'il permet d'améliorer les performances. Nous conduisons nos expériences sur des problèmes industriels de grande taille (plus de 8 Millions d'inconnues) et utilisons une version hors-mémoire d'un code multifrontal creux appelé MUMPS (solveur multifrontal parallèle). Dans la deuxième partie de ce travail nous nous intéressons au cas de seconds membres creux multiples. Ce problème apparaît dans des applications en electromagnétisme et en assimilation de données et résulte du besoin de calculer l'espace propre d'une matrice fortement déficiente, du calcul d'éléments de l'inverse de la matrice associée aux équations normales pour les moindres carrés linéaires ou encore du traitement de matrices fortement réductibles en programmation linéaire. Nous décrivons un algorithme efficace de réduction du volume d'Entrées/Sorties sur le disque lors d'une résolution hors-mémoire. Plus généralement nous montrons comment le caractère creux des seconds -membres peut être exploité pour réduire le nombre d'opérations et le nombre d'accès à la mémoire lors de l'étape de résolution. Le travail présenté dans cette thèse a été partiellement financé par le projet SOLSTICE de l'ANR (ANR-06-CIS6-010). ABSTRACT : We consider the solution of very large systems of linear equations with direct multifrontal methods. In this context the size of the factors is an important limitation for the use of sparse direct solvers. We will thus assume that the factors have been written on the local disks of our target multiprocessor machine during parallel factorization. Our main focus is the study and the design of efficient approaches for the forward and backward substitution phases after a sparse multifrontal factorization. These phases involve sparse triangular solution and have often been neglected in previous works on sparse direct factorization. In many applications, however, the time for the solution can be the main bottleneck for the performance. This thesis consists of two parts. The focus of the first part is on optimizing the out-of-core performance of the solution phase. The focus of the second part is to further improve the performance by exploiting the sparsity of the right-hand side vectors. In the first part, we describe and compare two approaches to access data from the hard disk. We then show that in a parallel environment the task scheduling can strongly influence the performance. We prove that a constraint ordering of the tasks is possible; it does not introduce any deadlock and it improves the performance. Experiments on large real test problems (more than 8 million unknowns) using an out-of-core version of a sparse multifrontal code called MUMPS (MUltifrontal Massively Parallel Solver) are used to analyse the behaviour of our algorithms. In the second part, we are interested in applications with sparse multiple right-hand sides, particularly those with single nonzero entries. The motivating applications arise in electromagnetism and data assimilation. In such applications, we need either to compute the null space of a highly rank deficient matrix or to compute entries in the inverse of a matrix associated with the normal equations of linear least-squares problems. We cast both of these problems as linear systems with multiple right-hand side vectors, each containing a single nonzero entry. We describe, implement and comment on efficient algorithms to reduce the input-output cost during an outof- core execution. We show how the sparsity of the right-hand side can be exploited to limit both the number of operations and the amount of data accessed. The work presented in this thesis has been partially supported by SOLSTICE ANR project (ANR-06-CIS6-010)

    Parallel triangular solution in the out-of-core multifrontal approach for solving large sparse linear systems

    Get PDF
    We consider the solution of very large systems of linear equations with direct multifrontal methods. In this context the size of the factors is an important limitation for the use of sparse direct solvers. We will thus assume that the factors have been written on the local disks of our target multiprocessor machine during parallel factorization. Our main focus is the study and the design of efficient approaches for the forward and backward substitution phases after a sparse multifrontal factorization. These phases involve sparse triangular solution and have often been neglected in previous works on sparse direct factorization. In many applications, however, the time for the solution can be the main bottleneck for the performance. This thesis consists of two parts. The focus of the first part is on optimizing the out-of-core performance of the solution phase. The focus of the second part is to further improve the performance by exploiting the sparsity of the right-hand side vectors. In the first part, we describe and compare two approaches to access data from the hard disk. We then show that in a parallel environment the task scheduling can strongly influence the performance. We prove that a constraint ordering of the tasks is possible; it does not introduce any deadlock and it improves the performance. Experiments on large real test problems (more than 8 million unknowns) using an out-of-core version of a sparse multifrontal code called MUMPS (MUltifrontal Massively Parallel Solver) are used to analyse the behaviour of our algorithms. In the second part, we are interested in applications with sparse multiple right-hand sides, particularly those with single nonzero entries. The motivating applications arise in electromagnetism and data assimilation. In such applications, we need either to compute the null space of a highly rank deficient matrix or to compute entries in the inverse of a matrix associated with the normal equations of linear least-squares problems. We cast both of these problems as linear systems with multiple right-hand side vectors, each containing a single nonzero entry. We describe, implement and comment on efficient algorithms to reduce the input-output cost during an outof- core execution. We show how the sparsity of the right-hand side can be exploited to limit both the number of operations and the amount of data accessed. The work presented in this thesis has been partially supported by SOLSTICE ANR project (ANR-06-CIS6-010)
    corecore