19 research outputs found

    Scalable hierarchical parallel algorithm for the solution of super large-scale sparse linear equations

    Full text link
    The parallel linear equations solver capable of effectively using 1000+ processors becomes the bottleneck of large-scale implicit engineering simulations. In this paper, we present a new hierarchical parallel master-slave-structural iterative algorithm for the solution of super large-scale sparse linear equations in distributed memory computer cluster. Through alternatively performing global equilibrium computation and local relaxation, our proposed algorithm will reach the specific accuracy requirement in a few of iterative steps. Moreover, each set/slave-processor majorly communicate with its nearest neighbors, and the transferring data between sets/slave-processors and master is always far below the set-neighbor communication. The corresponding algorithm for implicit finite element analysis has been implemented based on MPI library, and a super large 2-dimension square system of triangle-lattice truss structure under random static loads is simulated with over one billion degrees of freedom and up to 2001 processors on "Exploration 100" cluster in Tsinghua University. The numerical experiments demonstrate that this algorithm has excellent parallel efficiency and high scalability, and it may have broad application in other implicit simulations.Comment: 23 page, 9 figures 1 tabl

    Synthetic presentation of iterative asynchronous parallel algorithms.

    Get PDF
    Iterative asynchronous parallel methods are nowadays gaining renewed interest in the community of researchers interested in High Performance Computing (HPC), in the specific case of massive parallelism. This is because these methods avoid the deadlock phenomena and that moreover a rigorous load balancing is not necessary, which is not the case with synchronous methods. Such iterative asynchronous parallel methods are of great interest when there are many synchronizations between processors, which in the case of iterative methods is the case when convergence is slow. Indeed in iterative synchronous parallel methods, to respect the task sequence graph that defines in fact the logic of the algorithm used, processors must wait for the results they need and calculated by other processors; such expectations of the results emitted by concurrent processors therefore cause idle times for standby processors. It is to overcome this drawback that asynchronous parallel iterative methods have been introduced first for the resolution of large scale linear systems and then for the resolution of highly nonlinear algebraic systems of large size as well, where the solution may be subject to constraints. This kind of method has been widely studied worldwide by many authors. The purpose of this presentation is to present as broadly and pedagogically as possible the asynchronous parallel iterative methods as well as the issues related to their implementation and application in solving many problems arising from High Performance Computing. We will therefore try as much as possible to present the underlying concepts that allow a good understanding of these methods by avoiding as much as possible an overly rigorous mathematical formalism; references to the main pioneering work will also be made. After a general introduction we will present the basic concepts that allow to model asynchronous parallel iterative methods including as a particular case synchronous methods. We will then present the algorithmic extensions of these methods consisting of asynchronous sub-domain methods, asynchronous multisplitting methods as well as asynchronous parallel methods with flexible communications. In each case an analysis of the behavior of these methods will be presented. Note that the first kind of analysis allows to obtain an estimate of the asymptotic rate of convergence. The difficult problem of the stopping test of asynchronous parallel iterations will be also studied, both by computer sciences considerations and also by numerical aspects related to the mathematical analysis of the behavior of theses iterative parallel methods. The parallel asynchronous methods have been implemented on various architectures and we will present the main principles that made it possible to code them. These parallel asynchronous methods have been used for the resolution of several kind of mathematical problems and we will list the main applications processed. Finally we will try to specify in which cases and on which type of architecture these methods are efficient and interesting to use

    Parallel two-stage algorithms for solving the PageRank problem

    Get PDF
    In this work we present parallel algorithms based on the use of two-stage methods for solving the PageRank problem as a linear system. Different parallel versions of these methods are explored and their convergence properties are analyzed. The parallel implementation has been developed using a mixed MPI/OpenMP model to exploit parallelism beyond a single level. In order to investigate and analyze the proposed parallel algorithms, we have used several realistic large datasets. The numerical results show that the proposed algorithms can speed up the time to converge with respect to the parallel Power algorithm and behave better than other well-known techniques.This research was supported by the Spanish Ministry of Economy and Competitiveness (MINECO) and the European Commission (FEDER funds) under Grant Number TIN2015-66972-C5-4-R

    Estimation in Gaussian Graphical Models Using Tractable Subgraphs: A Walk-Sum Analysis

    Get PDF
    Graphical models provide a powerful formalism for statistical signal processing. Due to their sophisticated modeling capabilities, they have found applications in a variety of fields such as computer vision, image processing, and distributed sensor networks. In this paper, we present a general class of algorithms for estimation in Gaussian graphical models with arbitrary structure. These algorithms involve a sequence of inference problems on tractable subgraphs over subsets of variables. This framework includes parallel iterations such as embedded trees, serial iterations such as block Gauss-Seidel, and hybrid versions of these iterations. We also discuss a method that uses local memory at each node to overcome temporary communication failures that may arise in distributed sensor network applications. We analyze these algorithms based on the recently developed walk-sum interpretation of Gaussian inference. We describe the walks ldquocomputedrdquo by the algorithms using walk-sum diagrams, and show that for iterations based on a very large and flexible set of sequences of subgraphs, convergence is guaranteed in walk-summable models. Consequently, we are free to choose spanning trees and subsets of variables adaptively at each iteration. This leads to efficient methods for optimizing the next iteration step to achieve maximum reduction in error. Simulation results demonstrate that these nonstationary algorithms provide a significant speedup in convergence over traditional one-tree and two-tree iterations

    Modeling and estimation in Gaussian graphical models : maximum-entropy methods and walk-sum analysis

    Get PDF
    Thesis (S.M.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2007.Includes bibliographical references (leaves 81-86).Graphical models provide a powerful formalism for statistical signal processing. Due to their sophisticated modeling capabilities, they have found applications in a variety of fields such as computer vision, image processing, and distributed sensor networks. In this thesis we study two central signal processing problems involving Gaussian graphical models, namely modeling and estimation. The modeling problem involves learning a sparse graphical model approximation to a specified distribution. The estimation problem in turn exploits this graph structure to solve high-dimensional estimation problems very efficiently. We propose a new approach for learning a thin graphical model approximation to a specified multivariate probability distribution (e.g., the empirical distribution from sample data). The selection of sparse graph structure arises naturally in our approach through the solution of a convex optimization problem, which differentiates our procedure from standard combinatorial methods. In our approach, we seek the maximum entropy relaxation (MER) within an exponential family, which maximizes entropy subject to constraints that marginal distributions on small subsets of variables are close to the prescribed marginals in relative entropy. We also present a primal-dual interior point method that is scalable and tractable provided the level of relaxation is sufficient to obtain a thin graph. A crucial element of this algorithm is that we exploit sparsity of the Fisher information matrix in models defined on chordal graphs. The merits of this approach are investigated by recovering the graphical structure of some simple graphical models from sample data. Next, we present a general class of algorithms for estimation in Gaussian graphical models with arbitrary structure.(cont.) These algorithms involve a sequence of inference problems on tractable subgraphs over subsets of variables. This framework includes parallel iterations such as Embedded Trees, serial iterations such as block Gauss-Seidel, and hybrid versions of these iterations. We also discuss a method that uses local memory at each node to overcome temporary communication failures that may arise in distributed sensor network applications. We analyze these algorithms based on the recently developed walk-sum interpretation of Gaussian inference. We describe the walks "computed" by the algorithms using walk-sum diagrams, and show that for non-stationary iterations based on a very large and flexible set of sequences of subgraphs, convergence is achieved in walk-summable models. Consequently, we are free to choose spanning trees and subsets of variables adaptively at each iteration. This leads to efficient methods for optimizing the next iteration step to achieve maximum reduction in error. Simulation results demonstrate that these non-stationary algorithms provide a significant speedup in convergence over traditional one-tree and two-tree iterations.by Venkat Chandrasekaran.S.M

    Calcul haute performance pour la simulation d'interactions fluide-structure

    Get PDF
    Cette thèse aborde la résolution des problèmes d'interaction fluide-structure par un algorithme consistant en un couplage entre deux solveurs : un pour le fluide et un pour la structure. Pour assurer la cohérence entre les maillages fluide et structure, on considère également une discrétisation de chaque domaine par volumes finis. En raison des difficultés de décomposition du domaine en sous-domaines, nous considérons pour chaque environnement un algorithme parallèle de multi-splitting (ou multi-décomposition) qui correspond à une présentation unifiée des méthodes de sous-domaines avec ou sans recouvrement. Cette méthode combine plusieurs applications de points fixes contractantes et nous montrons que, sous des hypothèses appropriées, chaque application de points fixes est contractante dans des espaces de dimensions finies normés par des normes hilbertiennes et non-hilbertiennes. De plus, nous montrons qu'une telle étude est valable pour les résolutions parallèles synchrones et plus généralement asynchrones de grands systèmes linéaires apparaissant lors de la discrétisation des problèmes d'interaction fluide-structure et peut être étendue au cas où le déplacement de la structure est soumis à des contraintes. Par ailleurs, nous pouvons également considérer l’analyse de la convergence de ces méthodes de multi-splitting parallèles asynchrones par des techniques d’ordre partiel, lié au principe du maximum discret, aussi bien dans le cadre linéaire que dans celui obtenu lorsque les déplacements de la structure sont soumis à des contraintes. Nous réalisons des simulations parallèles pour divers cas test fluide-structure sur différents clusters, en considérant des communications bloquantes et non bloquantes. Dans ce dernier cas nous avons eu à résoudre une difficulté d'implémentation dans la mesure où une erreur irrécupérable survenait lors de l'exécution ; cette difficulté a été levée par introduction d’une méthode assurant la terminaison de toutes les communications non bloquantes avant la mise à jour du maillage. Les performances des simulations parallèles sont présentées et analysées. Enfin, nous appliquons la méthodologie présentée précédemment à divers contextes d'interaction fluide-structure de type industriel sur des maillages non structurés, ce qui constitue une difficulté supplémentaire

    Resilience for Asynchronous Iterative Methods for Sparse Linear Systems

    Get PDF
    Large scale simulations are used in a variety of application areas in science and engineering to help forward the progress of innovation. Many spend the vast majority of their computational time attempting to solve large systems of linear equations; typically arising from discretizations of partial differential equations that are used to mathematically model various phenomena. The algorithms used to solve these problems are typically iterative in nature, and making efficient use of computational time on High Performance Computing (HPC) clusters involves constantly improving these iterative algorithms. Future HPC platforms are expected to encounter three main problem areas: scalability of code, reliability of hardware, and energy efficiency of the platform. The HPC resources that are expected to run the large programs are planned to consist of billions of processing units that come from more traditional multicore processors as well as a variety of different hardware accelerators. This growth in parallelism leads to the presence of all three problems. Previously, work on algorithm development has focused primarily on creating fault tolerance mechanisms for traditional iterative solvers. Recent work has begun to revisit using asynchronous methods for solving large scale applications, and this dissertation presents research into fault tolerance for fine-grained methods that are asynchronous in nature. Classical convergence results for asynchronous methods are revisited and modified to account for the possible occurrence of a fault, and a variety of techniques for recovery from the effects of a fault are proposed. Examples of how these techniques can be used are shown for various algorithms, including an analysis of a fine-grained algorithm for computing incomplete factorizations. Lastly, numerous modeling and simulation tools for the further construction of iterative algorithms for HPC applications are developed, including numerical models for simulating faults and a simulation framework that can be used to extrapolate the performance of algorithms towards future HPC systems
    corecore