12 research outputs found

    Generalizing Random Butterfly Transforms to Arbitrary Matrix Sizes

    Full text link
    Parker and L\^e introduced random butterfly transforms (RBTs) as a preprocessing technique to replace pivoting in dense LU factorization. Unfortunately, their FFT-like recursive structure restricts the dimensions of the matrix. Furthermore, on multi-node systems, efficient management of the communication overheads restricts the matrix's distribution even more. To remove these limitations, we have generalized the RBT to arbitrary matrix sizes by truncating the dimensions of each layer in the transform. We expanded Parker's theoretical analysis to generalized RBT, specifically that in exact arithmetic, Gaussian elimination with no pivoting will succeed with probability 1 after transforming a matrix with full-depth RBTs. Furthermore, we experimentally show that these generalized transforms improve performance over Parker's formulation by up to 62\% while retaining the ability to replace pivoting. This generalized RBT is available in the SLATE numerical software library

    Sur la symmétrisation de matrices et des solveurs directs

    Get PDF
    We investigate algorithms for finding column permutations of sparse matrices in order to have large diagonal entriesand to have many entries symmetrically positioned around the diagonal. The aim is to improve the memory and running time requirements of a certain class of sparse direct solvers.We propose efficient algorithms for this purpose by combining two existing approaches and demonstratethe effect of our findings in practice using a direct solver. In particular, we show improvements in a number of components of the running time of a sparse direct solver with respect to the state of the art on a diverse set of matrices.Nous étudions des algorithmes pour trouver des permutations de colonnes de matrices creuses afin d’avoir de grandes entrées sur la diagonaleet d’avoir de nombreuses entrées symétriquement positionnées autour de la diagonale. Notre but est d’améliorer la mémoire et le temps d’exécution d’unecertaine classe de solveurs directs creux. Nous proposons des algorithmes efficaces à cet effet en combinant deux approches existantes et exposons l’effet de nos résultats dans la pratique en utilisant un solveur direct. En particulier, nous montrons des améliorations dans de plusieurs components du temps d’exécution d’un solveur direct creux par rapport à l’état de l’art sur un ensemble divers de matrice

    Making Sparse Gaussian Elimination Scalable by Static Pivoting

    No full text
    We propose several techniques as alternatives to partial pivoting to stabilize sparse Gaussian elimination. From numerical experiments we demonstrate that for a wide range of problems the new method is as stable as partial pivoting. The main advantage of the new method over partial pivoting is that it permits a priori determination of data structures and communication pattern for Gaussian elimination, which makes it more scalable on distributed memory machines. Based on this a priori knowledge, we design highly parallel algorithms for both sparse Gaussian elimination and triangular solve and we show that they are suitable for large-scale distributed memory machines. Keywords: sparse unsymmetric linear systems, static pivoting, iterative refinement, MPI, 2-D matrix decomposition. 1 Introduction In our earlier work [8, 9, 22], we developed new algorithms to solve unsymmetric sparse linear systems using Gaussian elimination with partial pivoting (GEPP). The new algorithms are hi..

    Performance Improvements of Common Sparse Numerical Linear Algebra Computations

    Get PDF
    Manufacturers of computer hardware are able to continuously sustain an unprecedented pace of progress in computing speed of their products, partially due to increased clock rates but also because of ever more complicated chip designs. With new processor families appearing every few years, it is increasingly harder to achieve high performance rates in sparse matrix computations. This research proposes new methods for sparse matrix factorizations and applies in an iterative code generalizations of known concepts from related disciplines. The proposed solutions and extensions are implemented in ways that tend to deliver efficiency while retaining ease of use of existing solutions. The implementations are thoroughly timed and analyzed using a commonly accepted set of test matrices. The tests were conducted on modern processors that seem to have gained an appreciable level of popularity and are fairly representative for a wider range of processor types that are available on the market now or in the near future. The new factorization technique formally introduced in the early chapters is later on proven to be quite competitive with state of the art software currently available. Although not totally superior in all cases (as probably no single approach could possibly be), the new factorization algorithm exhibits a few promising features. In addition, an all-embracing optimization effort is applied to an iterative algorithm that stands out for its robustness. This also gives satisfactory results on the tested computing platforms in terms of performance improvement. The same set of test matrices is used to enable an easy comparison between both investigated techniques, even though they are customarily treated separately in the literature. Possible extensions of the presented work are discussed. They range from easily conceivable merging with existing solutions to rather more evolved schemes dependent on hard to predict progress in theoretical and algorithmic research

    Distributed Sparse Matrix Solver and Pivoting Algorithms for Large Linear Equations

    Get PDF
    This thesis presents a distributed sparse direct solver and pivoting strategies for distributed sparse LU factorization. In Chapter I, we introduce some background concepts in linear algebra. In Chapter II, we discuss parallel hardware architectures and introduce our problem for discussion. In Chapter III, we describe the implementation of our sparse direct solver and our pivoting algorithms. Next, we present our simulation methodology and results in Chapter IV and we show that our pivoting algorithm yields up to 50% more accurate results than a current state-of-the-art solver, SuperLU_DIST. Finally, in Chapter V, we present some ideas to further improve the accuracy and speed of our distributed sparse matrix solver

    High-speed extended-term time-domain simulation for online cascading analysis of power systemalysis of Power System

    Get PDF
    A high-speed extended-term (HSET) time domain simulator (TDS), intended to become a part of an energy management system (EMS), has been newly developed for use in online extended-term dynamic cascading analysis of power systems. HSET-TDS includes the following attributes for providing situational awareness of high-consequence events: i) online analysis, including n-1 and n-k events, ii) ability to simulate both fast and slow dynamics for 1-3 hours in advance, iii) inclusion of rigorous protection-system modeling, iv) intelligence for corrective action ID, storage, and fast retrieval, and v) high-speed execution. Very fast on-line computational capability is the most desired attribute of this simulator. Based on the process of solving algebraic differential equations describing the dynamics of power system, HSET-TDS seeks to develop computational efficiency at each of the following hierarchical levels, i) hardware, ii) strategies, iii) integration methods, iv) nonlinear solvers, and v) linear solver libraries. This thesis first describes the Hammer-Hollingsworth 4 (HH4) implicit integration method. Like the trapezoidal rule, HH4 is symmetrically A-Stable but it possesses greater high-order precision (h4) than the trapezoidal rule. Such precision enables larger integration steps and therefore improves simulation efficiency for variable step size implementations. This thesis provides the underlying theory on which we advocate use of HH4 over other numerical integration methods for power system time-domain simulation. Second, motivated by the need to perform high speed extended-term time domain simulation (HSET-TDS) for on-line purposes, this thesis presents principles for designing numerical solvers of differential algebraic systems associated with power system time-domain simulation, including DAE construction strategies (Direct Solution Method), integration methods(HH4), nonlinear solvers(Very Dishonest Newton), and linear solvers(SuperLU). We have implemented a design appropriate for HSET-TDS, and we compare it to various solvers, including the commercial grade PSS\E program, with respect to computational efficiency and accuracy, using as examples the New England 39 bus system, the expanded 8775 bus system, and PJM 13029 buses system. Third, we have explored a stiffness-decoupling method, intended to be part of parallel design of time domain simulation software for super computers. The stiffness-decoupling method is able to combine the advantages of implicit methods (A-stability) and explicit method(less computation). With the new stiffness detection method proposed herein, the stiffness can be captured. The expanded 975 buses system is used to test simulation efficiency. Finally, several parallel strategies for super computer deployment to simulate power system dynamics are proposed and compared. Design A partitions the task via scale with the stiffness decoupling method, waveform relaxation, and parallel linear solver. Design B partitions the task via the time axis using a highly precise integration method, the Kuntzmann-Butcher Method - order 8 (KB8). The strategy of partitioning events is designed to partition the whole simulation via the time axis through a simulated sequence of cascading events. For all strategies proposed, a strategy of partitioning cascading events is recommended, since the sub-tasks for each processor are totally independent, and therefore minimum communication time is needed

    Reducing Communication in the Solution of Linear Systems

    Get PDF
    There is a growing performance gap between computation and communication on modern computers, making it crucial to develop algorithms with lower latency and bandwidth requirements. Because systems of linear equations are important for numerous scientific and engineering applications, I have studied several approaches for reducing communication in those problems. First, I developed optimizations to dense LU with partial pivoting, which downstream applications can adopt with little to no effort. Second, I consider two techniques to completely replace pivoting in dense LU, which can provide significantly higher speedups, albeit without the same numerical guarantees as partial pivoting. One technique uses randomized preprocessing, while the other is a novel combination of block factorization and additive perturbation. Finally, I investigate using mixed precision in GMRES for solving sparse systems, which reduces the volume of data movement, and thus, the pressure on the memory bandwidth
    corecore