25 research outputs found
Highly parallel sparse Cholesky factorization
Several fine grained parallel algorithms were developed and compared to compute the Cholesky factorization of a sparse matrix. The experimental implementations are on the Connection Machine, a distributed memory SIMD machine whose programming model conceptually supplies one processor per data element. In contrast to special purpose algorithms in which the matrix structure conforms to the connection structure of the machine, the focus is on matrices with arbitrary sparsity structure. The most promising algorithm is one whose inner loop performs several dense factorizations simultaneously on a 2-D grid of processors. Virtually any massively parallel dense factorization algorithm can be used as the key subroutine. The sparse code attains execution rates comparable to those of the dense subroutine. Although at present architectural limitations prevent the dense factorization from realizing its potential efficiency, it is concluded that a regular data parallel architecture can be used efficiently to solve arbitrarily structured sparse problems. A performance model is also presented and it is used to analyze the algorithms
Efficient Factor Graph Fusion for Multi-robot Mapping
This work presents a novel method to efficiently factorize the combination of multiple factor graphs having common variables of estimation. The fast-paced innovation in the algebraic graph theory has enabled new tools of state estimation like factor graphs. Recent factor graph formulation for Simultaneous Localization and Mapping (SLAM) like Incremental Smoothing and Mapping using the Bayes tree (ISAM2) has been very successful and garnered much attention. Variable ordering, a well-known technique in linear algebra is employed for solving the factor graph. Our primary contribution in this work is to reuse the variable ordering of the graphs being combined to find the ordering of the fused graph. In the case of mapping, multiple robots provide a great advantage over single robot by providing a faster map coverage and better estimation quality. This coupled with an inevitable increase in the number of robots around us produce a demand for faster algorithms. For example, a city full of self-driving cars could pool their observation measurements rapidly to plan a traffic free navigation. By reusing the variable ordering of the parent graphs we were able to produce an order-of-magnitude difference in the time required for solving the fused graph. We also provide a formal verification to show that the proposed strategy does not violate any of the relevant standards. A common problem in multi-robot SLAM is relative pose graph initialization to produce a globally consistent map. The other contribution addresses this by minimizing a specially formulated error function as a part of solving the factor graph. The performance is illustrated on a publicly available SuiteSparse dataset and the multi-robot AP Hill dataset
Distributed Robust Stability Analysis of Interconnected Uncertain Systems
This paper considers robust stability analysis of a large network of
interconnected uncertain systems. To avoid analyzing the entire network as a
single large, lumped system, we model the network interconnections with
integral quadratic constraints. This approach yields a sparse linear matrix
inequality which can be decomposed into a set of smaller, coupled linear matrix
inequalities. This allows us to solve the analysis problem efficiently and in a
distributed manner. We also show that the decomposed problem is equivalent to
the original robustness analysis problem, and hence our method does not
introduce additional conservativeness.Comment: This paper has been accepted for presentation at the 51st IEEE
Conference on Decision and Control, Maui, Hawaii, 201
Recommended from our members
A Parallel Direct Method for Finite Element Electromagnetic Computations Based on Domain Decomposition
High performance parallel computing and direct (factorization-based) solution methods have been the two main trends in electromagnetic computations in recent years. When time-harmonic (frequency-domain) Maxwell\u27s equation are directly discretized with the Finite Element Method (FEM) or other Partial Differential Equation (PDE) methods, the resulting linear system of equations is sparse and indefinite, thus harder to efficiently factorize serially or in parallel than alternative methods e.g. integral equation solutions, that result in dense linear systems. State-of-the-art sparse matrix direct solvers such as MUMPS and PARDISO don\u27t scale favorably, have low parallel efficiency and high memory footprint. This work introduces a new class of sparse direct solvers based on domain decomposition method, termed Direct Domain Decomposition Method (D3M), which is reliable, memory efficient, and offers very good parallel scalability for arbitrary 3D FEM problems.
Unlike recent trends in approximate/low-rank solvers, this method focuses on `numerically exact\u27 solution methods as they are more reliable for complex `real-life\u27 models. The proposed method leverages physical insights at every stage of the development through a new symmetric domain decomposition method (DDM) with one set of Lagrange multipliers. Applying a special regularization scheme at the interfaces, either artificial loss or gain is introduced to each domain to eliminate non-physical internal resonances. A block-wise recursive algorithm based on Takahashi relationship is proposed for the efficient computation of discrete Dirichlet-to-Neumann (DtN) map to reduce the volumetric problem from all domains into an auxiliary surfacial problem defined on the domain interfaces only. Numerical results show up to 50% run-time saving in DtN map computation using the proposed block-wise recursive algorithm compared to alternative approaches. The auxiliary unknowns on the domain interfaces form a considerably (approximately an order of magnitude) smaller block-wise sparse matrix, which is efficiently factorized using a customized block LDL factorization with restricted pivoting to ensure stability.
The parallelization of the proposed D3M is realized based on Directed Acyclic Graph (DAG). Recent advances in parallel dense direct solvers, have shifted toward parallel implementation that rely on DAG scheduling to achieve highly efficient asynchronous parallel execution. However, adaptation of such schemes to sparse matrices is harder and often impractical. In D3M, computation of each domain\u27s discrete DtN map ``embarrassingly parallel\u27\u27, whereas the customized block LDLT is suitable for a block directed acyclic graph (B-DAG) task scheduling, similar to that used in dense matrix parallel direct solvers. In this approach, computations are represented as a sequence of small tasks that operate on domains of DDM or dense matrix blocks of the reduced matrix. These tasks can be statically scheduled for parallel execution using their DAG dependencies and weights that depend on estimates of computation and communication costs.
Comparisons with state-of-the-art exact direct solvers on electrically large problems suggest up to 20% better parallel efficiency, 30% - 3X less memory and slightly faster in runtime, while maintaining the same accuracy
Book of Abstracts of the Sixth SIAM Workshop on Combinatorial Scientific Computing
Book of Abstracts of CSC14 edited by Bora UçarInternational audienceThe Sixth SIAM Workshop on Combinatorial Scientific Computing, CSC14, was organized at the Ecole Normale Supérieure de Lyon, France on 21st to 23rd July, 2014. This two and a half day event marked the sixth in a series that started ten years ago in San Francisco, USA. The CSC14 Workshop's focus was on combinatorial mathematics and algorithms in high performance computing, broadly interpreted. The workshop featured three invited talks, 27 contributed talks and eight poster presentations. All three invited talks were focused on two interesting fields of research specifically: randomized algorithms for numerical linear algebra and network analysis. The contributed talks and the posters targeted modeling, analysis, bisection, clustering, and partitioning of graphs, applied in the context of networks, sparse matrix factorizations, iterative solvers, fast multi-pole methods, automatic differentiation, high-performance computing, and linear programming. The workshop was held at the premises of the LIP laboratory of ENS Lyon and was generously supported by the LABEX MILYON (ANR-10-LABX-0070, Université de Lyon, within the program ''Investissements d'Avenir'' ANR-11-IDEX-0007 operated by the French National Research Agency), and by SIAM
Engineering Data Reduction for Nested Dissection
Many applications rely on time-intensive matrix operations, such as
factorization, which can be sped up significantly for large sparse matrices by
interpreting the matrix as a sparse graph and computing a node ordering that
minimizes the so-called fill-in. In this paper, we engineer new data reduction
rules for the minimum fill-in problem, which significantly reduce the size of
the graph while producing an equivalent (or near-equivalent) instance. By
applying both new and existing data reduction rules exhaustively before nested
dissection, we obtain improved quality and at the same time large improvements
in running time on a variety of instances. Our overall algorithm outperforms
the state-of-the-art significantly: it not only yields better elimination
orders, but it does so significantly faster than previously possible. For
example, on road networks, where nested dissection algorithms are typically
used as a preprocessing step for shortest path computations, our algorithms are
on average six times faster than Metis while computing orderings with less
fill-in
Sparse Approximate Inference for Spatio-Temporal Point Process Models
Spatio-temporal point process models play a central role in the analysis of
spatially distributed systems in several disciplines. Yet, scalable inference
remains computa- tionally challenging both due to the high resolution modelling
generally required and the analytically intractable likelihood function. Here,
we exploit the sparsity structure typical of (spatially) discretised
log-Gaussian Cox process models by using approximate message-passing
algorithms. The proposed algorithms scale well with the state dimension and the
length of the temporal horizon with moderate loss in distributional accuracy.
They hence provide a flexible and faster alternative to both non-linear
filtering-smoothing type algorithms and to approaches that implement the
Laplace method or expectation propagation on (block) sparse latent Gaussian
models. We infer the parameters of the latent Gaussian model using a structured
variational Bayes approach. We demonstrate the proposed framework on simulation
studies with both Gaussian and point-process observations and use it to
reconstruct the conflict intensity and dynamics in Afghanistan from the
WikiLeaks Afghan War Diary