19 research outputs found
Data Structures and Algorithms for Efficient Solution of Simultaneous Linear Equations from 3-D Ice Sheet Models
Two current software packages for solving large systems of sparse simultaneous l~neare equations are evaluated in terms of their applicability to solving systems of equations generated by the University of Maine Ice Sheet Model. SuperLU, the first package, has been developed by researchers at the University of California at Berkeley and the Lawrence Berkeley National Laboratory. UMFPACK, the second package, has been developed by T. A. Davis of the University of Florida who has ties with the U. C. Berkeley researchers as well as European researchers. Both packages are direct solvers that use LU factorization with forward and backward substitution. The University of Maine Ice Sheet Model uses the finite element method to solve partial differential equations that describe ice thickness, velocity,and temperature throughout glaciers as functions of position and t~me. The finite element method generates systems of linear equations having tens of thousands of variables and one hundred or so non-zero coefficients per equation. Matrices representing these systems of equations may be strictly banded or banded with right and lower borders. In order to efficiently Interface the software packages with the ice sheet model, a modified compressed column data structure and supporting routines were designed and written. The data structure interfaces directly with both software packages and allows the ice sheet model to access matrix coefficients by row and column number in roughly 100 nanoseconds while only storing non-zero entries of the matrix. No a priori knowledge of the matrix\u27s sparsity pattern is required. Both software packages were tested with matrices produced by the model and performance characteristics were measured arid compared with banded Gaussian elimination. When combined with high performance basic linear algebra subprograms (BLAS), the packages are as much as 5 to 7 times faster than banded Gaussian elimination. The BLAS produced by K. Goto of the University of Texas was used. Memory usage by the packages varted from slightly more than banded Gaussian elimination with UMFPACK, to as much as a 40% savings with SuperLU. In addition, the packages provide componentwise backward error measures and estimates of the matrix\u27s condition number. SuperLU is available for parallel computers as well as single processor computers. UMPACK is only for single processor computers. Both packages are also capable of efficiently solving the bordered matrix problem
The numerical solution of sparse matrix equations by fast methods and associated computational techniques
The numerical solution of sparse matrix equations by fast methods and associated computational technique
Computation of eigenvectors of block tridiagonal matrices based on twisted factorizations
Die Berechnung von Eigenwerten und Eigenvektoren von blocktridiagonalen Matrizen und Bandmatrizen stellt einen gewichtigen Aspekt von zahlreichen Anwendungen aus dem Scientific Computing dar. Bisherige Algorithmen zur Bestimmung von Eigenvektoren in solchen Matrizen basierten zumeist auf einer vorhergehenden Tridiagonalisierung der Matrix. Obwohl die Bestimmung von Eigenwerten und Eigenvektoren in tridiagonalen Matrizen sehr effizient durchgeführt werden kann, ist der notwendige Tridiagonalisierungsprozess jedoch sehr rechenintensiv. Des weiteren benötigen zahlreiche Methoden noch Maßnahmen zur Sicherstellung der Orthogonalität der resultierenden Eigenvektoren, was eine zusätzliche Bürde für die Rechenleistung darstellt.
In dieser Arbeit wird eine neue Methode zur Berechnung von Eigenvektoren in blocktridiagonalen Matrizen untersucht, die im Wesentlichen auf der Verwendung von Twisted Factorizations beruht. Hierfür werden die grundlegenden Prinzipien eines Algorithmus zur Berechnung von geblockten Twisted Factorizations von blocktridiagonalen Matrizen erläutert. Des weiteren werden einige interessante Eigenschaften von Twisted Factorizations aufgezeigt, sowie die Beziehung des Blocks, bei dem sich die Faktorisierungen treffen, zu einem Eigenvektor erklärt. Diese Beziehung kann zur effizienten Bestimmung von Eigenvektoren herangezogen werden.
Im Gegensatz zu bisherigen Methoden ist der hier vorgestellte Algorithmus nicht auf eine Reduktion zur tridiagonalen Form angewiesen und beinhaltet nur einen einzigen Schritt der inversen Iteration. Dies wird durch das Auffinden eines Startvektors, der das Residuum des Eigenpaares minimiert, ermöglicht. Einer der Hauptpunkte dieser Arbeit ist daher die Evaluierung verschiedener Strategien zur Selektion eines geeigneten Startvektors.
Des weiteren werden im Rahmen dieser Arbeit Daten zur Genauigkeit, Orthogonalität und des Zeitverhaltens einer Computerimplementation des neuen Algorithmus vorgestellt und mit gängigen Methoden verglichen. Die gewonnenen Daten zeigen nicht nur, daß der Algorithmus Eigenvektoren mit sehr geringen Residuen zurückliefert, sondern auch bei der Berechnung von Eigenvektoren in großen Matrizen und/oder Matrizen mit geringer Bandbreite effizienter ist. Aufgrund seiner Struktur und dem inhärenten Parallelisierungspotential ist der neue Algorithmus hervorragend dazu geeignet, moderne und zukünftige Hardware auszunutzen, welche von einem hohen Maß an Nebenläufigkeit geprägt sind.Computing the eigenvalues and eigenvectors of a band or block tridiagonal matrix is an important aspect of various applications in Scientific Computing. Most existing algorithms for computing eigenvectors of a band matrix rely on a prior tridiagonalization of the matrix. While the eigenvalues and eigenvectors of tridiagonal matrices can be computed very efficiently, the preceding tridiagonalization process can be relatively costly. Moreover, many eigensolvers require additional measures to ensure the orthogonality of the computed eigenvectors, which constitutes a significant computational expense.
In this thesis we explore a new method for computing eigenvectors of block tridiagonal matrices based on twisted factorizations. We describe the basic principles of an algorithm for computing block twisted factorizations of block tridiagonal matrices. We also show some interesting properties of these twisted factorizations and investigate the relation of the block, where the factorizations meet, to an eigenvector of the block tridiagonal matrix. This relation can be exploited to compute the eigenvector in a very efficient way.
Contrary to most conventional techniques, our algorithm for the determination of eigenvectors does not require a reduction of the matrix to tridiagonal form, and attempts to compute a good eigenvector approximation with only a single step of inverse iteration. This idea is based on finding a starting vector for inverse iteration which minimizes the residual of the resulting eigenpair. One of the main contributions of this thesis is the investigation and evaluation of different strategies for the selection of a suitable starting vector.
Furthermore, we present experimental data for the accuracy, orthogonality and runtime behavior of an implementation of the new algorithm, and compare these results with existing methods. Our results show that our new algorithm returns eigenvectors with very low residuals, while being more efficient in terms of computational costs for large matrices and/or for small bandwidths. Due to its structure and inherent parallelization potential, the new algorithm is also well suited for exploiting modern and future hardware, which are characterized by a high degree of concurrency
Recommended from our members
A Parallel Direct Method for Finite Element Electromagnetic Computations Based on Domain Decomposition
High performance parallel computing and direct (factorization-based) solution methods have been the two main trends in electromagnetic computations in recent years. When time-harmonic (frequency-domain) Maxwell\u27s equation are directly discretized with the Finite Element Method (FEM) or other Partial Differential Equation (PDE) methods, the resulting linear system of equations is sparse and indefinite, thus harder to efficiently factorize serially or in parallel than alternative methods e.g. integral equation solutions, that result in dense linear systems. State-of-the-art sparse matrix direct solvers such as MUMPS and PARDISO don\u27t scale favorably, have low parallel efficiency and high memory footprint. This work introduces a new class of sparse direct solvers based on domain decomposition method, termed Direct Domain Decomposition Method (D3M), which is reliable, memory efficient, and offers very good parallel scalability for arbitrary 3D FEM problems.
Unlike recent trends in approximate/low-rank solvers, this method focuses on `numerically exact\u27 solution methods as they are more reliable for complex `real-life\u27 models. The proposed method leverages physical insights at every stage of the development through a new symmetric domain decomposition method (DDM) with one set of Lagrange multipliers. Applying a special regularization scheme at the interfaces, either artificial loss or gain is introduced to each domain to eliminate non-physical internal resonances. A block-wise recursive algorithm based on Takahashi relationship is proposed for the efficient computation of discrete Dirichlet-to-Neumann (DtN) map to reduce the volumetric problem from all domains into an auxiliary surfacial problem defined on the domain interfaces only. Numerical results show up to 50% run-time saving in DtN map computation using the proposed block-wise recursive algorithm compared to alternative approaches. The auxiliary unknowns on the domain interfaces form a considerably (approximately an order of magnitude) smaller block-wise sparse matrix, which is efficiently factorized using a customized block LDL factorization with restricted pivoting to ensure stability.
The parallelization of the proposed D3M is realized based on Directed Acyclic Graph (DAG). Recent advances in parallel dense direct solvers, have shifted toward parallel implementation that rely on DAG scheduling to achieve highly efficient asynchronous parallel execution. However, adaptation of such schemes to sparse matrices is harder and often impractical. In D3M, computation of each domain\u27s discrete DtN map ``embarrassingly parallel\u27\u27, whereas the customized block LDLT is suitable for a block directed acyclic graph (B-DAG) task scheduling, similar to that used in dense matrix parallel direct solvers. In this approach, computations are represented as a sequence of small tasks that operate on domains of DDM or dense matrix blocks of the reduced matrix. These tasks can be statically scheduled for parallel execution using their DAG dependencies and weights that depend on estimates of computation and communication costs.
Comparisons with state-of-the-art exact direct solvers on electrically large problems suggest up to 20% better parallel efficiency, 30% - 3X less memory and slightly faster in runtime, while maintaining the same accuracy
Solution of partial differential equations on vector and parallel computers
The present status of numerical methods for partial differential equations on vector and parallel computers was reviewed. The relevant aspects of these computers are discussed and a brief review of their development is included, with particular attention paid to those characteristics that influence algorithm selection. Both direct and iterative methods are given for elliptic equations as well as explicit and implicit methods for initial boundary value problems. The intent is to point out attractive methods as well as areas where this class of computer architecture cannot be fully utilized because of either hardware restrictions or the lack of adequate algorithms. Application areas utilizing these computers are briefly discussed
Explicit alternating direction methods for problems in fluid dynamics
Recently an iterative method was formulated employing a new splitting strategy for the
solution of tridiagonal systems of difference equations. The method was successful in solving the systems of equations arising from one dimensional initial boundary value problems,
and a theoretical analysis for proving the convergence of the method for systems whose
constituent matrices are positive definite was presented by Evans and Sahimi [22]. The
method was known as the Alternating Group Explicit (AGE) method and is referred to
as AGE-1D. The explicit nature of the method meant that its implementation on parallel
machines can be very promising.
The method was also extended to solve systems arising from two and three dimensional
initial-boundary value problems, but the AGE-2D and AGE-3D algorithms proved to be
too demanding in computational cost which largely reduces the advantages of its parallel
nature.
In this thesis, further theoretical analyses and experimental studies are pursued to establish
the convergence and suitability of the AGE-1D method to a wider class of systems arising
from univariate and multivariate differential equations with symmetric and non symmetric
difference operators. Also the possibility of a Chebyshev acceleration of the AGE-1D
algorithm is considered.
For two and three dimensional problems it is proposed to couple the use of the AGE-1D
algorithm with an ADI scheme or an ADI iterative method in what is called the Explicit
Alternating Direction (EAD) method. It is then shown through experimental results that
the EAD method retains the parallel features of the AGE method and moreover leads to
savings of up to 83 % in the computational cost for solving some of the model problems.
The thesis also includes applications of the AGE-1D algorithm and the EAD method to
solve some problems of fluid dynamics such as the linearized Shallow Water equations,
and the Navier Stokes' equations for the flow in an idealized one dimensional Planetary
Boundary Layer.
The thesis terminates with conclusions and suggestions for further work together with a
comprehensive bibliography and an appendix containing some selected programs
Application of HPC in eddy current electromagnetic problem solution
As engineering problems are becoming more and more advanced, the size of an average model solved by partial differential equations is rapidly growing and, in order to keep simulation times within reasonable bounds, both faster computers and more efficient software implementations are needed.
In the first part of this thesis, the full potential of simulation software has been exploited through high performance parallel computing techniques. In particular, the simulation of induction heating processes is accomplished within reasonable solution times, by implementing different parallel direct solvers for large sparse linear system, in the solution process of a commercial software. The performance of such library on shared memory systems has been remarkably improved by implementing a multithreaded version of MUMPS (MUltifrontal Massively Parallel Solver) library, which have been tested on benchmark matrices arising from typical induction heating process simulations.
A new multithreading approach and a low rank approximation technique have been implemented and developed by MUMPS team in Lyon and Toulouse. In the context of a collaboration between MUMPS team and DII-University of Padova, a preliminary version of such functionalities could be tested on induction heating benchmark problems, and a substantial reduction of the computational cost and memory requirements could be achieved.
In the second part of this thesis, some examples of design methodology by virtual prototyping have been described. Complex multiphysics simulations involving electromagnetic, circuital, thermal and mechanical problems have been performed by exploiting parallel solvers, as developed in the first part of this thesis. Finally, multiobjective stochastic optimization algorithms have been applied to multiphysics 3D model simulations in search of a set of improved induction heating device configurations
A New Method for Efficient Parallel Solution of Large Linear Systems on a SIMD Processor.
This dissertation proposes a new technique for efficient parallel solution of very large linear systems of equations on a SIMD processor. The model problem used to investigate both the efficiency and applicability of the technique was of a regular structure with semi-bandwidth and resulted from approximation of a second order, two-dimensional elliptic equation on a regular domain under the Dirichlet and periodic boundary conditions. With only slight modifications, chiefly to properly account for the mathematical effects of varying bandwidths, the technique can be extended to encompass solution of any regular, banded systems. The computational model used was the MasPar MP-X (model 1208B), a massively parallel processor hostnamed hurricane and housed in the Concurrent Computing Laboratory of the Physics/Astronomy department, Louisiana State University. The maximum bandwidth which caused the problem\u27s size to fit the nyproc nxproc machine array exactly, was determined. This as well as smaller sizes were used in four experiments to evaluate the efficiency of the new technique. Four benchmark algorithms, two direct--Gauss elimination (GE), Orthogonal factorization--and two iterative--symmetric over-relaxation (SOR) ( = 2), the conjugate gradient method (CG)--were used to test the efficiency of the new approach based upon three evaluation metrics--deviations of results of computations, measured as average absolute errors, from the exact solution, the cpu times, and the mega flop rates of executions. All the benchmarks, except the GE, were implemented in parallel. In all evaluation categories, the new approach outperformed the benchmarks and very much so when N p, p being the number of processors and N the problem size. At the maximum system\u27s size, the new method was about 2.19 more accurate, and about 1.7 times faster than the benchmarks. But when the system size was a lot smaller than the machine\u27s size, the new approach\u27s performance deteriorated precipitously, and, in fact, in this circumstance, its performance was worse than that of GE, the serial code. Hence, this technique is recommended for solution of linear systems with regular structures on array processors when the problem\u27s size is large in relation to the processor\u27s size
The NASTRAN theoretical manual
Designed to accommodate additions and modifications, this commentary on NASTRAN describes the problem solving capabilities of the program in a narrative fashion and presents developments of the analytical and numerical procedures that underlie the program. Seventeen major sections and numerous subsections cover; the organizational aspects of the program, utility matrix routines, static structural analysis, heat transfer, dynamic structural analysis, computer graphics, special structural modeling techniques, error analysis, interaction between structures and fluids, and aeroelastic analysis