10,350 research outputs found
An Arnoldi-frontal approach for the stability analysis of flows in a collapsible channel
In this paper, we present a new approach based on a combination of the Arnoldi and frontal methods for solving large sparse asymmetric and generalized complex eigenvalue problems. The new eigensolver seeks the most unstable eigensolution in the Krylov subspace and makes use of the efficiency of the frontal solver developed for the finite element methods. The approach is used for a stability analysis of flows in a collapsible channel and is found to significantly improve the computational efficiency compared to the traditionally used QZ solver or a standard Arnoldi method. With the new approach, we are able to validate the previous results obtained either on a much coarser mesh or estimated from unsteady simulations. New neutral stability solutions of the system have been obtained which are beyond the limits of previously used methods
h-multigrid agglomeration based solution strategies for discontinuous Galerkin discretizations of incompressible flow problems
In this work we exploit agglomeration based -multigrid preconditioners to
speed-up the iterative solution of discontinuous Galerkin discretizations of
the Stokes and Navier-Stokes equations. As a distinctive feature -coarsened
mesh sequences are generated by recursive agglomeration of a fine grid,
admitting arbitrarily unstructured grids of complex domains, and agglomeration
based discontinuous Galerkin discretizations are employed to deal with
agglomerated elements of coarse levels. Both the expense of building coarse
grid operators and the performance of the resulting multigrid iteration are
investigated. For the sake of efficiency coarse grid operators are inherited
through element-by-element projections, avoiding the cost of numerical
integration over agglomerated elements. Specific care is devoted to the
projection of viscous terms discretized by means of the BR2 dG method. We
demonstrate that enforcing the correct amount of stabilization on coarse grids
levels is mandatory for achieving uniform convergence with respect to the
number of levels. The numerical solution of steady and unsteady, linear and
non-linear problems is considered tackling challenging 2D test cases and 3D
real life computations on parallel architectures. Significant execution time
gains are documented.Comment: 78 pages, 7 figure
Parallel Factorizations in Numerical Analysis
In this paper we review the parallel solution of sparse linear systems,
usually deriving by the discretization of ODE-IVPs or ODE-BVPs. The approach is
based on the concept of parallel factorization of a (block) tridiagonal matrix.
This allows to obtain efficient parallel extensions of many known matrix
factorizations, and to derive, as a by-product, a unifying approach to the
parallel solution of ODEs.Comment: 15 pages, 5 figure
An efficient multi-core implementation of a novel HSS-structured multifrontal solver using randomized sampling
We present a sparse linear system solver that is based on a multifrontal
variant of Gaussian elimination, and exploits low-rank approximation of the
resulting dense frontal matrices. We use hierarchically semiseparable (HSS)
matrices, which have low-rank off-diagonal blocks, to approximate the frontal
matrices. For HSS matrix construction, a randomized sampling algorithm is used
together with interpolative decompositions. The combination of the randomized
compression with a fast ULV HSS factorization leads to a solver with lower
computational complexity than the standard multifrontal method for many
applications, resulting in speedups up to 7 fold for problems in our test
suite. The implementation targets many-core systems by using task parallelism
with dynamic runtime scheduling. Numerical experiments show performance
improvements over state-of-the-art sparse direct solvers. The implementation
achieves high performance and good scalability on a range of modern shared
memory parallel systems, including the Intel Xeon Phi (MIC). The code is part
of a software package called STRUMPACK -- STRUctured Matrices PACKage, which
also has a distributed memory component for dense rank-structured matrices
A Class of Parallel Tiled Linear Algebra Algorithms for Multicore Architectures
As multicore systems continue to gain ground in the High Performance
Computing world, linear algebra algorithms have to be reformulated or new
algorithms have to be developed in order to take advantage of the architectural
features on these new processors. Fine grain parallelism becomes a major
requirement and introduces the necessity of loose synchronization in the
parallel execution of an operation. This paper presents an algorithm for the
Cholesky, LU and QR factorization where the operations can be represented as a
sequence of small tasks that operate on square blocks of data. These tasks can
be dynamically scheduled for execution based on the dependencies among them and
on the availability of computational resources. This may result in an out of
order execution of the tasks which will completely hide the presence of
intrinsically sequential tasks in the factorization. Performance comparisons
are presented with the LAPACK algorithms where parallelism can only be
exploited at the level of the BLAS operations and vendor implementations
A weakly stable algorithm for general Toeplitz systems
We show that a fast algorithm for the QR factorization of a Toeplitz or
Hankel matrix A is weakly stable in the sense that R^T.R is close to A^T.A.
Thus, when the algorithm is used to solve the semi-normal equations R^T.Rx =
A^Tb, we obtain a weakly stable method for the solution of a nonsingular
Toeplitz or Hankel linear system Ax = b. The algorithm also applies to the
solution of the full-rank Toeplitz or Hankel least squares problem.Comment: 17 pages. An old Technical Report with postscript added. For further
details, see http://wwwmaths.anu.edu.au/~brent/pub/pub143.htm
- …