1,360 research outputs found

    An Efficient Block Circulant Preconditioner For Simulating Fracture Using Large Fuse Networks

    Full text link
    {\it Critical slowing down} associated with the iterative solvers close to the critical point often hinders large-scale numerical simulation of fracture using discrete lattice networks. This paper presents a block circlant preconditioner for iterative solvers for the simulation of progressive fracture in disordered, quasi-brittle materials using large discrete lattice networks. The average computational cost of the present alorithm per iteration is O(rslogs)+delopsO(rs log s) + delops, where the stiffness matrix A{\bf A} is partioned into rr-by-rr blocks such that each block is an ss-by-ss matrix, and delopsdelops represents the operational count associated with solving a block-diagonal matrix with rr-by-rr dense matrix blocks. This algorithm using the block circulant preconditioner is faster than the Fourier accelerated preconditioned conjugate gradient (PCG) algorithm, and alleviates the {\it critical slowing down} that is especially severe close to the critical point. Numerical results using random resistor networks substantiate the efficiency of the present algorithm.Comment: 16 pages including 2 figure

    Parallel accelerated cyclic reduction preconditioner for three-dimensional elliptic PDEs with variable coefficients

    Full text link
    We present a robust and scalable preconditioner for the solution of large-scale linear systems that arise from the discretization of elliptic PDEs amenable to rank compression. The preconditioner is based on hierarchical low-rank approximations and the cyclic reduction method. The setup and application phases of the preconditioner achieve log-linear complexity in memory footprint and number of operations, and numerical experiments exhibit good weak and strong scalability at large processor counts in a distributed memory environment. Numerical experiments with linear systems that feature symmetry and nonsymmetry, definiteness and indefiniteness, constant and variable coefficients demonstrate the preconditioner applicability and robustness. Furthermore, it is possible to control the number of iterations via the accuracy threshold of the hierarchical matrix approximations and their arithmetic operations, and the tuning of the admissibility condition parameter. Together, these parameters allow for optimization of the memory requirements and performance of the preconditioner.Comment: 24 pages, Elsevier Journal of Computational and Applied Mathematics, Dec 201

    High performance interior point methods for three-dimensional finite element limit analysis

    Get PDF
    The ability to obtain rigorous upper and lower bounds on collapse loads of various structures makes finite element limit analysis an attractive design tool. The increasingly high cost of computing those bounds, however, has limited its application on problems in three dimensions. This work reports on a high-performance homogeneous self-dual primal-dual interior point method developed for three-dimensional finite element limit analysis. This implementation achieves convergence times over 4.5× faster than the leading commercial solver across a set of three-dimensional finite element limit analysis test problems, making investigation of three dimensional limit loads viable. A comparison between a range of iterative linear solvers and direct methods used to determine the search direction is also provided, demonstrating the superiority of direct methods for this application. The components of the interior point solver considered include the elimination of and options for handling remaining free variables, multifrontal and supernodal Cholesky comparison for computing the search direction, differences between approximate minimum degree [1] and nested dissection [13] orderings, dealing with dense columns and fixed variables, and accelerating the linear system solver through parallelization. Each of these areas resulted in an improvement on at least one of the problems in the test set, with many achieving gains across the whole set. The serial implementation achieved runtime performance 1.7× faster than the commercial solver Mosek [5]. Compared with the parallel version of Mosek, the use of parallel BLAS routines in the supernodal solver saw a 1.9× speedup, and with a modified version of the GPU-enabled CHOLMOD [11] and a single NVIDIA Tesla K20c this speedup increased to 4.65×
    corecore