129 research outputs found

    Domain Decomposition Based High Performance Parallel Computing\ud

    Get PDF
    The study deals with the parallelization of finite element based Navier-Stokes codes using domain decomposition and state-ofart sparse direct solvers. There has been significant improvement in the performance of sparse direct solvers. Parallel sparse direct solvers are not found to exhibit good scalability. Hence, the parallelization of sparse direct solvers is done using domain decomposition techniques. A highly efficient sparse direct solver PARDISO is used in this study. The scalability of both Newton and modified Newton algorithms are tested

    A domain decomposing parallel sparse linear system solver

    Get PDF
    The solution of large sparse linear systems is often the most time-consuming part of many science and engineering applications. Computational fluid dynamics, circuit simulation, power network analysis, and material science are just a few examples of the application areas in which large sparse linear systems need to be solved effectively. In this paper we introduce a new parallel hybrid sparse linear system solver for distributed memory architectures that contains both direct and iterative components. We show that by using our solver one can alleviate the drawbacks of direct and iterative solvers, achieving better scalability than with direct solvers and more robustness than with classical preconditioned iterative solvers. Comparisons to well-known direct and iterative solvers on a parallel architecture are provided.Comment: To appear in Journal of Computational and Applied Mathematic

    An efficient multi-core implementation of a novel HSS-structured multifrontal solver using randomized sampling

    Full text link
    We present a sparse linear system solver that is based on a multifrontal variant of Gaussian elimination, and exploits low-rank approximation of the resulting dense frontal matrices. We use hierarchically semiseparable (HSS) matrices, which have low-rank off-diagonal blocks, to approximate the frontal matrices. For HSS matrix construction, a randomized sampling algorithm is used together with interpolative decompositions. The combination of the randomized compression with a fast ULV HSS factorization leads to a solver with lower computational complexity than the standard multifrontal method for many applications, resulting in speedups up to 7 fold for problems in our test suite. The implementation targets many-core systems by using task parallelism with dynamic runtime scheduling. Numerical experiments show performance improvements over state-of-the-art sparse direct solvers. The implementation achieves high performance and good scalability on a range of modern shared memory parallel systems, including the Intel Xeon Phi (MIC). The code is part of a software package called STRUMPACK -- STRUctured Matrices PACKage, which also has a distributed memory component for dense rank-structured matrices

    High-performance direct solution of finite element problems on multi-core processors

    Get PDF
    A direct solution procedure is proposed and developed which exploits the parallelism that exists in current symmetric multiprocessing (SMP) multi-core processors. Several algorithms are proposed and developed to improve the performance of the direct solution of FE problems. A high-performance sparse direct solver is developed which allows experimentation with the newly developed and existing algorithms. The performance of the algorithms is investigated using a large set of FE problems. Furthermore, operation count estimations are developed to further assess various algorithms. An out-of-core version of the solver is developed to reduce the memory requirements for the solution. I/O is performed asynchronously without blocking the thread that makes the I/O request. Asynchronous I/O allows overlapping factorization and triangular solution computations with I/O. The performance of the developed solver is demonstrated on a large number of test problems. A problem with nearly 10 million degree of freedoms is solved on a low price desktop computer using the out-of-core version of the direct solver. Furthermore, the developed solver usually outperforms a commonly used shared memory solver.Ph.D.Committee Chair: Will, Kenneth; Committee Member: Emkin, Leroy; Committee Member: Kurc, Ozgur; Committee Member: Vuduc, Richard; Committee Member: White, Donal

    Matching-based preprocessing algorithms to the solution of saddle-point problems in large-scale nonconvex interior-point optimization

    Get PDF
    Interior-point methods are among the most efficient approaches for solving large-scale nonlinear programming problems. At the core of these methods, highly ill-conditioned symmetric saddle-point problems have to be solved. We present combinatorial methods to preprocess these matrices in order to establish more favorable numerical properties for the subsequent factorization. Our approach is based on symmetric weighted matchings and is used in a sparse direct LDL T factorization method where the pivoting is restricted to static supernode data structures. In addition, we will dynamically expand the supernode data structure in cases where additional fill-in helps to select better numerical pivot elements. This technique can be seen as an alternative to the more traditional threshold pivoting techniques. We demonstrate the competitiveness of this approach within an interior-point method on a large set of test problems from the CUTE and COPS sets, as well as large optimal control problems based on partial differential equations. The largest nonlinear optimization problem solved has more than 12 million variables and 6 million constraint

    Scalable hierarchical parallel algorithm for the solution of super large-scale sparse linear equations

    Full text link
    The parallel linear equations solver capable of effectively using 1000+ processors becomes the bottleneck of large-scale implicit engineering simulations. In this paper, we present a new hierarchical parallel master-slave-structural iterative algorithm for the solution of super large-scale sparse linear equations in distributed memory computer cluster. Through alternatively performing global equilibrium computation and local relaxation, our proposed algorithm will reach the specific accuracy requirement in a few of iterative steps. Moreover, each set/slave-processor majorly communicate with its nearest neighbors, and the transferring data between sets/slave-processors and master is always far below the set-neighbor communication. The corresponding algorithm for implicit finite element analysis has been implemented based on MPI library, and a super large 2-dimension square system of triangle-lattice truss structure under random static loads is simulated with over one billion degrees of freedom and up to 2001 processors on "Exploration 100" cluster in Tsinghua University. The numerical experiments demonstrate that this algorithm has excellent parallel efficiency and high scalability, and it may have broad application in other implicit simulations.Comment: 23 page, 9 figures 1 tabl
    corecore