1,321 research outputs found

    Parallel accelerated cyclic reduction preconditioner for three-dimensional elliptic PDEs with variable coefficients

    Full text link
    We present a robust and scalable preconditioner for the solution of large-scale linear systems that arise from the discretization of elliptic PDEs amenable to rank compression. The preconditioner is based on hierarchical low-rank approximations and the cyclic reduction method. The setup and application phases of the preconditioner achieve log-linear complexity in memory footprint and number of operations, and numerical experiments exhibit good weak and strong scalability at large processor counts in a distributed memory environment. Numerical experiments with linear systems that feature symmetry and nonsymmetry, definiteness and indefiniteness, constant and variable coefficients demonstrate the preconditioner applicability and robustness. Furthermore, it is possible to control the number of iterations via the accuracy threshold of the hierarchical matrix approximations and their arithmetic operations, and the tuning of the admissibility condition parameter. Together, these parameters allow for optimization of the memory requirements and performance of the preconditioner.Comment: 24 pages, Elsevier Journal of Computational and Applied Mathematics, Dec 201

    Parallel solution of power system linear equations

    Get PDF
    At the heart of many power system computations lies the solution of a large sparse set of linear equations. These equations arise from the modelling of the network and are the cause of a computational bottleneck in power system analysis applications. Efficient sequential techniques have been developed to solve these equations but the solution is still too slow for applications such as real-time dynamic simulation and on-line security analysis. Parallel computing techniques have been explored in the attempt to find faster solutions but the methods developed to date have not efficiently exploited the full power of parallel processing. This thesis considers the solution of the linear network equations encountered in power system computations. Based on the insight provided by the elimination tree, it is proposed that a novel matrix structure is adopted to allow the exploitation of parallelism which exists within the cutset of a typical parallel solution. Using this matrix structure it is possible to reduce the size of the sequential part of the problem and to increase the speed and efficiency of typical LU-based parallel solution. A method for transforming the admittance matrix into the required form is presented along with network partitioning and load balancing techniques. Sequential solution techniques are considered and existing parallel methods are surveyed to determine their strengths and weaknesses. Combining the benefits of existing solutions with the new matrix structure allows an improved LU-based parallel solution to be derived. A simulation of the improved LU solution is used to show the improvements in performance over a standard LU-based solution that result from the adoption of the new techniques. The results of a multiprocessor implementation of the method are presented and the new method is shown to have a better performance than existing methods for distributed memory multiprocessors

    An Efficient Interior-Point Decomposition Algorithm for Parallel Solution of Large-Scale Nonlinear Problems with Significant Variable Coupling

    Get PDF
    In this dissertation we develop multiple algorithms for efficient parallel solution of structured nonlinear programming problems by decomposition of the linear augmented system solved at each iteration of a nonlinear interior-point approach. In particular, we address large-scale, block-structured problems with a significant number of complicating, or coupling variables. This structure arises in many important problem classes including multi-scenario optimization, parameter estimation, two-stage stochastic programming, optimal control and power network problems. The structure of these problems induces a block-angular structure in the augmented system, and parallel solution is possible using a Schur-complement decomposition. Three major variants are implemented: a serial, full-space interior-point method, serial and parallel versions of an explicit Schur-complement decomposition, and serial and parallel versions of an implicit PCG-based Schur-complement decomposition. All of these algorithms have been implemented in C++ in an extensible software framework for nonlinear optimization. The explicit Schur-complement decomposition is typically effective for problems with a few hundred coupling variables. We demonstrate the performance of our implementation on an important problem in optimal power grid operation, the contingency-constrained AC optimal power ow problem. In this dissertation, we present a rectangular IV formulation for the contingency-constrained ACOPF problem and demonstrate that the explicit Schur-complement decomposition can dramatically reduce solution times for a problem with a large number of contingency scenarios. Moreover, a comparison of the explicit Schur-complement decomposition implementation and the Progressive Hedging approach provided by Pyomo is provided, showing that the internal decomposition approach is computationally favorable to the external approach. However, the explicit Schur-complement decomposition approach is not appropriate for problems with a large number of coupling variables because of the high computational cost associated with forming and solving the dense Schur-complement. We show that this bottleneck can be overcome by solving the Schur-complement equations implicitly using a quasi-Newton preconditioned conjugate gradient method. This new algorithm avoids explicit formation and factorization of the Schur-complement. The computational efficiency of the serial and parallel versions of this algorithm are compared with the serial full-space approach, and the serial and parallel explicit Schur-complement approach on a set of quadratic parameter estimation problems and nonlinear optimization problems. These results show that the PCG implicit Schur-complement approach dramatically reduces the computational expense for problems with many coupling variables

    Parallel harmonic balance method for analysis of nonlinear mechanical systems

    Get PDF
    Mechanical vibration analysis and modelling are essential tools used in the design of various mechanical components and structures. In the case of turbine engine design specifically, the ability to accurately predict vibration of various parts is crucial to ensure their safe operation while maintaining efficiency. As the designs become increasingly complex and margins for errors get smaller, high fidelity numerical vibration models are necessary for their analysis. Research of parallel algorithms has progressed significantly in the last decades, thanks to the exponential growth of the world's available computational resources. This work explores the possibilities for parallel implementations for solving large scale nonlinear vibration problems. A C++ code using MPI was developed to validate these implementations in practice. The harmonic balance method is used in combination with finite elements discretisation and applied to an elastic body with the Green-Lagrange nonlinear model for large deformations. A parameter continuation scheme using a predictor-corrector approach is included to compute frequency response functions. A Newton-Raphson solver is used to solve the bordered nonlinear system of equations in the frequency domain. Three different parallel algorithms for solving the linearised problem in each Newton iteration are analysed - a sparse direct solver (using MUMPS library), GMRES (using PETSc library) and an inhouse implementation of FETI. The performance of the solvers is analysed using beam testcases and a fan blade geometry. Scalability of MUMPS and the FETI solver is assessed. Full nonlinear frequency response functions with turning points are also computed. Use of artificial coarse space and preconditioning in FETI is discussed as it greatly impacts convergence properties of the solver. The presented parallel linear solvers show promising scalability results and an ability to solve nonlinear systems of several million degrees of freedom.Open Acces

    A hierarchically blocked Jacobi SVD algorithm for single and multiple graphics processing units

    Full text link
    We present a hierarchically blocked one-sided Jacobi algorithm for the singular value decomposition (SVD), targeting both single and multiple graphics processing units (GPUs). The blocking structure reflects the levels of GPU's memory hierarchy. The algorithm may outperform MAGMA's dgesvd, while retaining high relative accuracy. To this end, we developed a family of parallel pivot strategies on GPU's shared address space, but applicable also to inter-GPU communication. Unlike common hybrid approaches, our algorithm in a single GPU setting needs a CPU for the controlling purposes only, while utilizing GPU's resources to the fullest extent permitted by the hardware. When required by the problem size, the algorithm, in principle, scales to an arbitrary number of GPU nodes. The scalability is demonstrated by more than twofold speedup for sufficiently large matrices on a Tesla S2050 system with four GPUs vs. a single Fermi card.Comment: Accepted for publication in SIAM Journal on Scientific Computin

    PARALLEL ALGORITHMS FOR NONLINEAR PROGRAMMING AND APPLICATIONS IN PHARMACEUTICAL MANUFACTURING

    Get PDF
    Effective manufacturing of pharmaceuticals presents a number of challenging optimization problems due to complex distributed, time-independent models and the need to handle uncertainty. These challenges are multiplied when real-time solutions are required. The demand for fast solution of nonlinear optimization problems, coupled with the emergence of new concurrent computing architectures, drives the need for parallel algorithms to solve challenging NLP problems. The goal of this work is the development of parallel algorithms for nonlinear programming problems on different computing architectures, and the application of large-scale nonlinear programming on challenging problems in pharmaceutical manufacturing

    A class of linear solvers based on multilevel and supernodal factorization

    Get PDF
    De oplossing van grote en schaarse lineaire systemen is een kritieke component van moderne wetenschap en technische simulaties. Iteratieve methoden, namelijk de klasse van moderne Krylov-subruimtemethoden, worden vaak gebruikt om grootschalige lineaire systemen op te lossen. Om de robuustheid en de convergentiesnelheid van de iteratieve methoden te verbeteren, worden preconditioneringstechnieken vaak beschouwd als cruciale componenten van de lineaire systeemoplossing. In dit proefschrift wordt een klasse van algebraïsche multilevel oplossers gepresenteerd voor het conditioneren van algemene lineaire systeemvergelijkingen die voortkomen uit computationele wetenschap en technische toepassingen. Ze kunnen spaarzame patronen produceren en geheugenkosten besparen door recursieve combinatorische algoritmen toe te passen. Robuustheid wordt verbeterd door de factorisatie te combineren met recent ontwikkelde overlappende en compressiestrategieën en door efficiënte lokale oplossers te gebruiken. We hebben de goede prestaties van de voorgestelde strategieën aangetoond met numerieke experimenten op realistische matrixproblemen, ook in vergelijking met enkele van de meest populaire algebraïsche preconditioners die tegenwoordig worden gebruikt

    A class of linear solvers based on multilevel and supernodal factorization

    Get PDF
    corecore