14 research outputs found

    ParILUT - A parallel threshold ILU for GPUS

    Get PDF

    Combinatorial problems in solving linear systems

    Get PDF
    42 pages, available as LIP research report RR-2009-15Numerical linear algebra and combinatorial optimization are vast subjects; as is their interaction. In virtually all cases there should be a notion of sparsity for a combinatorial problem to arise. Sparse matrices therefore form the basis of the interaction of these two seemingly disparate subjects. As the core of many of today's numerical linear algebra computations consists of the solution of sparse linear system by direct or iterative methods, we survey some combinatorial problems, ideas, and algorithms relating to these computations. On the direct methods side, we discuss issues such as matrix ordering; bipartite matching and matrix scaling for better pivoting; task assignment and scheduling for parallel multifrontal solvers. On the iterative method side, we discuss preconditioning techniques including incomplete factorization preconditioners, support graph preconditioners, and algebraic multigrid. In a separate part, we discuss the block triangular form of sparse matrices

    Preconditioning for Sparse Linear Systems at the Dawn of the 21st Century: History, Current Developments, and Future Perspectives

    Get PDF
    Iterative methods are currently the solvers of choice for large sparse linear systems of equations. However, it is well known that the key factor for accelerating, or even allowing for, convergence is the preconditioner. The research on preconditioning techniques has characterized the last two decades. Nowadays, there are a number of different options to be considered when choosing the most appropriate preconditioner for the specific problem at hand. The present work provides an overview of the most popular algorithms available today, emphasizing the respective merits and limitations. The overview is restricted to algebraic preconditioners, that is, general-purpose algorithms requiring the knowledge of the system matrix only, independently of the specific problem it arises from. Along with the traditional distinction between incomplete factorizations and approximate inverses, the most recent developments are considered, including the scalable multigrid and parallel approaches which represent the current frontier of research. A separate section devoted to saddle-point problems, which arise in many different applications, closes the paper

    New Sequential and Scalable Parallel Algorithms for Incomplete Factor Preconditioning

    Get PDF
    The solution of large, sparse, linear systems of equations Ax = b is an important kernel, and the dominant term with regard to execution time, in many applications in scientific computing. The large size of the systems of equations being solved currently (millions of unknowns and equations) requires iterative solvers on parallel computers. Preconditioning, which is the process of translating a linear system into a related system that is easier to solve, is widely used to reduce solution time and is sometimes required to ensure convergence. Level-based preconditioning (ILU(ℓ)) has long been used in serial contexts and is widely recognized as robust and effective for a wide range of problems. However, the method has long been regarded as an inherently sequential technique. Parallelism, it has been thought, can be achieved primarily at the expense of increased iterations. We dispute these claims. The first half of this dissertation takes an in-depth look at structurally based ILU(ℓ) symbolic factorization. There are two definitions of fill level, “sum” and “max,” that have been proposed. Hitherto, these definitions have been cast in terms of matrix terminology. We develop a sequence of lemmas and theorems that provide graph theoretic characterizations of both definitions; these characterizations are based on the static graph of a matrix, G(A). Our Incomplete Fill Path Theorem characterizes fill levels per the sum definition; this is the definition that is used in most library implementations of the “classic” ILU(ℓ) factorization algorithm. Our theorem leads to several new graph-search algorithms that compute factors identical, or nearly identical, to those computed by the “classic” algorithm. Our analyses shows that the new algorithms have lower run time complexity than that of the previously existing algorithms for certain classes of matrices that are commonly encountered in scientific applications. The second half of this dissertation presents a Parallel ILU algorithmic framework (PILU). This framework enables scalable parallel ILU preconditioning by combining concepts from domain decomposition and graph ordering. The framework can accommodate ILU(ℓ) factorization as well as threshold-based ILUT methods. A model implementation of the framework, the Euclid library, was developed as part of this dissertation. This library was used to obtain experimental results for Poisson\u27s equation, the Convection-Diffusion equation, and a nonlinear Radiative Transfer problem. The experiments, which were conducted on a variety of platforms with up to 400 CPUs, demonstrate that our approach is highly scalable for arbitrary ILU(ℓ) fill levels

    Astmeliste plaatide optimiseerimine siledate voolavuspindade korral

    Get PDF
    Käesolevas väitekirjas vaadeldakse Misese, Hilli ning Tsai-Wu materjalist valmistatud elastsete plastsete astmeliste plaatide optimiseerimisega seotud küsimusi. Antud dissertatsioon põhineb autori seitsmel teaduslikul publikatsioonil, millest kuus on avaldatud viimase kolme aasta jooksul. Käesolev dissertatsioon koosneb neljast peatükist, kirjanduse loetelust ning autori elulookirjeldusest. Esimene peatükk on sisuliselt ülevaade numbriliste meetodite rakendamisest konstruktsioonielementide optimiseerimisel. Selles peatükis antakse ülevaade plaatide ja koorikute optimiseerimisele pühendatud töödest, samuti kirjeldatakse lõplike elementide meetodi ja paralleelarvutuse ajaloolist arengut. Käesoleva uurimise raames on kasutatud lõplike elementide meetodit ning Haari lainikute meetodit harilike ja osatuletistega diferentsiaalvõrrandite lahendamiseks ning on rakendatud kõrgproduktiivse ja paralleelarvutuse põhimõtteid. Teises peatükis vaadeldakse sandwich-tüüpi sümmeetrilise elastse-plastse ümarplaadi painet ühtlaselt jaotatud koormuse mõjul ning otsitakse miinimumkaaluga projekti ette antud maksimumläbipainde korral. Eeldatakse, et plaadi materjal vastab Misese voolavustingimusele. Optimaalse lahendi leidmiseks on kasutatud lõplike elementide meetodit. Kolmandas peatükis uuritakse eelmises peatükis püstitatud probleeme sümmeetriliste elastsete-plastsete astmeliste rõngasplaatide puhul. Optimaalse lahendi leidmiseks on kasutatud lõplike elementide meetodit ning Haari lainikute meetodit, viimast kasutatakse ka harilike diferentsiaalvõrrandite lahendamiseks. Neljandas peatükis on uuritud anisotroopsete rõngasplaatide painet ning on leitud miinimumkaaluga projektid Hilli ja Tsai-Wu voolavustingimuste puhul. Arvutamisel on kasutatud Haari lainikute meetodit. Väitekirjas on välja töötatud paralleelarvutuse metoodika, mis annab võimaluse numbriliselt lahendada elastsete-plastsete plaatide optimiseerimisprobleeme. Saadud lahendeid on võrreldud Ohashi ja Murakami, Turvey ning Upadrasta tulemustega. Töös saadud tulemused on heas kooskõlas teiste autorite töödega. Uurimistöö käigus ilmnes, et optimiseerimisülesannete puhul on mõistlikum kasutada lainikute meetodit, mille paralleeliseerimine hoiab rohkem kokku arvuti ressurssi.The current work is devoted to the theory of analysis and optimization of stepped circular and annular plates subject to smooth yield surfaces. Chapter 1 provides the brief historical review of the problem and of the finite element method. The Basic ideas of parallel computation, also of the multigrid method are presented herein, as well. In Chapter 2 a method for numerical investigation of axisymmetric plates subjected to the distributed transverse pressure loading was presented. The material of plates studied herein is assumed to be an ideal elastic plastic material obeying the non-linear yield condition of von Mises and the associated flow law. The strain hardening as well as geometrical non-linearity are neglected in the present investigation. Calculations carried out showed that the obtained results are in good correlation with those obtained by ABAQUS when solving the direct problem of determination of the stress strain state of the plate. In Chapter 3 an analytical-numerical study of annular plates operating in the range of elastic plastic deformations was undertaken. The material of plates was assumed to be an ideal elastic plastic material obeying the Mises yield condition. The author succeeded in the analytical derivation of optimality conditions for this highly non-linear problem. The obtained systems of equations were solved by existing computer codes. In Chapter 4 the methods of analysis and optimization of plates with piece wise constant thicknesses developed earlier for homogeneous isotropic materials are extended to plates made of anisotropic materials. The plastic yielding of the material is assumed to take place according to the criterion Tsai-Wu and the associated gradientality law. The traditional bending theory is used, non-linear effects are neglected in the current study

    Aspects of Ocean Circulation with Finite Element Modelling

    Get PDF
    This thesis deals with development and evaluation of the three dimensional, nonstationary ocean model FEOM:sub:0:/sub: (basic version of the Finite Element Ocean Model FEOM). This model is based on the Finite Element Method (FEM) which allows for the use of unstructured grids with variable resolution. The first part of the thesis introduces the governing equations, the mathematical formulation as well as the discretisation using FEM. After introducing the discrete form of the equations some details on the numerical implementation are given.The second part of the thesis contains applications of FEOM:sub:0:/sub: to different oceanographic tasks under idealised conditions. Comparisons to analytical results as well as to results of other numerical models in corresponding experiments are presented.The first application investigates the propagation of waves in a stratified ocean. The model shows nice correspondence to theoretically obtained wave properties as well as to results of the Modular Ocean Model (MOM). The second investigation considers the wind driven ocean circulation, especially the resulting vertical structure of the flow field. The influence of topography is examined, the results coincide with the predictions of linear theory. Finally an idealised overflow scenario is investigated. The flow of dense water on a slope poses a special problem for numerical ocean models. An international intercomparison study (DOME: Dynamics of Overflow Mixing and Entrainment) was conceived in order to gain insight into the capabilities of different numerical models in reproducing this process. FEOM:sub:0:/sub: is applied to the idealised DOME setup with and without interior density stratification. In case of a homogeneous interior a variability in the overflow rate of several days shows up, the model gives a reasonable path of the plume and reproduces the theoretically obtained dependence of the overflow transport on Coriolis parameter and density structure

    Polynomial and rational approximation for electronic structure calculations

    Get PDF
    Atomic-scale simulation of matter has become an important research tool in physics, chemistry, material science and biology as it allows for insights which neither theoretical nor experimental investigation can provide. The most accurate of these simulations are based on the laws of quantum mechanics, in which case the main computational bottleneck becomes the evaluation of functions f(H) of a sparse matrix H (the Hamiltonian). One way to evaluate such matrix functions is through polynomial and rational approximation, the theory of which is reviewed in Chapter 2 of this thesis. It is well known that rational functions can approximate the relevant functions with much lower degrees than polynomials, but they are more challenging to use in practice since they require fast algorithms for evaluating rational functions r(H) of a matrix argument H. Such an algorithm has recently been proposed in the form of the Pole Expansion and Selected Inversion (PEXSI) scheme, which evaluates r(H) by writing r(x) = P k ck x−zk in partial-fraction-decomposed form and then employing advanced sparse factorisation techniques to evaluate only a small subset of the entries of the resolvents (H − z) −1 . This scheme scales better than cubically in the matrix dimension, but it is not a linear scaling algorithm in general. We overcome this limitation in Chapter 3 by devising a modified, linear-scaling PEXSI algorithm which exploits that most of the fill-in entries in the triangular factorisations computed by the PEXSI algorithm are negligibly small. Finally, Chapter 4 presents a novel algorithm for computing electric conductivities which requires evaluating a bivariate matrix function f(H, H). We show that the Chebyshev coefficients ck1k2 of the relevant function f(x1, x2) concentrate along the diagonal k1 ∼ k2 and that this allows us to approximate f(x1, x2) much more efficiently than one would expect based on a straightforward tensor-product extension of the one-dimensional arguments

    High performance computing for multiphase fluid flows

    Get PDF
    Multiphase fluid flows are very common in engineering and science applications. Examples include air ow on water surface, metallurgical flow and blood flow in the body. In these flows, fluids are separated by a sharp interface and form different phases. The flow is characterized by the movement of this interface. Accurate modelling of the interface movement is a fundamental problem in the numerical simulation of these flows. Velocities for the movement are provided by the numerical solution of the Navier-Stokes (N-S) equations. These equations are discretized and converted into linear systems of equations. Research in the direction towards solving these systems efficiently has been the main focus of many researchers in the field of Computational Fluid Dynamics (CFD). A modified Volume of Fluid (VOF) method for modelling two phase flows is implemented using an analytic relation for its reconstruction step. The Finite Volume Method (FVM) is utilized, by incorporating a staggered grid, to discretize the two-dimensional (2-D) N-S equations. A preconditioned Krylov-Subspace iterative method, namely, the Bi-Conjugate Gradient Stabilized (Bi-CGSTAB) method is employed to solve the linear systems of equations. Solving the linear system usually consumes most of the simulation time for multiphase flow problems. Novel algorithms for the Incomplete LU Threshold (ILUT) preconditioner, forward and backward substitution and other matrix operations for penta-diagonal matrices are proposed here by adopting a diagonal sparse matrices format. The novel algorithm for ILUT reduces the computational complexity from O(n3 − n2) to O(n) in comparison to dense format. Further, it brings down the communication overhead, consequently facilitating parallelization. Parallel versions of these algorithms are developed using a new load balancing scheme. The MPI C++ communication library is utilized to develop the parallel version. The 2-D VOF code is applied to shape advection problems and results are found to be in good agreement with those available in literature. In the case of translation of a square box, it provides more accurate results than other VOF methods. The code for the VOF method and the parallel iterative solvers are integrated with 2-D N-S code in C++. The whole code is then implemented to simulate several two phase flow problems: dam breaking with and without an obstacle, rising of an air bubble and lid driven cavity flows. Speedup data from parallel programs implemented on these problems are generated
    corecore