25 research outputs found

    Automatic Performance Optimization of Stencil Codes

    Get PDF
    A widely used class of codes are stencil codes. Their general structure is very simple: data points in a large grid are repeatedly recomputed from neighboring values. This predefined neighborhood is the so-called stencil. Despite their very simple structure, stencil codes are hard to optimize since only few computations are performed while a comparatively large number of values have to be accessed, i.e., stencil codes usually have a very low computational intensity. Moreover, the set of optimizations and their parameters also depend on the hardware on which the code is executed. To cut a long story short, current production compilers are not able to fully optimize this class of codes and optimizing each application by hand is not practical. As a remedy, we propose a set of optimizations and describe how they can be applied automatically by a code generator for the domain of stencil codes. A combination of a space and time tiling is able to increase the data locality, which significantly reduces the memory-bandwidth requirements: a standard three-dimensional 7-point Jacobi stencil can be accelerated by a factor of 3. This optimization can target basically any stencil code, while others are more specialized. E.g., support for arbitrary linear data layout transformations is especially beneficial for colored kernels, such as a Red-Black Gauss-Seidel smoother. On the one hand, an optimized data layout for such kernels reduces the bandwidth requirements while, on the other hand, it simplifies an explicit vectorization. Other noticeable optimizations described in detail are redundancy elimination techniques to eliminate common subexpressions both in a sequence of statements and across loop boundaries, arithmetic simplifications and normalizations, and the vectorization mentioned previously. In combination, these optimizations are able to increase the performance not only of the model problem given by Poisson’s equation, but also of real-world applications: an optical flow simulation and the simulation of a non-isothermal and non-Newtonian fluid flow

    Multiphysics simulations: challenges and opportunities.

    Full text link

    Scalable parallel simulation of variably saturated flow

    Get PDF
    In this thesis we develop highly accurate simulation tools for variably saturated flow through porous media able to take advantage of the latest supercomputing resources. Hence, we aim for parallel scalability to very large compute resources of over 105 CPU cores. Our starting point is the parallel subsurface flow simulator ParFlow. This library is of widespread use in the hydrology community and known to have excellent parallel scalability up to 16k processes. We first investigate the numerical tools this library implements in order to perform the simulations it was designed for. ParFlow solves the governing equation for subsurface flow with a cell centered finite difference (FD) method. The code targets high performance computing (HPC) systems by means of distributed memory parallelism. We propose to reorganize ParFlow's mesh subsystem by using fast partitioning algorithms provided by the parallel adaptive mesh refinement (AMR) library p4est. We realize this in a minimally invasive manner by modifying selected parts of the code to reinterpret the existing mesh data structures. Furthermore, we evaluate the scaling performance of the modified version of ParFlow, demonstrating excellent weak and strong scaling up to 458k cores of the Juqueen supercomputer at the JĂĽlich Supercomputing Centre. The above mentioned results were obtained for uniform meshes and hence without explicitly exploiting the AMR capabilities of the p4est library. A natural extension of our work is to activate such functionality and make ParFlow a true AMR application. Enabling ParFlow to use AMR is challenging for several reasons: It may be based on assumptions on the parallel partition that cannot be maintained with AMR, it may use mesh-related metadata that is replicated on all CPUs, and it may assume uniform meshes in the construction of mathematical operators. Additionally, the use of locally refined meshes will certainly change the spectral properties of these operators. In this work, we develop an algorithmic approach to activate the usage of locally refined grids in ParFlow. AMR allows meshes where elements of different size neighbor each other. In this case, ParFlow may incur erroneous results when it attempts to communicate data between inter-element boundaries. We propose and discuss two solutions to this issue operating at two different levels: The first manipulates the indices of the degrees of freedom, While the second operates directly on the degrees of freedom. Both approaches aim to introduce minimal changes to the original ParFlow code. In an AMR framework, the FD method taken by ParFlow will require modifications to correctly deal with different size elements. Mixed finite elements (MFE) are on the other hand better suited for the usage of AMR. It is known that the cell centered FD method used in ParFlow might be reinterpreted as a MFE discretization using Raviart-Thomas elements of lower order. We conclude this thesis presenting a block preconditioner for saddle point problems arising from a MFE on locally refined meshes. We evaluate its robustness with respect to various classes of coefficients for uniform and locally refined meshes

    Preconditioned fast solvers for large linear systems with specific sparse and/or Toeplitz-like structures and applications

    Get PDF
    In this thesis, the design of the preconditioners we propose starts from applications instead of treating the problem in a completely general way. The reason is that not all types of linear systems can be addressed with the same tools. In this sense, the techniques for designing efficient iterative solvers depends mostly on properties inherited from the continuous problem, that has originated the discretized sequence of matrices. Classical examples are locality, isotropy in the PDE context, whose discrete counterparts are sparsity and matrices constant along the diagonals, respectively. Therefore, it is often important to take into account the properties of the originating continuous model for obtaining better performances and for providing an accurate convergence analysis. We consider linear systems that arise in the solution of both linear and nonlinear partial differential equation of both integer and fractional type. For the latter case, an introduction to both the theory and the numerical treatment is given. All the algorithms and the strategies presented in this thesis are developed having in mind their parallel implementation. In particular, we consider the processor-co-processor framework, in which the main part of the computation is performed on a Graphics Processing Unit (GPU) accelerator. In Part I we introduce our proposal for sparse approximate inverse preconditioners for either the solution of time-dependent Partial Differential Equations (PDEs), Chapter 3, and Fractional Differential Equations (FDEs), containing both classical and fractional terms, Chapter 5. More precisely, we propose a new technique for updating preconditioners for dealing with sequences of linear systems for PDEs and FDEs, that can be used also to compute matrix functions of large matrices via quadrature formula in Chapter 4 and for optimal control of FDEs in Chapter 6. At last, in Part II, we consider structured preconditioners for quasi-Toeplitz systems. The focus is towards the numerical treatment of discretized convection-diffusion equations in Chapter 7 and on the solution of FDEs with linear multistep formula in boundary value form in Chapter 8

    Preconditioned fast solvers for large linear systems with specific sparse and/or Toeplitz-like structures and applications

    Get PDF
    In this thesis, the design of the preconditioners we propose starts from applications instead of treating the problem in a completely general way. The reason is that not all types of linear systems can be addressed with the same tools. In this sense, the techniques for designing efficient iterative solvers depends mostly on properties inherited from the continuous problem, that has originated the discretized sequence of matrices. Classical examples are locality, isotropy in the PDE context, whose discrete counterparts are sparsity and matrices constant along the diagonals, respectively. Therefore, it is often important to take into account the properties of the originating continuous model for obtaining better performances and for providing an accurate convergence analysis. We consider linear systems that arise in the solution of both linear and nonlinear partial differential equation of both integer and fractional type. For the latter case, an introduction to both the theory and the numerical treatment is given. All the algorithms and the strategies presented in this thesis are developed having in mind their parallel implementation. In particular, we consider the processor-co-processor framework, in which the main part of the computation is performed on a Graphics Processing Unit (GPU) accelerator. In Part I we introduce our proposal for sparse approximate inverse preconditioners for either the solution of time-dependent Partial Differential Equations (PDEs), Chapter 3, and Fractional Differential Equations (FDEs), containing both classical and fractional terms, Chapter 5. More precisely, we propose a new technique for updating preconditioners for dealing with sequences of linear systems for PDEs and FDEs, that can be used also to compute matrix functions of large matrices via quadrature formula in Chapter 4 and for optimal control of FDEs in Chapter 6. At last, in Part II, we consider structured preconditioners for quasi-Toeplitz systems. The focus is towards the numerical treatment of discretized convection-diffusion equations in Chapter 7 and on the solution of FDEs with linear multistep formula in boundary value form in Chapter 8

    Modelling of natural attenuation processes in groundwater using adaptive and parallel numerical methods.

    Get PDF
    Biodegradation is an important process contributing to the natural attenuation (NA) of organic contaminants in groundwater. A numerical model was created to describe anaerobic phenol biodegradation data from an aquifer-derived laboratory scale microcosm. The dynamic behaviour of the system was simulated by considering a two-step syntrophic biodegradation model with fermentation and respiration steps, both simulated kinetically, and with hydrogen and acetate as intermediate species, and additionally, other geochemical reactions including aqueous speciation, surface complexation, mineral dissolution and precipitation. The model suggested microbial competition between respiration processes using different electron acceptors was important. In contrast, a partial equilibrium approach, considering only thermodynamics, and not kinetics, for respiration, did not explain the data. The laboratory scale biodegradation model was transferred to a field scale reactive transport model of the phenol plume at Four Ashes, UK. The effects of acclimatisation, toxicity, and bioavailability on microbial kinetics were considered. The simulations suggest that plume core processes are much more important than previously thought, possibly with a greater impact than plume fringe processes. The field scale model was computationally demanding due to the biogeochemical complexity. Two strategies for dealing with high computational demands are (i) parallel processing, where the workload is shared between multiple processors, and (ii) locally adaptive remeshing, where a refined area of the grid tracks moving plume fringes through the domain. A new code was developed using the partial differential equation software toolbox, UG, and tested against other biodegradation simulators. The relative efficiency of parallel, adaptive methods for multispecies biodegradation simulations was measured. It appears, in general, that relatively complex models are required for the realistic, quantitative assessment of NA at field scale, and that parallel, adaptive numerical methods provide appropriate efficiency benefits for such simulations

    Mathematical models and numerical simulation of mechanochemical pattern formation in biological tissues

    Get PDF
    Mechanical and chemical pattern formation in the development of biological tissue is a fundamental and fascinating process of self-complexation and self-organization. Yet, the understanding of the underlying mechanisms and their mathematical description still lacks in many interesting cases such as embryogenesis. In this thesis, we combine recent experimental and theoretical insights and numerically investigate the capacity of mechano-chemical processes to spontaneously generate patterns in biological tissue. Firstly, we develop and numerically analyze a prototypical system of partial differential equations (PDEs) leading to mechanochemical pattern formation in evolving tissues. Based on recent experimental data, we propose a novel coupling by tensor invariants describing stretch, stress or strain of tissue mechanics on the production of signaling molecules (morphogens). In turn, morphogen leads to piecewise-defined active deformations of individual biological cells. The presented approach is flexible and applied to two prominent examples of evolving tissue: We show how these simple interaction rules (“feedback loops”) lead to spontaneous, robust mechanochemical patterns in the applications to embryogenesis and to symmetry breaking in the sweet water polyp Hydra. Our results reveal that the full 3D model geometry is essential to obtain realistic results such as gastrulation events. Also, we highlight predictive numerical experiments that assess the sensitivity of biological tissue with regard to mechanical stimuli, namely to micropipette aspiration. These numerical experiments allow for a cross-validation with experimental observations. Besides, we apply our modeling approach to growing tips in colonial hydroids and investigate the role of rotational and shearing active deformations by comparison to experimental data. Secondly, we develop an efficient, numerical method to reliably solve these strongly coupled, prototypical systems of PDEs that model mechanochemical long-term problems. We employ state-of-the-art finite element methods, parallel geometric multigrid solvers and present a simple, local mesh refinement strategy to obtain an efficient solution approach. Parallel solvers are essential to deal with the huge problem size in 3D and were modified to keep track of biological cells. Further, we propose a stabilization of the structural equation to deal with the strongly coupled system of equations and the challenges of the different timescales of growth (days) and nonlinear elasticity (seconds). Also, this addresses the instabilities which result form the description of homogeneous Neumann values on the entire boundary that is necessary since the locations of patterns is a priori unknown
    corecore