731 research outputs found

    Parallel Low-Storage Runge-Kutta Solvers for ODE Systems with Limited Access Distance

    Get PDF
    We consider the solution of initial value problems (IVPs) of large systems of ordinary differential equations (ODEs) for which memory space requirements determine the choice of the integration method. In particular, we discuss the space-efficient sequential and parallel implementation of embedded Runge—Kutta (RK) methods. Our focus is on the exploitation of a special structure of commonly appearing ODE systems, referred to as ‘‘limited access distance,’’ to improve scalability and memory usage. Such systems may arise, for example, from the semi-discretization of partial differential equations (PDEs). The storage space required by classical RK methods is directly proportional to the dimension n of the ODE system and the number of stages s of the method. We propose an implementation strategy based on a pipelined processing of the stages of the RK method and show how the memory usage of this computation scheme can be reduced to less than three storage registers by an overlapping of vectors without compromising the choice of method coefficients or the potential for efficient stepsize control. We analyze and compare the scalability of different parallel implementation strategies in detailed runtime experiments on different modern parallel architectures. </jats:p

    Automated Translation and Accelerated Solving of Differential Equations on Multiple GPU Platforms

    Full text link
    We demonstrate a high-performance vendor-agnostic method for massively parallel solving of ensembles of ordinary differential equations (ODEs) and stochastic differential equations (SDEs) on GPUs. The method is integrated with a widely used differential equation solver library in a high-level language (Julia's DifferentialEquations.jl) and enables GPU acceleration without requiring code changes by the user. Our approach achieves state-of-the-art performance compared to hand-optimized CUDA-C++ kernels, while performing 20−100×20-100\times faster than the vectorized-map (\texttt{vmap}) approach implemented in JAX and PyTorch. Performance evaluation on NVIDIA, AMD, Intel, and Apple GPUs demonstrates performance portability and vendor-agnosticism. We show composability with MPI to enable distributed multi-GPU workflows. The implemented solvers are fully featured, supporting event handling, automatic differentiation, and incorporating of datasets via the GPU's texture memory, allowing scientists to take advantage of GPU acceleration on all major current architectures without changing their model code and without loss of performance.Comment: 11 figure

    Lecture 12: Recent Advances in Time Integration Methods and How They Can Enable Exascale Simulations

    Get PDF
    To prepare for exascale systems, scientific simulations are growing in physical realism and thus complexity. This increase often results in additional and changing time scales. Time integration methods are critical to efficient solution of these multiphysics systems. Yet, many large-scale applications have not fully embraced modern time integration methods nor efficient software implementations. Hence, achieving temporal accuracy with new and complex simulations has proved challenging. We will overview recent advances in time integration methods, including additive IMEX methods, multirate methods, and parallel-in-time approaches, expected to help realize the potential of exascale systems on multiphysics simulations. Efficient execution of these methods relies, in turn, on efficient algebraic solvers, and we will discuss the relationships between integrators and solvers. In addition, an effective time integration approach is not complete without efficient software, and we will discuss effective software design approaches for time integrators and their uses in application codes. Lastly, examples demonstrating some of these new methods and their implementations will be presented. This work was performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under Contract DE-AC52-07NA27344. LLNL-ABS- 819501

    Efficient implicit spectral/hp element DG techniques for compressible flows

    Get PDF
    In the simulation of stiff problems, such as fluid flows at high Reynolds numbers, the efficiency of explicit time integration is significantly limited by the need to use very small time steps. To alleviate this limitation and to accelerate compressible flow simulations based on high-order spectral/hphp element methods, an implicit time integration method is developed using singly diagonally implicit Runge-Kutta temporal discretization schemes combined with a Jacobian-free Newton Krylov (JFNK) method. This thesis studies several topics influencing the efficiency, accuracy and robustness of the solver. Firstly, an efficient partially matrix-free block relaxed Jacobi (BRJ) preconditioner is proposed, in which the Jacobian matrix and preconditioning matrices are properly approximated based on studies of their influences on convergence. The preconditioner only forms and stores the diagonal part of the Jacobian matrix while the off-diagonal operators are calculated on the fly. Used together with techniques such as using single precision data, the BRJ can largely reduce the memory consumption when compared with matrix-based ones like incomplete LU factorization preconditioners (ILU). To further accelerate the solver, influences of different parts of the flux Jacobian on the preconditioning effects are studied and terms with minor influences are neglected. This reduces the computational cost of the BRJ preconditioner by about 3 times while maintaining similar preconditioning effects. Secondly, adaptive strategies for a suitable choice of some free parameters are designed to maintain temporal accuracy and relatively high efficiency. The several free parameters in the implicit solver have significant influences on the accuracy, efficiency and stability. Therefore, designing proper strategies in choosing them is essential for developing a robust general purpose solver. Based on the idea of constructing proper relations between the temporal, spatial and iterative errors, adaptive strategies are designed for determining the time step and Newton tolerance. These parameters maintain temporal accuracy of the solver in the sense that further decreasing temporal and iterative errors will not obviously improve the efficiency. Meanwhile, they maintain relatively efficient by avoiding excessively small time step and Newton tolerance. The strategies are tested in different types of cases to illustrate their performance and generality. Finally, the implicit solver is studied in high-fidelity simulations of turbulent flows based on a hierarchical implementation in the open-source spectral/hphp element framework Nektar++. The solver is applied to large-eddy simulations of Taylor-Green vortex flow, turbulent channel flow and flow over a circular cylinder cases. The efficiency of the solver and the prediction accuracy of these problems are studied. The results show that the solver yields good predictions in turbulence simulations whilst keeping good stability and high efficiency.Open Acces

    The numerical solution of neural field models posed on realistic cortical domains

    Get PDF
    The mathematical modelling of neural activity is a hugely complex and prominent area of exploration that has been the focus of many researchers since the mid 1900s. Although many advancements and scientific breakthroughs have been made, there is still a great deal that is not yet understood about the brain. There have been a considerable amount of studies in mathematical neuroscience that consider the brain as a simple one-dimensional or two-dimensional domain; however, this is not biologically realistic and is primarily selected as the domain of choice to aid analytical progress. The primary aim of this thesis is to develop and provide a novel suite of codes to facilitate the computationally efficient numerical solution of large-scale delay differential equations, and utilise this to explore both neural mass and neural field models with space-dependent delays. Through this, we seek to widen the scope of models of neural activity by posing them on realistic cortical domains and incorporating real brain data to describe non-local cortical connections. The suite is validated using a selection of examples that compare numerical and analytical results, along with recreating existing results from the literature. The relationship between structural connectivity and functional connectivity is then analysed as we use an eigenmode fitting approach to inform the desired stability regimes of a selection of neural mass models with delays. Here, we explore a next-generation neural mass model developed by Coombes and Byrne [36], and compare results to the more traditional Wilson-Cowan formulation [180, 181]. Finally, we examine a variety of solutions to three different neural field models that incorporate real structural connectivity, path length, and geometric surface data, using our NFESOLVE library to efficiently compute the numerical solutions. We demonstrate how the field version of the next-generation model can yield intricate and detailed solutions which push us closer to recreating observed brain dynamics

    The numerical solution of neural field models posed on realistic cortical domains

    Get PDF
    The mathematical modelling of neural activity is a hugely complex and prominent area of exploration that has been the focus of many researchers since the mid 1900s. Although many advancements and scientific breakthroughs have been made, there is still a great deal that is not yet understood about the brain. There have been a considerable amount of studies in mathematical neuroscience that consider the brain as a simple one-dimensional or two-dimensional domain; however, this is not biologically realistic and is primarily selected as the domain of choice to aid analytical progress. The primary aim of this thesis is to develop and provide a novel suite of codes to facilitate the computationally efficient numerical solution of large-scale delay differential equations, and utilise this to explore both neural mass and neural field models with space-dependent delays. Through this, we seek to widen the scope of models of neural activity by posing them on realistic cortical domains and incorporating real brain data to describe non-local cortical connections. The suite is validated using a selection of examples that compare numerical and analytical results, along with recreating existing results from the literature. The relationship between structural connectivity and functional connectivity is then analysed as we use an eigenmode fitting approach to inform the desired stability regimes of a selection of neural mass models with delays. Here, we explore a next-generation neural mass model developed by Coombes and Byrne [36], and compare results to the more traditional Wilson-Cowan formulation [180, 181]. Finally, we examine a variety of solutions to three different neural field models that incorporate real structural connectivity, path length, and geometric surface data, using our NFESOLVE library to efficiently compute the numerical solutions. We demonstrate how the field version of the next-generation model can yield intricate and detailed solutions which push us closer to recreating observed brain dynamics
    • …
    corecore