71,471 research outputs found

    A fully-coupled discontinuous Galerkin method for two-phase flow in porous media with discontinuous capillary pressure

    Full text link
    In this paper we formulate and test numerically a fully-coupled discontinuous Galerkin (DG) method for incompressible two-phase flow with discontinuous capillary pressure. The spatial discretization uses the symmetric interior penalty DG formulation with weighted averages and is based on a wetting-phase potential / capillary potential formulation of the two-phase flow system. After discretizing in time with diagonally implicit Runge-Kutta schemes the resulting systems of nonlinear algebraic equations are solved with Newton's method and the arising systems of linear equations are solved efficiently and in parallel with an algebraic multigrid method. The new scheme is investigated for various test problems from the literature and is also compared to a cell-centered finite volume scheme in terms of accuracy and time to solution. We find that the method is accurate, robust and efficient. In particular no post-processing of the DG velocity field is necessary in contrast to results reported by several authors for decoupled schemes. Moreover, the solver scales well in parallel and three-dimensional problems with up to nearly 100 million degrees of freedom per time step have been computed on 1000 processors

    Multigrid preconditioners for the mixed finite element dynamical core of the LFRic atmospheric model

    Get PDF
    Due to the wide separation of time scales in geophysical fluid dynamics, semi-implicit time integrators are commonly used in operational atmospheric forecast models. They guarantee the stable treatment of fast (acoustic and gravity) waves, while not suffering from severe restrictions on the timestep size. To propagate the state of the atmosphere forward in time, a non-linear equation for the prognostic variables has to be solved at every timestep. Since the nonlinearity is typically weak, this is done with a small number of Newton- or Picard- iterations, which in turn require the efficient solution of a large system on linear equations with O(106 − 109) unknowns. This linear solve is often the computationally most costly part of the model. In this paper an efficient linear solver for the LFRic next-generation model, currently developed by the Met Office, is described. The model uses an advanced mimetic finite element discretisation which makes the construction of efficient solvers challenging compared to models using standard finite-difference and finite-volume methods. The linear solver hinges on a bespoke multigrid preconditioner of the Schur-complement system for the pressure correction. By comparing to Krylov-subspace methods, the superior performance and robustness of the multigrid algorithm is demonstrated for standard test cases and realistic model setups. In production mode, the model will have to run in parallel on 100,000s of processing elements. As confirmed by numerical experiments, one particular advantage of the multigrid solver is its excellent parallel scalability due to avoiding expensive global reduction operations

    Microwave Tomography Using Stochastic Optimization And High Performance Computing

    Get PDF
    This thesis discusses the application of parallel computing in microwave tomography for detection and imaging of dielectric objects. The main focus is on microwave tomography with the use of a parallelized Finite Difference Time Domain (FDTD) forward solver in conjunction with non-linear stochastic optimization based inverse solvers. Because such solvers require very heavy computation, their investigation has been limited in favour of deterministic inverse solvers that make use of assumptions and approximations of the imaging target. Without the use of linearization assumptions, a non-linear stochastic microwave tomography system is able to resolve targets of arbitrary permittivity contrast profiles while avoiding convergence to local minima of the microwave tomography optimization space. This work is focused on ameliorating this computational load with the use of heavy parallelization. The presented microwave tomography system is capable of modelling complex, heterogeneous, and dispersive media using the Debye model. A detailed explanation of the dispersive FDTD is presented herein. The system uses scattered field data due to multiple excitation angles, frequencies, and observation angles in order to improve target resolution, reduce the ill-posedness of the microwave tomography inverse problem, and improve the accuracy of the complex permittivity profile of the imaging target. The FDTD forward solver is parallelized with the use of the Common Unified Device Architecture (CUDA) programming model developed by NVIDIA corporation. In the forward solver, the time stepping of the fields are computed on a Graphics Processing Unit (GPU). In addition the inverse solver makes use of the Message Passing Interface (MPI) system to distribute computation across multiple work stations. The FDTD method was chosen due to its ease of parallelization using GPU computing, in addition to its ability to simulate wideband excitation signals during a single forward simulation. We investigated the use of distributed Particle Swarm Optimization (PSO) and Differential Evolution (DE) methods in the inverse solver for this microwave tomography system. In these optimization algorithms, candidate solutions are farmed out to separate workstations to be evaluated. As fitness evaluations are returned asynchronously, the optimization algorithm updates the population of candidate solutions and gives new candidate solutions to be evaluated to open workstations. In this manner, we used a total of eight graphics processing units during optimization with minimal downtime. Presented in this thesis is a microwave tomography algorithm that does not rely on linearization assumptions, capable of imaging a target in a reasonable amount of time for clinical applications. The proposed algorithm was tested using numerical phantoms that with material parameters similar to what one would find in normal or malignant human tissue

    Principles for problem aggregation and assignment in medium scale multiprocessors

    Get PDF
    One of the most important issues in parallel processing is the mapping of workload to processors. This paper considers a large class of problems having a high degree of potential fine grained parallelism, and execution requirements that are either not predictable, or are too costly to predict. The main issues in mapping such a problem onto medium scale multiprocessors are those of aggregation and assignment. We study a method of parameterized aggregation that makes few assumptions about the workload. The mapping of aggregate units of work onto processors is uniform, and exploits locality of workload intensity to balance the unknown workload. In general, a finer aggregate granularity leads to a better balance at the price of increased communication/synchronization costs; the aggregation parameters can be adjusted to find a reasonable granularity. The effectiveness of this scheme is demonstrated on three model problems: an adaptive one-dimensional fluid dynamics problem with message passing, a sparse triangular linear system solver on both a shared memory and a message-passing machine, and a two-dimensional time-driven battlefield simulation employing message passing. Using the model problems, the tradeoffs are studied between balanced workload and the communication/synchronization costs. Finally, an analytical model is used to explain why the method balances workload and minimizes the variance in system behavior

    Sparse matrix based power flow solver for real-time simulation of power system

    Get PDF
    Analyzing a massive number of Power Flow (PF) equations even on almost identical or similar network topology is a highly time-consuming process for large-scale power systems. The major computation time is hoarded by the iterative linear solving process to solve nonlinear equations until convergence is achieved. This is a paramount concern for any PF analysis methods. This thesis presents a sparse matrix-based power flow solver that is fast enough to be implemented in the real-time analysis of largescale power systems. It uses KLU, a sparse matrix solver, for PF analysis. It also implements parallel processing of CPU and GPU which enables the simultaneous computation of multiple blocks in the algorithm leading to faster execution. It runs 1000 times and 200 times faster than newton raphson method for DC and AC power system respectively. On average, it is around 10 times faster than MATPOWER for both AC and DC power system

    Simultaneous analysis of large INTEGRAL/SPI datasets: optimizing the computation of the solution and its variance using sparse matrix algorithms

    Get PDF
    International audienceNowadays, analyzing and reducing the ever larger astronomical datasets is becoming a crucial challenge, especially for long cumulated observation times. The INTEGRAL/SPI X/Îł-ray spectrometer is an instrument for which it is essential to process many exposures at the same time in order to increase the low signal-to-noise ratio of the weakest sources. In this context, the conventional methods for data reduction are inefficient and sometimes not feasible at all. Processing several years of data simultaneously requires computing not only the solution of a large system of equations, but also the associated uncertainties. We aim at reducing the computation time and the memory usage. Since the SPI transfer function is sparse, we have used some popular methods for the solution of large sparse linear systems; we briefly review these methods. We use the Multifrontal Massively Parallel Solver (MUMPS) to compute the solution of the system of equations. We also need to compute the variance of the solution, which amounts to computing selected entries of the inverse of the sparse matrix corresponding to our linear system. This can be achieved through one of the latest features of the MUMPS software that has been partly motivated by this work. In this paper we provide a brief presentation of this feature and evaluate its effectiveness on astrophysical problems requiring the processing of large datasets simultaneously, such as the study of the entire emission of the Galaxy. We used these algorithms to solve the large sparse systems arising from SPI data processing and to obtain both their solutions and the associated variances. In conclusion, thanks to these newly developed tools, processing large datasets arising from SPI is now feasible with both a reasonable execution time and a low memory usage

    Sparse matrix based power flow solver for real-time simulation of power system

    Get PDF
    Analyzing a massive number of Power Flow (PF) equations even on almost identical or similar network topology is a highly time-consuming process for large-scale power systems. The major computation time is hoarded by the iterative linear solving process to solve nonlinear equations until convergence is achieved. This is a paramount concern for any PF analysis methods. This thesis presents a sparse matrix-based power flow solver that is fast enough to be implemented in the real-time analysis of largescale power systems. It uses KLU, a sparse matrix solver, for PF analysis. It also implements parallel processing of CPU and GPU which enables the simultaneous computation of multiple blocks in the algorithm leading to faster execution. It runs 1000 times and 200 times faster than newton raphson method for DC and AC power system respectively. On average, it is around 10 times faster than MATPOWER for both AC and DC power system

    A performance portable, fully implicit Landau collision operator with batched linear solvers

    Full text link
    Modern accelerators use hierarchically parallel programming models that enable massive multithreading within a processing element (PE), with multiple PEs per device driven by traditional processes. Batching is a technique for exposing PE-level parallelism in algorithms that previously ran on entire processes or multiple threads within a single MPI process. Kinetic discretizations of magnetized plasmas, for example, advance the Vlasov-Maxwell system, which is then followed by a fully implicit time advance of a collision operator. These collision advances are independent at each spatial point and are well suited to batch processing. This paper builds on previous work on a high-performance, fully nonlinear Landau collision operator by batching the linear solver, as well as batching the spatial point problems and adding new support for multiple grids for highly multiscale, multi-species problems. An anisotropic relaxation verification test that agrees well with previous published results and analytical solutions is presented. The performance of the NVIDIA A100 and AMD MI250X nodes is evaluated, with a detailed hardware utilization analysis on the A100. For portability, the entire Landau operator time advance is implemented in Kokkos and is available in the PETSc numerical library

    High Performance Matrix-Fee Method for Large-Scale Finite Element Analysis on Graphics Processing Units

    Get PDF
    This thesis presents a high performance computing (HPC) algorithm on graphics processing units (GPU) for large-scale numerical simulations. In particular, the research focuses on the development of an efficient matrix-free conjugate gradient solver for the acceleration and scalability of the steady-state heat transfer finite element analysis (FEA) on a three-dimension uniform structured hexahedral mesh using a voxel-based technique. One of the greatest challenges in large-scale FEA is the availability of computer memory for solving the linear system of equations. Like in large-scale heat transfer simulations, where the size of the system matrix assembly becomes very large, the FEA solver requires huge amounts of computational time and memory that very often exceed the actual memory limits of the available hardware resources. To overcome this problem a matrix-free conjugate gradient (MFCG) method is designed and implemented to finite element computations which avoids the global matrix assembly. The main difference of the MFCG to the classical conjugate gradient (CG) solver lies on the implementation of the matrix-vector product operation. Matrix-vector operation found to be the most expensive process consuming more than 80% out of the total computations for the numerical solution and thus a matrix-free matrix-vector (MFMV) approach becomes beneficial for saving memory and computational time throughout the execution of the FEA. In summary, the MFMV algorithm consists of three nested loops: (a) a loop over the mesh elements of the domain, (b) a loop on the element nodal values to perform the element matrix-vector operations and (c) the summation and transformation of the nodal values into their correct positions in the global index. A performance analysis on a serial and a parallel implementation on a GPU shows that the MFCG solver outperforms the classical CG consuming significantly lower amounts of memory allowing for much larger size simulations. The outcome of this study suggests that the MFCG can also speed-up and scale the execution of large-scale finite element simulations
