442 research outputs found

    Optimization Techniques for Stencil Data Parallel Programs: Methodologies and Applications

    Get PDF
    The optimization of data parallel programs is a challenging open problem. We analyzed in detail the optimization techniques for stencil computations, which are a subset of data parallel computations. Drawing from previous research, we developed a structured model to describe the program transformations. We used this model to compare the different optimizations presented in literature and study the interaction between them

    A multi-GPU shallow-water simulation with transport of contaminants

    Get PDF
    [Abstract] This work presents cost-effective multi-graphics processing unit (GPU) parallel implementations of a finite-volume numerical scheme for solving pollutant transport problems in bidimensional domains. The fluid is modeled by 2D shallow-water equations, whereas the transport of pollutant is modeled by a transport equation. The 2D domain is discretized using a first-order Roe finite-volume scheme. Specifically, this paper presents multi-GPU implementations of both a solution that exploits recomputation on the GPU and an optimized solution that is based on a ghost cell decoupling approach. Our multi-GPU implementations have been optimized using nonblocking communications, overlapping communications and computations and the application of ghost cell expansion to minimize communications. The fastest one reached a speedup of 78 × using four GPUs on an InfiniBand network with respect to a parallel execution on a multicore CPU with six cores and two-way hyperthreading per core. Such performance, measured using a realistic problem, enabled the calculation of solutions not only in real time but also in orders of magnitude faster than the simulated time.Copyright © 2012 John Wiley & Sons, Ltd

    JAX-DIPS: Neural bootstrapping of finite discretization methods and application to elliptic problems with discontinuities

    Full text link
    We present a scalable strategy for development of mesh-free hybrid neuro-symbolic partial differential equation solvers based on existing mesh-based numerical discretization methods. Particularly, this strategy can be used to efficiently train neural network surrogate models of partial differential equations by (i) leveraging the accuracy and convergence properties of advanced numerical methods, solvers, and preconditioners, as well as (ii) better scalability to higher order PDEs by strictly limiting optimization to first order automatic differentiation. The presented neural bootstrapping method (hereby dubbed NBM) is based on evaluation of the finite discretization residuals of the PDE system obtained on implicit Cartesian cells centered on a set of random collocation points with respect to trainable parameters of the neural network. Importantly, the conservation laws and symmetries present in the bootstrapped finite discretization equations inform the neural network about solution regularities within local neighborhoods of training points. We apply NBM to the important class of elliptic problems with jump conditions across irregular interfaces in three spatial dimensions. We show the method is convergent such that model accuracy improves by increasing number of collocation points in the domain and predonditioning the residuals. We show NBM is competitive in terms of memory and training speed with other PINN-type frameworks. The algorithms presented here are implemented using \texttt{JAX} in a software package named \texttt{JAX-DIPS} (https://github.com/JAX-DIPS/JAX-DIPS), standing for differentiable interfacial PDE solver. We open sourced \texttt{JAX-DIPS} to facilitate research into use of differentiable algorithms for developing hybrid PDE solvers

    Multicore-aware parallel temporal blocking of stencil codes for shared and distributed memory

    Full text link
    New algorithms and optimization techniques are needed to balance the accelerating trend towards bandwidth-starved multicore chips. It is well known that the performance of stencil codes can be improved by temporal blocking, lessening the pressure on the memory interface. We introduce a new pipelined approach that makes explicit use of shared caches in multicore environments and minimizes synchronization and boundary overhead. For clusters of shared-memory nodes we demonstrate how temporal blocking can be employed successfully in a hybrid shared/distributed-memory environment.Comment: 9 pages, 6 figure

    Proceedings for the ICASE Workshop on Heterogeneous Boundary Conditions

    Get PDF
    Domain Decomposition is a complex problem with many interesting aspects. The choice of decomposition can be made based on many different criteria, and the choice of interface of internal boundary conditions are numerous. The various regions under study may have different dynamical balances, indicating that different physical processes are dominating the flow in these regions. This conference was called in recognition of the need to more clearly define the nature of these complex problems. This proceedings is a collection of the presentations and the discussion groups

    Facilitating the development of stencil applications using the Heterogeneous Programming Library

    Get PDF
    [Abstract] Stencil computations are very common in scientific codes. Heterogeneous systems achieve good results solving these problems, but their programming is complex because of the ghost regions required in multi-device implementations and the difficulty to properly exploit their hardware. The Heterogeneous Programming Library (HPL) is a recent framework that improves the programmability of heterogeneous devices. This paper describes two extensions of HPL focused on stencil computations. The first one allows to automatically update the ghost regions they involve. The second one automates the implementation of the computational kernels of these algorithms. In our evaluation, the first mechanism reduces on average the number of lines of code and the Halstead programming effort of the host code of comparable HPL baselines by 34% and 64.2%, respectively, while the second contribution reduces these metrics by 72% and 79% in the computational kernels, respectively. Also, the first technique has negligible performance overheads, while the second one matches the performance of manually developed kernels. As an added benefit, the facilitation of the development of these codes thanks to these techniques helps programmers experiment with optimizations suited for this applications such as the ghost cell expansion technique, which provides speedups of up to 13% in our experiments.Ministerio de Economía y Competitividad de España; TIN2013-42148-PMinisterio de Economía y Competitividad de España; TIN2016-75845-PXunta de Galicia; ED431G/0

    Multiscale Modeling with Differential Equations

    Full text link
    Many physical systems are governed by ordinary or partial differential equations (see, for example, Chapter ''Differential equations'', ''System of Differential Equations''). Typically the solution of such systems are functions of time or of a single space variable (in the case of ODE's), or they depend on multidimensional space coordinates or on space and time (in the case of PDE's). In some cases, the solutions may depend on several time or space scales. An example governed by ODE's is the damped harmonic oscillator, in the two extreme cases of very small or very large damping, the cardiovascular system, where the thickness of the arteries and veins varies from centimeters to microns, shallow water equations, which are valid when water depth is small compared to typical wavelength of surface waves, and sorption kinetics, in which the range of interaction of a surfactant with an air bubble is much smaller than the size of the bubble itself. In all such cases a detailed simulation of the models which resolves all space or time scales is often inefficient or intractable, and usually even unnecessary to provide a reasonable description of the behavior of the system. In the Chapter ''Multiscale modeling with differential equations'' we present examples of systems described by ODE's and PDE's which are intrinsically multiscale, and illustrate how suitable modeling provide an effective way to capture the essential behavior of the solutions of such systems without resolving the small scales.Comment: 40 pages, 20 figures, to be published as a book chapter in a SIAM boo
    • …
    corecore