442 research outputs found
Optimization Techniques for Stencil Data Parallel Programs: Methodologies and Applications
The optimization of data parallel programs is a challenging open problem. We analyzed in detail the optimization techniques for stencil computations, which are a subset of data parallel computations.
Drawing from previous research, we developed a structured model to describe the program transformations. We used this model to compare the different optimizations presented in
literature and study the interaction between them
A multi-GPU shallow-water simulation with transport of contaminants
[Abstract] This work presents cost-effective multi-graphics processing unit (GPU) parallel implementations of a finite-volume numerical scheme for solving pollutant transport problems in bidimensional domains. The fluid is modeled by 2D shallow-water equations, whereas the transport of pollutant is modeled by a transport equation. The 2D domain is discretized using a first-order Roe finite-volume scheme. Specifically, this paper presents multi-GPU implementations of both a solution that exploits recomputation on the GPU and an optimized solution that is based on a ghost cell decoupling approach. Our multi-GPU implementations have been optimized using nonblocking communications, overlapping communications and computations and the application of ghost cell expansion to minimize communications. The fastest one reached a speedup of 78 × using four GPUs on an InfiniBand network with respect to a parallel execution on a multicore CPU with six cores and two-way hyperthreading per core. Such performance, measured using a realistic problem, enabled the calculation of solutions not only in real time but also in orders of magnitude faster than the simulated time.Copyright © 2012 John Wiley & Sons, Ltd
JAX-DIPS: Neural bootstrapping of finite discretization methods and application to elliptic problems with discontinuities
We present a scalable strategy for development of mesh-free hybrid
neuro-symbolic partial differential equation solvers based on existing
mesh-based numerical discretization methods. Particularly, this strategy can be
used to efficiently train neural network surrogate models of partial
differential equations by (i) leveraging the accuracy and convergence
properties of advanced numerical methods, solvers, and preconditioners, as well
as (ii) better scalability to higher order PDEs by strictly limiting
optimization to first order automatic differentiation. The presented neural
bootstrapping method (hereby dubbed NBM) is based on evaluation of the finite
discretization residuals of the PDE system obtained on implicit Cartesian cells
centered on a set of random collocation points with respect to trainable
parameters of the neural network. Importantly, the conservation laws and
symmetries present in the bootstrapped finite discretization equations inform
the neural network about solution regularities within local neighborhoods of
training points. We apply NBM to the important class of elliptic problems with
jump conditions across irregular interfaces in three spatial dimensions. We
show the method is convergent such that model accuracy improves by increasing
number of collocation points in the domain and predonditioning the residuals.
We show NBM is competitive in terms of memory and training speed with other
PINN-type frameworks. The algorithms presented here are implemented using
\texttt{JAX} in a software package named \texttt{JAX-DIPS}
(https://github.com/JAX-DIPS/JAX-DIPS), standing for differentiable interfacial
PDE solver. We open sourced \texttt{JAX-DIPS} to facilitate research into use
of differentiable algorithms for developing hybrid PDE solvers
Multicore-aware parallel temporal blocking of stencil codes for shared and distributed memory
New algorithms and optimization techniques are needed to balance the
accelerating trend towards bandwidth-starved multicore chips. It is well known
that the performance of stencil codes can be improved by temporal blocking,
lessening the pressure on the memory interface. We introduce a new pipelined
approach that makes explicit use of shared caches in multicore environments and
minimizes synchronization and boundary overhead. For clusters of shared-memory
nodes we demonstrate how temporal blocking can be employed successfully in a
hybrid shared/distributed-memory environment.Comment: 9 pages, 6 figure
Proceedings for the ICASE Workshop on Heterogeneous Boundary Conditions
Domain Decomposition is a complex problem with many interesting aspects. The choice of decomposition can be made based on many different criteria, and the choice of interface of internal boundary conditions are numerous. The various regions under study may have different dynamical balances, indicating that different physical processes are dominating the flow in these regions. This conference was called in recognition of the need to more clearly define the nature of these complex problems. This proceedings is a collection of the presentations and the discussion groups
Facilitating the development of stencil applications using the Heterogeneous Programming Library
[Abstract] Stencil computations are very common in scientific codes. Heterogeneous systems achieve good results solving these problems, but their programming is complex because of the ghost regions required in multi-device implementations and the difficulty to properly exploit their hardware. The Heterogeneous Programming Library (HPL) is a recent framework that improves the programmability of heterogeneous devices. This paper describes two extensions of HPL focused on stencil computations. The first one allows to automatically update the ghost regions they involve. The second one automates the implementation of the computational kernels of these algorithms. In our evaluation, the first mechanism reduces on average the number of lines of code and the Halstead programming effort of the host code of comparable HPL baselines by 34% and 64.2%, respectively, while the second contribution reduces these metrics by 72% and 79% in the computational kernels, respectively. Also, the first technique has negligible performance overheads, while the second one matches the performance of manually developed kernels. As an added benefit, the facilitation of the development of these codes thanks to these techniques helps programmers experiment with optimizations suited for this applications such as the ghost cell expansion technique, which provides speedups of up to 13% in our experiments.Ministerio de EconomÃa y Competitividad de España; TIN2013-42148-PMinisterio de EconomÃa y Competitividad de España; TIN2016-75845-PXunta de Galicia; ED431G/0
Multiscale Modeling with Differential Equations
Many physical systems are governed by ordinary or partial differential
equations (see, for example, Chapter ''Differential equations'', ''System of
Differential Equations''). Typically the solution of such systems are functions
of time or of a single space variable (in the case of ODE's), or they depend on
multidimensional space coordinates or on space and time (in the case of PDE's).
In some cases, the solutions may depend on several time or space scales. An
example governed by ODE's is the damped harmonic oscillator, in the two extreme
cases of very small or very large damping, the cardiovascular system, where the
thickness of the arteries and veins varies from centimeters to microns, shallow
water equations, which are valid when water depth is small compared to typical
wavelength of surface waves, and sorption kinetics, in which the range of
interaction of a surfactant with an air bubble is much smaller than the size of
the bubble itself. In all such cases a detailed simulation of the models which
resolves all space or time scales is often inefficient or intractable, and
usually even unnecessary to provide a reasonable description of the behavior of
the system. In the Chapter ''Multiscale modeling with differential equations''
we present examples of systems described by ODE's and PDE's which are
intrinsically multiscale, and illustrate how suitable modeling provide an
effective way to capture the essential behavior of the solutions of such
systems without resolving the small scales.Comment: 40 pages, 20 figures, to be published as a book chapter in a SIAM
boo
- …