2,914 research outputs found
Distributed memory compiler methods for irregular problems: Data copy reuse and runtime partitioning
Outlined here are two methods which we believe will play an important role in any distributed memory compiler able to handle sparse and unstructured problems. We describe how to link runtime partitioners to distributed memory compilers. In our scheme, programmers can implicitly specify how data and loop iterations are to be distributed between processors. This insulates users from having to deal explicitly with potentially complex algorithms that carry out work and data partitioning. We also describe a viable mechanism for tracking and reusing copies of off-processor data. In many programs, several loops access the same off-processor memory locations. As long as it can be verified that the values assigned to off-processor memory locations remain unmodified, we show that we can effectively reuse stored off-processor data. We present experimental data from a 3-D unstructured Euler solver run on iPSC/860 to demonstrate the usefulness of our methods
Group Communication Patterns for High Performance Computing in Scala
We developed a Functional object-oriented Parallel framework (FooPar) for
high-level high-performance computing in Scala. Central to this framework are
Distributed Memory Parallel Data structures (DPDs), i.e., collections of data
distributed in a shared nothing system together with parallel operations on
these data. In this paper, we first present FooPar's architecture and the idea
of DPDs and group communications. Then, we show how DPDs can be implemented
elegantly and efficiently in Scala based on the Traversable/Builder pattern,
unifying Functional and Object-Oriented Programming. We prove the correctness
and safety of one communication algorithm and show how specification testing
(via ScalaCheck) can be used to bridge the gap between proof and
implementation. Furthermore, we show that the group communication operations of
FooPar outperform those of the MPJ Express open source MPI-bindings for Java,
both asymptotically and empirically. FooPar has already been shown to be
capable of achieving close-to-optimal performance for dense matrix-matrix
multiplication via JNI. In this article, we present results on a parallel
implementation of the Floyd-Warshall algorithm in FooPar, achieving more than
94 % efficiency compared to the serial version on a cluster using 100 cores for
matrices of dimension 38000 x 38000
The use of data-mining for the automatic formation of tactics
This paper discusses the usse of data-mining for the automatic formation of tactics. It was presented at the Workshop on Computer-Supported Mathematical Theory Development held at IJCAR in 2004. The aim of this project is to evaluate the applicability of data-mining techniques to the automatic formation of tactics from large corpuses of proofs. We data-mine information from large proof corpuses to find commonly occurring patterns. These patterns are then evolved into tactics using genetic programming techniques
A high-performance open-source framework for multiphysics simulation and adjoint-based shape and topology optimization
The first part of this thesis presents the advances made in the Open-Source software SU2,
towards transforming it into a high-performance framework for design and optimization of
multiphysics problems. Through this work, and in collaboration with other authors, a tenfold
performance improvement was achieved for some problems. More importantly, problems that
had previously been impossible to solve in SU2, can now be used in numerical optimization
with shape or topology variables. Furthermore, it is now exponentially simpler to study new
multiphysics applications, and to develop new numerical schemes taking advantage of modern
high-performance-computing systems.
In the second part of this thesis, these capabilities allowed the application of topology optimiza-
tion to medium scale fluid-structure interaction problems, using high-fidelity models (nonlinear
elasticity and Reynolds-averaged Navier-Stokes equations), which had not been done before
in the literature. This showed that topology optimization can be used to target aerodynamic
objectives, by tailoring the interaction between fluid and structure. However, it also made ev-
ident the limitations of density-based methods for this type of problem, in particular, reliably
converging to discrete solutions. This was overcome with new strategies to both guarantee and
accelerate (i.e. reduce the overall computational cost) the convergence to discrete solutions in
fluid-structure interaction problems.Open Acces
Distributed Memory Compiler Methods for Irregular Problems -- Data Copy Reuse and Runtime Partitioning
This paper outlines two methods which we believe will play an important role in any distributed memory compiler able to handle sparse and unstructured problems. We describe how to link runtime partitioners to distributed memory compilers. In our scheme, programmers can implicitly specify how data and loop iterations are to be distributed between processors. This insulates users from having to deal explicitly with potentially complex algorithms that carry out work and data partitioning. We also describe a viable mechanism for tracking and reusing copies of off-processor data. In many programs, several loops access the same off-processor memory locations. As long as it can be verified that the values assigned to off-processor memory locations remain unmodified, we show that we can effectively reuse stored off-processor data. We present experimental data from a 3-D unstructured Euler solver run on an iPSC/860 to demonstrate the usefulness of our methods
Scaling finite difference methods in large eddy simulation of jet engine noise to the petascale: numerical methods and their efficient and automated implementation
Reduction of jet engine noise has recently become a new arena of competition between aircraft manufacturers. As a relatively new field of research in computational fluid dynamics (CFD), computational aeroacoustics (CAA) prediction of jet engine noise based on large eddy simulation (LES) is a robust and accurate tool that complements the existing theoretical and experimental approaches. In order to satisfy the stringent requirements of CAA on numerical accuracy, finite difference methods in LES-based jet engine noise prediction rely on the implicitly formulated compact spatial partial differentiation and spatial filtering schemes, a crucial component of which is an embedded solver for tridiagonal linear systems spatially oriented along the three coordinate directions of the computational space. Traditionally, researchers and engineers in CAA have employed manually crafted implementations of solvers including the transposition method, the multiblock method and the Schur complement method. Algorithmically, these solvers force a trade-off between numerical accuracy and parallel scalability. Programmingwise, implementing them for each of the three coordinate directions is tediously repetitive and error-prone. ^ In this study, we attempt to tackle both of these two challenges faced by researchers and engineers. We first describe an accurate and scalable tridiagonal linear system solver as a specialization of the truncated SPIKE algorithm and strategies for efficient implementation of the compact spatial partial differentiation and spatial filtering schemes. We then elaborate on two programming models tailored for composing regular grid-based numerical applications including finite difference-based LES of jet engine noise, one based on generalized elemental subroutines and the other based on functional array programming, and the accompanying code optimization and generation methodologies. Through empirical experiments, we demonstrate that truncated SPIKE-based spatial partial differentiation and spatial filtering deliver the theoretically promised optimal scalability in weak scaling conditions and can be implemented using the two programming models with performance on par with handwritten code while significantly reducing the required programming effort
Combined 3D thinning and greedy algorithm to approximate realistic particles with corrected mechanical properties
The shape of irregular particles has significant influence on micro- and
macro-scopic behavior of granular systems. This paper presents a combined 3D
thinning and greedy set-covering algorithm to approximate realistic particles
with a clump of overlapping spheres for discrete element method (DEM)
simulations. First, the particle medial surface (or surface skeleton), from
which all candidate (maximal inscribed) spheres can be generated, is computed
by the topological 3D thinning. Then, the clump generation procedure is
converted into a greedy set-covering (SCP) problem.
To correct the mass distribution due to highly overlapped spheres inside the
clump, linear programming (LP) is used to adjust the density of each component
sphere, such that the aggregate properties mass, center of mass and inertia
tensor are identical or close enough to the prototypical particle. In order to
find the optimal approximation accuracy (volume coverage: ratio of clump's
volume to the original particle's volume), particle flow of 3 different shapes
in a rotating drum are conducted. It was observed that the dynamic angle of
repose starts to converge for all particle shapes at 85% volume coverage
(spheres per clump < 30), which implies the possible optimal resolution to
capture the mechanical behavior of the system.Comment: 34 pages, 13 figure
- …