10,151 research outputs found
Analysing Astronomy Algorithms for GPUs and Beyond
Astronomy depends on ever increasing computing power. Processor clock-rates
have plateaued, and increased performance is now appearing in the form of
additional processor cores on a single chip. This poses significant challenges
to the astronomy software community. Graphics Processing Units (GPUs), now
capable of general-purpose computation, exemplify both the difficult
learning-curve and the significant speedups exhibited by massively-parallel
hardware architectures. We present a generalised approach to tackling this
paradigm shift, based on the analysis of algorithms. We describe a small
collection of foundation algorithms relevant to astronomy and explain how they
may be used to ease the transition to massively-parallel computing
architectures. We demonstrate the effectiveness of our approach by applying it
to four well-known astronomy problems: Hogbom CLEAN, inverse ray-shooting for
gravitational lensing, pulsar dedispersion and volume rendering. Algorithms
with well-defined memory access patterns and high arithmetic intensity stand to
receive the greatest performance boost from massively-parallel architectures,
while those that involve a significant amount of decision-making may struggle
to take advantage of the available processing power.Comment: 10 pages, 3 figures, accepted for publication in MNRA
The projector algorithm: a simple parallel algorithm for computing Voronoi diagrams and Delaunay graphs
The Voronoi diagram is a certain geometric data structure which has numerous
applications in various scientific and technological fields. The theory of
algorithms for computing 2D Euclidean Voronoi diagrams of point sites is rich
and useful, with several different and important algorithms. However, this
theory has been quite steady during the last few decades in the sense that no
essentially new algorithms have entered the game. In addition, most of the
known algorithms are serial in nature and hence cast inherent difficulties on
the possibility to compute the diagram in parallel. In this paper we present
the projector algorithm: a new and simple algorithm which enables the
(combinatorial) computation of 2D Voronoi diagrams. The algorithm is
significantly different from previous ones and some of the involved concepts in
it are in the spirit of linear programming and optics. Parallel implementation
is naturally supported since each Voronoi cell can be computed independently of
the other cells. A new combinatorial structure for representing the cells (and
any convex polytope) is described along the way and the computation of the
induced Delaunay graph is obtained almost automatically.Comment: This is a major revision; re-organization and better presentation of
some parts; correction of several inaccuracies; improvement of some proofs
and figures; added references; modification of the title; the paper is long
but more than half of it is composed of proofs and references: it is
sufficient to look at pages 5, 7--11 in order to understand the algorith
Computational advances in gravitational microlensing: a comparison of CPU, GPU, and parallel, large data codes
To assess how future progress in gravitational microlensing computation at
high optical depth will rely on both hardware and software solutions, we
compare a direct inverse ray-shooting code implemented on a graphics processing
unit (GPU) with both a widely-used hierarchical tree code on a single-core CPU,
and a recent implementation of a parallel tree code suitable for a CPU-based
cluster supercomputer. We examine the accuracy of the tree codes through
comparison with a direct code over a much wider range of parameter space than
has been feasible before. We demonstrate that all three codes present
comparable accuracy, and choice of approach depends on considerations relating
to the scale and nature of the microlensing problem under investigation. On
current hardware, there is little difference in the processing speed of the
single-core CPU tree code and the GPU direct code, however the recent plateau
in single-core CPU speeds means the existing tree code is no longer able to
take advantage of Moore's law-like increases in processing speed. Instead, we
anticipate a rapid increase in GPU capabilities in the next few years, which is
advantageous to the direct code. We suggest that progress in other areas of
astrophysical computation may benefit from a transition to GPUs through the use
of "brute force" algorithms, rather than attempting to port the current best
solution directly to a GPU language -- for certain classes of problems, the
simple implementation on GPUs may already be no worse than an optimised
single-core CPU version.Comment: 11 pages, 4 figures, accepted for publication in New Astronom
QuickCSG: Fast Arbitrary Boolean Combinations of N Solids
QuickCSG computes the result for general N-polyhedron boolean expressions
without an intermediate tree of solids. We propose a vertex-centric view of the
problem, which simplifies the identification of final geometric contributions,
and facilitates its spatial decomposition. The problem is then cast in a single
KD-tree exploration, geared toward the result by early pruning of any region of
space not contributing to the final surface. We assume strong regularity
properties on the input meshes and that they are in general position. This
simplifying assumption, in combination with our vertex-centric approach,
improves the speed of the approach. Complemented with a task-stealing
parallelization, the algorithm achieves breakthrough performance, one to two
orders of magnitude speedups with respect to state-of-the-art CPU algorithms,
on boolean operations over two to dozens of polyhedra. The algorithm also
outperforms GPU implementations with approximate discretizations, while
producing an output without redundant facets. Despite the restrictive
assumptions on the input, we show the usefulness of QuickCSG for applications
with large CSG problems and strong temporal constraints, e.g. modeling for 3D
printers, reconstruction from visual hulls and collision detection
QuickCSG: Fast Arbitrary Boolean Combinations of N Solids
QuickCSG computes the result for general N-polyhedron boolean expressions
without an intermediate tree of solids. We propose a vertex-centric view of the
problem, which simplifies the identification of final geometric contributions,
and facilitates its spatial decomposition. The problem is then cast in a single
KD-tree exploration, geared toward the result by early pruning of any region of
space not contributing to the final surface. We assume strong regularity
properties on the input meshes and that they are in general position. This
simplifying assumption, in combination with our vertex-centric approach,
improves the speed of the approach. Complemented with a task-stealing
parallelization, the algorithm achieves breakthrough performance, one to two
orders of magnitude speedups with respect to state-of-the-art CPU algorithms,
on boolean operations over two to dozens of polyhedra. The algorithm also
outperforms GPU implementations with approximate discretizations, while
producing an output without redundant facets. Despite the restrictive
assumptions on the input, we show the usefulness of QuickCSG for applications
with large CSG problems and strong temporal constraints, e.g. modeling for 3D
printers, reconstruction from visual hulls and collision detection
Empirical Evaluation of the Parallel Distribution Sweeping Framework on Multicore Architectures
In this paper, we perform an empirical evaluation of the Parallel External
Memory (PEM) model in the context of geometric problems. In particular, we
implement the parallel distribution sweeping framework of Ajwani, Sitchinava
and Zeh to solve batched 1-dimensional stabbing max problem. While modern
processors consist of sophisticated memory systems (multiple levels of caches,
set associativity, TLB, prefetching), we empirically show that algorithms
designed in simple models, that focus on minimizing the I/O transfers between
shared memory and single level cache, can lead to efficient software on current
multicore architectures. Our implementation exhibits significantly fewer
accesses to slow DRAM and, therefore, outperforms traditional approaches based
on plane sweep and two-way divide and conquer.Comment: Longer version of ESA'13 pape
The localized Delaunay triangulation and ad-hoc routing in heterogeneous environments
Ad-Hoc Wireless routing has become an important area of research in the last few years due to the massive increase in wireless devices. Computational Geometry is relevant in attempts to build stable, low power routing schemes. It is only recently, however, that models have been expanded to consider devices with a non-uniform broadcast range, and few properties are known. In particular, we find, via both theoretical and experimental methods, extremal properties for the Localized Delaunay Triangulation over the Mutual Inclusion Graph. We also provide a distributed, sub-quadratic algorithm for the generation of the structure
- …