38,673 research outputs found
Parallel Write-Efficient Algorithms and Data Structures for Computational Geometry
In this paper, we design parallel write-efficient geometric algorithms that
perform asymptotically fewer writes than standard algorithms for the same
problem. This is motivated by emerging non-volatile memory technologies with
read performance being close to that of random access memory but writes being
significantly more expensive in terms of energy and latency. We design
algorithms for planar Delaunay triangulation, -d trees, and static and
dynamic augmented trees. Our algorithms are designed in the recently introduced
Asymmetric Nested-Parallel Model, which captures the parallel setting in which
there is a small symmetric memory where reads and writes are unit cost as well
as a large asymmetric memory where writes are times more expensive
than reads. In designing these algorithms, we introduce several techniques for
obtaining write-efficiency, including DAG tracing, prefix doubling,
reconstruction-based rebalancing and -labeling, which we believe will
be useful for designing other parallel write-efficient algorithms
OpenACC Based GPU Parallelization of Plane Sweep Algorithm for Geometric Intersection
Line segment intersection is one of the elementary operations in computational geometry. Complex problems in Geographic Information Systems (GIS) like finding map overlays or spatial joins using polygonal data require solving segment intersections. Plane sweep paradigm is used for finding geometric intersection in an efficient manner. However, it is difficult to parallelize due to its in-order processing of spatial events. We present a new fine-grained parallel algorithm for geometric intersection and its CPU and GPU implementation using OpenMP and OpenACC. To the best of our knowledge, this is the first work demonstrating an effective parallelization of plane sweep on GPUs.
We chose compiler directive based approach for implementation because of its simplicity to parallelize sequential code. Using Nvidia Tesla P100 GPU, our implementation achieves around 40X speedup for line segment intersection problem on 40K and 80K data sets compared to sequential CGAL library
Efficient Representation of Computational Meshes
We present a simple yet general and efficient approach to representation of
computational meshes. Meshes are represented as sets of mesh entities of
different topological dimensions and their incidence relations. We discuss a
straightforward and efficient storage scheme for such mesh representations and
efficient algorithms for computation of arbitrary incidence relations from a
given initial and minimal set of incidence relations. The general
representation may harbor a wide range of computational meshes, and may also be
specialized to provide simple user interfaces for particular meshes, including
simplicial meshes in one, two and three space dimensions where the mesh
entities correspond to vertices, edges, faces and cells. It is elaborated on
how the proposed concepts and data structures may be used for assembly of
variational forms in parallel over distributed finite element meshes.
Benchmarks are presented to demonstrate efficiency in terms of CPU time and
memory usage
GeantV: Results from the prototype of concurrent vector particle transport simulation in HEP
Full detector simulation was among the largest CPU consumer in all CERN
experiment software stacks for the first two runs of the Large Hadron Collider
(LHC). In the early 2010's, the projections were that simulation demands would
scale linearly with luminosity increase, compensated only partially by an
increase of computing resources. The extension of fast simulation approaches to
more use cases, covering a larger fraction of the simulation budget, is only
part of the solution due to intrinsic precision limitations. The remainder
corresponds to speeding-up the simulation software by several factors, which is
out of reach using simple optimizations on the current code base. In this
context, the GeantV R&D project was launched, aiming to redesign the legacy
particle transport codes in order to make them benefit from fine-grained
parallelism features such as vectorization, but also from increased code and
data locality. This paper presents extensively the results and achievements of
this R&D, as well as the conclusions and lessons learnt from the beta
prototype.Comment: 34 pages, 26 figures, 24 table
- …