3 research outputs found
Potential benefits of a block-space GPU approach for discrete tetrahedral domains
The study of data-parallel domain re-organization and thread-mapping
techniques are relevant topics as they can increase the efficiency of GPU
computations when working on spatial discrete domains with non-box-shaped
geometry. In this work we study the potential benefits of applying a succint
data re-organization of a tetrahedral data-parallel domain of size
combined with an efficient block-space GPU map of the form
. Results from the analysis suggest that
in theory the combination of these two optimizations produce significant
performance improvement as block-based data re-organization allows a coalesced
one-to-one correspondence at local thread-space while produces an
efficient block-space spatial correspondence between groups of data and groups
of threads, reducing the number of unnecessary threads from to
where is the linear block-size and typically . From the analysis, we obtained that a block based succint data
re-organization can provide up to improved performance over a linear
data organization while the map can be up to more efficient than a
bounding box approach. The results from this work can serve as a useful guide
for a more efficient GPU computation on tetrahedral domains found in spin
lattice, finite element and special n-body problems, among others
Block-space GPU Mapping for Embedded Sierpi\'nski Gasket Fractals
This work studies the problem of GPU thread mapping for a Sierpi\'nski gasket
fractal embedded in a discrete Euclidean space of . A block-space
map
is proposed, from Euclidean parallel space to embedded fractal
space , that maps in time and uses
no more than threads with being the Hausdorff dimension, making it parallel space efficient.
When compared to a bounding-box map, offers a sub-exponential
improvement in parallel space and a monotonically increasing speedup once . Experimental performance tests show that in practice
can produce performance improvement at any block-size once ,
reaching approximately of speedup for under optimal block
configurations.Comment: 7 pages, 8 Figure
Efficient GPU Thread Mapping on Embedded 2D Fractals
This work proposes a new approach for mapping GPU threads onto a family of
discrete embedded 2D fractals. A block-space map is proposed,
from Euclidean parallel space to embedded fractal space
, that maps in time and uses no
more than threads with being the
Hausdorff dimension of the fractal, making it parallel space efficient. When
compared to a bounding-box (BB) approach, offers a
sub-exponential improvement in parallel space and a monotonically increasing
speedup . The Sierpinski gasket fractal is used as a particular case
study and the experimental performance results show that
reaches up to of speedup over the bounding-box approach. A
tensor-core based implementation of is also proposed for
modern GPUs, providing up to of extra performance. The results
obtained in this work show that doing efficient GPU thread mapping on fractal
domains can significantly improve the performance of several applications that
work with this type of geometry.Comment: 20 Pages. arXiv admin note: text overlap with arXiv:1706.0455