Search CORE

3 research outputs found

Potential benefits of a block-space GPU approach for discrete tetrahedral domains

Author: Bustos Benjamín
Hitschfeld Nancy
Navarro Cristóbal A.
Publication venue
Publication date: 28/06/2016
Field of study

The study of data-parallel domain re-organization and thread-mapping techniques are relevant topics as they can increase the efficiency of GPU computations when working on spatial discrete domains with non-box-shaped geometry. In this work we study the potential benefits of applying a succint data re-organization of a tetrahedral data-parallel domain of size

\mathcal{O}(n^3)

combined with an efficient block-space GPU map of the form

g:\mathbb{N} \rightarrow \mathbb{N}^3

. Results from the analysis suggest that in theory the combination of these two optimizations produce significant performance improvement as block-based data re-organization allows a coalesced one-to-one correspondence at local thread-space while

g(\lambda)

produces an efficient block-space spatial correspondence between groups of data and groups of threads, reducing the number of unnecessary threads from

O(n^3)

O(n^2\rho^3)

where

\rho

is the linear block-size and typically

\rho^3 \ll n

. From the analysis, we obtained that a block based succint data re-organization can provide up to

2\times

improved performance over a linear data organization while the map can be up to

6\times

more efficient than a bounding box approach. The results from this work can serve as a useful guide for a more efficient GPU computation on tetrahedral domains found in spin lattice, finite element and special n-body problems, among others

arXiv.org e-Print Archive

Block-space GPU Mapping for Embedded Sierpi\'nski Gasket Fractals

Author: Bustos Benjamín
Hitschfeld Nancy
Navarro Cristóbal A.
Vega Raimundo
Publication venue
Publication date: 14/06/2017
Field of study

This work studies the problem of GPU thread mapping for a Sierpi\'nski gasket fractal embedded in a discrete Euclidean space of

n \times n

. A block-space map

\lambda: \mathbb{Z}_{\mathbb{E}}^{2} \mapsto \mathbb{Z}_{\mathbb{F}}^{2}

is proposed, from Euclidean parallel space

\mathbb{E}

to embedded fractal space

\mathbb{F}

, that maps in

\mathcal{O}(\log_2 \log_2(n))

time and uses no more than

\mathcal{O}(n^\mathbb{H})

threads with

\mathbb{H} \approx 1.58...

being the Hausdorff dimension, making it parallel space efficient. When compared to a bounding-box map,

\lambda(\omega)

offers a sub-exponential improvement in parallel space and a monotonically increasing speedup once

n > n_0

. Experimental performance tests show that in practice

\lambda(\omega)

can produce performance improvement at any block-size once

n > n_0 = 2^8

, reaching approximately

10\times

of speedup for

n=2^{16}

under optimal block configurations.Comment: 7 pages, 8 Figure

arXiv.org e-Print Archive

Efficient GPU Thread Mapping on Embedded 2D Fractals

Author: Bustos Benjamin
Hitschfeld Nancy
Navarro Cristóbal A.
Quezada Felipe A.
Vega Raimundo
Publication venue
Publication date: 25/04/2020
Field of study

This work proposes a new approach for mapping GPU threads onto a family of discrete embedded 2D fractals. A block-space map

\lambda: \mathbb{Z}_{\mathbb{E}}^{2} \mapsto \mathbb{Z}_{\mathbb{F}}^{2}

is proposed, from Euclidean parallel space

\mathbb{E}

to embedded fractal space

\mathbb{F}

, that maps in

\mathcal{O}(\log_2 \log_2(n))

time and uses no more than

\mathcal{O}(n^\mathbb{H})

threads with

\mathbb{H}

being the Hausdorff dimension of the fractal, making it parallel space efficient. When compared to a bounding-box (BB) approach,

\lambda(\omega)

offers a sub-exponential improvement in parallel space and a monotonically increasing speedup

n \ge n_0

. The Sierpinski gasket fractal is used as a particular case study and the experimental performance results show that

\lambda(\omega)

reaches up to

9\times

of speedup over the bounding-box approach. A tensor-core based implementation of

\lambda(\omega)

is also proposed for modern GPUs, providing up to

\sim40\%

of extra performance. The results obtained in this work show that doing efficient GPU thread mapping on fractal domains can significantly improve the performance of several applications that work with this type of geometry.Comment: 20 Pages. arXiv admin note: text overlap with arXiv:1706.0455

arXiv.org e-Print Archive