Search CORE

974 research outputs found

Computing the Component-Labeling and the Adjacency Tree of a Binary Digital Image in Near Logarithmic-Time

Author: Díaz del Río Fernando
Molina Abril Helena
Real Jurado Pedro
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2019
Field of study

Connected component labeling (CCL) of binary images is one of the fundamental operations in real time applications. The adjacency tree (AdjT) of the connected components offers a region-based representation where each node represents a region which is surrounded by another region of the opposite color. In this paper, a fully parallel algorithm for computing the CCL and AdjT of a binary digital image is described and implemented, without the need of using any geometric information. The time complexity order for an image of m × n pixels under the assumption that a processing element exists for each pixel is near O(log(m+ n)). Results for a multicore processor show a very good scalability until the so-called memory bandwidth bottleneck is reached. The inherent parallelism of our approach points to the direction that even better results will be obtained in other less classical computing architectures.Ministerio de Economía y Competitividad MTM2016-81030-PMinisterio de Economía y Competitividad TEC2012-37868-C04-0

idUS. Depósito de Investigación Universidad de Sevilla

Image Processing for Multiple-Target Tracking on a Graphics Processing Unit

Author: Tanner Michael A.
Publication venue: AFIT Scholar
Publication date: 09/03/2009
Field of study

Multiple-target tracking (MTT) systems have been implemented on many different platforms, however these solutions are often expensive and have long development times. Such MTT implementations require custom hardware, yet offer very little flexibility with ever changing data sets and target tracking requirements. This research explores how to supplement and enhance MTT performance with an existing graphics processing unit (GPU) on a general computing platform. Typical computers are already equipped with powerful GPUs to support various games and multimedia applications. However, such GPUs are not currently being used in desktop MTT applications. This research explores if and how a GPU can be used to supplement and enhance MTT implementations on a flexible common desktop computer without requiring costly dedicated MTT hardware and software. A MTT system was developed in MATLAB to provide baseline performance metrics for processing 24-bit, 1920x1080 color video footage filmed at 30 frames per second. The baseline MATLAB implementation is further enhanced with various custom C functions to speed up the MTT implementation for fair comparison and analysis. From the MATLAB MTT implementation, this research identifies potential areas of improvement through use of the GPU. The bottleneck image processing functions (frame differencing) were converted to execute on the GPU. On average, the GPU code executed 287% faster than the MATLAB implementation. Some individual functions actually executed 20 times faster than the baseline. These results indicate that the GPU is a viable source to significantly increase the performance of MTT with a low-cost hardware solution

AFTI Scholar (Air Force Institute of Technology)

GPU-based Swendsen-Wang multi-cluster algorithm for the simulation of two-dimensional classical spin systems

Author: Baek
Bernaschi
Binder
Block
Cormen
Fortuin
Hawick
Hoshen
Hwang
Janke
Kalentev
Komura
Kosterlitz
Landau
Levy
Metropolis
Preis
Swendsen
Tomita
Tomita
Weigel
Weigel
Wolff
Yukihiro Komura
Yutaka Okabe
Publication venue: 'Elsevier BV'
Publication date: 03/02/2012
Field of study

We present the GPU calculation with the common unified device architecture (CUDA) for the Swendsen-Wang multi-cluster algorithm of two-dimensional classical spin systems. We adjust the two connected component labeling algorithms recently proposed with CUDA for the assignment of the cluster in the Swendsen-Wang algorithm. Starting with the q-state Potts model, we extend our implementation to the system of vector spins, the q-state clock model, with the idea of embedded cluster. We test the performance, and the calculation time on GTX580 is obtained as 2.51 nano sec per a spin flip for the q=2 Potts model (Ising model) and 2.42 nano sec per a spin flip for the q=6 clock model with the linear size L=4096 at the critical temperature, respectively. The computational speed for the q=2 Potts model on GTX580 is 12.4 times as fast as the calculation speed on a current CPU core. That for the q=6 clock model on GTX580 is 35.6 times as fast as the calculation speed on a current CPU core.Comment: accepted for publication in Comp. Phys. Commu

arXiv.org e-Print Archive

Crossref

Multi-GPU-based Swendsen-Wang multi-cluster algorithm for the simulation of two-dimensional q-state Potts model

Author: Aoki
Barkema
Block
Hawick
Hoshen
Kalentev
Komura
Komura
Kosterlitz
Metropolis
Preis
Preis
Swendsen
Tomita
Tomita
Weigel
Wolff
Yukihiro Komura
Yutaka Okabe
Publication venue: 'Elsevier BV'
Publication date: 09/08/2012
Field of study

We present the multiple GPU computing with the common unified device architecture (CUDA) for the Swendsen-Wang multi-cluster algorithm of two-dimensional (2D) q-state Potts model. Extending our algorithm for single GPU computing [Comp. Phys. Comm. 183 (2012) 1155], we realize the GPU computation of the Swendsen-Wang multi-cluster algorithm for multiple GPUs. We implement our code on the large-scale open science supercomputer TSUBAME 2.0, and test the performance and the scalability of the simulation of the 2D Potts model. The performance on Tesla M2050 using 256 GPUs is obtained as 37.3 spin flips per a nano second for the q=2 Potts model (Ising model) at the critical temperature with the linear system size L=65536.Comment: accepted for publication in Comp. Phys. Commun. arXiv admin note: substantial text overlap with arXiv:1202.063

arXiv.org e-Print Archive

Crossref

How does Connected Components Labeling with Decision Trees perform on GPUs?

Author: Costantino Grana
Federico Bolelli
Federico Pollastri
Laura Canalini
Michele Cancilla
Stefano Allegretti
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2019
Field of study

In this paper the problem of Connected Components Labeling (CCL) in binary images using Graphic Processing Units (GPUs) is tackled by a different perspective. In the last decade, many novel algorithms have been released, specifically designed for GPUs. Because CCL literature concerning sequential algorithms is very rich, and includes many efficient solutions, designers of parallel algorithms were often inspired by techniques that had already proved successful in a sequential environment, such as the Union-Find paradigm for solving equivalences between provisional labels. However, the use of decision trees to minimize memory accesses, which is one of the main feature of the best performing sequential algorithms, was never taken into account when designing parallel CCL solutions. In fact, branches in the code tend to cause thread divergence, which usually leads to inefficiency. Anyway, this consideration does not necessarily apply to every possible scenario. Are we sure that the advantages of decision trees do not compensate for the cost of thread divergence? In order to answer this question, we chose three well-known sequential CCL algorithms, which employ decision trees as the cornerstone of their strategy, and we built a data-parallel version of each of them. Experimental tests on real case datasets show that, in most cases, these solutions outperform state-of-the-art algorithms, thus demonstrating the effectiveness of decision trees also in a parallel environment

Archivio istituzionale della ricerca - Università di Modena e Reggio Emilia

A Block-Based Union-Find Algorithm to Label Connected Components on GPUs

Author: A Eklund
A Rosenfeld
C Grana
C Grana
D Maltoni
DP Playne
F Bolelli
F Bolelli
F Bolelli
F Bolelli
F Dong
KA Hawick
L He
Laura Canalini
M Andrecut
O Kalentev
S Zavalishin
Stefano Allegretti
T Lelore
Y Komura
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2019
Field of study

In this paper, we introduce a novel GPU-based Connected Components Labeling algorithm: the Block-based Union Find. The proposed strategy significantly improves an existing GPU algorithm, taking advantage of a block-based approach. Experimental results on real cases and synthetically generated datasets demonstrate the superiority of the new proposal with respect to state-of-the-art

Crossref

Archivio istituzionale della ricerca - Università di Modena e Reggio Emilia