974 research outputs found

    Computing the Component-Labeling and the Adjacency Tree of a Binary Digital Image in Near Logarithmic-Time

    Get PDF
    Connected component labeling (CCL) of binary images is one of the fundamental operations in real time applications. The adjacency tree (AdjT) of the connected components offers a region-based representation where each node represents a region which is surrounded by another region of the opposite color. In this paper, a fully parallel algorithm for computing the CCL and AdjT of a binary digital image is described and implemented, without the need of using any geometric information. The time complexity order for an image of m × n pixels under the assumption that a processing element exists for each pixel is near O(log(m+ n)). Results for a multicore processor show a very good scalability until the so-called memory bandwidth bottleneck is reached. The inherent parallelism of our approach points to the direction that even better results will be obtained in other less classical computing architectures.Ministerio de Economía y Competitividad MTM2016-81030-PMinisterio de Economía y Competitividad TEC2012-37868-C04-0

    Image Processing for Multiple-Target Tracking on a Graphics Processing Unit

    Get PDF
    Multiple-target tracking (MTT) systems have been implemented on many different platforms, however these solutions are often expensive and have long development times. Such MTT implementations require custom hardware, yet offer very little flexibility with ever changing data sets and target tracking requirements. This research explores how to supplement and enhance MTT performance with an existing graphics processing unit (GPU) on a general computing platform. Typical computers are already equipped with powerful GPUs to support various games and multimedia applications. However, such GPUs are not currently being used in desktop MTT applications. This research explores if and how a GPU can be used to supplement and enhance MTT implementations on a flexible common desktop computer without requiring costly dedicated MTT hardware and software. A MTT system was developed in MATLAB to provide baseline performance metrics for processing 24-bit, 1920x1080 color video footage filmed at 30 frames per second. The baseline MATLAB implementation is further enhanced with various custom C functions to speed up the MTT implementation for fair comparison and analysis. From the MATLAB MTT implementation, this research identifies potential areas of improvement through use of the GPU. The bottleneck image processing functions (frame differencing) were converted to execute on the GPU. On average, the GPU code executed 287% faster than the MATLAB implementation. Some individual functions actually executed 20 times faster than the baseline. These results indicate that the GPU is a viable source to significantly increase the performance of MTT with a low-cost hardware solution

    GPU-based Swendsen-Wang multi-cluster algorithm for the simulation of two-dimensional classical spin systems

    Full text link
    We present the GPU calculation with the common unified device architecture (CUDA) for the Swendsen-Wang multi-cluster algorithm of two-dimensional classical spin systems. We adjust the two connected component labeling algorithms recently proposed with CUDA for the assignment of the cluster in the Swendsen-Wang algorithm. Starting with the q-state Potts model, we extend our implementation to the system of vector spins, the q-state clock model, with the idea of embedded cluster. We test the performance, and the calculation time on GTX580 is obtained as 2.51 nano sec per a spin flip for the q=2 Potts model (Ising model) and 2.42 nano sec per a spin flip for the q=6 clock model with the linear size L=4096 at the critical temperature, respectively. The computational speed for the q=2 Potts model on GTX580 is 12.4 times as fast as the calculation speed on a current CPU core. That for the q=6 clock model on GTX580 is 35.6 times as fast as the calculation speed on a current CPU core.Comment: accepted for publication in Comp. Phys. Commu

    Multi-GPU-based Swendsen-Wang multi-cluster algorithm for the simulation of two-dimensional q-state Potts model

    Full text link
    We present the multiple GPU computing with the common unified device architecture (CUDA) for the Swendsen-Wang multi-cluster algorithm of two-dimensional (2D) q-state Potts model. Extending our algorithm for single GPU computing [Comp. Phys. Comm. 183 (2012) 1155], we realize the GPU computation of the Swendsen-Wang multi-cluster algorithm for multiple GPUs. We implement our code on the large-scale open science supercomputer TSUBAME 2.0, and test the performance and the scalability of the simulation of the 2D Potts model. The performance on Tesla M2050 using 256 GPUs is obtained as 37.3 spin flips per a nano second for the q=2 Potts model (Ising model) at the critical temperature with the linear system size L=65536.Comment: accepted for publication in Comp. Phys. Commun. arXiv admin note: substantial text overlap with arXiv:1202.063

    How does Connected Components Labeling with Decision Trees perform on GPUs?

    Get PDF
    In this paper the problem of Connected Components Labeling (CCL) in binary images using Graphic Processing Units (GPUs) is tackled by a different perspective. In the last decade, many novel algorithms have been released, specifically designed for GPUs. Because CCL literature concerning sequential algorithms is very rich, and includes many efficient solutions, designers of parallel algorithms were often inspired by techniques that had already proved successful in a sequential environment, such as the Union-Find paradigm for solving equivalences between provisional labels. However, the use of decision trees to minimize memory accesses, which is one of the main feature of the best performing sequential algorithms, was never taken into account when designing parallel CCL solutions. In fact, branches in the code tend to cause thread divergence, which usually leads to inefficiency. Anyway, this consideration does not necessarily apply to every possible scenario. Are we sure that the advantages of decision trees do not compensate for the cost of thread divergence? In order to answer this question, we chose three well-known sequential CCL algorithms, which employ decision trees as the cornerstone of their strategy, and we built a data-parallel version of each of them. Experimental tests on real case datasets show that, in most cases, these solutions outperform state-of-the-art algorithms, thus demonstrating the effectiveness of decision trees also in a parallel environment

    A Block-Based Union-Find Algorithm to Label Connected Components on GPUs

    Get PDF
    In this paper, we introduce a novel GPU-based Connected Components Labeling algorithm: the Block-based Union Find. The proposed strategy significantly improves an existing GPU algorithm, taking advantage of a block-based approach. Experimental results on real cases and synthetically generated datasets demonstrate the superiority of the new proposal with respect to state-of-the-art
    • …
    corecore