1,222 research outputs found

    Connected component identification and cluster update on GPU

    Full text link
    Cluster identification tasks occur in a multitude of contexts in physics and engineering such as, for instance, cluster algorithms for simulating spin models, percolation simulations, segmentation problems in image processing, or network analysis. While it has been shown that graphics processing units (GPUs) can result in speedups of two to three orders of magnitude as compared to serial codes on CPUs for the case of local and thus naturally parallelized problems such as single-spin flip update simulations of spin models, the situation is considerably more complicated for the non-local problem of cluster or connected component identification. I discuss the suitability of different approaches of parallelization of cluster labeling and cluster update algorithms for calculations on GPU and compare to the performance of serial implementations.Comment: 15 pages, 14 figures, one table, submitted to PR

    Optimizing GPU-Based Connected Components Labeling Algorithms

    Get PDF
    Connected Components Labeling (CCL) is a fundamental image processing technique, widely used in various application areas. Computational throughput of Graphical Processing Units (GPUs) makes them eligible for such a kind of algorithms. In the last decade, many approaches to compute CCL on GPUs have been proposed. Unfortunately, most of them have focused on 4-way connectivity neglecting the importance of 8-way connectivity. This paper aims to extend state-of-the-art GPU-based algorithms from 4 to 8-way connectivity and to improve them with additional optimizations. Experimental results revealed the effectiveness of the proposed strategies

    How does Connected Components Labeling with Decision Trees perform on GPUs?

    Get PDF
    In this paper the problem of Connected Components Labeling (CCL) in binary images using Graphic Processing Units (GPUs) is tackled by a different perspective. In the last decade, many novel algorithms have been released, specifically designed for GPUs. Because CCL literature concerning sequential algorithms is very rich, and includes many efficient solutions, designers of parallel algorithms were often inspired by techniques that had already proved successful in a sequential environment, such as the Union-Find paradigm for solving equivalences between provisional labels. However, the use of decision trees to minimize memory accesses, which is one of the main feature of the best performing sequential algorithms, was never taken into account when designing parallel CCL solutions. In fact, branches in the code tend to cause thread divergence, which usually leads to inefficiency. Anyway, this consideration does not necessarily apply to every possible scenario. Are we sure that the advantages of decision trees do not compensate for the cost of thread divergence? In order to answer this question, we chose three well-known sequential CCL algorithms, which employ decision trees as the cornerstone of their strategy, and we built a data-parallel version of each of them. Experimental tests on real case datasets show that, in most cases, these solutions outperform state-of-the-art algorithms, thus demonstrating the effectiveness of decision trees also in a parallel environment

    Quest for Speed: The Epic Saga of Record-Breaking on OpenCV Connected Components Extraction

    Get PDF
    Connected Components Labeling (CCL) represents an essential part of many Image Processing and Computer Vision pipelines. Given its relevance on the field, it has been part of most cutting-edge Computer Vision libraries. In this paper, all the algorithms included in the OpenCV during the years are reviewed, from sequential to parallel/GPU-based implementations. Our goal is to provide a better understanding of what has changed and why one algorithm should be preferred to another both in terms of memory usage and execution speed

    The evolution of bits and bottlenecks in a scientific workflow trying to keep up with technology: Accelerating 4D image segmentation applied to nasa data

    Get PDF
    In 2016, a team of earth scientists directly engaged a team of computer scientists to identify cyberinfrastructure (CI) approaches that would speed up an earth science workflow. This paper describes the evolution of that workflow as the two teams bridged CI and an image segmentation algorithm to do large scale earth science research. The Pacific Research Platform (PRP) and The Cognitive Hardware and Software Ecosystem Community Infrastructure (CHASE-CI) resources were used to significantly decreased the earth science workflow's wall-clock time from 19.5 days to 53 minutes. The improvement in wall-clock time comes from the use of network appliances, improved image segmentation, deployment of a containerized workflow, and the increase in CI experience and training for the earth scientists. This paper presents a description of the evolving innovations used to improve the workflow, bottlenecks identified within each workflow version, and improvements made within each version of the workflow, over a three-year time period

    A Block-Based Union-Find Algorithm to Label Connected Components on GPUs

    Get PDF
    In this paper, we introduce a novel GPU-based Connected Components Labeling algorithm: the Block-based Union Find. The proposed strategy significantly improves an existing GPU algorithm, taking advantage of a block-based approach. Experimental results on real cases and synthetically generated datasets demonstrate the superiority of the new proposal with respect to state-of-the-art

    A Heuristic-Based Decision Tree for Connected Components Labeling of 3D Volumes: Implementation and Reproducibility Notes

    Get PDF
    This paper provides a detailed description of how to install, setup, and use the YACCLAB benchmark to test the algorithms published in "A Heuristic-Based Decision Tree for Connected Components Labeling of 3D Volumes," underlying how the parameters affect and influence experimental results
    • …
    corecore