1,627 research outputs found
A Block-Based Union-Find Algorithm to Label Connected Components on GPUs
In this paper, we introduce a novel GPU-based Connected Components Labeling algorithm: the Block-based Union Find. The proposed strategy significantly improves an existing GPU algorithm, taking advantage of a block-based approach. Experimental results on real cases and synthetically generated datasets demonstrate the superiority of the new proposal with respect to state-of-the-art
Spectral-spatial classification of n-dimensional images in real-time based on segmentation and mathematical morphology on GPUs
The objective of this thesis is to develop efficient schemes for spectral-spatial n-dimensional image
classification. By efficient schemes, we mean schemes that produce good classification results in
terms of accuracy, as well as schemes that can be executed in real-time on low-cost computing
infrastructures, such as the Graphics Processing Units (GPUs) shipped in personal computers. The
n-dimensional images include images with two and three dimensions, such as images coming from
the medical domain, and also images ranging from ten to hundreds of dimensions, such as the multiand
hyperspectral images acquired in remote sensing.
In image analysis, classification is a regularly used method for information retrieval in areas such as
medical diagnosis, surveillance, manufacturing and remote sensing, among others. In addition, as
the hyperspectral images have been widely available in recent years owing to the reduction in the
size and cost of the sensors, the number of applications at lab scale, such as food quality control, art
forgery detection, disease diagnosis and forensics has also increased. Although there are many
spectral-spatial classification schemes, most are computationally inefficient in terms of execution
time. In addition, the need for efficient computation on low-cost computing infrastructures is
increasing in line with the incorporation of technology into everyday applications.
In this thesis we have proposed two spectral-spatial classification schemes: one based on
segmentation and other based on wavelets and mathematical morphology. These schemes were
designed with the aim of producing good classification results and they perform better than other
schemes found in the literature based on segmentation and mathematical morphology in terms of
accuracy. Additionally, it was necessary to develop techniques and strategies for efficient GPU
computing, for example, a block–asynchronous strategy, resulting in an efficient implementation on
GPU of the aforementioned spectral-spatial classification schemes. The optimal GPU parameters
were analyzed and different data partitioning and thread block arrangements were studied to exploit
the GPU resources. The results show that the GPU is an adequate computing platform for on-board
processing of hyperspectral information
Optimized Block-Based Algorithms to Label Connected Components on GPUs
Connected Components Labeling (CCL) is a crucial step of several image processing and computer vision pipelines. Many efficient sequential strategies exist, among which one of the most effective is the use of a block-based mask to drastically cut the number of memory accesses. In the last decade, aided by the fast development of Graphics Processing Units (GPUs), a lot of data parallel CCL algorithms have been proposed along with sequential ones. Applications that entirely run in GPU can benefit from parallel implementations of CCL that allow to avoid expensive memory transfers between host and device. In this paper, two new eight-connectivity CCL algorithms are proposed, namely Block-based Union Find (BUF) and Block-based Komura Equivalence (BKE). These algorithms optimize existing GPU solutions introducing a block-based approach. Extensions for three-dimensional datasets are also discussed. In order to produce a fair comparison with previously proposed alternatives, YACCLAB, a public CCL benchmarking framework, has been extended and made suitable for evaluating also GPU algorithms. Moreover, three-dimensional datasets have been added to its collection. Experimental results on real cases and synthetically generated datasets demonstrate the superiority of the new proposals with respect to state-of-the-art, both on 2D and 3D scenarios
Mixing multi-core CPUs and GPUs for scientific simulation software
Recent technological and economic developments have led to widespread availability of
multi-core CPUs and specialist accelerator processors such as graphical processing units
(GPUs). The accelerated computational performance possible from these devices can be very
high for some applications paradigms. Software languages and systems such as NVIDIA's
CUDA and Khronos consortium's open compute language (OpenCL) support a number of
individual parallel application programming paradigms. To scale up the performance of some
complex systems simulations, a hybrid of multi-core CPUs for coarse-grained parallelism and
very many core GPUs for data parallelism is necessary. We describe our use of hybrid applica-
tions using threading approaches and multi-core CPUs to control independent GPU devices.
We present speed-up data and discuss multi-threading software issues for the applications
level programmer and o er some suggested areas for language development and integration
between coarse-grained and ne-grained multi-thread systems. We discuss results from three
common simulation algorithmic areas including: partial di erential equations; graph cluster
metric calculations and random number generation. We report on programming experiences
and selected performance for these algorithms on: single and multiple GPUs; multi-core CPUs;
a CellBE; and using OpenCL. We discuss programmer usability issues and the outlook and
trends in multi-core programming for scienti c applications developers
Multi-GPU-based Swendsen-Wang multi-cluster algorithm for the simulation of two-dimensional q-state Potts model
We present the multiple GPU computing with the common unified device
architecture (CUDA) for the Swendsen-Wang multi-cluster algorithm of
two-dimensional (2D) q-state Potts model. Extending our algorithm for single
GPU computing [Comp. Phys. Comm. 183 (2012) 1155], we realize the GPU
computation of the Swendsen-Wang multi-cluster algorithm for multiple GPUs. We
implement our code on the large-scale open science supercomputer TSUBAME 2.0,
and test the performance and the scalability of the simulation of the 2D Potts
model. The performance on Tesla M2050 using 256 GPUs is obtained as 37.3 spin
flips per a nano second for the q=2 Potts model (Ising model) at the critical
temperature with the linear system size L=65536.Comment: accepted for publication in Comp. Phys. Commun. arXiv admin note:
substantial text overlap with arXiv:1202.063
Optimizing GPU-Based Connected Components Labeling Algorithms
Connected Components Labeling (CCL) is a fundamental image processing technique, widely used in various application areas. Computational throughput of Graphical Processing Units (GPUs) makes them eligible for such a kind of algorithms. In the last decade, many approaches to compute CCL on GPUs have been proposed. Unfortunately, most of them have focused on 4-way connectivity neglecting the importance of 8-way connectivity. This paper aims to extend state-of-the-art GPU-based algorithms from 4 to 8-way connectivity and to improve them with additional optimizations. Experimental results revealed the effectiveness of the proposed strategies
- …