56,919 research outputs found
Optimized Block-Based Algorithms to Label Connected Components on GPUs
Connected Components Labeling (CCL) is a crucial step of several image processing and computer vision pipelines. Many efficient sequential strategies exist, among which one of the most effective is the use of a block-based mask to drastically cut the number of memory accesses. In the last decade, aided by the fast development of Graphics Processing Units (GPUs), a lot of data parallel CCL algorithms have been proposed along with sequential ones. Applications that entirely run in GPU can benefit from parallel implementations of CCL that allow to avoid expensive memory transfers between host and device. In this paper, two new eight-connectivity CCL algorithms are proposed, namely Block-based Union Find (BUF) and Block-based Komura Equivalence (BKE). These algorithms optimize existing GPU solutions introducing a block-based approach. Extensions for three-dimensional datasets are also discussed. In order to produce a fair comparison with previously proposed alternatives, YACCLAB, a public CCL benchmarking framework, has been extended and made suitable for evaluating also GPU algorithms. Moreover, three-dimensional datasets have been added to its collection. Experimental results on real cases and synthetically generated datasets demonstrate the superiority of the new proposals with respect to state-of-the-art, both on 2D and 3D scenarios
How to Extract the Geometry and Topology from Very Large 3D Segmentations
Segmentation is often an essential intermediate step in image analysis. A
volume segmentation characterizes the underlying volume image in terms of
geometric information--segments, faces between segments, curves in which
several faces meet--as well as a topology on these objects. Existing algorithms
encode this information in designated data structures, but require that these
data structures fit entirely in Random Access Memory (RAM). Today, 3D images
with several billion voxels are acquired, e.g. in structural neurobiology.
Since these large volumes can no longer be processed with existing methods, we
present a new algorithm which performs geometry and topology extraction with a
runtime linear in the number of voxels and log-linear in the number of faces
and curves. The parallelizable algorithm proceeds in a block-wise fashion and
constructs a consistent representation of the entire volume image on the hard
drive, making the structure of very large volume segmentations accessible to
image analysis. The parallelized C++ source code, free command line tools and
MATLAB mex files are avilable from
http://hci.iwr.uni-heidelberg.de/software.phpComment: C++ source code, free command line tools and MATLAB mex files are
avilable from http://hci.iwr.uni-heidelberg.de/software.ph
Connected component identification and cluster update on GPU
Cluster identification tasks occur in a multitude of contexts in physics and
engineering such as, for instance, cluster algorithms for simulating spin
models, percolation simulations, segmentation problems in image processing, or
network analysis. While it has been shown that graphics processing units (GPUs)
can result in speedups of two to three orders of magnitude as compared to
serial codes on CPUs for the case of local and thus naturally parallelized
problems such as single-spin flip update simulations of spin models, the
situation is considerably more complicated for the non-local problem of cluster
or connected component identification. I discuss the suitability of different
approaches of parallelization of cluster labeling and cluster update algorithms
for calculations on GPU and compare to the performance of serial
implementations.Comment: 15 pages, 14 figures, one table, submitted to PR
Implicit Decomposition for Write-Efficient Connectivity Algorithms
The future of main memory appears to lie in the direction of new technologies
that provide strong capacity-to-performance ratios, but have write operations
that are much more expensive than reads in terms of latency, bandwidth, and
energy. Motivated by this trend, we propose sequential and parallel algorithms
to solve graph connectivity problems using significantly fewer writes than
conventional algorithms. Our primary algorithmic tool is the construction of an
-sized "implicit decomposition" of a bounded-degree graph on
nodes, which combined with read-only access to enables fast answers to
connectivity and biconnectivity queries on . The construction breaks the
linear-write "barrier", resulting in costs that are asymptotically lower than
conventional algorithms while adding only a modest cost to querying time. For
general non-sparse graphs on edges, we also provide the first writes
and operations parallel algorithms for connectivity and biconnectivity.
These algorithms provide insight into how applications can efficiently process
computations on large graphs in systems with read-write asymmetry
An Algorithmic Framework for Labeling Road Maps
Given an unlabeled road map, we consider, from an algorithmic perspective,
the cartographic problem to place non-overlapping road labels embedded in their
roads. We first decompose the road network into logically coherent road
sections, e.g., parts of roads between two junctions. Based on this
decomposition, we present and implement a new and versatile framework for
placing labels in road maps such that the number of labeled road sections is
maximized. In an experimental evaluation with road maps of 11 major cities we
show that our proposed labeling algorithm is both fast in practice and that it
reaches near-optimal solution quality, where optimal solutions are obtained by
mixed-integer linear programming. In comparison to the standard OpenStreetMap
renderer Mapnik, our algorithm labels 31% more road sections in average.Comment: extended version of a paper to appear at GIScience 201
Coplanar Repeats by Energy Minimization
This paper proposes an automated method to detect, group and rectify
arbitrarily-arranged coplanar repeated elements via energy minimization. The
proposed energy functional combines several features that model how planes with
coplanar repeats are projected into images and captures global interactions
between different coplanar repeat groups and scene planes. An inference
framework based on a recent variant of -expansion is described and fast
convergence is demonstrated. We compare the proposed method to two widely-used
geometric multi-model fitting methods using a new dataset of annotated images
containing multiple scene planes with coplanar repeats in varied arrangements.
The evaluation shows a significant improvement in the accuracy of
rectifications computed from coplanar repeats detected with the proposed method
versus those detected with the baseline methods.Comment: 14 pages with supplemental materials attache
Fast connected component labeling algorithm: a non voxel-based approach
This paper presents a new approach to achieve connected component labeling on both binary images and volumes by using the Extreme Vertices Model (EVM), a representation model for orthogonal
polyhedra, applied to digital images and volume datasets recently. In contrast with previous techniques, this method does not use a voxel-based approach but deals with the inner sections of the object.Postprint (published version
Space-Time Tradeoffs for Distributed Verification
Verifying that a network configuration satisfies a given boolean predicate is
a fundamental problem in distributed computing. Many variations of this problem
have been studied, for example, in the context of proof labeling schemes (PLS),
locally checkable proofs (LCP), and non-deterministic local decision (NLD). In
all of these contexts, verification time is assumed to be constant. Korman,
Kutten and Masuzawa [PODC 2011] presented a proof-labeling scheme for MST, with
poly-logarithmic verification time, and logarithmic memory at each vertex.
In this paper we introduce the notion of a -PLS, which allows the
verification procedure to run for super-constant time. Our work analyzes the
tradeoffs of -PLS between time, label size, message length, and computation
space. We construct a universal -PLS and prove that it uses the same amount
of total communication as a known one-round universal PLS, and factor
smaller labels. In addition, we provide a general technique to prove lower
bounds for space-time tradeoffs of -PLS. We use this technique to show an
optimal tradeoff for testing that a network is acyclic (cycle free). Our
optimal -PLS for acyclicity uses label size and computation space . We further describe a recursive space verifier for
acyclicity which does not assume previous knowledge of the run-time .Comment: Pre-proceedings version of paper presented at the 24th International
Colloquium on Structural Information and Communication Complexity (SIROCCO
2017
- …