5,022 research outputs found
DepthCut: Improved Depth Edge Estimation Using Multiple Unreliable Channels
In the context of scene understanding, a variety of methods exists to
estimate different information channels from mono or stereo images, including
disparity, depth, and normals. Although several advances have been reported in
the recent years for these tasks, the estimated information is often imprecise
particularly near depth discontinuities or creases. Studies have however shown
that precisely such depth edges carry critical cues for the perception of
shape, and play important roles in tasks like depth-based segmentation or
foreground selection. Unfortunately, the currently extracted channels often
carry conflicting signals, making it difficult for subsequent applications to
effectively use them. In this paper, we focus on the problem of obtaining
high-precision depth edges (i.e., depth contours and creases) by jointly
analyzing such unreliable information channels. We propose DepthCut, a
data-driven fusion of the channels using a convolutional neural network trained
on a large dataset with known depth. The resulting depth edges can be used for
segmentation, decomposing a scene into depth layers with relatively flat depth,
or improving the accuracy of the depth estimate near depth edges by
constraining its gradients to agree with these edges. Quantitatively, we
compare against 15 variants of baselines and demonstrate that our depth edges
result in an improved segmentation performance and an improved depth estimate
near depth edges compared to data-agnostic channel fusion. Qualitatively, we
demonstrate that the depth edges result in superior segmentation and depth
orderings.Comment: 12 page
A Framework for Symmetric Part Detection in Cluttered Scenes
The role of symmetry in computer vision has waxed and waned in importance
during the evolution of the field from its earliest days. At first figuring
prominently in support of bottom-up indexing, it fell out of favor as shape
gave way to appearance and recognition gave way to detection. With a strong
prior in the form of a target object, the role of the weaker priors offered by
perceptual grouping was greatly diminished. However, as the field returns to
the problem of recognition from a large database, the bottom-up recovery of the
parts that make up the objects in a cluttered scene is critical for their
recognition. The medial axis community has long exploited the ubiquitous
regularity of symmetry as a basis for the decomposition of a closed contour
into medial parts. However, today's recognition systems are faced with
cluttered scenes, and the assumption that a closed contour exists, i.e. that
figure-ground segmentation has been solved, renders much of the medial axis
community's work inapplicable. In this article, we review a computational
framework, previously reported in Lee et al. (2013), Levinshtein et al. (2009,
2013), that bridges the representation power of the medial axis and the need to
recover and group an object's parts in a cluttered scene. Our framework is
rooted in the idea that a maximally inscribed disc, the building block of a
medial axis, can be modeled as a compact superpixel in the image. We evaluate
the method on images of cluttered scenes.Comment: 10 pages, 8 figure
Finding Temporally Consistent Occlusion Boundaries in Videos using Geometric Context
We present an algorithm for finding temporally consistent occlusion
boundaries in videos to support segmentation of dynamic scenes. We learn
occlusion boundaries in a pairwise Markov random field (MRF) framework. We
first estimate the probability of an spatio-temporal edge being an occlusion
boundary by using appearance, flow, and geometric features. Next, we enforce
occlusion boundary continuity in a MRF model by learning pairwise occlusion
probabilities using a random forest. Then, we temporally smooth boundaries to
remove temporal inconsistencies in occlusion boundary estimation. Our proposed
framework provides an efficient approach for finding temporally consistent
occlusion boundaries in video by utilizing causality, redundancy in videos, and
semantic layout of the scene. We have developed a dataset with fully annotated
ground-truth occlusion boundaries of over 30 videos ($5000 frames). This
dataset is used to evaluate temporal occlusion boundaries and provides a much
needed baseline for future studies. We perform experiments to demonstrate the
role of scene layout, and temporal information for occlusion reasoning in
dynamic scenes.Comment: Applications of Computer Vision (WACV), 2015 IEEE Winter Conference
o
Computational models for image contour grouping
Contours are one dimensional curves which may correspond to meaningful entities such as object boundaries. Accurate contour detection will simplify many vision tasks such as object detection and image recognition. Due to the large variety of image content and contour topology, contours are often detected as edge fragments at first, followed by a second step known as {u0300}{u0300}contour grouping'' to connect them. Due to ambiguities in local image patches, contour grouping is essential for constructing globally coherent contour representation. This thesis aims to group contours so that they are consistent with human perception. We draw inspirations from Gestalt principles, which describe perceptual grouping ability of human vision system. In particular, our work is most relevant to the principles of closure, similarity, and past experiences. The first part of our contribution is a new computational model for contour closure. Most of existing contour grouping methods have focused on pixel-wise detection accuracy and ignored the psychological evidences for topological correctness. This chapter proposes a higher-order CRF model to achieve contour closure in the contour domain. We also propose an efficient inference method which is guaranteed to find integer solutions. Tested on the BSDS benchmark, our method achieves a superior contour grouping performance, comparable precision-recall curves, and more visually pleasant results. Our work makes progresses towards a better computational model of human perceptual grouping. The second part is an energy minimization framework for salient contour detection problem. Region cues such as color/texture homogeneity, and contour cues such as local contrast, are both useful for this task. In order to capture both kinds of cues in a joint energy function, topological consistency between both region and contour labels must be satisfied. Our technique makes use of the topological concept of winding numbers. By using a fast method for winding number computation, we find that a small number of linear constraints are sufficient for label consistency. Our method is instantiated by ratio-based energy functions. Due to cue integration, our method obtains improved results. User interaction can also be incorporated to further improve the results. The third part of our contribution is an efficient category-level image contour detector. The objective is to detect contours which most likely belong to a prescribed category. Our method, which is based on three levels of shape representation and non-parametric Bayesian learning, shows flexibility in learning from either human labeled edge images or unlabelled raw images. In both cases, our experiments obtain better contour detection results than competing methods. In addition, our training process is robust even with a considerable size of training samples. In contrast, state-of-the-art methods require more training samples, and often human interventions are required for new category training. Last but not least, in Chapter 7 we also show how to leverage contour information for symmetry detection. Our method is simple yet effective for detecting the symmetric axes of bilaterally symmetric objects in unsegmented natural scene images. Compared with methods based on feature points, our model can often produce better results for the images containing limited texture
Automated Visual Fin Identification of Individual Great White Sharks
This paper discusses the automated visual identification of individual great
white sharks from dorsal fin imagery. We propose a computer vision photo ID
system and report recognition results over a database of thousands of
unconstrained fin images. To the best of our knowledge this line of work
establishes the first fully automated contour-based visual ID system in the
field of animal biometrics. The approach put forward appreciates shark fins as
textureless, flexible and partially occluded objects with an individually
characteristic shape. In order to recover animal identities from an image we
first introduce an open contour stroke model, which extends multi-scale region
segmentation to achieve robust fin detection. Secondly, we show that
combinatorial, scale-space selective fingerprinting can successfully encode fin
individuality. We then measure the species-specific distribution of visual
individuality along the fin contour via an embedding into a global `fin space'.
Exploiting this domain, we finally propose a non-linear model for individual
animal recognition and combine all approaches into a fine-grained
multi-instance framework. We provide a system evaluation, compare results to
prior work, and report performance and properties in detail.Comment: 17 pages, 16 figures. To be published in IJCV. Article replaced to
update first author contact details and to correct a Figure reference on page
- …