3,694 research outputs found
Exploiting surroundedness for saliency detection: a boolean map approach
We demonstrate the usefulness of surroundedness for eye fixation prediction by proposing a Boolean Map based Saliency model (BMS). In our formulation, an image is characterized by a set of binary images, which are generated by randomly thresholding the image's feature maps in a whitened feature space. Based on a Gestalt principle of figure-ground segregation, BMS computes a saliency map by discovering surrounded regions via topological analysis of Boolean maps. Furthermore, we draw a connection between BMS and the Minimum Barrier Distance to provide insight into why and how BMS can properly captures the surroundedness cue via Boolean maps. The strength of BMS is verified by its simplicity, efficiency and superior performance compared with 10 state-of-the-art methods on seven eye tracking benchmark datasets.US National Science Foundation; 1059218; 1029430http://cs-people.bu.edu/jmzhang/BMS/BMS_iccv13_preprint.pdfAccepted manuscrip
Automatic Image Segmentation by Dynamic Region Merging
This paper addresses the automatic image segmentation problem in a region
merging style. With an initially over-segmented image, in which the many
regions (or super-pixels) with homogeneous color are detected, image
segmentation is performed by iteratively merging the regions according to a
statistical test. There are two essential issues in a region merging algorithm:
order of merging and the stopping criterion. In the proposed algorithm, these
two issues are solved by a novel predicate, which is defined by the sequential
probability ratio test (SPRT) and the maximum likelihood criterion. Starting
from an over-segmented image, neighboring regions are progressively merged if
there is an evidence for merging according to this predicate. We show that the
merging order follows the principle of dynamic programming. This formulates
image segmentation as an inference problem, where the final segmentation is
established based on the observed image. We also prove that the produced
segmentation satisfies certain global properties. In addition, a faster
algorithm is developed to accelerate the region merging process, which
maintains a nearest neighbor graph in each iteration. Experiments on real
natural images are conducted to demonstrate the performance of the proposed
dynamic region merging algorithm.Comment: 28 pages. This paper is under review in IEEE TI
A Framework for Symmetric Part Detection in Cluttered Scenes
The role of symmetry in computer vision has waxed and waned in importance
during the evolution of the field from its earliest days. At first figuring
prominently in support of bottom-up indexing, it fell out of favor as shape
gave way to appearance and recognition gave way to detection. With a strong
prior in the form of a target object, the role of the weaker priors offered by
perceptual grouping was greatly diminished. However, as the field returns to
the problem of recognition from a large database, the bottom-up recovery of the
parts that make up the objects in a cluttered scene is critical for their
recognition. The medial axis community has long exploited the ubiquitous
regularity of symmetry as a basis for the decomposition of a closed contour
into medial parts. However, today's recognition systems are faced with
cluttered scenes, and the assumption that a closed contour exists, i.e. that
figure-ground segmentation has been solved, renders much of the medial axis
community's work inapplicable. In this article, we review a computational
framework, previously reported in Lee et al. (2013), Levinshtein et al. (2009,
2013), that bridges the representation power of the medial axis and the need to
recover and group an object's parts in a cluttered scene. Our framework is
rooted in the idea that a maximally inscribed disc, the building block of a
medial axis, can be modeled as a compact superpixel in the image. We evaluate
the method on images of cluttered scenes.Comment: 10 pages, 8 figure
Latent Noise Segmentation: How Neural Noise Leads to the Emergence of Segmentation and Grouping
Deep Neural Networks (DNNs) that achieve human-level performance in general
tasks like object segmentation typically require supervised labels. In
contrast, humans are able to perform these tasks effortlessly without
supervision. To accomplish this, the human visual system makes use of
perceptual grouping. Understanding how perceptual grouping arises in an
unsupervised manner is critical for improving both models of the visual system,
and computer vision models. In this work, we propose a counterintuitive
approach to unsupervised perceptual grouping and segmentation: that they arise
because of neural noise, rather than in spite of it. We (1) mathematically
demonstrate that under realistic assumptions, neural noise can be used to
separate objects from each other, and (2) show that adding noise in a DNN
enables the network to segment images even though it was never trained on any
segmentation labels. Interestingly, we find that (3) segmenting objects using
noise results in segmentation performance that aligns with the perceptual
grouping phenomena observed in humans. We introduce the Good Gestalt (GG)
datasets -- six datasets designed to specifically test perceptual grouping, and
show that our DNN models reproduce many important phenomena in human
perception, such as illusory contours, closure, continuity, proximity, and
occlusion. Finally, we (4) demonstrate the ecological plausibility of the
method by analyzing the sensitivity of the DNN to different magnitudes of
noise. We find that some model variants consistently succeed with remarkably
low levels of neural noise (), and surprisingly, that segmenting
this way requires as few as a handful of samples. Together, our results suggest
a novel unsupervised segmentation method requiring few assumptions, a new
explanation for the formation of perceptual grouping, and a potential benefit
of neural noise in the visual system
Computational models for image contour grouping
Contours are one dimensional curves which may correspond to meaningful entities such as object boundaries. Accurate contour detection will simplify many vision tasks such as object detection and image recognition. Due to the large variety of image content and contour topology, contours are often detected as edge fragments at first, followed by a second step known as {u0300}{u0300}contour grouping'' to connect them. Due to ambiguities in local image patches, contour grouping is essential for constructing globally coherent contour representation. This thesis aims to group contours so that they are consistent with human perception. We draw inspirations from Gestalt principles, which describe perceptual grouping ability of human vision system. In particular, our work is most relevant to the principles of closure, similarity, and past experiences. The first part of our contribution is a new computational model for contour closure. Most of existing contour grouping methods have focused on pixel-wise detection accuracy and ignored the psychological evidences for topological correctness. This chapter proposes a higher-order CRF model to achieve contour closure in the contour domain. We also propose an efficient inference method which is guaranteed to find integer solutions. Tested on the BSDS benchmark, our method achieves a superior contour grouping performance, comparable precision-recall curves, and more visually pleasant results. Our work makes progresses towards a better computational model of human perceptual grouping. The second part is an energy minimization framework for salient contour detection problem. Region cues such as color/texture homogeneity, and contour cues such as local contrast, are both useful for this task. In order to capture both kinds of cues in a joint energy function, topological consistency between both region and contour labels must be satisfied. Our technique makes use of the topological concept of winding numbers. By using a fast method for winding number computation, we find that a small number of linear constraints are sufficient for label consistency. Our method is instantiated by ratio-based energy functions. Due to cue integration, our method obtains improved results. User interaction can also be incorporated to further improve the results. The third part of our contribution is an efficient category-level image contour detector. The objective is to detect contours which most likely belong to a prescribed category. Our method, which is based on three levels of shape representation and non-parametric Bayesian learning, shows flexibility in learning from either human labeled edge images or unlabelled raw images. In both cases, our experiments obtain better contour detection results than competing methods. In addition, our training process is robust even with a considerable size of training samples. In contrast, state-of-the-art methods require more training samples, and often human interventions are required for new category training. Last but not least, in Chapter 7 we also show how to leverage contour information for symmetry detection. Our method is simple yet effective for detecting the symmetric axes of bilaterally symmetric objects in unsegmented natural scene images. Compared with methods based on feature points, our model can often produce better results for the images containing limited texture
From holism to compositionality: memes and the evolution of segmentation, syntax, and signification in music and language
Steven Mithen argues that language evolved from an antecedent he terms “Hmmmmm, [meaning it was] Holistic, manipulative, multi-modal, musical and mimetic”. Owing to certain innate and learned factors, a capacity for segmentation and cross-stream mapping in early Homo sapiens broke the continuous line of Hmmmmm, creating discrete replicated units which, with the initial support of Hmmmmm, eventually became the semantically freighted words of modern language. That which remained after what was a bifurcation of Hmmmmm arguably survived as music, existing as a sound stream segmented into discrete units, although one without the explicit and relatively fixed semantic content of language. All three types of utterance – the parent Hmmmmm, language, and music – are amenable to a memetic interpretation which applies Universal Darwinism to what are understood as language and musical memes. On the basis of Peter Carruthers’ distinction between ‘cognitivism’ and ‘communicativism’ in language, and William Calvin’s theories of cortical information encoding, a framework is hypothesized for the semantic and syntactic associations between, on the one hand, the sonic patterns of language memes (‘lexemes’) and of musical memes (‘musemes’) and, on the other hand, ‘mentalese’ conceptual structures, in Chomsky’s ‘Logical Form’ (LF)
- …