3,694 research outputs found

    Exploiting surroundedness for saliency detection: a boolean map approach

    Full text link
    We demonstrate the usefulness of surroundedness for eye fixation prediction by proposing a Boolean Map based Saliency model (BMS). In our formulation, an image is characterized by a set of binary images, which are generated by randomly thresholding the image's feature maps in a whitened feature space. Based on a Gestalt principle of figure-ground segregation, BMS computes a saliency map by discovering surrounded regions via topological analysis of Boolean maps. Furthermore, we draw a connection between BMS and the Minimum Barrier Distance to provide insight into why and how BMS can properly captures the surroundedness cue via Boolean maps. The strength of BMS is verified by its simplicity, efficiency and superior performance compared with 10 state-of-the-art methods on seven eye tracking benchmark datasets.US National Science Foundation; 1059218; 1029430http://cs-people.bu.edu/jmzhang/BMS/BMS_iccv13_preprint.pdfAccepted manuscrip

    Automatic Image Segmentation by Dynamic Region Merging

    Full text link
    This paper addresses the automatic image segmentation problem in a region merging style. With an initially over-segmented image, in which the many regions (or super-pixels) with homogeneous color are detected, image segmentation is performed by iteratively merging the regions according to a statistical test. There are two essential issues in a region merging algorithm: order of merging and the stopping criterion. In the proposed algorithm, these two issues are solved by a novel predicate, which is defined by the sequential probability ratio test (SPRT) and the maximum likelihood criterion. Starting from an over-segmented image, neighboring regions are progressively merged if there is an evidence for merging according to this predicate. We show that the merging order follows the principle of dynamic programming. This formulates image segmentation as an inference problem, where the final segmentation is established based on the observed image. We also prove that the produced segmentation satisfies certain global properties. In addition, a faster algorithm is developed to accelerate the region merging process, which maintains a nearest neighbor graph in each iteration. Experiments on real natural images are conducted to demonstrate the performance of the proposed dynamic region merging algorithm.Comment: 28 pages. This paper is under review in IEEE TI

    A Framework for Symmetric Part Detection in Cluttered Scenes

    Full text link
    The role of symmetry in computer vision has waxed and waned in importance during the evolution of the field from its earliest days. At first figuring prominently in support of bottom-up indexing, it fell out of favor as shape gave way to appearance and recognition gave way to detection. With a strong prior in the form of a target object, the role of the weaker priors offered by perceptual grouping was greatly diminished. However, as the field returns to the problem of recognition from a large database, the bottom-up recovery of the parts that make up the objects in a cluttered scene is critical for their recognition. The medial axis community has long exploited the ubiquitous regularity of symmetry as a basis for the decomposition of a closed contour into medial parts. However, today's recognition systems are faced with cluttered scenes, and the assumption that a closed contour exists, i.e. that figure-ground segmentation has been solved, renders much of the medial axis community's work inapplicable. In this article, we review a computational framework, previously reported in Lee et al. (2013), Levinshtein et al. (2009, 2013), that bridges the representation power of the medial axis and the need to recover and group an object's parts in a cluttered scene. Our framework is rooted in the idea that a maximally inscribed disc, the building block of a medial axis, can be modeled as a compact superpixel in the image. We evaluate the method on images of cluttered scenes.Comment: 10 pages, 8 figure

    Latent Noise Segmentation: How Neural Noise Leads to the Emergence of Segmentation and Grouping

    Full text link
    Deep Neural Networks (DNNs) that achieve human-level performance in general tasks like object segmentation typically require supervised labels. In contrast, humans are able to perform these tasks effortlessly without supervision. To accomplish this, the human visual system makes use of perceptual grouping. Understanding how perceptual grouping arises in an unsupervised manner is critical for improving both models of the visual system, and computer vision models. In this work, we propose a counterintuitive approach to unsupervised perceptual grouping and segmentation: that they arise because of neural noise, rather than in spite of it. We (1) mathematically demonstrate that under realistic assumptions, neural noise can be used to separate objects from each other, and (2) show that adding noise in a DNN enables the network to segment images even though it was never trained on any segmentation labels. Interestingly, we find that (3) segmenting objects using noise results in segmentation performance that aligns with the perceptual grouping phenomena observed in humans. We introduce the Good Gestalt (GG) datasets -- six datasets designed to specifically test perceptual grouping, and show that our DNN models reproduce many important phenomena in human perception, such as illusory contours, closure, continuity, proximity, and occlusion. Finally, we (4) demonstrate the ecological plausibility of the method by analyzing the sensitivity of the DNN to different magnitudes of noise. We find that some model variants consistently succeed with remarkably low levels of neural noise (σ<0.001\sigma<0.001), and surprisingly, that segmenting this way requires as few as a handful of samples. Together, our results suggest a novel unsupervised segmentation method requiring few assumptions, a new explanation for the formation of perceptual grouping, and a potential benefit of neural noise in the visual system

    Computational models for image contour grouping

    Get PDF
    Contours are one dimensional curves which may correspond to meaningful entities such as object boundaries. Accurate contour detection will simplify many vision tasks such as object detection and image recognition. Due to the large variety of image content and contour topology, contours are often detected as edge fragments at first, followed by a second step known as {u0300}{u0300}contour grouping'' to connect them. Due to ambiguities in local image patches, contour grouping is essential for constructing globally coherent contour representation. This thesis aims to group contours so that they are consistent with human perception. We draw inspirations from Gestalt principles, which describe perceptual grouping ability of human vision system. In particular, our work is most relevant to the principles of closure, similarity, and past experiences. The first part of our contribution is a new computational model for contour closure. Most of existing contour grouping methods have focused on pixel-wise detection accuracy and ignored the psychological evidences for topological correctness. This chapter proposes a higher-order CRF model to achieve contour closure in the contour domain. We also propose an efficient inference method which is guaranteed to find integer solutions. Tested on the BSDS benchmark, our method achieves a superior contour grouping performance, comparable precision-recall curves, and more visually pleasant results. Our work makes progresses towards a better computational model of human perceptual grouping. The second part is an energy minimization framework for salient contour detection problem. Region cues such as color/texture homogeneity, and contour cues such as local contrast, are both useful for this task. In order to capture both kinds of cues in a joint energy function, topological consistency between both region and contour labels must be satisfied. Our technique makes use of the topological concept of winding numbers. By using a fast method for winding number computation, we find that a small number of linear constraints are sufficient for label consistency. Our method is instantiated by ratio-based energy functions. Due to cue integration, our method obtains improved results. User interaction can also be incorporated to further improve the results. The third part of our contribution is an efficient category-level image contour detector. The objective is to detect contours which most likely belong to a prescribed category. Our method, which is based on three levels of shape representation and non-parametric Bayesian learning, shows flexibility in learning from either human labeled edge images or unlabelled raw images. In both cases, our experiments obtain better contour detection results than competing methods. In addition, our training process is robust even with a considerable size of training samples. In contrast, state-of-the-art methods require more training samples, and often human interventions are required for new category training. Last but not least, in Chapter 7 we also show how to leverage contour information for symmetry detection. Our method is simple yet effective for detecting the symmetric axes of bilaterally symmetric objects in unsegmented natural scene images. Compared with methods based on feature points, our model can often produce better results for the images containing limited texture

    From holism to compositionality: memes and the evolution of segmentation, syntax, and signification in music and language

    Get PDF
    Steven Mithen argues that language evolved from an antecedent he terms “Hmmmmm, [meaning it was] Holistic, manipulative, multi-modal, musical and mimetic”. Owing to certain innate and learned factors, a capacity for segmentation and cross-stream mapping in early Homo sapiens broke the continuous line of Hmmmmm, creating discrete replicated units which, with the initial support of Hmmmmm, eventually became the semantically freighted words of modern language. That which remained after what was a bifurcation of Hmmmmm arguably survived as music, existing as a sound stream segmented into discrete units, although one without the explicit and relatively fixed semantic content of language. All three types of utterance – the parent Hmmmmm, language, and music – are amenable to a memetic interpretation which applies Universal Darwinism to what are understood as language and musical memes. On the basis of Peter Carruthers’ distinction between ‘cognitivism’ and ‘communicativism’ in language, and William Calvin’s theories of cortical information encoding, a framework is hypothesized for the semantic and syntactic associations between, on the one hand, the sonic patterns of language memes (‘lexemes’) and of musical memes (‘musemes’) and, on the other hand, ‘mentalese’ conceptual structures, in Chomsky’s ‘Logical Form’ (LF)
    corecore