7,177 research outputs found

    A Framework for Symmetric Part Detection in Cluttered Scenes

    Full text link
    The role of symmetry in computer vision has waxed and waned in importance during the evolution of the field from its earliest days. At first figuring prominently in support of bottom-up indexing, it fell out of favor as shape gave way to appearance and recognition gave way to detection. With a strong prior in the form of a target object, the role of the weaker priors offered by perceptual grouping was greatly diminished. However, as the field returns to the problem of recognition from a large database, the bottom-up recovery of the parts that make up the objects in a cluttered scene is critical for their recognition. The medial axis community has long exploited the ubiquitous regularity of symmetry as a basis for the decomposition of a closed contour into medial parts. However, today's recognition systems are faced with cluttered scenes, and the assumption that a closed contour exists, i.e. that figure-ground segmentation has been solved, renders much of the medial axis community's work inapplicable. In this article, we review a computational framework, previously reported in Lee et al. (2013), Levinshtein et al. (2009, 2013), that bridges the representation power of the medial axis and the need to recover and group an object's parts in a cluttered scene. Our framework is rooted in the idea that a maximally inscribed disc, the building block of a medial axis, can be modeled as a compact superpixel in the image. We evaluate the method on images of cluttered scenes.Comment: 10 pages, 8 figure

    Pushing the Boundaries of Boundary Detection using Deep Learning

    Get PDF
    In this work we show that adapting Deep Convolutional Neural Network training to the task of boundary detection can result in substantial improvements over the current state-of-the-art in boundary detection. Our contributions consist firstly in combining a careful design of the loss for boundary detection training, a multi-resolution architecture and training with external data to improve the detection accuracy of the current state of the art. When measured on the standard Berkeley Segmentation Dataset, we improve theoptimal dataset scale F-measure from 0.780 to 0.808 - while human performance is at 0.803. We further improve performance to 0.813 by combining deep learning with grouping, integrating the Normalized Cuts technique within a deep network. We also examine the potential of our boundary detector in conjunction with the task of semantic segmentation and demonstrate clear improvements over state-of-the-art systems. Our detector is fully integrated in the popular Caffe framework and processes a 320x420 image in less than a second.Comment: The previous version reported large improvements w.r.t. the LPO region proposal baseline, which turned out to be due to a wrong computation for the baseline. The improvements are currently less important, and are omitted. We are sorry if the reported results caused any confusion. We have also integrated reviewer feedback regarding human performance on the BSD benchmar

    Object Edge Contour Localisation Based on HexBinary Feature Matching

    Get PDF
    This paper addresses the issue of localising object edge contours in cluttered backgrounds to support robotics tasks such as grasping and manipulation and also to improve the potential perceptual capabilities of robot vision systems. Our approach is based on coarse-to-fine matching of a new recursively constructed hierarchical, dense, edge-localised descriptor, the HexBinary, based on the HexHog descriptor structure first proposed in [1]. Since Binary String image descriptors [2]– [5] require much lower computational resources, but provide similar or even better matching performance than Histogram of Orientated Gradient (HoG) descriptors, we have replaced the HoG base descriptor fields used in HexHog with Binary Strings generated from first and second order polar derivative approximations. The ALOI [6] dataset is used to evaluate the HexBinary descriptors which we demonstrate to achieve a superior performance to that of HexHoG [1] for pose refinement. The validation of our object contour localisation system shows promising results with correctly labelling ~86% of edgel positions and mis-labelling ~3%

    Neural Dynamics of Motion Perception: Direction Fields, Apertures, and Resonant Grouping

    Full text link
    A neural network model of global motion segmentation by visual cortex is described. Called the Motion Boundary Contour System (BCS), the model clarifies how ambiguous local movements on a complex moving shape are actively reorganized into a coherent global motion signal. Unlike many previous researchers, we analyse how a coherent motion signal is imparted to all regions of a moving figure, not only to regions at which unambiguous motion signals exist. The model hereby suggests a solution to the global aperture problem. The Motion BCS describes how preprocessing of motion signals by a Motion Oriented Contrast Filter (MOC Filter) is joined to long-range cooperative grouping mechanisms in a Motion Cooperative-Competitive Loop (MOCC Loop) to control phenomena such as motion capture. The Motion BCS is computed in parallel with the Static BCS of Grossberg and Mingolla (1985a, 1985b, 1987). Homologous properties of the Motion BCS and the Static BCS, specialized to process movement directions and static orientations, respectively, support a unified explanation of many data about static form perception and motion form perception that have heretofore been unexplained or treated separately. Predictions about microscopic computational differences of the parallel cortical streams V1 --> MT and V1 --> V2 --> MT are made, notably the magnocellular thick stripe and parvocellular interstripe streams. It is shown how the Motion BCS can compute motion directions that may be synthesized from multiple orientations with opposite directions-of-contrast. Interactions of model simple cells, complex cells, hypercomplex cells, and bipole cells are described, with special emphasis given to new functional roles in direction disambiguation for endstopping at multiple processing stages and to the dynamic interplay of spatially short-range and long-range interactions.Air Force Office of Scientific Research (90-0175); Defense Advanced Research Projects Agency (90-0083); Office of Naval Research (N00014-91-J-4100

    The constitution of visual perceptual units in the functional architecture of V1

    Full text link
    Scope of this paper is to consider a mean field neural model which takes into account the functional neurogeometry of the visual cortex modelled as a group of rotations and translations. The model generalizes well known results of Bressloff and Cowan which, in absence of input, accounts for hallucination patterns. The main result of our study consists in showing that in presence of a visual input, the eigenmodes of the linearized operator which become stable represent perceptual units present in the image. The result is strictly related to dimensionality reduction and clustering problems

    Learning long-range spatial dependencies with horizontal gated-recurrent units

    Full text link
    Progress in deep learning has spawned great successes in many engineering applications. As a prime example, convolutional neural networks, a type of feedforward neural networks, are now approaching -- and sometimes even surpassing -- human accuracy on a variety of visual recognition tasks. Here, however, we show that these neural networks and their recent extensions struggle in recognition tasks where co-dependent visual features must be detected over long spatial ranges. We introduce the horizontal gated-recurrent unit (hGRU) to learn intrinsic horizontal connections -- both within and across feature columns. We demonstrate that a single hGRU layer matches or outperforms all tested feedforward hierarchical baselines including state-of-the-art architectures which have orders of magnitude more free parameters. We further discuss the biological plausibility of the hGRU in comparison to anatomical data from the visual cortex as well as human behavioral data on a classic contour detection task.Comment: Published at NeurIPS 2018 https://papers.nips.cc/paper/7300-learning-long-range-spatial-dependencies-with-horizontal-gated-recurrent-unit

    Local and global gestalt laws: A neurally based spectral approach

    Get PDF
    A mathematical model of figure-ground articulation is presented, taking into account both local and global gestalt laws. The model is compatible with the functional architecture of the primary visual cortex (V1). Particularly the local gestalt law of good continuity is described by means of suitable connectivity kernels, that are derived from Lie group theory and are neurally implemented in long range connectivity in V1. Different kernels are compatible with the geometric structure of cortical connectivity and they are derived as the fundamental solutions of the Fokker Planck, the Sub-Riemannian Laplacian and the isotropic Laplacian equations. The kernels are used to construct matrices of connectivity among the features present in a visual stimulus. Global gestalt constraints are then introduced in terms of spectral analysis of the connectivity matrix, showing that this processing can be cortically implemented in V1 by mean field neural equations. This analysis performs grouping of local features and individuates perceptual units with the highest saliency. Numerical simulations are performed and results are obtained applying the technique to a number of stimuli.Comment: submitted to Neural Computatio

    Coplanar Repeats by Energy Minimization

    Full text link
    This paper proposes an automated method to detect, group and rectify arbitrarily-arranged coplanar repeated elements via energy minimization. The proposed energy functional combines several features that model how planes with coplanar repeats are projected into images and captures global interactions between different coplanar repeat groups and scene planes. An inference framework based on a recent variant of α\alpha-expansion is described and fast convergence is demonstrated. We compare the proposed method to two widely-used geometric multi-model fitting methods using a new dataset of annotated images containing multiple scene planes with coplanar repeats in varied arrangements. The evaluation shows a significant improvement in the accuracy of rectifications computed from coplanar repeats detected with the proposed method versus those detected with the baseline methods.Comment: 14 pages with supplemental materials attache
    • …
    corecore