6,502 research outputs found

    Multiscale Fields of Patterns

    Full text link
    We describe a framework for defining high-order image models that can be used in a variety of applications. The approach involves modeling local patterns in a multiscale representation of an image. Local properties of a coarsened image reflect non-local properties of the original image. In the case of binary images local properties are defined by the binary patterns observed over small neighborhoods around each pixel. With the multiscale representation we capture the frequency of patterns observed at different scales of resolution. This framework leads to expressive priors that depend on a relatively small number of parameters. For inference and learning we use an MCMC method for block sampling with very large blocks. We evaluate the approach with two example applications. One involves contour detection. The other involves binary segmentation.Comment: In NIPS 201

    Predictive coding: A Possible Explanation of Filling-in at the blind spot

    Full text link
    Filling-in at the blind-spot is a perceptual phenomenon in which the visual system fills the informational void, which arises due to the absence of retinal input corresponding to the optic disc, with surrounding visual attributes. Though there are enough evidence to conclude that some kind of neural computation is involved in filling-in at the blind spot especially in the early visual cortex, the knowledge of the actual computational mechanism is far from complete. We have investigated the bar experiments and the associated filling-in phenomenon in the light of the hierarchical predictive coding framework, where the blind-spot was represented by the absence of early feed-forward connection. We recorded the responses of predictive estimator neurons at the blind-spot region in the V1 area of our three level (LGN-V1-V2) model network. These responses are in agreement with the results of earlier physiological studies and using the generative model we also showed that these response profiles indeed represent the filling-in completion. These demonstrate that predictive coding framework could account for the filling-in phenomena observed in several psychophysical and physiological experiments involving bar stimuli. These results suggest that the filling-in could naturally arise from the computational principle of hierarchical predictive coding (HPC) of natural images.Comment: 23 pages, 9 figure

    Curve Reconstruction via the Global Statistics of Natural Curves

    Full text link
    Reconstructing the missing parts of a curve has been the subject of much computational research, with applications in image inpainting, object synthesis, etc. Different approaches for solving that problem are typically based on processes that seek visually pleasing or perceptually plausible completions. In this work we focus on reconstructing the underlying physically likely shape by utilizing the global statistics of natural curves. More specifically, we develop a reconstruction model that seeks the mean physical curve for a given inducer configuration. This simple model is both straightforward to compute and it is receptive to diverse additional information, but it requires enough samples for all curve configurations, a practical requirement that limits its effective utilization. To address this practical issue we explore and exploit statistical geometrical properties of natural curves, and in particular, we show that in many cases the mean curve is scale invariant and oftentimes it is extensible. This, in turn, allows to boost the number of examples and thus the robustness of the statistics and its applicability. The reconstruction results are not only more physically plausible but they also lead to important insights on the reconstruction problem, including an elegant explanation why certain inducer configurations are more likely to yield consistent perceptual completions than others.Comment: CVPR versio

    Contour Detection from Deep Patch-level Boundary Prediction

    Full text link
    In this paper, we present a novel approach for contour detection with Convolutional Neural Networks. A multi-scale CNN learning framework is designed to automatically learn the most relevant features for contour patch detection. Our method uses patch-level measurements to create contour maps with overlapping patches. We show the proposed CNN is able to to detect large-scale contours in an image efficienly. We further propose a guided filtering method to refine the contour maps produced from large-scale contours. Experimental results on the major contour benchmark databases demonstrate the effectiveness of the proposed technique. We show our method can achieve good detection of both fine-scale and large-scale contours.Comment: IEEE International Conference on Signal and Image Processing 201

    Computational models for image contour grouping

    Get PDF
    Contours are one dimensional curves which may correspond to meaningful entities such as object boundaries. Accurate contour detection will simplify many vision tasks such as object detection and image recognition. Due to the large variety of image content and contour topology, contours are often detected as edge fragments at first, followed by a second step known as {u0300}{u0300}contour grouping'' to connect them. Due to ambiguities in local image patches, contour grouping is essential for constructing globally coherent contour representation. This thesis aims to group contours so that they are consistent with human perception. We draw inspirations from Gestalt principles, which describe perceptual grouping ability of human vision system. In particular, our work is most relevant to the principles of closure, similarity, and past experiences. The first part of our contribution is a new computational model for contour closure. Most of existing contour grouping methods have focused on pixel-wise detection accuracy and ignored the psychological evidences for topological correctness. This chapter proposes a higher-order CRF model to achieve contour closure in the contour domain. We also propose an efficient inference method which is guaranteed to find integer solutions. Tested on the BSDS benchmark, our method achieves a superior contour grouping performance, comparable precision-recall curves, and more visually pleasant results. Our work makes progresses towards a better computational model of human perceptual grouping. The second part is an energy minimization framework for salient contour detection problem. Region cues such as color/texture homogeneity, and contour cues such as local contrast, are both useful for this task. In order to capture both kinds of cues in a joint energy function, topological consistency between both region and contour labels must be satisfied. Our technique makes use of the topological concept of winding numbers. By using a fast method for winding number computation, we find that a small number of linear constraints are sufficient for label consistency. Our method is instantiated by ratio-based energy functions. Due to cue integration, our method obtains improved results. User interaction can also be incorporated to further improve the results. The third part of our contribution is an efficient category-level image contour detector. The objective is to detect contours which most likely belong to a prescribed category. Our method, which is based on three levels of shape representation and non-parametric Bayesian learning, shows flexibility in learning from either human labeled edge images or unlabelled raw images. In both cases, our experiments obtain better contour detection results than competing methods. In addition, our training process is robust even with a considerable size of training samples. In contrast, state-of-the-art methods require more training samples, and often human interventions are required for new category training. Last but not least, in Chapter 7 we also show how to leverage contour information for symmetry detection. Our method is simple yet effective for detecting the symmetric axes of bilaterally symmetric objects in unsegmented natural scene images. Compared with methods based on feature points, our model can often produce better results for the images containing limited texture

    Solving Bongard Problems with a Visual Language and Pragmatic Reasoning

    Full text link
    More than 50 years ago Bongard introduced 100 visual concept learning problems as a testbed for intelligent vision systems. These problems are now known as Bongard problems. Although they are well known in the cognitive science and AI communities only moderate progress has been made towards building systems that can solve a substantial subset of them. In the system presented here, visual features are extracted through image processing and then translated into a symbolic visual vocabulary. We introduce a formal language that allows representing complex visual concepts based on this vocabulary. Using this language and Bayesian inference, complex visual concepts can be induced from the examples that are provided in each Bongard problem. Contrary to other concept learning problems the examples from which concepts are induced are not random in Bongard problems, instead they are carefully chosen to communicate the concept, hence requiring pragmatic reasoning. Taking pragmatic reasoning into account we find good agreement between the concepts with high posterior probability and the solutions formulated by Bongard himself. While this approach is far from solving all Bongard problems, it solves the biggest fraction yet

    SurfNet: Generating 3D shape surfaces using deep residual networks

    Full text link
    3D shape models are naturally parameterized using vertices and faces, \ie, composed of polygons forming a surface. However, current 3D learning paradigms for predictive and generative tasks using convolutional neural networks focus on a voxelized representation of the object. Lifting convolution operators from the traditional 2D to 3D results in high computational overhead with little additional benefit as most of the geometry information is contained on the surface boundary. Here we study the problem of directly generating the 3D shape surface of rigid and non-rigid shapes using deep convolutional neural networks. We develop a procedure to create consistent `geometry images' representing the shape surface of a category of 3D objects. We then use this consistent representation for category-specific shape surface generation from a parametric representation or an image by developing novel extensions of deep residual networks for the task of geometry image generation. Our experiments indicate that our network learns a meaningful representation of shape surfaces allowing it to interpolate between shape orientations and poses, invent new shape surfaces and reconstruct 3D shape surfaces from previously unseen images.Comment: CVPR 2017 pape
    • …
    corecore