500 research outputs found

    Deformable kernels for early vision

    Get PDF
    Early vision algorithms often have a first stage of linear-filtering that `extracts' from the image information at multiple scales of resolution and multiple orientations. A common difficulty in the design and implementation of such schemes is that one feels compelled to discretize coarsely the space of scales and orientations in order to reduce computation and storage costs. A technique is presented that allows: 1) computing the best approximation of a given family using linear combinations of a small number of `basis' functions; and 2) describing all finite-dimensional families, i.e., the families of filters for which a finite dimensional representation is possible with no error. The technique is based on singular value decomposition and may be applied to generating filters in arbitrary dimensions and subject to arbitrary deformations. The relevant functional analysis results are reviewed and precise conditions for the decomposition to be feasible are stated. Experimental results are presented that demonstrate the applicability of the technique to generating multiorientation multi-scale 2D edge-detection kernels. The implementation issues are also discussed

    Vision of a Visipedia

    Get PDF
    The web is not perfect: while text is easily searched and organized, pictures (the vast majority of the bits that one can find online) are not. In order to see how one could improve the web and make pictures first-class citizens of the web, I explore the idea of Visipedia, a visual interface for Wikipedia that is able to answer visual queries and enables experts to contribute and organize visual knowledge. Five distinct groups of humans would interact through Visipedia: users, experts, editors, visual workers, and machine vision scientists. The latter would gradually build automata able to interpret images. I explore some of the technical challenges involved in making Visipedia happen. I argue that Visipedia will likely grow organically, combining state-of-the-art machine vision with human labor

    Detecting and localizing edges composed of steps, peaks and roofs

    Get PDF
    It is well known that the projection of depth or orientation discontinuities in a physical scene results in image intensity edges which are not ideal step edges but are more typically a combination of steps, peak and roof profiles. However most edge detection schemes ignore the composite nature of these edges, resulting in systematic errors in detection and localization. We address the problem of detecting and localizing these edges, while at the same time also solving the problem of false responses in smoothly shaded regions with constant gradient of the image brightness. We show that a class of nonlinear filters, known as quadratic filters, are appropriate for this task, while linear filters are not. A series of performance criteria are derived for characterizing the SNR, localization and multiple responses of these filters in a manner analogous to Canny's criteria for linear filters. A two-dimensional version of the approach is developed which has the property of being able to represent multiple edges at the same location and determine the orientation of each to any desired precision. This permits junctions to be localized without rounding. Experimental results are presented

    Measuring and Predicting Importance of Objects in Our Visual World

    Get PDF
    Associating keywords with images automatically is an approachable and useful goal for visual recognition researchers. Keywords are distinctive and informative objects. We argue that keywords need to be sorted by 'importance', which we define as the probability of being mentioned first by an observer. We propose a method for measuring the `importance' of words using the object labels that multiple human observers give an everyday scene photograph. We model object naming as drawing balls from an urn, and fit this model to estimate `importance'; this combines order and frequency, enabling precise prediction under limited human labeling. We explore the relationship between the importance of an object in a particular image and the area, centrality, and saliency of the corresponding image patches. Furthermore, our data shows that many words are associated with even simple environments, and that few frequently appearing objects are shared across environments

    Depth from Brightness of Moving Images

    Get PDF
    In this note we describe a method for recursively estimating the depth of a scene from a sequence of images. The input to the estimator are brightness values at a number of locations of a grid in a video image, and the output is the relative (scaled) depth corresponding to each image-point. The estimator is invariant with respect to the motion of the viewer, in the sense that the motion parameters are not part of the state of the estimator and therefore the estimates do not depend on motion as long as there is enough parallax (the translational velocity is nonzero). This scheme is a "direct" version of an other algorithm previously presented by the authors for estimating depth from point-feature correspondence independent of motion

    Robust and Efficient Recovery of Rigid Motion from Subspace Constraints Solved using Recursive Identification of Nonlinear Implicit Systems

    Get PDF
    The problem of estimating rigid motion from projections may be characterized using a nonlinear dynamical system, composed of the rigid motion transformation and the perspective map. The time derivative of the output of such a system, which is also called the "motion field", is bilinear in the motion parameters, and may be used to specify a subspace constraint on either the direction of translation or the inverse depth of the observed points. Estimating motion may then be formulated as an optimization task constrained on such a subspace. Heeger and Jepson [5], who first introduced this constraint, solve the optimization task using an extensive search over the possible directions of translation. We reformulate the optimization problem in a systems theoretic framework as the the identification of a dynamic system in exterior differential form with parameters on a differentiable manifold, and use techniques which pertain to nonlinear estimation and identification theory to perform the optimization task in a principled manner. The general technique for addressing such identification problems [14] has been used successfully in addressing other problems in computational vision [13, 12]. The application of the general method [14] results in a recursive and pseudo-optimal solution of the motion problem, which has robustness properties far superior to other existing techniques we have implemented. By releasing the constraint that the visible points lie in front of the observer, we may explain some psychophysical effects on the nonrigid percept of rigidly moving shapes. Experiments on real and synthetic image sequences show very promising results in terms of robustness, accuracy and computational efficiency

    Reducing “Structure from Motion”: a general framework for dynamic vision. 1. Modeling

    Get PDF
    The literature on recursive estimation of structure and motion from monocular image sequences comprises a large number of apparently unrelated models and estimation techniques. We propose a framework that allows us to derive and compare all models by following the idea of dynamical system reduction. The “natural” dynamic model, derived from the rigidity constraint and the projection model, is first reduced by explicitly decoupling structure (depth) from motion. Then, implicit decoupling techniques are explored, which consist of imposing that some function of the unknown parameters is held constant. By appropriately choosing such a function, not only can we account for models seen so far in the literature, but we can also derive novel ones

    Motion from "X" by Compensating "Y"

    Get PDF
    This paper analyzes the geometry of the visual motion estimation problem in relation to transformations of the input (images) that stabilize particular output functions such as the motion of a point, a line and a plane in the image. By casting the problem within the popular "epipolar geometry", we provide a common framework for including constraints such as point, line of plane fixation by just considering "slices" of the parameter manifold. The models we provide can be used for estimating motion from a batch using the preferred optimization techniques, or for defining dynamic filters that estimate motion from a causal sequence. We discuss methods for performing the necessary compensation by either controlling the support of the camera or by pre-processing the images. The compensation algorithms may be used also for recursively fitting a plane in 3-D both from point-features or directly from brightness. Conversely, they may be used for estimating motion relative to the plane independent of its parameters

    A network for multiscale image segmentation

    Get PDF
    Detecting edges of objects in their images is a basic problem in computational vision. The scale-space technique introduced by Witkin [11] provides means of using local and global reasoning in locating edges. This approach has a major drawback: it is difficult to obtain accurately the locations of the 'semantically meaningful' edges. We have refined the definition of scale-space, and introduced a class of algorithms for implementing it based on using anisotropic diffusion [9]. The algorithms involves simple, local operations replicated over the image making parallel hardware implementation feasible. In this paper we present the major ideas behind the use of scale space, and anisotropic diffusion for edge detection, we show that anisotropic diffusion can enhance edges, we suggest a network implementation of anisotropic diffusion, and provide design criteria for obtaining networks performing scale space, and edge detection. The results of a software implementation are shown
    corecore