6,191 research outputs found

    GASP : Geometric Association with Surface Patches

    Full text link
    A fundamental challenge to sensory processing tasks in perception and robotics is the problem of obtaining data associations across views. We present a robust solution for ascertaining potentially dense surface patch (superpixel) associations, requiring just range information. Our approach involves decomposition of a view into regularized surface patches. We represent them as sequences expressing geometry invariantly over their superpixel neighborhoods, as uniquely consistent partial orderings. We match these representations through an optimal sequence comparison metric based on the Damerau-Levenshtein distance - enabling robust association with quadratic complexity (in contrast to hitherto employed joint matching formulations which are NP-complete). The approach is able to perform under wide baselines, heavy rotations, partial overlaps, significant occlusions and sensor noise. The technique does not require any priors -- motion or otherwise, and does not make restrictive assumptions on scene structure and sensor movement. It does not require appearance -- is hence more widely applicable than appearance reliant methods, and invulnerable to related ambiguities such as textureless or aliased content. We present promising qualitative and quantitative results under diverse settings, along with comparatives with popular approaches based on range as well as RGB-D data.Comment: International Conference on 3D Vision, 201

    On the Design and Analysis of Multiple View Descriptors

    Full text link
    We propose an extension of popular descriptors based on gradient orientation histograms (HOG, computed in a single image) to multiple views. It hinges on interpreting HOG as a conditional density in the space of sampled images, where the effects of nuisance factors such as viewpoint and illumination are marginalized. However, such marginalization is performed with respect to a very coarse approximation of the underlying distribution. Our extension leverages on the fact that multiple views of the same scene allow separating intrinsic from nuisance variability, and thus afford better marginalization of the latter. The result is a descriptor that has the same complexity of single-view HOG, and can be compared in the same manner, but exploits multiple views to better trade off insensitivity to nuisance variability with specificity to intrinsic variability. We also introduce a novel multi-view wide-baseline matching dataset, consisting of a mixture of real and synthetic objects with ground truthed camera motion and dense three-dimensional geometry

    Vision, Action, and Make-Perceive

    Get PDF
    In this paper, I critically assess the enactive account of visual perception recently defended by Alva NoĂ« (2004). I argue inter alia that the enactive account falsely identifies an object’s apparent shape with its 2D perspectival shape; that it mistakenly assimilates visual shape perception and volumetric object recognition; and that it seriously misrepresents the constitutive role of bodily action in visual awareness. I argue further that noticing an object’s perspectival shape involves a hybrid experience combining both perceptual and imaginative elements – an act of what I call ‘make-perceive.

    Single-Image Depth Prediction Makes Feature Matching Easier

    Get PDF
    Good local features improve the robustness of many 3D re-localization and multi-view reconstruction pipelines. The problem is that viewing angle and distance severely impact the recognizability of a local feature. Attempts to improve appearance invariance by choosing better local feature points or by leveraging outside information, have come with pre-requisites that made some of them impractical. In this paper, we propose a surprisingly effective enhancement to local feature extraction, which improves matching. We show that CNN-based depths inferred from single RGB images are quite helpful, despite their flaws. They allow us to pre-warp images and rectify perspective distortions, to significantly enhance SIFT and BRISK features, enabling more good matches, even when cameras are looking at the same scene but in opposite directions.Comment: 14 pages, 7 figures, accepted for publication at the European conference on computer vision (ECCV) 202

    The robot's vista space : a computational 3D scene analysis

    Get PDF
    Swadzba A. The robot's vista space : a computational 3D scene analysis. Bielefeld (Germany): Bielefeld University; 2011.The space that can be explored quickly from a fixed view point without locomotion is known as the vista space. In indoor environments single rooms and room parts follow this definition. The vista space plays an important role in situations with agent-agent interaction as it is the directly surrounding environment in which the interaction takes place. A collaborative interaction of the partners in and with the environment requires that both partners know where they are, what spatial structures they are talking about, and what scene elements they are going to manipulate. This thesis focuses on the analysis of a robot's vista space. Mechanisms for extracting relevant spatial information are developed which enable the robot to recognize in which place it is, to detect the scene elements the human partner is talking about, and to segment scene structures the human is changing. These abilities are addressed by the proposed holistic, aligned, and articulated modeling approach. For a smooth human-robot interaction, the computed models should be aligned to the partner's representations. Therefore, the design of the computational models is based on the combination of psychological results from studies on human scene perception with basic physical properties of the perceived scene and the perception itself. The holistic modeling realizes a categorization of room percepts based on the observed 3D spatial layout. Room layouts have room type specific features and fMRI studies have shown that some of the human brain areas being active in scene recognition are sensitive to the 3D geometry of a room. With the aligned modeling, the robot is able to extract the hierarchical scene representation underlying a scene description given by a human tutor. Furthermore, it is able to ground the inferred scene elements in its own visual perception of the scene. This modeling follows the assumption that cognition and language schematize the world in the same way. This is visible in the fact that a scene depiction mainly consists of relations between an object and its supporting structure or between objects located on the same supporting structure. Last, the articulated modeling equips the robot with a methodology for articulated scene part extraction and fast background learning under short and disturbed observation conditions typical for human-robot interaction scenarios. Articulated scene parts are detected model-less by observing scene changes caused by their manipulation. Change detection and background learning are closely coupled because change is defined phenomenologically as variation of structure. This means that change detection involves a comparison of currently visible structures with a representation in memory. In range sensing this comparison can be nicely implement as subtraction of these two representations. The three modeling approaches enable the robot to enrich its visual perceptions of the surrounding environment, the vista space, with semantic information about meaningful spatial structures useful for further interaction with the environment and the human partner
    • 

    corecore