628 research outputs found

    Optical flow by multi-scale annotated keypoints: A biological approach

    Get PDF
    Optical flow is the pattern of apparent motion of objects in a visual scene and the relative motion, or egomotion, of the observer in the scene. In this paper we present a new cortical model for optical flow. This model is based on simple, complex and end-stopped cells. Responses of end-stopped cells serve to detect keypoints and those of simple cells are used to detect orientations of underlying structures and to classify the junction type. By combining a hierarchical, multi-scale tree structure and saliency maps, moving objects can be segregated, their movement can be obtained, and they can be tracked over time. We also show that optical flow at coarse scales suffices to determine egomotion. The model is discussed in the context of an integrated cortical architecture which includes disparity in stereo vision

    Multi-scale cortical keypoints for realtime hand tracking and gesture recognition

    Get PDF
    Human-robot interaction is an interdisciplinary research area which aims at integrating human factors, cognitive psychology and robot technology. The ultimate goal is the development of social robots. These robots are expected to work in human environments, and to understand behavior of persons through gestures and body movements. In this paper we present a biological and realtime framework for detecting and tracking hands. This framework is based on keypoints extracted from cortical V1 end-stopped cells. Detected keypoints and the cells’ responses are used to classify the junction type. By combining annotated keypoints in a hierarchical, multi-scale tree structure, moving and deformable hands can be segregated, their movements can be obtained, and they can be tracked over time. By using hand templates with keypoints at only two scales, a hand’s gestures can be recognized

    The SmartVision local navigation aid for blind and visually impaired persons

    Get PDF
    The SmartVision prototype is a small, cheap and easily wearable navigation aid for blind and visually impaired persons. Its functionality addresses global navigation for guiding the user to some destiny, and local navigation for negotiating paths, sidewalks and corridors, with avoidance of static as well as moving obstacles. Local navigation applies to both in- and outdoor situations. In this article we focus on local navigation: the detection of path borders and obstacles in front of the user and just beyond the reach of the white cane, such that the user can be assisted in centering on the path and alerted to looming hazards. Using a stereo camera worn at chest height, a portable computer in a shoulder-strapped pouch or pocket and only one earphone or small speaker, the system is inconspicuous, it is no hindrence while walking with the cane, and it does not block normal surround sounds. The vision algorithms are optimised such that the system can work at a few frames per second

    Unsupervised learning of object landmarks by factorized spatial embeddings

    Full text link
    Learning automatically the structure of object categories remains an important open problem in computer vision. In this paper, we propose a novel unsupervised approach that can discover and learn landmarks in object categories, thus characterizing their structure. Our approach is based on factorizing image deformations, as induced by a viewpoint change or an object deformation, by learning a deep neural network that detects landmarks consistently with such visual effects. Furthermore, we show that the learned landmarks establish meaningful correspondences between different object instances in a category without having to impose this requirement explicitly. We assess the method qualitatively on a variety of object types, natural and man-made. We also show that our unsupervised landmarks are highly predictive of manually-annotated landmarks in face benchmark datasets, and can be used to regress these with a high degree of accuracy.Comment: To be published in ICCV 201

    Slim DensePose: Thrifty Learning from Sparse Annotations and Motion Cues

    Get PDF
    DensePose supersedes traditional landmark detectors by densely mapping image pixels to body surface coordinates. This power, however, comes at a greatly increased annotation time, as supervising the model requires to manually label hundreds of points per pose instance. In this work, we thus seek methods to significantly slim down the DensePose annotations, proposing more efficient data collection strategies. In particular, we demonstrate that if annotations are collected in video frames, their efficacy can be multiplied for free by using motion cues. To explore this idea, we introduce DensePose-Track, a dataset of videos where selected frames are annotated in the traditional DensePose manner. Then, building on geometric properties of the DensePose mapping, we use the video dynamic to propagate ground-truth annotations in time as well as to learn from Siamese equivariance constraints. Having performed exhaustive empirical evaluation of various data annotation and learning strategies, we demonstrate that doing so can deliver significantly improved pose estimation results over strong baselines. However, despite what is suggested by some recent works, we show that merely synthesizing motion patterns by applying geometric transformations to isolated frames is significantly less effective, and that motion cues help much more when they are extracted from videos.Comment: CVPR 201

    Deep Graph Matching via Blackbox Differentiation of Combinatorial Solvers

    Full text link
    Building on recent progress at the intersection of combinatorial optimization and deep learning, we propose an end-to-end trainable architecture for deep graph matching that contains unmodified combinatorial solvers. Using the presence of heavily optimized combinatorial solvers together with some improvements in architecture design, we advance state-of-the-art on deep graph matching benchmarks for keypoint correspondence. In addition, we highlight the conceptual advantages of incorporating solvers into deep learning architectures, such as the possibility of post-processing with a strong multi-graph matching solver or the indifference to changes in the training setting. Finally, we propose two new challenging experimental setups. The code is available at https://github.com/martius-lab/blackbox-deep-graph-matchingComment: ECCV 2020 conference pape
    • …
    corecore