628 research outputs found
Optical flow by multi-scale annotated keypoints: A biological approach
Optical flow is the pattern of apparent motion of objects in a visual scene and the relative motion, or egomotion, of the observer in the scene. In this paper we present a new cortical model for optical flow. This model is based on simple, complex and end-stopped cells. Responses of end-stopped cells serve to detect keypoints and those of simple cells are used to detect orientations of underlying structures and to classify the junction type. By combining a hierarchical, multi-scale tree structure and saliency maps, moving objects can be segregated, their movement can be obtained, and they can be tracked over time. We also show that optical flow at coarse scales
suffices to determine egomotion. The model is discussed in the context of an integrated cortical architecture which includes disparity in stereo vision
Multi-scale cortical keypoints for realtime hand tracking and gesture recognition
Human-robot interaction is an interdisciplinary
research area which aims at integrating human factors, cognitive
psychology and robot technology. The ultimate goal is
the development of social robots. These robots are expected to
work in human environments, and to understand behavior of
persons through gestures and body movements. In this paper
we present a biological and realtime framework for detecting
and tracking hands. This framework is based on keypoints
extracted from cortical V1 end-stopped cells. Detected keypoints
and the cells’ responses are used to classify the junction type.
By combining annotated keypoints in a hierarchical, multi-scale
tree structure, moving and deformable hands can be segregated,
their movements can be obtained, and they can be tracked over
time. By using hand templates with keypoints at only two scales,
a hand’s gestures can be recognized
The SmartVision local navigation aid for blind and visually impaired persons
The SmartVision prototype is a small, cheap and easily wearable navigation aid for blind and visually impaired persons. Its functionality addresses global navigation for guiding the user to some destiny, and local navigation for negotiating paths, sidewalks and corridors, with avoidance of static as well as moving obstacles. Local navigation applies to both in- and outdoor situations. In this article we focus on local navigation: the detection of path borders and obstacles in front of the user and just beyond the reach of the white cane, such that the user can be assisted in centering on the path and alerted to looming hazards. Using a stereo camera worn at chest height, a portable computer in a shoulder-strapped pouch or pocket and only one earphone or small speaker, the system is
inconspicuous, it is no hindrence while walking with the cane, and it does not block normal surround sounds. The vision algorithms are optimised such that the system can work at a few frames per second
Unsupervised learning of object landmarks by factorized spatial embeddings
Learning automatically the structure of object categories remains an
important open problem in computer vision. In this paper, we propose a novel
unsupervised approach that can discover and learn landmarks in object
categories, thus characterizing their structure. Our approach is based on
factorizing image deformations, as induced by a viewpoint change or an object
deformation, by learning a deep neural network that detects landmarks
consistently with such visual effects. Furthermore, we show that the learned
landmarks establish meaningful correspondences between different object
instances in a category without having to impose this requirement explicitly.
We assess the method qualitatively on a variety of object types, natural and
man-made. We also show that our unsupervised landmarks are highly predictive of
manually-annotated landmarks in face benchmark datasets, and can be used to
regress these with a high degree of accuracy.Comment: To be published in ICCV 201
Slim DensePose: Thrifty Learning from Sparse Annotations and Motion Cues
DensePose supersedes traditional landmark detectors by densely mapping image
pixels to body surface coordinates. This power, however, comes at a greatly
increased annotation time, as supervising the model requires to manually label
hundreds of points per pose instance. In this work, we thus seek methods to
significantly slim down the DensePose annotations, proposing more efficient
data collection strategies. In particular, we demonstrate that if annotations
are collected in video frames, their efficacy can be multiplied for free by
using motion cues. To explore this idea, we introduce DensePose-Track, a
dataset of videos where selected frames are annotated in the traditional
DensePose manner. Then, building on geometric properties of the DensePose
mapping, we use the video dynamic to propagate ground-truth annotations in time
as well as to learn from Siamese equivariance constraints. Having performed
exhaustive empirical evaluation of various data annotation and learning
strategies, we demonstrate that doing so can deliver significantly improved
pose estimation results over strong baselines. However, despite what is
suggested by some recent works, we show that merely synthesizing motion
patterns by applying geometric transformations to isolated frames is
significantly less effective, and that motion cues help much more when they are
extracted from videos.Comment: CVPR 201
Deep Graph Matching via Blackbox Differentiation of Combinatorial Solvers
Building on recent progress at the intersection of combinatorial optimization
and deep learning, we propose an end-to-end trainable architecture for deep
graph matching that contains unmodified combinatorial solvers. Using the
presence of heavily optimized combinatorial solvers together with some
improvements in architecture design, we advance state-of-the-art on deep graph
matching benchmarks for keypoint correspondence. In addition, we highlight the
conceptual advantages of incorporating solvers into deep learning
architectures, such as the possibility of post-processing with a strong
multi-graph matching solver or the indifference to changes in the training
setting. Finally, we propose two new challenging experimental setups. The code
is available at https://github.com/martius-lab/blackbox-deep-graph-matchingComment: ECCV 2020 conference pape
- …