156 research outputs found

    Estimating Sensor Motion from Wide-Field Optical Flow on a Log-Dipolar Sensor

    Full text link
    Log-polar image architectures, motivated by the structure of the human visual field, have long been investigated in computer vision for use in estimating motion parameters from an optical flow vector field. Practical problems with this approach have been: (i) dependence on assumed alignment of the visual and motion axes; (ii) sensitivity to occlusion form moving and stationary objects in the central visual field, where much of the numerical sensitivity is concentrated; and (iii) inaccuracy of the log-polar architecture (which is an approximation to the central 20°) for wide-field biological vision. In the present paper, we show that an algorithm based on generalization of the log-polar architecture; termed the log-dipolar sensor, provides a large improvement in performance relative to the usual log-polar sampling. Specifically, our algorithm: (i) is tolerant of large misalignmnet of the optical and motion axes; (ii) is insensitive to significant occlusion by objects of unknown motion; and (iii) represents a more correct analogy to the wide-field structure of human vision. Using the Helmholtz-Hodge decomposition to estimate the optical flow vector field on a log-dipolar sensor, we demonstrate these advantages, using synthetic optical flow maps as well as natural image sequences

    Study of Convolutional Neural Networks for Global Parametric Motion Estimation on Log-Polar Imagery

    Get PDF
    [EN] The problem of motion estimation from images has been widely studied in the past. Although many mature solutions exist, there are still open issues and challenges to be addressed. For instance, in spite of the well-known performance of convolutional neural networks (CNNs) in many computer vision problems, only very recent work has started to explore CNNs to learning to estimate motion, as an alternative to manually-designed algorithms. These few initial efforts, however, have focused on conventional Cartesian images, while other imaging models have not been studied. This work explores the yet unknown role of CNNs in estimating global parametric motion in log-polar images. Despite its favourable properties, estimating some motion components in this model has proven particularly challenging with past approaches. It is therefore highly important to understand how CNNs behave when their input are log-polar images, since they involve a complex mapping in the motion model, a polar image geometry, and space-variant resolution. To this end, a CNN is considered in this work for regressing the motion parameters. Experiments on existing image datasets using synthetic image deformations reveal that, interestingly, standard CNNs can successfully learn to estimate global parametric motion on log-polar images with accuracies comparable to or better than with Cartesian images.This work was supported in part by the Universitat Jaume I, Castellon, Spain, through the Pla de promocio de la investigacio, under Project UJI-B2018-44; and in part by the Spanish Ministerio de Ciencia, Innovacion y Universidades through the Research Network under Grant RED2018-102511-T.Traver, VJ.; Paredes Palacios, R. (2020). Study of Convolutional Neural Networks for Global Parametric Motion Estimation on Log-Polar Imagery. IEEE Access. 8:149122-149132. https://doi.org/10.1109/ACCESS.2020.3016030S149122149132

    Cross Pixel Optical Flow Similarity for Self-Supervised Learning

    Full text link
    We propose a novel method for learning convolutional neural image representations without manual supervision. We use motion cues in the form of optical flow, to supervise representations of static images. The obvious approach of training a network to predict flow from a single image can be needlessly difficult due to intrinsic ambiguities in this prediction task. We instead propose a much simpler learning goal: embed pixels such that the similarity between their embeddings matches that between their optical flow vectors. At test time, the learned deep network can be used without access to video or flow information and transferred to tasks such as image classification, detection, and segmentation. Our method, which significantly simplifies previous attempts at using motion for self-supervision, achieves state-of-the-art results in self-supervision using motion cues, competitive results for self-supervision in general, and is overall state of the art in self-supervised pretraining for semantic image segmentation, as demonstrated on standard benchmarks

    Event-Based Algorithms For Geometric Computer Vision

    Get PDF
    Event cameras are novel bio-inspired sensors which mimic the function of the human retina. Rather than directly capturing intensities to form synchronous images as in traditional cameras, event cameras asynchronously detect changes in log image intensity. When such a change is detected at a given pixel, the change is immediately sent to the host computer, where each event consists of the x,y pixel position of the change, a timestamp, accurate to tens of microseconds, and a polarity, indicating whether the pixel got brighter or darker. These cameras provide a number of useful benefits over traditional cameras, including the ability to track extremely fast motions, high dynamic range, and low power consumption. However, with a new sensing modality comes the need to develop novel algorithms. As these cameras do not capture photometric intensities, novel loss functions must be developed to replace the photoconsistency assumption which serves as the backbone of many classical computer vision algorithms. In addition, the relative novelty of these sensors means that there does not exist the wealth of data available for traditional images with which we can train learning based methods such as deep neural networks. In this work, we address both of these issues with two foundational principles. First, we show that the motion blur induced when the events are projected into the 2D image plane can be used as a suitable substitute for the classical photometric loss function. Second, we develop self-supervised learning methods which allow us to train convolutional neural networks to estimate motion without any labeled training data. We apply these principles to solve classical perception problems such as feature tracking, visual inertial odometry, optical flow and stereo depth estimation, as well as recognition tasks such as object detection and human pose estimation. We show that these solutions are able to utilize the benefits of event cameras, allowing us to operate in fast moving scenes with challenging lighting which would be incredibly difficult for traditional cameras

    Multiple Integrated Navigation Sensors for Improving Occupancy Grid FastSLAM

    Get PDF
    An autonomous vehicle must accurately observe its location within the environment to interact with objects and accomplish its mission. When its environment is unknown, the vehicle must construct a map detailing its surroundings while using it to maintain an accurate location. Such a vehicle is faced with the circularly defined Simultaneous Localization and Mapping (SLAM) problem. However difficult, SLAM is a critical component of autonomous vehicle exploration with applications to search and rescue. To current knowledge, this research presents the first SLAM solution to integrate stereo cameras, inertial measurements, and vehicle odometry into a Multiple Integrated Navigation Sensor (MINS) path. The implementation combines the MINS path with LIDAR to observe and map the environment using the FastSLAM algorithm. In real-world tests, a mobile ground vehicle equipped with these sensors completed a 140 meter loop around indoor hallways. This SLAM solution produces a path that closes the loop and remains within 1 meter of truth, reducing the error 92% from an image-inertial navigation system and 79% from odometry FastSLAM

    DART: Distribution Aware Retinal Transform for Event-based Cameras

    Full text link
    We introduce a generic visual descriptor, termed as distribution aware retinal transform (DART), that encodes the structural context using log-polar grids for event cameras. The DART descriptor is applied to four different problems, namely object classification, tracking, detection and feature matching: (1) The DART features are directly employed as local descriptors in a bag-of-features classification framework and testing is carried out on four standard event-based object datasets (N-MNIST, MNIST-DVS, CIFAR10-DVS, NCaltech-101). (2) Extending the classification system, tracking is demonstrated using two key novelties: (i) For overcoming the low-sample problem for the one-shot learning of a binary classifier, statistical bootstrapping is leveraged with online learning; (ii) To achieve tracker robustness, the scale and rotation equivariance property of the DART descriptors is exploited for the one-shot learning. (3) To solve the long-term object tracking problem, an object detector is designed using the principle of cluster majority voting. The detection scheme is then combined with the tracker to result in a high intersection-over-union score with augmented ground truth annotations on the publicly available event camera dataset. (4) Finally, the event context encoded by DART greatly simplifies the feature correspondence problem, especially for spatio-temporal slices far apart in time, which has not been explicitly tackled in the event-based vision domain.Comment: 12 pages, revision submitted to TPAMI in Nov 201
    corecore