5,602 research outputs found

    Pseudo-labels for Supervised Learning on Dynamic Vision Sensor Data, Applied to Object Detection under Ego-motion

    Full text link
    In recent years, dynamic vision sensors (DVS), also known as event-based cameras or neuromorphic sensors, have seen increased use due to various advantages over conventional frame-based cameras. Using principles inspired by the retina, its high temporal resolution overcomes motion blurring, its high dynamic range overcomes extreme illumination conditions and its low power consumption makes it ideal for embedded systems on platforms such as drones and self-driving cars. However, event-based data sets are scarce and labels are even rarer for tasks such as object detection. We transferred discriminative knowledge from a state-of-the-art frame-based convolutional neural network (CNN) to the event-based modality via intermediate pseudo-labels, which are used as targets for supervised learning. We show, for the first time, event-based car detection under ego-motion in a real environment at 100 frames per second with a test average precision of 40.3% relative to our annotated ground truth. The event-based car detector handles motion blur and poor illumination conditions despite not explicitly trained to do so, and even complements frame-based CNN detectors, suggesting that it has learnt generalized visual representations

    Self-supervised Blur Detection from Synthetically Blurred Scenes

    Get PDF
    Blur detection aims at segmenting the blurred areas of a given image. Recent deep learning-based methods approach this problem by learning an end-to-end mapping between the blurred input and a binary mask representing the localization of its blurred areas. Nevertheless, the effectiveness of such deep models is limited due to the scarcity of datasets annotated in terms of blur segmentation, as blur annotation is labour intensive. In this work, we bypass the need for such annotated datasets for end-to-end learning, and instead rely on object proposals and a model for blur generation in order to produce a dataset of synthetically blurred images. This allows us to perform self-supervised learning over the generated image and ground truth blur mask pairs using CNNs, defining a framework that can be employed in purely self-supervised, weakly supervised or semi-supervised configurations. Interestingly, experimental results of such setups over the largest blur segmentation datasets available show that this approach achieves state of the art results in blur segmentation, even without ever observing any real blurred image.This research was partially funded by the Basque Government’s Industry Department under the ELKARTEK program’s project ONKOIKER under agreement KK2018/00090. We thank the Spanish project TIN2016- 79717-R and mention Generalitat de Catalunya CERCA Program

    Online Feature Selection for Visual Tracking

    Get PDF
    Object tracking is one of the most important tasks in many applications of computer vision. Many tracking methods use a fixed set of features ignoring that appearance of a target object may change drastically due to intrinsic and extrinsic factors. The ability to dynamically identify discriminative features would help in handling the appearance variability by improving tracking performance. The contribution of this work is threefold. Firstly, this paper presents a collection of several modern feature selection approaches selected among filter, embedded, and wrapper methods. Secondly, we provide extensive tests regarding the classification task intended to explore the strengths and weaknesses of the proposed methods with the goal to identify the right candidates for online tracking. Finally, we show how feature selection mechanisms can be successfully employed for ranking the features used by a tracking system, maintaining high frame rates. In particular, feature selection mounted on the Adaptive Color Tracking (ACT) system operates at over 110 FPS. This work demonstrates the importance of feature selection in online and realtime applications, resulted in what is clearly a very impressive performance, our solutions improve by 3% up to 7% the baseline ACT while providing superior results compared to 29 state-of-the-art tracking methods

    Sparse-to-Continuous: Enhancing Monocular Depth Estimation using Occupancy Maps

    Full text link
    This paper addresses the problem of single image depth estimation (SIDE), focusing on improving the quality of deep neural network predictions. In a supervised learning scenario, the quality of predictions is intrinsically related to the training labels, which guide the optimization process. For indoor scenes, structured-light-based depth sensors (e.g. Kinect) are able to provide dense, albeit short-range, depth maps. On the other hand, for outdoor scenes, LiDARs are considered the standard sensor, which comparatively provides much sparser measurements, especially in areas further away. Rather than modifying the neural network architecture to deal with sparse depth maps, this article introduces a novel densification method for depth maps, using the Hilbert Maps framework. A continuous occupancy map is produced based on 3D points from LiDAR scans, and the resulting reconstructed surface is projected into a 2D depth map with arbitrary resolution. Experiments conducted with various subsets of the KITTI dataset show a significant improvement produced by the proposed Sparse-to-Continuous technique, without the introduction of extra information into the training stage.Comment: Accepted. (c) 2019 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other work
    • …
    corecore