9 research outputs found

    Dropout Distillation for Efficiently Estimating Model Confidence

    Full text link
    We propose an efficient way to output better calibrated uncertainty scores from neural networks. The Distilled Dropout Network (DDN) makes standard (non-Bayesian) neural networks more introspective by adding a new training loss which prevents them from being overconfident. Our method is more efficient than Bayesian neural networks or model ensembles which, despite providing more reliable uncertainty scores, are more cumbersome to train and slower to test. We evaluate DDN on the the task of image classification on the CIFAR-10 dataset and show that our calibration results are competitive even when compared to 100 Monte Carlo samples from a dropout network while they also increase the classification accuracy. We also propose better calibration within the state of the art Faster R-CNN object detection framework and show, using the COCO dataset, that DDN helps train better calibrated object detectors

    Predicting and improving perception performance for robotics applications

    No full text
    Perception systems are often the core component of a robotics framework as their ability to accurately interpret sensor data is essential for autonomy. The goal of this thesis is to estimate and improve the perception performance of a mobile robot across large areas of operation, particularly when there are no guarantees that the testing data distribution will match the training distribution. Such situations are prevalent for autonomous mobile robots operating outdoors under a variety of environmental conditions. This thesis explores the adaptability of vision systems by training place-specific models which outperform generic ones. We show that it is possible to train such models in a self-supervised fashion using geometric scene constraints without relying on costly image annotations. This thesis also explores the awareness that vision systems have of their own capability to make correct predictions at any given moment in time. We approach this problem from two different vantage points: firstly, through performance records which model perception performance as a function of location and appearance and, secondly, through intrinsic model uncertainty, or introspection as introduced by [Grimmett et al., 2016]. Performance records allow an autonomous agent to estimate the likelihood of making a mistake during future traversals of the same place. In a use-case scenario regarding offering or denying autonomy, we show that an agent is able to estimate when its confidence levels are low, deny autonomy, and reduce the number of perception mistakes made. Introspection refers to the ability of a model to associate an appropriate assessment of confidence with any test case. We introduce an efficient way to obtain well-calibrated and reliable uncertainty scores from neural networks. Our method is more computationally efficient than Bayesian neural networks or model ensembles which, despite being well-calibrated, are more cumbersome to train and slower to test. Additionally, we believe that we are the first to propose more introspective detectors within a state of the art object detection framework such as Faster R-CNN. This thesis proposes vision systems that are not only more accurate but also whose failures can be more reliably predicted. In doing so, we advocate practical solutions that often make use of tools specific to robotics such as additional sensing modalities or localisation maps pertaining to an autonomous vehicle, but we also touch upon machine learning techniques such as Bayesian deep learning. While striving for high accuracy remains a crucial endeavour, given the safetycritical nature of robot perception, we believe that estimating reliability, introspection, and diagnosing failure are indispensable when operating in cluttered, complex, and ever-changing environments.</p

    Probabilistic Future Prediction for Video Scene Understanding

    No full text
    We present a novel deep learning architecture for probabilistic future prediction from video. We predict the future semantics, geometry and motion of complex real-world urban scenes and use this representation to control an autonomous vehicle. This work is the first to jointly predict ego-motion, static scene, and the motion of dynamic agents in a probabilistic manner, which allows sampling consistent, highly probable futures from a compact latent space. Our model learns a representation from RGB video with a spatio-temporal convolutional module. The learned representation can be explicitly decoded to future semantic segmentation, depth, and optical flow, in addition to being an input to a learnt driving policy. To model the stochasticity of the future, we introduce a conditional variational approach which minimises the divergence between the present distribution (what could happen given what we have seen) and the future distribution (what we observe actually happens). During inference, diverse futures are generated by sampling from the present distribution.Toshiba Europe, grant G10045

    HerbDisc: Towards Lifelong Robotic Object Discovery

    No full text
    Our long-term goal is to develop a general solution to the Lifelong Robotic Object Discovery (LROD) problem: to discover new objects in the environment while the robot operates, for as long as the robot operates. In this paper, we consider the first step towards LROD: we automatically process the raw data stream of an entire workday of a robotic agent to discover objects. Our key contribution to achieve this goal is to incorporate domain knowledge—robotic metadata—in the discovery process, in addition to visual data. We propose a general graph-based formulation for LROD in which generic domain knowledge is encoded as constraints. To make long-term object discovery feasible, we encode into our formulation the natural constraints and non-visual sensory information in service robotics. A key advantage of our generic formulation is that we can add, modify

    Exploiting Domain Knowledge for Object Discovery

    No full text
    Abstract—In this paper, we consider the problem of Lifelong Robotic Object Discovery (LROD) as the long-term goal of discovering novel objects in the environment while the robot operates, for as long as the robot operates. As a first step towards LROD, we automatically process the raw video stream of an entire workday of a robotic agent to discover objects. We claim that the key to achieve this goal is to incorporate domain knowledge whenever available, in order to detect and adapt to changes in the environment. We propose a general graph-based formulation for LROD in which generic domain knowledge is encoded as constraints. Our formulation enables new sources of domain knowledge—metadata—to be added dynamically to the system, as they become available or as conditions change. By adding domain knowledge, we discover 2.7 × more objects and decrease processing time 190 times. Our optimized implementation, HerbDisc, processes 6 h 20 min of RGBD video of real human environments in 18 min 30 s, and discovers 121 correct novel objects with their 3D models. I

    Exploiting Domain Knowledge for Object Discovery

    No full text
    <p>In this paper, we consider the problem of Lifelong Robotic Object Discovery (LROD) as the long-term goal of discovering novel objects in the environment while the robot operates, for as long as the robot operates. As a first step towards LROD, we automatically process the raw video stream of an entire workday of a robotic agent to discover objects. We claim that the key to achieve this goal is to incorporate domain knowledge whenever available, in order to detect and adapt to changes in the environment. We propose a general graph-based formulation for LROD in which generic domain knowledge is encoded as constraints. Our formulation enables new sources of domain knowledge —metadata— to be added dynamically to the system, as they become available or as conditions change. By adding domain knowledge, we discover 2.7x more objects and decrease processing time 190 times. Our optimized implementation, HerbDisc, processes 6 h 20 min of RGBD video of real human environments in 18 min 30 s, and discovers 121 correct novel objects with their 3D models.</p
    corecore