9 research outputs found
Dropout Distillation for Efficiently Estimating Model Confidence
We propose an efficient way to output better calibrated uncertainty scores
from neural networks. The Distilled Dropout Network (DDN) makes standard
(non-Bayesian) neural networks more introspective by adding a new training loss
which prevents them from being overconfident. Our method is more efficient than
Bayesian neural networks or model ensembles which, despite providing more
reliable uncertainty scores, are more cumbersome to train and slower to test.
We evaluate DDN on the the task of image classification on the CIFAR-10 dataset
and show that our calibration results are competitive even when compared to 100
Monte Carlo samples from a dropout network while they also increase the
classification accuracy. We also propose better calibration within the state of
the art Faster R-CNN object detection framework and show, using the COCO
dataset, that DDN helps train better calibrated object detectors
Predicting and improving perception performance for robotics applications
Perception systems are often the core component of a robotics framework as their
ability to accurately interpret sensor data is essential for autonomy. The goal of
this thesis is to estimate and improve the perception performance of a mobile robot
across large areas of operation, particularly when there are no guarantees that
the testing data distribution will match the training distribution. Such situations
are prevalent for autonomous mobile robots operating outdoors under a variety
of environmental conditions.
This thesis explores the adaptability of vision systems by training place-specific
models which outperform generic ones. We show that it is possible to train such
models in a self-supervised fashion using geometric scene constraints without relying
on costly image annotations.
This thesis also explores the awareness that vision systems have of their own
capability to make correct predictions at any given moment in time. We approach
this problem from two different vantage points: firstly, through performance records
which model perception performance as a function of location and appearance
and, secondly, through intrinsic model uncertainty, or introspection as introduced
by [Grimmett et al., 2016].
Performance records allow an autonomous agent to estimate the likelihood
of making a mistake during future traversals of the same place. In a use-case
scenario regarding offering or denying autonomy, we show that an agent is able
to estimate when its confidence levels are low, deny autonomy, and reduce the
number of perception mistakes made.
Introspection refers to the ability of a model to associate an appropriate
assessment of confidence with any test case. We introduce an efficient way to obtain
well-calibrated and reliable uncertainty scores from neural networks. Our method is
more computationally efficient than Bayesian neural networks or model ensembles
which, despite being well-calibrated, are more cumbersome to train and slower to
test. Additionally, we believe that we are the first to propose more introspective
detectors within a state of the art object detection framework such as Faster R-CNN.
This thesis proposes vision systems that are not only more accurate but also
whose failures can be more reliably predicted. In doing so, we advocate practical
solutions that often make use of tools specific to robotics such as additional sensing
modalities or localisation maps pertaining to an autonomous vehicle, but we also
touch upon machine learning techniques such as Bayesian deep learning.
While striving for high accuracy remains a crucial endeavour, given the safetycritical
nature of robot perception, we believe that estimating reliability, introspection,
and diagnosing failure are indispensable when operating in cluttered,
complex, and ever-changing environments.</p
Probabilistic Future Prediction for Video Scene Understanding
We present a novel deep learning architecture for probabilistic future prediction from video. We predict the future semantics, geometry and motion of complex real-world urban scenes and use this representation to control an autonomous vehicle. This work is the first to jointly
predict ego-motion, static scene, and the motion of dynamic agents in a
probabilistic manner, which allows sampling consistent, highly probable
futures from a compact latent space. Our model learns a representation from RGB video with a spatio-temporal convolutional module. The
learned representation can be explicitly decoded to future semantic segmentation, depth, and optical flow, in addition to being an input to a
learnt driving policy. To model the stochasticity of the future, we introduce a conditional variational approach which minimises the divergence
between the present distribution (what could happen given what we have
seen) and the future distribution (what we observe actually happens).
During inference, diverse futures are generated by sampling from the
present distribution.Toshiba Europe, grant G10045
HerbDisc: Towards Lifelong Robotic Object Discovery
Our long-term goal is to develop a general solution to the Lifelong Robotic Object Discovery (LROD) problem: to discover new objects in the environment while the robot operates, for as long as the robot operates. In this paper, we consider the first step towards LROD: we automatically process the raw data stream of an entire workday of a robotic agent to discover objects. Our key contribution to achieve this goal is to incorporate domain knowledge—robotic metadata—in the discovery process, in addition to visual data. We propose a general graph-based formulation for LROD in which generic domain knowledge is encoded as constraints. To make long-term object discovery feasible, we encode into our formulation the natural constraints and non-visual sensory information in service robotics. A key advantage of our generic formulation is that we can add, modify
Exploiting Domain Knowledge for Object Discovery
Abstract—In this paper, we consider the problem of Lifelong Robotic Object Discovery (LROD) as the long-term goal of discovering novel objects in the environment while the robot operates, for as long as the robot operates. As a first step towards LROD, we automatically process the raw video stream of an entire workday of a robotic agent to discover objects. We claim that the key to achieve this goal is to incorporate domain knowledge whenever available, in order to detect and adapt to changes in the environment. We propose a general graph-based formulation for LROD in which generic domain knowledge is encoded as constraints. Our formulation enables new sources of domain knowledge—metadata—to be added dynamically to the system, as they become available or as conditions change. By adding domain knowledge, we discover 2.7 × more objects and decrease processing time 190 times. Our optimized implementation, HerbDisc, processes 6 h 20 min of RGBD video of real human environments in 18 min 30 s, and discovers 121 correct novel objects with their 3D models. I
Exploiting Domain Knowledge for Object Discovery
<p>In this paper, we consider the problem of Lifelong Robotic Object Discovery (LROD) as the long-term goal of discovering novel objects in the environment while the robot operates, for as long as the robot operates. As a first step towards LROD, we automatically process the raw video stream of an entire workday of a robotic agent to discover objects. We claim that the key to achieve this goal is to incorporate domain knowledge whenever available, in order to detect and adapt to changes in the environment. We propose a general graph-based formulation for LROD in which generic domain knowledge is encoded as constraints. Our formulation enables new sources of domain knowledge —metadata— to be added dynamically to the system, as they become available or as conditions change. By adding domain knowledge, we discover 2.7x more objects and decrease processing time 190 times. Our optimized implementation, HerbDisc, processes 6 h 20 min of RGBD video of real human environments in 18 min 30 s, and discovers 121 correct novel objects with their 3D models.</p
Learn from experience: Probabilistic prediction of perception performance to avoid failure
Recommended from our members
Model-Based Imitation Learning for Urban Driving
An accurate model of the environment and the dynamic agents acting in it offers great potential for improving motion planning. We present MILE: a Model-based Imitation LEarning approach to jointly learn a model of the world and a policy for autonomous driving. Our method leverages 3D geometry as an inductive bias and learns a highly compact latent space directly from high-resolution videos of expert demonstrations. Our model is trained on an offline corpus of urban driving data, without any online interaction with the environment. MILE improves upon prior state-of-the-art by 31% in driving score on the CARLA simulator when deployed in a completely new town and new weather conditions. Our model can predict diverse and plausible states and actions, that can be interpretably decoded to bird's-eye view semantic segmentation. Further, we demonstrate that it can execute complex driving manoeuvres from plans entirely predicted in imagination. Our approach is the first camera-only method that models static scene, dynamic scene, and ego-behaviour in an urban driving environment. The code and model weights are available at https://github.com/wayveai/mile.Toshiba Europe grant G10045