27 research outputs found
Multimodal learning from visual and remotely sensed data
Autonomous vehicles are often deployed to perform exploration and monitoring missions in unseen environments. In such applications, there is often a compromise between the information richness and the acquisition cost of different sensor modalities. Visual data is usually very information-rich, but requires in-situ acquisition with the robot. In contrast, remotely sensed data has a larger range and footprint, and may be available prior to a mission. In order to effectively and efficiently explore and monitor the environment, it is critical to make use of all of the sensory information available to the robot. One important application is the use of an Autonomous Underwater Vehicle (AUV) to survey the ocean floor. AUVs can take high resolution in-situ photographs of the sea floor, which can be used to classify different regions into various habitat classes that summarise the observed physical and biological properties. This is known as benthic habitat mapping. However, since AUVs can only image a tiny fraction of the ocean floor, habitat mapping is usually performed with remotely sensed bathymetry (ocean depth) data, obtained from shipborne multibeam sonar. With the recent surge in unsupervised feature learning and deep learning techniques, a number of previous techniques have investigated the concept of multimodal learning: capturing the relationship between different sensor modalities in order to perform classification and other inference tasks. This thesis proposes related techniques for visual and remotely sensed data, applied to the task of autonomous exploration and monitoring with an AUV. Doing so enables more accurate classification of the benthic environment, and also assists autonomous survey planning. The first contribution of this thesis is to apply unsupervised feature learning techniques to marine data. The proposed techniques are used to extract features from image and bathymetric data separately, and the performance is compared to that with more traditionally used features for each sensor modality. The second contribution is the development of a multimodal learning architecture that captures the relationship between the two modalities. The model is robust to missing modalities, which means it can extract better features for large-scale benthic habitat mapping, where only bathymetry is available. The model is used to perform classification with various combinations of modalities, demonstrating that multimodal learning provides a large performance improvement over the baseline case. The third contribution is an extension of the standard learning architecture using a gated feature learning model, which enables the model to better capture the āone-to-manyā relationship between visual and bathymetric data. This opens up further inference capabilities, with the ability to predict visual features from bathymetric data, which allows image-based queries. Such queries are useful for AUV survey planning, especially when supervised labels are unavailable. The final contribution is the novel derivation of a number of information-theoretic measures to aid survey planning. The proposed measures predict the utility of unobserved areas, in terms of the amount of expected additional visual information. As such, they are able to produce utility maps over a large region that can be used by the AUV to determine the most informative locations from a set of candidate missions. The models proposed in this thesis are validated through extensive experiments on real marine data. Furthermore, the introduced techniques have applications in various other areas within robotics. As such, this thesis concludes with a discussion on the broader implications of these contributions, and the future research directions that arise as a result of this work
Attention-Privileged Reinforcement Learning
Image-based Reinforcement Learning is known to suffer from poor sample
efficiency and generalisation to unseen visuals such as distractors
(task-independent aspects of the observation space). Visual domain
randomisation encourages transfer by training over visual factors of variation
that may be encountered in the target domain. This increases learning
complexity, can negatively impact learning rate and performance, and requires
knowledge of potential variations during deployment. In this paper, we
introduce Attention-Privileged Reinforcement Learning (APRiL) which uses a
self-supervised attention mechanism to significantly alleviate these drawbacks:
by focusing on task-relevant aspects of the observations, attention provides
robustness to distractors as well as significantly increased learning
efficiency. APRiL trains two attention-augmented actor-critic agents: one
purely based on image observations, available across training and transfer
domains; and one with access to privileged information (such as environment
states) available only during training. Experience is shared between both
agents and their attention mechanisms are aligned. The image-based policy can
then be deployed without access to privileged information. We experimentally
demonstrate accelerated and more robust learning on a diverse set of domains,
leading to improved final performance for environments both within and outside
the training distribution.Comment: Published at Conference on Robot Learning (CoRL) 202
Continual Unsupervised Representation Learning
Continual learning aims to improve the ability of modern learning systems to
deal with non-stationary distributions, typically by attempting to learn a
series of tasks sequentially. Prior art in the field has largely considered
supervised or reinforcement learning tasks, and often assumes full knowledge of
task labels and boundaries. In this work, we propose an approach (CURL) to
tackle a more general problem that we will refer to as unsupervised continual
learning. The focus is on learning representations without any knowledge about
task identity, and we explore scenarios when there are abrupt changes between
tasks, smooth transitions from one task to another, or even when the data is
shuffled. The proposed approach performs task inference directly within the
model, is able to dynamically expand to capture new concepts over its lifetime,
and incorporates additional rehearsal-based techniques to deal with
catastrophic forgetting. We demonstrate the efficacy of CURL in an unsupervised
learning setting with MNIST and Omniglot, where the lack of labels ensures no
information is leaked about the task. Further, we demonstrate strong
performance compared to prior art in an i.i.d setting, or when adapting the
technique to supervised tasks such as incremental class learning.Comment: NeurIPS 201
Towards Compute-Optimal Transfer Learning
The field of transfer learning is undergoing a significant shift with the
introduction of large pretrained models which have demonstrated strong
adaptability to a variety of downstream tasks. However, the high computational
and memory requirements to finetune or use these models can be a hindrance to
their widespread use. In this study, we present a solution to this issue by
proposing a simple yet effective way to trade computational efficiency for
asymptotic performance which we define as the performance a learning algorithm
achieves as compute tends to infinity. Specifically, we argue that zero-shot
structured pruning of pretrained models allows them to increase compute
efficiency with minimal reduction in performance. We evaluate our method on the
Nevis'22 continual learning benchmark that offers a diverse set of transfer
scenarios. Our results show that pruning convolutional filters of pretrained
models can lead to more than 20% performance improvement in low computational
regimes
Health, education, and social care provision after diagnosis of childhood visual disability
Aim: To investigate the health, education, and social care provision for children newly diagnosed with visual disability.Method: This was a national prospective study, the British Childhood Visual Impairment and Blindness Study 2 (BCVIS2), ascertaining new diagnoses of visual impairment or severe visual impairment and blindness (SVIBL), or equivalent vi-sion. Data collection was performed by managing clinicians up to 1-year follow-up, and included health and developmental needs, and health, education, and social care provision.Results: BCVIS2 identified 784 children newly diagnosed with visual impairment/SVIBL (313 with visual impairment, 471 with SVIBL). Most children had associated systemic disorders (559 [71%], 167 [54%] with visual impairment, and 392 [84%] with SVIBL). Care from multidisciplinary teams was provided for 549 children (70%). Two-thirds (515) had not received an Education, Health, and Care Plan (EHCP). Fewer children with visual impairment had seen a specialist teacher (SVIBL 35%, visual impairment 28%, Ļ2p < 0.001), or had an EHCP (11% vs 7%, Ļ2p < 0 . 01).Interpretation: Families need additional support from managing clinicians to access recommended complex interventions such as the use of multidisciplinary teams and educational support. This need is pressing, as the population of children with visual impairment/SVIBL is expected to grow in size and complexity.This is an open access article under the terms of the Creative Commons Attribution License, which permits use, distribution and reproduction in any medium, provided the original work is properly cited
CurveSLAM: utilizing higher level structure in stereo vision-based navigation
Existing approaches to visual Simultaneous Localization and Mapping (SLAM)
typically utilize points as visual feature primitives to represent landmarks in the
environment. Since these techniques mostly use image points from a standard
feature point detector, they do not explicitly map objects or regions of interest.
Further, previous SLAM techniques that propose the use of higher level structures
often place constraints on the environment, such as requiring orthogonal lines and
planes. Our work is motivated by the need for different SLAM techniques in path
and riverine settings, where feature points can be scarce and may not adequately
represent the environment. Accordingly, the proposed approach uses BĀ“ezier polynomial
curves as stereo vision primitives and offers a novel SLAM formulation to
update the curve parameters and vehicle pose. This method eliminates the need
for point-based stereo matching, with an optimization procedure to directly extract
the curve information in the world frame from noisy edge measurements.
Further, the proposed algorithm enables navigation with fewer feature states than
most point-based techniques, and is able to produce a map which only provides
detail in key areas. Results in simulation and with vision data validate that the
proposed method can be effective in estimating the 6DOF pose of the stereo camera,
and can produce structured, uncluttered maps. Monte Carlo simulations of
the algorithm are also provided to analyze its consistency
CurveSLAM: An approach for vision-based navigation without point features
Existing approaches to visual Simultaneous Localization and Mapping (SLAM) typically utilize points as visual feature primitives to represent landmarks in the environment. Since these techniques mostly use image points from a standard feature point detector, they do not explicitly map objects or regions of interest. Our work is motivated by the need for different SLAM techniques in path and riverine settings, where feature points can be scarce or may not adequately represent the environment. Accordingly, the proposed approach uses cubic BeĢzier curves as stereo vision primitives and offers a novel SLAM formulation to update the curve parameters and vehicle pose. This method eliminates the need for point-based stereo matching, with an optimization procedure to directly extract the curve information in the world frame from noisy edge measurements. Further, the proposed algorithm enables navigation with fewer feature states than most point-based techniques, and is able to produce a map which only provides detail in key areas. Results in simulation and with vision data validate that the proposed method can be effective in estimating the 6DOF pose of the stereo camera, and can produce structured, uncluttered maps