27 research outputs found

    Multimodal learning from visual and remotely sensed data

    Get PDF
    Autonomous vehicles are often deployed to perform exploration and monitoring missions in unseen environments. In such applications, there is often a compromise between the information richness and the acquisition cost of different sensor modalities. Visual data is usually very information-rich, but requires in-situ acquisition with the robot. In contrast, remotely sensed data has a larger range and footprint, and may be available prior to a mission. In order to effectively and efficiently explore and monitor the environment, it is critical to make use of all of the sensory information available to the robot. One important application is the use of an Autonomous Underwater Vehicle (AUV) to survey the ocean floor. AUVs can take high resolution in-situ photographs of the sea floor, which can be used to classify different regions into various habitat classes that summarise the observed physical and biological properties. This is known as benthic habitat mapping. However, since AUVs can only image a tiny fraction of the ocean floor, habitat mapping is usually performed with remotely sensed bathymetry (ocean depth) data, obtained from shipborne multibeam sonar. With the recent surge in unsupervised feature learning and deep learning techniques, a number of previous techniques have investigated the concept of multimodal learning: capturing the relationship between different sensor modalities in order to perform classification and other inference tasks. This thesis proposes related techniques for visual and remotely sensed data, applied to the task of autonomous exploration and monitoring with an AUV. Doing so enables more accurate classification of the benthic environment, and also assists autonomous survey planning. The first contribution of this thesis is to apply unsupervised feature learning techniques to marine data. The proposed techniques are used to extract features from image and bathymetric data separately, and the performance is compared to that with more traditionally used features for each sensor modality. The second contribution is the development of a multimodal learning architecture that captures the relationship between the two modalities. The model is robust to missing modalities, which means it can extract better features for large-scale benthic habitat mapping, where only bathymetry is available. The model is used to perform classification with various combinations of modalities, demonstrating that multimodal learning provides a large performance improvement over the baseline case. The third contribution is an extension of the standard learning architecture using a gated feature learning model, which enables the model to better capture the ā€˜one-to-manyā€™ relationship between visual and bathymetric data. This opens up further inference capabilities, with the ability to predict visual features from bathymetric data, which allows image-based queries. Such queries are useful for AUV survey planning, especially when supervised labels are unavailable. The final contribution is the novel derivation of a number of information-theoretic measures to aid survey planning. The proposed measures predict the utility of unobserved areas, in terms of the amount of expected additional visual information. As such, they are able to produce utility maps over a large region that can be used by the AUV to determine the most informative locations from a set of candidate missions. The models proposed in this thesis are validated through extensive experiments on real marine data. Furthermore, the introduced techniques have applications in various other areas within robotics. As such, this thesis concludes with a discussion on the broader implications of these contributions, and the future research directions that arise as a result of this work

    Attention-Privileged Reinforcement Learning

    Full text link
    Image-based Reinforcement Learning is known to suffer from poor sample efficiency and generalisation to unseen visuals such as distractors (task-independent aspects of the observation space). Visual domain randomisation encourages transfer by training over visual factors of variation that may be encountered in the target domain. This increases learning complexity, can negatively impact learning rate and performance, and requires knowledge of potential variations during deployment. In this paper, we introduce Attention-Privileged Reinforcement Learning (APRiL) which uses a self-supervised attention mechanism to significantly alleviate these drawbacks: by focusing on task-relevant aspects of the observations, attention provides robustness to distractors as well as significantly increased learning efficiency. APRiL trains two attention-augmented actor-critic agents: one purely based on image observations, available across training and transfer domains; and one with access to privileged information (such as environment states) available only during training. Experience is shared between both agents and their attention mechanisms are aligned. The image-based policy can then be deployed without access to privileged information. We experimentally demonstrate accelerated and more robust learning on a diverse set of domains, leading to improved final performance for environments both within and outside the training distribution.Comment: Published at Conference on Robot Learning (CoRL) 202

    Continual Unsupervised Representation Learning

    Full text link
    Continual learning aims to improve the ability of modern learning systems to deal with non-stationary distributions, typically by attempting to learn a series of tasks sequentially. Prior art in the field has largely considered supervised or reinforcement learning tasks, and often assumes full knowledge of task labels and boundaries. In this work, we propose an approach (CURL) to tackle a more general problem that we will refer to as unsupervised continual learning. The focus is on learning representations without any knowledge about task identity, and we explore scenarios when there are abrupt changes between tasks, smooth transitions from one task to another, or even when the data is shuffled. The proposed approach performs task inference directly within the model, is able to dynamically expand to capture new concepts over its lifetime, and incorporates additional rehearsal-based techniques to deal with catastrophic forgetting. We demonstrate the efficacy of CURL in an unsupervised learning setting with MNIST and Omniglot, where the lack of labels ensures no information is leaked about the task. Further, we demonstrate strong performance compared to prior art in an i.i.d setting, or when adapting the technique to supervised tasks such as incremental class learning.Comment: NeurIPS 201

    Towards Compute-Optimal Transfer Learning

    Full text link
    The field of transfer learning is undergoing a significant shift with the introduction of large pretrained models which have demonstrated strong adaptability to a variety of downstream tasks. However, the high computational and memory requirements to finetune or use these models can be a hindrance to their widespread use. In this study, we present a solution to this issue by proposing a simple yet effective way to trade computational efficiency for asymptotic performance which we define as the performance a learning algorithm achieves as compute tends to infinity. Specifically, we argue that zero-shot structured pruning of pretrained models allows them to increase compute efficiency with minimal reduction in performance. We evaluate our method on the Nevis'22 continual learning benchmark that offers a diverse set of transfer scenarios. Our results show that pruning convolutional filters of pretrained models can lead to more than 20% performance improvement in low computational regimes

    Health, education, and social care provision after diagnosis of childhood visual disability

    Get PDF
    Aim: To investigate the health, education, and social care provision for children newly diagnosed with visual disability.Method: This was a national prospective study, the British Childhood Visual Impairment and Blindness Study 2 (BCVIS2), ascertaining new diagnoses of visual impairment or severe visual impairment and blindness (SVIBL), or equivalent vi-sion. Data collection was performed by managing clinicians up to 1-year follow-up, and included health and developmental needs, and health, education, and social care provision.Results: BCVIS2 identified 784 children newly diagnosed with visual impairment/SVIBL (313 with visual impairment, 471 with SVIBL). Most children had associated systemic disorders (559 [71%], 167 [54%] with visual impairment, and 392 [84%] with SVIBL). Care from multidisciplinary teams was provided for 549 children (70%). Two-thirds (515) had not received an Education, Health, and Care Plan (EHCP). Fewer children with visual impairment had seen a specialist teacher (SVIBL 35%, visual impairment 28%, Ļ‡2p < 0.001), or had an EHCP (11% vs 7%, Ļ‡2p < 0 . 01).Interpretation: Families need additional support from managing clinicians to access recommended complex interventions such as the use of multidisciplinary teams and educational support. This need is pressing, as the population of children with visual impairment/SVIBL is expected to grow in size and complexity.This is an open access article under the terms of the Creative Commons Attribution License, which permits use, distribution and reproduction in any medium, provided the original work is properly cited

    CurveSLAM: utilizing higher level structure in stereo vision-based navigation

    Get PDF
    Existing approaches to visual Simultaneous Localization and Mapping (SLAM) typically utilize points as visual feature primitives to represent landmarks in the environment. Since these techniques mostly use image points from a standard feature point detector, they do not explicitly map objects or regions of interest. Further, previous SLAM techniques that propose the use of higher level structures often place constraints on the environment, such as requiring orthogonal lines and planes. Our work is motivated by the need for different SLAM techniques in path and riverine settings, where feature points can be scarce and may not adequately represent the environment. Accordingly, the proposed approach uses BĀ“ezier polynomial curves as stereo vision primitives and offers a novel SLAM formulation to update the curve parameters and vehicle pose. This method eliminates the need for point-based stereo matching, with an optimization procedure to directly extract the curve information in the world frame from noisy edge measurements. Further, the proposed algorithm enables navigation with fewer feature states than most point-based techniques, and is able to produce a map which only provides detail in key areas. Results in simulation and with vision data validate that the proposed method can be effective in estimating the 6DOF pose of the stereo camera, and can produce structured, uncluttered maps. Monte Carlo simulations of the algorithm are also provided to analyze its consistency

    CurveSLAM: An approach for vision-based navigation without point features

    No full text
    Existing approaches to visual Simultaneous Localization and Mapping (SLAM) typically utilize points as visual feature primitives to represent landmarks in the environment. Since these techniques mostly use image points from a standard feature point detector, they do not explicitly map objects or regions of interest. Our work is motivated by the need for different SLAM techniques in path and riverine settings, where feature points can be scarce or may not adequately represent the environment. Accordingly, the proposed approach uses cubic BeĢzier curves as stereo vision primitives and offers a novel SLAM formulation to update the curve parameters and vehicle pose. This method eliminates the need for point-based stereo matching, with an optimization procedure to directly extract the curve information in the world frame from noisy edge measurements. Further, the proposed algorithm enables navigation with fewer feature states than most point-based techniques, and is able to produce a map which only provides detail in key areas. Results in simulation and with vision data validate that the proposed method can be effective in estimating the 6DOF pose of the stereo camera, and can produce structured, uncluttered maps
    corecore