Search CORE

25 research outputs found

Efficient Supervision for Robot Learning via Imitation, Simulation, and Adaptation

Author: Wulfmeier Markus
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2018
Field of study

Recent successes in machine learning have led to a shift in the design of autonomous systems, improving performance on existing tasks and rendering new applications possible. Data-focused approaches gain relevance across diverse, intricate applications when developing data collection and curation pipelines becomes more effective than manual behaviour design. The following work aims at increasing the efficiency of this pipeline in two principal ways: by utilising more powerful sources of informative data and by extracting additional information from existing data. In particular, we target three orthogonal fronts: imitation learning, domain adaptation, and transfer from simulation.Comment: Dissertation Summar

arXiv.org e-Print Archive

Oxford University Research Archive

Incremental Adversarial Domain Adaptation for Continually Changing Environments

Author: Bewley Alex
Posner Ingmar
Wulfmeier Markus
Publication venue
Publication date: 01/01/2018
Field of study

Continuous appearance shifts such as changes in weather and lighting conditions can impact the performance of deployed machine learning models. While unsupervised domain adaptation aims to address this challenge, current approaches do not utilise the continuity of the occurring shifts. In particular, many robotics applications exhibit these conditions and thus facilitate the potential to incrementally adapt a learnt model over minor shifts which integrate to massive differences over time. Our work presents an adversarial approach for lifelong, incremental domain adaptation which benefits from unsupervised alignment to a series of intermediate domains which successively diverge from the labelled source domain. We empirically demonstrate that our incremental approach improves handling of large appearance changes, e.g. day to night, on a traversable-path segmentation task compared with a direct, single alignment step approach. Furthermore, by approximating the feature distribution for the source domain with a generative adversarial network, the deployment module can be rendered fully independent of retaining potentially large amounts of the related source training data for only a minor reduction in performance.Comment: International Conference on Robotics and Automation 201

arXiv.org e-Print Archive

Crossref

Oxford University Research Archive

Addressing Appearance Change in Outdoor Robotics with Adversarial Domain Adaptation

Author: Bewley Alex
Posner Ingmar
Wulfmeier Markus
Publication venue
Publication date: 01/01/2017
Field of study

Appearance changes due to weather and seasonal conditions represent a strong impediment to the robust implementation of machine learning systems in outdoor robotics. While supervised learning optimises a model for the training domain, it will deliver degraded performance in application domains that underlie distributional shifts caused by these changes. Traditionally, this problem has been addressed via the collection of labelled data in multiple domains or by imposing priors on the type of shift between both domains. We frame the problem in the context of unsupervised domain adaptation and develop a framework for applying adversarial techniques to adapt popular, state-of-the-art network architectures with the additional objective to align features across domains. Moreover, as adversarial training is notoriously unstable, we first perform an extensive ablation study, adapting many techniques known to stabilise generative adversarial networks, and evaluate on a surrogate classification task with the same appearance change. The distilled insights are applied to the problem of free-space segmentation for motion planning in autonomous driving.Comment: In Proceedings of the 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2017

arXiv.org e-Print Archive

Crossref

Oxford University Research Archive

Mutual Alignment Transfer Learning

Author: Abbeel Pieter
Posner Ingmar
Wulfmeier Markus
Publication venue
Publication date: 01/01/2017
Field of study

Training robots for operation in the real world is a complex, time consuming and potentially expensive task. Despite significant success of reinforcement learning in games and simulations, research in real robot applications has not been able to match similar progress. While sample complexity can be reduced by training policies in simulation, such policies can perform sub-optimally on the real platform given imperfect calibration of model dynamics. We present an approach -- supplemental to fine tuning on the real robot -- to further benefit from parallel access to a simulator during training and reduce sample requirements on the real robot. The developed approach harnesses auxiliary rewards to guide the exploration for the real world agent based on the proficiency of the agent in simulation and vice versa. In this context, we demonstrate empirically that the reciprocal alignment for both agents provides further benefit as the agent in simulation can adjust to optimize its behaviour for states commonly visited by the real-world agent

arXiv.org e-Print Archive

Oxford University Research Archive

Scrutinizing and De-Biasing Intuitive Physics with Neural Stethoscopes

Author: Bewley Alex
Fuchs Fabian B.
Groth Oliver
Kosiorek Adam R.
Posner Ingmar
Vedaldi Andrea
Wulfmeier Markus
Publication venue
Publication date: 01/01/2019
Field of study

Visually predicting the stability of block towers is a popular task in the domain of intuitive physics. While previous work focusses on prediction accuracy, a one-dimensional performance measure, we provide a broader analysis of the learned physical understanding of the final model and how the learning process can be guided. To this end, we introduce neural stethoscopes as a general purpose framework for quantifying the degree of importance of specific factors of influence in deep neural networks as well as for actively promoting and suppressing information as appropriate. In doing so, we unify concepts from multitask learning as well as training with auxiliary and adversarial losses. We apply neural stethoscopes to analyse the state-of-the-art neural network for stability prediction. We show that the baseline model is susceptible to being misled by incorrect visual cues. This leads to a performance breakdown to the level of random guessing when training on scenarios where visual cues are inversely correlated with stability. Using stethoscopes to promote meaningful feature extraction increases performance from 51% to 90% prediction accuracy. Conversely, training on an easy dataset where visual cues are positively correlated with stability, the baseline model learns a bias leading to poor performance on a harder dataset. Using an adversarial stethoscope, the network is successfully de-biased, leading to a performance increase from 66% to 88%

arXiv.org e-Print Archive

Oxford University Research Archive

Attention-Privileged Reinforcement Learning

Author: Hadsell Raia
Posner Ingmar
Rao Dushyant
Salter Sasha
Wulfmeier Markus
Publication venue
Publication date: 18/11/2020
Field of study

Image-based Reinforcement Learning is known to suffer from poor sample efficiency and generalisation to unseen visuals such as distractors (task-independent aspects of the observation space). Visual domain randomisation encourages transfer by training over visual factors of variation that may be encountered in the target domain. This increases learning complexity, can negatively impact learning rate and performance, and requires knowledge of potential variations during deployment. In this paper, we introduce Attention-Privileged Reinforcement Learning (APRiL) which uses a self-supervised attention mechanism to significantly alleviate these drawbacks: by focusing on task-relevant aspects of the observations, attention provides robustness to distractors as well as significantly increased learning efficiency. APRiL trains two attention-augmented actor-critic agents: one purely based on image observations, available across training and transfer domains; and one with access to privileged information (such as environment states) available only during training. Experience is shared between both agents and their attention mechanisms are aligned. The image-based policy can then be deployed without access to privileged information. We experimentally demonstrate accelerated and more robust learning on a diverse set of domains, leading to improved final performance for environments both within and outside the training distribution.Comment: Published at Conference on Robot Learning (CoRL) 202

arXiv.org e-Print Archive

Oxford University Research Archive

TACO: Learning Task Decomposition via Temporal Alignment for Control

Author: Posner Ingmar
Salter Sasha
Shiarlis Kyriacos
Whiteson Shimon
Wulfmeier Markus
Publication venue
Publication date: 01/01/2018
Field of study

Many advanced Learning from Demonstration (LfD) methods consider the decomposition of complex, real-world tasks into simpler sub-tasks. By reusing the corresponding sub-policies within and between tasks, they provide training data for each policy from different high-level tasks and compose them to perform novel ones. Existing approaches to modular LfD focus either on learning a single high-level task or depend on domain knowledge and temporal segmentation. In contrast, we propose a weakly supervised, domain-agnostic approach based on task sketches, which include only the sequence of sub-tasks performed in each demonstration. Our approach simultaneously aligns the sketches with the observed demonstrations and learns the required sub-policies. This improves generalisation in comparison to separate optimisation procedures. We evaluate the approach on multiple domains, including a simulated 3D robot arm control task using purely image-based observations. The results show that our approach performs commensurately with fully supervised approaches, while requiring significantly less annotation effort.Comment: 12 Pages. Published at ICML 201

arXiv.org e-Print Archive

Oxford University Research Archive

Electronic structure of fully epitaxial Co2TiSn thin films

Author: A. Thomson
Claudia Felser
Elke Arenholz
Günter Reiss
H. Ebert
Hendrik Wulfmeier
Jan Schmalhorst
L. V. Bekenov
Markus Meinert
Tanja Graf
Publication venue: 'American Physical Society (APS)'
Publication date: 01/01/2010
Field of study

In this article we report on the properties of thin films of the full Heusler compound Co2TiSn prepared by DC magnetron co-sputtering. Fully epitaxial, stoichiometric films were obtained by deposition on MgO (001) substrates at substrate temperatures above 600{\deg}C. The films are well ordered in the L21 structure, and the Curie temperature exceeds slightly the bulk value. They show a significant, isotropic magnetoresistance and the resistivity becomes strongly anomalous in the paramagnetic state. The films are weakly ferrimagnetic, with nearly 1 \mu_B on the Co atoms, and a small antiparallel Ti moment, in agreement with theoretical expectations. From comparison of x-ray absorption spectra on the Co L3/L2 edges, including circular and linear magnetic dichroism, with ab initio calculations of the x-ray absorption and circular dichroism spectra we infer that the electronic structure of Co2TiSn has essentially non-localized character. Spectral features that have not been explained in detail before, are explained here in terms of the final state band structure.Comment: 11 pages, 8 figure

arXiv.org e-Print Archive

CiteSeerX

Crossref

Publications at Bielefeld University

UNT Digital Library

MPG.PuRe

Equivariant Data Augmentation for Generalization in Offline Reinforcement Learning

Author: Bechtle Sarah
Byravan Arunkumar
Pinneri Cristina
Riedmiller Martin
Whitney William F.
Wulfmeier Markus
Zhang Jingwei
Publication venue
Publication date: 14/09/2023
Field of study

We present a novel approach to address the challenge of generalization in offline reinforcement learning (RL), where the agent learns from a fixed dataset without any additional interaction with the environment. Specifically, we aim to improve the agent's ability to generalize to out-of-distribution goals. To achieve this, we propose to learn a dynamics model and check if it is equivariant with respect to a fixed type of transformation, namely translations in the state space. We then use an entropy regularizer to increase the equivariant set and augment the dataset with the resulting transformed samples. Finally, we learn a new policy offline based on the augmented dataset, with an off-the-shelf offline RL algorithm. Our experimental results demonstrate that our approach can greatly improve the test performance of the policy on the considered environments

arXiv.org e-Print Archive