1,105 research outputs found
A cascaded supervised learning approach to inverse reinforcement learning
International audienceThis paper considers the Inverse Reinforcement Learning (IRL) problem, that is inferring a reward function for which a demonstrated expert policy is optimal. We propose to break the IRL problem down into two generic Supervised Learning steps: this is the Cascaded Supervised IRL (CSI) approach. A classification step that defines a score function is followed by a regression step providing a reward function. A theoretical analysis shows that the demonstrated expert policy is nearoptimal for the computed reward function. Not needing to repeatedly solve a Markov Decision Process (MDP) and the ability to leverage existing techniques for classification and regression are two important advantages of the CSI approach. It is furthermore empirically demonstrated to compare positively to state-of-the-art approaches when using only transitions sampled according to the expert policy, up to the use of some heuristics. This is exemplified on two classical benchmarks (the mountain car problem and a highway driving simulator)
Contributions Ă l'apprentissage par renforcement inverse
This thesis, "Contributions à l’apprentissage par renforcement inverse", brings three major contributions to the community. The first one is a method for estimating the feature expectation, a quantity involved in most of state-of-the-art approaches which were thus extended to a batch off-policy setting. The second major contribution is an Inverse Reinforcement Learning algorithm, structured classification for inverse reinforcement learning (SCIRL), which relaxes a standard constraint in the field, the repeated solving of a Markov Decision Process, by introducing the temporal structure (using the feature expectation) of this process into a structured margin classification algorithm. The afferent theoretical guarantee and the good empirical performance it exhibited allowed it to be presented in a good international conference : NIPS. Finally, the third contribution is cascaded supervised learning for inverse reinforcement learning (CSI) a method consisting in learning the expert’s behavior via a supervised learning approach, and then introducing the temporal structure of the MDP via a regression involving the score function of the classifier. This method presents the same type of theoretical guarantee as SCIRL, but uses standard components for classification and regression, which makes its use simpler. This work will be presented in another good international conference : ECML
Machine Learning for Metasurfaces Design and Their Applications
Metasurfaces (MTSs) are increasingly emerging as enabling technologies to
meet the demands for multi-functional, small form-factor, efficient,
reconfigurable, tunable, and low-cost radio-frequency (RF) components because
of their ability to manipulate waves in a sub-wavelength thickness through
modified boundary conditions. They enable the design of reconfigurable
intelligent surfaces (RISs) for adaptable wireless channels and smart radio
environments, wherein the inherently stochastic nature of the wireless
environment is transformed into a programmable propagation channel. In
particular, space-limited RF applications, such as communications and radar,
that have strict radiation requirements are currently being investigated for
potential RIS deployment. The RIS comprises sub-wavelength units or meta-atoms,
which are independently controlled and whose geometry and material determine
the spectral response of the RIS. Conventionally, designing RIS to yield the
desired EM response requires trial and error by iteratively investigating a
large possibility of various geometries and materials through thousands of
full-wave EM simulations. In this context, machine/deep learning (ML/DL)
techniques are proving critical in reducing the computational cost and time of
RIS inverse design. Instead of explicitly solving Maxwell's equations, DL
models learn physics-based relationships through supervised training data. The
ML/DL techniques also aid in RIS deployment for numerous wireless applications,
which requires dealing with multiple channel links between the base station
(BS) and the users. As a result, the BS and RIS beamformers require a joint
design, wherein the RIS elements must be rapidly reconfigured. This chapter
provides a synopsis of DL techniques for both inverse RIS design and
RIS-assisted wireless systems.Comment: Book chapter, 70 pages, 12 figures, 2 tables. arXiv admin note:
substantial text overlap with arXiv:2101.09131, arXiv:2009.0254
Overcoming Exploration in Reinforcement Learning with Demonstrations
Exploration in environments with sparse rewards has been a persistent problem
in reinforcement learning (RL). Many tasks are natural to specify with a sparse
reward, and manually shaping a reward function can result in suboptimal
performance. However, finding a non-zero reward is exponentially more difficult
with increasing task horizon or action dimensionality. This puts many
real-world tasks out of practical reach of RL methods. In this work, we use
demonstrations to overcome the exploration problem and successfully learn to
perform long-horizon, multi-step robotics tasks with continuous control such as
stacking blocks with a robot arm. Our method, which builds on top of Deep
Deterministic Policy Gradients and Hindsight Experience Replay, provides an
order of magnitude of speedup over RL on simulated robotics tasks. It is simple
to implement and makes only the additional assumption that we can collect a
small set of demonstrations. Furthermore, our method is able to solve tasks not
solvable by either RL or behavior cloning alone, and often ends up
outperforming the demonstrator policy.Comment: 8 pages, ICRA 201
Knowledge Transfer Between Robots with Similar Dynamics for High-Accuracy Impromptu Trajectory Tracking
In this paper, we propose an online learning approach that enables the
inverse dynamics model learned for a source robot to be transferred to a target
robot (e.g., from one quadrotor to another quadrotor with different mass or
aerodynamic properties). The goal is to leverage knowledge from the source
robot such that the target robot achieves high-accuracy trajectory tracking on
arbitrary trajectories from the first attempt with minimal data recollection
and training. Most existing approaches for multi-robot knowledge transfer are
based on post-analysis of datasets collected from both robots. In this work, we
study the feasibility of impromptu transfer of models across robots by learning
an error prediction module online. In particular, we analytically derive the
form of the mapping to be learned by the online module for exact tracking,
propose an approach for characterizing similarity between robots, and use these
results to analyze the stability of the overall system. The proposed approach
is illustrated in simulation and verified experimentally on two different
quadrotors performing impromptu trajectory tracking tasks, where the quadrotors
are required to accurately track arbitrary hand-drawn trajectories from the
first attempt.Comment: European Control Conference (ECC) 201
EMOTE: An Explainable architecture for Modelling the Other Through Empathy
We can usually assume others have goals analogous to our own. This assumption
can also, at times, be applied to multi-agent games - e.g. Agent 1's attraction
to green pellets is analogous to Agent 2's attraction to red pellets. This
"analogy" assumption is tied closely to the cognitive process known as empathy.
Inspired by empathy, we design a simple and explainable architecture to model
another agent's action-value function. This involves learning an "Imagination
Network" to transform the other agent's observed state in order to produce a
human-interpretable "empathetic state" which, when presented to the learning
agent, produces behaviours that mimic the other agent. Our approach is
applicable to multi-agent scenarios consisting of a single learning agent and
other (independent) agents acting according to fixed policies. This
architecture is particularly beneficial for (but not limited to) algorithms
using a composite value or reward function. We show our method produces better
performance in multi-agent games, where it robustly estimates the other's model
in different environment configurations. Additionally, we show that the
empathetic states are human interpretable, and thus verifiable
Difference of Convex Functions Programming Applied to Control with Expert Data
This paper reports applications of Difference of Convex functions (DC)
programming to Learning from Demonstrations (LfD) and Reinforcement Learning
(RL) with expert data. This is made possible because the norm of the Optimal
Bellman Residual (OBR), which is at the heart of many RL and LfD algorithms, is
DC. Improvement in performance is demonstrated on two specific algorithms,
namely Reward-regularized Classification for Apprenticeship Learning (RCAL) and
Reinforcement Learning with Expert Demonstrations (RLED), through experiments
on generic Markov Decision Processes (MDP), called Garnets
- …