24 research outputs found
Unsupervised state representation learning with robotic priors: a robustness benchmark
Our understanding of the world depends highly on our capacity to produce
intuitive and simplified representations which can be easily used to solve
problems. We reproduce this simplification process using a neural network to
build a low dimensional state representation of the world from images acquired
by a robot. As in Jonschkowski et al. 2015, we learn in an unsupervised way
using prior knowledge about the world as loss functions called robotic priors
and extend this approach to high dimension richer images to learn a 3D
representation of the hand position of a robot from RGB images. We propose a
quantitative evaluation of the learned representation using nearest neighbors
in the state space that allows to assess its quality and show both the
potential and limitations of robotic priors in realistic environments. We
augment image size, add distractors and domain randomization, all crucial
components to achieve transfer learning to real robots. Finally, we also
contribute a new prior to improve the robustness of the representation. The
applications of such low dimensional state representation range from easing
reinforcement learning (RL) and knowledge transfer across tasks, to
facilitating learning from raw data with more efficient and compact high level
representations. The results show that the robotic prior approach is able to
extract high level representation as the 3D position of an arm and organize it
into a compact and coherent space of states in a challenging dataset.Comment: ICRA 2018 submissio
HIGhER : Improving instruction following with Hindsight Generation for Experience Replay
Language creates a compact representation of the world and allows the
description of unlimited situations and objectives through compositionality.
While these characterizations may foster instructing, conditioning or
structuring interactive agent behavior, it remains an open-problem to correctly
relate language understanding and reinforcement learning in even simple
instruction following scenarios. This joint learning problem is alleviated
through expert demonstrations, auxiliary losses, or neural inductive biases. In
this paper, we propose an orthogonal approach called Hindsight Generation for
Experience Replay (HIGhER) that extends the Hindsight Experience Replay (HER)
approach to the language-conditioned policy setting. Whenever the agent does
not fulfill its instruction, HIGhER learns to output a new directive that
matches the agent trajectory, and it relabels the episode with a positive
reward. To do so, HIGhER learns to map a state into an instruction by using
past successful trajectories, which removes the need to have external expert
interventions to relabel episodes as in vanilla HER. We show the efficiency of
our approach in the BabyAI environment, and demonstrate how it complements
other instruction following methods.Comment: Accepted at ADPRL'2
A Machine of Few Words -- Interactive Speaker Recognition with Reinforcement Learning
Speaker recognition is a well known and studied task in the speech processing
domain. It has many applications, either for security or speaker adaptation of
personal devices. In this paper, we present a new paradigm for automatic
speaker recognition that we call Interactive Speaker Recognition (ISR). In this
paradigm, the recognition system aims to incrementally build a representation
of the speakers by requesting personalized utterances to be spoken in contrast
to the standard text-dependent or text-independent schemes. To do so, we cast
the speaker recognition task into a sequential decision-making problem that we
solve with Reinforcement Learning. Using a standard dataset, we show that our
method achieves excellent performance while using little speech signal amounts.
This method could also be applied as an utterance selection mechanism for
building speech synthesis systems
Deep unsupervised state representation learning with robotic priors: a robustness analysis
International audienceOur understanding of the world depends highly on our capacity to produce intuitive and simplified representations which can be easily used to solve problems. We reproduce this simplification process using a neural network to build a low dimensional state representation of the world from images acquired by a robot. As in Jonschkowski et al. 2015, we learn in an unsupervised way using prior knowledge about the world as loss functions called robotic priors and extend this approach to high dimension richer images to learn a 3D representation of the hand position of a robot from RGB images. We propose a quantitative evaluation metric of the learned representation that uses nearest neighbors in the state space and allows to assess its quality and show both the potential and limitations of robotic priors in realistic environments. We augment image size, add distractors and domain randomization, all crucial components to achieve transfer learning to real robots. Finally, we also contribute a new prior to improve the robustness of the representation. The applications of such low dimensional state representation range from easing reinforcement learning (RL) and knowledge transfer across tasks, to facilitating learning from raw data with more efficient and compact high level representations. The results show that the robotic prior approach is able to extract high level representation as the 3D position of an arm and organize it into a compact and coherent space of states in a challenging dataset
"I'm sorry Dave, I'm afraid I can't do that" Deep Q-Learning From Forbidden Actions
International audienceThe use of Reinforcement Learning (RL) is still restricted to simulation or to enhance human-operated systems through recommendations. Real-world environments (e.g. industrial robots or power grids) are generally designed with safety constraints in mind implemented in the shape of valid actions masks or contingency controllers. For example, the range of motion and the angles of the motors of a robot can be limited to physical boundaries. Violating constraints thus results in rejected actions or entering in a safe mode driven by an external controller, making RL agents incapable of learning from their mistakes. In this paper, we propose a simple modification of a state-of-the-art deep RL algorithm (DQN), enabling learning from forbidden actions. To do so, the standard Q-learning update is enhanced with an extra safety loss inspired by structured classification. We empirically show that it reduces the number of hit constraints during the learning phase and accelerates convergence to near-optimal policies compared to using standard DQN. Experiments are done on a Visual Grid World Environment and Text-World domain
HIGhER: Improving instruction following with Hindsight Generation for Experience Replay
International audienceLanguage creates a compact representation of the world and allows the description of unlimited situations and objectives through compositionality. While these characterizations may foster instructing, conditioning or structuring interactive agent behavior, it remains an open-problem to correctly relate language understanding and reinforcement learning in even simple instruction following scenarios. This joint learning problem is alleviated through expert demonstrations, auxiliary losses, or neural inductive biases. In this paper, we propose an orthogonal approach called Hindsight Generation for Experience Replay (HIGhER) that extends the Hindsight Experience Replay approach to the language-conditioned policy setting. Whenever the agent does not fulfill its instruction, HIGhER learns to output a new directive that matches the agent trajectory, and it relabels the episode with a positive reward. To do so, HIGhER learns to map a state into an instruction by using past successful trajectories, which removes the need to have external expert interventions to relabel episodes as in vanilla HER. We show the efficiency of our approach in the BabyAI environment, and demonstrate how it complements other instruction following methods
A Machine of Few Words Interactive Speaker Recognition with Reinforcement Learning
International audienceSpeaker recognition is a well known and studied task in the speech processing domain. It has many applications, either for security or speaker adaptation of personal devices. In this paper, we present a new paradigm for automatic speaker recognition that we call Interactive Speaker Recognition (ISR). In this paradigm, the recognition system aims to incrementally build a representation of the speakers by requesting personalized utterances to be spoken in contrast to the standard text-dependent or text-independent schemes. To do so, we cast the speaker recognition task into a sequential decision-making problem that we solve with Reinforcement Learning. Using a standard dataset, we show that our method achieves excellent performance while using little speech signal amounts. This method could also be applied as an utterance selection mechanism for building speech synthesis systems
Strategies for Using Proximal Policy Optimization in Mobile Puzzle Games
While traditionally a labour intensive task, the testing of game content is
progressively becoming more automated. Among the many directions in which this
automation is taking shape, automatic play-testing is one of the most promising
thanks also to advancements of many supervised and reinforcement learning (RL)
algorithms. However these type of algorithms, while extremely powerful, often
suffer in production environments due to issues with reliability and
transparency in their training and usage.
In this research work we are investigating and evaluating strategies to apply
the popular RL method Proximal Policy Optimization (PPO) in a casual mobile
puzzle game with a specific focus on improving its reliability in training and
generalization during game playing.
We have implemented and tested a number of different strategies against a
real-world mobile puzzle game (Lily's Garden from Tactile Games). We isolated
the conditions that lead to a failure in either training or generalization
during testing and we identified a few strategies to ensure a more stable
behaviour of the algorithm in this game genre.Comment: 10 pages, 8 figures, to be published in 2020 Foundations of Digital
Games conferenc
Visual Reasoning with Multi-hop Feature Modulation
International audienceRecent breakthroughs in computer vision and natural language processing have spurred interest in challenging multi-modal tasks such as visual question-answering and visual dialogue. For such tasks, one successful approach is to condition image-based convolutional network computation on language via Feature-wise Linear Modulation (FiLM) layers, i.e., per-channel scaling and shifting. We propose to generate the parameters of FiLM layers going up the hierarchy of a convolutional network in a multi-hop fashion rather than all at once, as in prior work. By alternating between attending to the language input and generating FiLM layer parameters, this approach is better able to scale to settings with longer input sequences such as dialogue. We demonstrate that multi-hop FiLM generation achieves state-of-the-art for the short input sequence task ReferIt-on-par with single-hop FiLM generation-while also significantly outperforming prior state-of-the-art and single-hop FiLM generation on the GuessWhat?! visual dialogue task