21,219 research outputs found
Using Monte Carlo Search With Data Aggregation to Improve Robot Soccer Policies
RoboCup soccer competitions are considered among the most challenging
multi-robot adversarial environments, due to their high dynamism and the
partial observability of the environment. In this paper we introduce a method
based on a combination of Monte Carlo search and data aggregation (MCSDA) to
adapt discrete-action soccer policies for a defender robot to the strategy of
the opponent team. By exploiting a simple representation of the domain, a
supervised learning algorithm is trained over an initial collection of data
consisting of several simulations of human expert policies. Monte Carlo policy
rollouts are then generated and aggregated to previous data to improve the
learned policy over multiple epochs and games. The proposed approach has been
extensively tested both on a soccer-dedicated simulator and on real robots.
Using this method, our learning robot soccer team achieves an improvement in
ball interceptions, as well as a reduction in the number of opponents' goals.
Together with a better performance, an overall more efficient positioning of
the whole team within the field is achieved
Deep TAMER: Interactive Agent Shaping in High-Dimensional State Spaces
While recent advances in deep reinforcement learning have allowed autonomous
learning agents to succeed at a variety of complex tasks, existing algorithms
generally require a lot of training data. One way to increase the speed at
which agents are able to learn to perform tasks is by leveraging the input of
human trainers. Although such input can take many forms, real-time,
scalar-valued feedback is especially useful in situations where it proves
difficult or impossible for humans to provide expert demonstrations. Previous
approaches have shown the usefulness of human input provided in this fashion
(e.g., the TAMER framework), but they have thus far not considered
high-dimensional state spaces or employed the use of deep learning. In this
paper, we do both: we propose Deep TAMER, an extension of the TAMER framework
that leverages the representational power of deep neural networks in order to
learn complex tasks in just a short amount of time with a human trainer. We
demonstrate Deep TAMER's success by using it and just 15 minutes of
human-provided feedback to train an agent that performs better than humans on
the Atari game of Bowling - a task that has proven difficult for even
state-of-the-art reinforcement learning methods.Comment: 9 pages, 6 figure
Cooperative Inverse Reinforcement Learning
For an autonomous system to be helpful to humans and to pose no unwarranted
risks, it needs to align its values with those of the humans in its environment
in such a way that its actions contribute to the maximization of value for the
humans. We propose a formal definition of the value alignment problem as
cooperative inverse reinforcement learning (CIRL). A CIRL problem is a
cooperative, partial-information game with two agents, human and robot; both
are rewarded according to the human's reward function, but the robot does not
initially know what this is. In contrast to classical IRL, where the human is
assumed to act optimally in isolation, optimal CIRL solutions produce behaviors
such as active teaching, active learning, and communicative actions that are
more effective in achieving value alignment. We show that computing optimal
joint policies in CIRL games can be reduced to solving a POMDP, prove that
optimality in isolation is suboptimal in CIRL, and derive an approximate CIRL
algorithm
Cycle-of-Learning for Autonomous Systems from Human Interaction
We discuss different types of human-robot interaction paradigms in the
context of training end-to-end reinforcement learning algorithms. We provide a
taxonomy to categorize the types of human interaction and present our
Cycle-of-Learning framework for autonomous systems that combines different
human-interaction modalities with reinforcement learning. Two key concepts
provided by our Cycle-of-Learning framework are how it handles the integration
of the different human-interaction modalities (demonstration, intervention, and
evaluation) and how to define the switching criteria between them.Comment: Presented at AI-HRI AAAI-FSS, 2018 (arXiv:1809.06606
Internal Model from Observations for Reward Shaping
Reinforcement learning methods require careful design involving a reward
function to obtain the desired action policy for a given task. In the absence
of hand-crafted reward functions, prior work on the topic has proposed several
methods for reward estimation by using expert state trajectories and action
pairs. However, there are cases where complete or good action information
cannot be obtained from expert demonstrations. We propose a novel reinforcement
learning method in which the agent learns an internal model of observation on
the basis of expert-demonstrated state trajectories to estimate rewards without
completely learning the dynamics of the external environment from state-action
pairs. The internal model is obtained in the form of a predictive model for the
given expert state distribution. During reinforcement learning, the agent
predicts the reward as a function of the difference between the actual state
and the state predicted by the internal model. We conducted multiple
experiments in environments of varying complexity, including the Super Mario
Bros and Flappy Bird games. We show our method successfully trains good
policies directly from expert game-play videos.Comment: 7 pages, 6 figures, ICML workshop (ALA 2018
Expert-augmented actor-critic for ViZDoom and Montezumas Revenge
We propose an expert-augmented actor-critic algorithm, which we evaluate on
two environments with sparse rewards: Montezumas Revenge and a demanding maze
from the ViZDoom suite. In the case of Montezumas Revenge, an agent trained
with our method achieves very good results consistently scoring above 27,000
points (in many experiments beating the first world). With an appropriate
choice of hyperparameters, our algorithm surpasses the performance of the
expert data. In a number of experiments, we have observed an unreported bug in
Montezumas Revenge which allowed the agent to score more than 800,000 points
Hybrid Reinforcement Learning with Expert State Sequences
Existing imitation learning approaches often require that the complete
demonstration data, including sequences of actions and states, are available.
In this paper, we consider a more realistic and difficult scenario where a
reinforcement learning agent only has access to the state sequences of an
expert, while the expert actions are unobserved. We propose a novel
tensor-based model to infer the unobserved actions of the expert state
sequences. The policy of the agent is then optimized via a hybrid objective
combining reinforcement learning and imitation learning. We evaluated our
hybrid approach on an illustrative domain and Atari games. The empirical
results show that (1) the agents are able to leverage state expert sequences to
learn faster than pure reinforcement learning baselines, (2) our tensor-based
action inference model is advantageous compared to standard deep neural
networks in inferring expert actions, and (3) the hybrid policy optimization
objective is robust against noise in expert state sequences.Comment: AAAI 2019; https://github.com/XiaoxiaoGuo/tensor4r
Real-time Automatic Emotion Recognition from Body Gestures
Although psychological research indicates that bodily expressions convey
important affective information, to date research in emotion recognition
focused mainly on facial expression or voice analysis. In this paper we propose
an approach to realtime automatic emotion recognition from body movements. A
set of postural, kinematic, and geometrical features are extracted from
sequences 3D skeletons and fed to a multi-class SVM classifier. The proposed
method has been assessed on data acquired through two different systems: a
professionalgrade optical motion capture system, and Microsoft Kinect. The
system has been assessed on a "six emotions" recognition problem, and using a
leave-one-subject-out cross validation strategy, reached an overall recognition
rate of 61.3% which is very close to the recognition rate of 61.9% obtained by
human observers. To provide further testing of the system, two games were
developed, where one or two users have to interact to understand and express
emotions with their body
Deep learning for video game playing
In this article, we review recent Deep Learning advances in the context of
how they have been applied to play different types of video games such as
first-person shooters, arcade games, and real-time strategy games. We analyze
the unique requirements that different game genres pose to a deep learning
system and highlight important open challenges in the context of applying these
machine learning methods to video games, such as general game playing, dealing
with extremely large decision spaces and sparse rewards
Action Assembly: Sparse Imitation Learning for Text Based Games with Combinatorial Action Spaces
We propose a computationally efficient algorithm that combines compressed
sensing with imitation learning to solve text-based games with combinatorial
action spaces. Specifically, we introduce a new compressed sensing algorithm,
named IK-OMP, which can be seen as an extension to the Orthogonal Matching
Pursuit (OMP). We incorporate IK-OMP into a supervised imitation learning
setting and show that the combined approach (Sparse Imitation Learning,
Sparse-IL) solves the entire text-based game of Zork1 with an action space of
approximately 10 million actions given both perfect and noisy demonstrations.Comment: Under review at IJCAI 202
- …