382 research outputs found
Deep Ordinal Reinforcement Learning
Reinforcement learning usually makes use of numerical rewards, which have
nice properties but also come with drawbacks and difficulties. Using rewards on
an ordinal scale (ordinal rewards) is an alternative to numerical rewards that
has received more attention in recent years. In this paper, a general approach
to adapting reinforcement learning problems to the use of ordinal rewards is
presented and motivated. We show how to convert common reinforcement learning
algorithms to an ordinal variation by the example of Q-learning and introduce
Ordinal Deep Q-Networks, which adapt deep reinforcement learning to ordinal
rewards. Additionally, we run evaluations on problems provided by the OpenAI
Gym framework, showing that our ordinal variants exhibit a performance that is
comparable to the numerical variations for a number of problems. We also give
first evidence that our ordinal variant is able to produce better results for
problems with less engineered and simpler-to-design reward signals.Comment: replaced figures for better visibility, added github repository, more
details about source of experimental results, updated target value
calculation for standard and ordinal Deep Q-Networ
Using Monte Carlo Search With Data Aggregation to Improve Robot Soccer Policies
RoboCup soccer competitions are considered among the most challenging
multi-robot adversarial environments, due to their high dynamism and the
partial observability of the environment. In this paper we introduce a method
based on a combination of Monte Carlo search and data aggregation (MCSDA) to
adapt discrete-action soccer policies for a defender robot to the strategy of
the opponent team. By exploiting a simple representation of the domain, a
supervised learning algorithm is trained over an initial collection of data
consisting of several simulations of human expert policies. Monte Carlo policy
rollouts are then generated and aggregated to previous data to improve the
learned policy over multiple epochs and games. The proposed approach has been
extensively tested both on a soccer-dedicated simulator and on real robots.
Using this method, our learning robot soccer team achieves an improvement in
ball interceptions, as well as a reduction in the number of opponents' goals.
Together with a better performance, an overall more efficient positioning of
the whole team within the field is achieved
Measuring collaborative emergent behavior in multi-agent reinforcement learning
Multi-agent reinforcement learning (RL) has important implications for the
future of human-agent teaming. We show that improved performance with
multi-agent RL is not a guarantee of the collaborative behavior thought to be
important for solving multi-agent tasks. To address this, we present a novel
approach for quantitatively assessing collaboration in continuous spatial tasks
with multi-agent RL. Such a metric is useful for measuring collaboration
between computational agents and may serve as a training signal for
collaboration in future RL paradigms involving humans.Comment: 1st International Conference on Human Systems Engineering and Design,
6 pages, 2 figures, 1 tabl
The Dreaming Variational Autoencoder for Reinforcement Learning Environments
Reinforcement learning has shown great potential in generalizing over raw
sensory data using only a single neural network for value optimization. There
are several challenges in the current state-of-the-art reinforcement learning
algorithms that prevent them from converging towards the global optima. It is
likely that the solution to these problems lies in short- and long-term
planning, exploration and memory management for reinforcement learning
algorithms. Games are often used to benchmark reinforcement learning algorithms
as they provide a flexible, reproducible, and easy to control environment.
Regardless, few games feature a state-space where results in exploration,
memory, and planning are easily perceived. This paper presents The Dreaming
Variational Autoencoder (DVAE), a neural network based generative modeling
architecture for exploration in environments with sparse feedback. We further
present Deep Maze, a novel and flexible maze engine that challenges DVAE in
partial and fully-observable state-spaces, long-horizon tasks, and
deterministic and stochastic problems. We show initial findings and encourage
further work in reinforcement learning driven by generative exploration.Comment: Best Student Paper Award, Proceedings of the 38th SGAI International
Conference on Artificial Intelligence, Cambridge, UK, 2018, Artificial
Intelligence XXXV, 201
Crawling in Rogue's dungeons with (partitioned) A3C
Rogue is a famous dungeon-crawling video-game of the 80ies, the ancestor of
its gender. Rogue-like games are known for the necessity to explore partially
observable and always different randomly-generated labyrinths, preventing any
form of level replay. As such, they serve as a very natural and challenging
task for reinforcement learning, requiring the acquisition of complex,
non-reactive behaviors involving memory and planning. In this article we show
how, exploiting a version of A3C partitioned on different situations, the agent
is able to reach the stairs and descend to the next level in 98% of cases.Comment: Accepted at the Fourth International Conference on Machine Learning,
Optimization, and Data Science (LOD 2018
Can Machines Think in Radio Language?
People can think in auditory, visual and tactile forms of language, so can
machines principally. But is it possible for them to think in radio language?
According to a first principle presented for general intelligence, i.e. the
principle of language's relativity, the answer may give an exceptional solution
for robot astronauts to talk with each other in space exploration.Comment: 4 pages, 1 figur
Exploiting Scene-specific Features for Object Goal Navigation
Can the intrinsic relation between an object and the room in which it is
usually located help agents in the Visual Navigation Task? We study this
question in the context of Object Navigation, a problem in which an agent has
to reach an object of a specific class while moving in a complex domestic
environment. In this paper, we introduce a new reduced dataset that speeds up
the training of navigation models, a notoriously complex task. Our proposed
dataset permits the training of models that do not exploit online-built maps in
reasonable times even without the use of huge computational resources.
Therefore, this reduced dataset guarantees a significant benchmark and it can
be used to identify promising models that could be then tried on bigger and
more challenging datasets. Subsequently, we propose the SMTSC model, an
attention-based model capable of exploiting the correlation between scenes and
objects contained in them, highlighting quantitatively how the idea is correct.Comment: Accepted at ACVR2020 ECCV2020 Worksho
A Gray-Box Approach for Curriculum Learning
Curriculum learning is often employed in deep reinforcement learning to let the agent progress more quickly towards better behaviors. Numerical methods for curriculum learning in the literature provides only initial heuristic solutions, with little to no guarantee on their quality. We define a new gray-box function that, including a suitable scheduling problem, can be effectively used to reformulate the curriculum learning problem. We propose different efficient numerical methods to address this gray-box reformulation. Preliminary numerical results on a benchmark task in the curriculum learning literature show the viability of the proposed approach
Visual Rationalizations in Deep Reinforcement Learning for Atari Games
Due to the capability of deep learning to perform well in high dimensional
problems, deep reinforcement learning agents perform well in challenging tasks
such as Atari 2600 games. However, clearly explaining why a certain action is
taken by the agent can be as important as the decision itself. Deep
reinforcement learning models, as other deep learning models, tend to be opaque
in their decision-making process. In this work, we propose to make deep
reinforcement learning more transparent by visualizing the evidence on which
the agent bases its decision. In this work, we emphasize the importance of
producing a justification for an observed action, which could be applied to a
black-box decision agent.Comment: presented as oral talk at BNAIC 201
Toward Interpretable Deep Reinforcement Learning with Linear Model U-Trees
Deep Reinforcement Learning (DRL) has achieved impressive success in many
applications. A key component of many DRL models is a neural network
representing a Q function, to estimate the expected cumulative reward following
a state-action pair. The Q function neural network contains a lot of implicit
knowledge about the RL problems, but often remains unexamined and
uninterpreted. To our knowledge, this work develops the first mimic learning
framework for Q functions in DRL. We introduce Linear Model U-trees (LMUTs) to
approximate neural network predictions. An LMUT is learned using a novel
on-line algorithm that is well-suited for an active play setting, where the
mimic learner observes an ongoing interaction between the neural net and the
environment. Empirical evaluation shows that an LMUT mimics a Q function
substantially better than five baseline methods. The transparent tree structure
of an LMUT facilitates understanding the network's learned knowledge by
analyzing feature influence, extracting rules, and highlighting the
super-pixels in image inputs.Comment: This paper is accepted by ECML-PKDD 201
- …