24,783 research outputs found
Dot-to-Dot: Explainable Hierarchical Reinforcement Learning for Robotic Manipulation
Robotic systems are ever more capable of automation and fulfilment of complex
tasks, particularly with reliance on recent advances in intelligent systems,
deep learning and artificial intelligence. However, as robots and humans come
closer in their interactions, the matter of interpretability, or explainability
of robot decision-making processes for the human grows in importance. A
successful interaction and collaboration will only take place through mutual
understanding of underlying representations of the environment and the task at
hand. This is currently a challenge in deep learning systems. We present a
hierarchical deep reinforcement learning system, consisting of a low-level
agent handling the large actions/states space of a robotic system efficiently,
by following the directives of a high-level agent which is learning the
high-level dynamics of the environment and task. This high-level agent forms a
representation of the world and task at hand that is interpretable for a human
operator. The method, which we call Dot-to-Dot, is tested on a MuJoCo-based
model of the Fetch Robotics Manipulator, as well as a Shadow Hand, to test its
performance. Results show efficient learning of complex actions/states spaces
by the low-level agent, and an interpretable representation of the task and
decision-making process learned by the high-level agent
An Individual-based Probabilistic Model for Fish Stock Simulation
We define an individual-based probabilistic model of a sole (Solea solea)
behaviour. The individual model is given in terms of an Extended Probabilistic
Discrete Timed Automaton (EPDTA), a new formalism that is introduced in the
paper and that is shown to be interpretable as a Markov decision process. A
given EPDTA model can be probabilistically model-checked by giving a suitable
translation into syntax accepted by existing model-checkers. In order to
simulate the dynamics of a given population of soles in different environmental
scenarios, an agent-based simulation environment is defined in which each agent
implements the behaviour of the given EPDTA model. By varying the probabilities
and the characteristic functions embedded in the EPDTA model it is possible to
represent different scenarios and to tune the model itself by comparing the
results of the simulations with real data about the sole stock in the North
Adriatic sea, available from the recent project SoleMon. The simulator is
presented and made available for its adaptation to other species.Comment: In Proceedings AMCA-POP 2010, arXiv:1008.314
Automatic Programming of Cellular Automata and Artificial Neural Networks Guided by Philosophy
Many computer models such as cellular automata and artificial neural networks
have been developed and successfully applied. However, in some cases, these
models might be restrictive on the possible solutions or their solutions might
be difficult to interpret. To overcome this problem, we outline a new approach,
the so-called allagmatic method, that automatically programs and executes
models with as little limitations as possible while maintaining human
interpretability. Earlier we described a metamodel and its building blocks
according to the philosophical concepts of structure (spatial dimension) and
operation (temporal dimension). They are entity, milieu, and update function
that together abstractly describe cellular automata, artificial neural
networks, and possibly any kind of computer model. By automatically combining
these building blocks in an evolutionary computation, interpretability might be
increased by the relationship to the metamodel, and models might be translated
into more interpretable models via the metamodel. We propose generic and
object-oriented programming to implement the entities and their milieus as
dynamic and generic arrays and the update function as a method. We show two
experiments where a simple cellular automaton and an artificial neural network
are automatically programmed, compiled, and executed. A target state is
successfully evolved and learned in the cellular automaton and artificial
neural network, respectively. We conclude that the allagmatic method can create
and execute cellular automaton and artificial neural network models in an
automated manner with the guidance of philosophy.Comment: 12 pages, 1 figur
Learning Policies from Self-Play with Policy Gradients and MCTS Value Estimates
In recent years, state-of-the-art game-playing agents often involve policies
that are trained in self-playing processes where Monte Carlo tree search (MCTS)
algorithms and trained policies iteratively improve each other. The strongest
results have been obtained when policies are trained to mimic the search
behaviour of MCTS by minimising a cross-entropy loss. Because MCTS, by design,
includes an element of exploration, policies trained in this manner are also
likely to exhibit a similar extent of exploration. In this paper, we are
interested in learning policies for a project with future goals including the
extraction of interpretable strategies, rather than state-of-the-art
game-playing performance. For these goals, we argue that such an extent of
exploration is undesirable, and we propose a novel objective function for
training policies that are not exploratory. We derive a policy gradient
expression for maximising this objective function, which can be estimated using
MCTS value estimates, rather than MCTS visit counts. We empirically evaluate
various properties of resulting policies, in a variety of board games.Comment: Accepted at the IEEE Conference on Games (CoG) 201
Deep Decision Trees for Discriminative Dictionary Learning with Adversarial Multi-Agent Trajectories
With the explosion in the availability of spatio-temporal tracking data in
modern sports, there is an enormous opportunity to better analyse, learn and
predict important events in adversarial group environments. In this paper, we
propose a deep decision tree architecture for discriminative dictionary
learning from adversarial multi-agent trajectories. We first build up a
hierarchy for the tree structure by adding each layer and performing feature
weight based clustering in the forward pass. We then fine tune the player role
weights using back propagation. The hierarchical architecture ensures the
interpretability and the integrity of the group representation. The resulting
architecture is a decision tree, with leaf-nodes capturing a dictionary of
multi-agent group interactions. Due to the ample volume of data available, we
focus on soccer tracking data, although our approach can be used in any
adversarial multi-agent domain. We present applications of proposed method for
simulating soccer games as well as evaluating and quantifying team strategies.Comment: To appear in 4th International Workshop on Computer Vision in Sports
(CVsports) at CVPR 201
- …
