24,783 research outputs found

    Dot-to-Dot: Explainable Hierarchical Reinforcement Learning for Robotic Manipulation

    Full text link
    Robotic systems are ever more capable of automation and fulfilment of complex tasks, particularly with reliance on recent advances in intelligent systems, deep learning and artificial intelligence. However, as robots and humans come closer in their interactions, the matter of interpretability, or explainability of robot decision-making processes for the human grows in importance. A successful interaction and collaboration will only take place through mutual understanding of underlying representations of the environment and the task at hand. This is currently a challenge in deep learning systems. We present a hierarchical deep reinforcement learning system, consisting of a low-level agent handling the large actions/states space of a robotic system efficiently, by following the directives of a high-level agent which is learning the high-level dynamics of the environment and task. This high-level agent forms a representation of the world and task at hand that is interpretable for a human operator. The method, which we call Dot-to-Dot, is tested on a MuJoCo-based model of the Fetch Robotics Manipulator, as well as a Shadow Hand, to test its performance. Results show efficient learning of complex actions/states spaces by the low-level agent, and an interpretable representation of the task and decision-making process learned by the high-level agent

    An Individual-based Probabilistic Model for Fish Stock Simulation

    Get PDF
    We define an individual-based probabilistic model of a sole (Solea solea) behaviour. The individual model is given in terms of an Extended Probabilistic Discrete Timed Automaton (EPDTA), a new formalism that is introduced in the paper and that is shown to be interpretable as a Markov decision process. A given EPDTA model can be probabilistically model-checked by giving a suitable translation into syntax accepted by existing model-checkers. In order to simulate the dynamics of a given population of soles in different environmental scenarios, an agent-based simulation environment is defined in which each agent implements the behaviour of the given EPDTA model. By varying the probabilities and the characteristic functions embedded in the EPDTA model it is possible to represent different scenarios and to tune the model itself by comparing the results of the simulations with real data about the sole stock in the North Adriatic sea, available from the recent project SoleMon. The simulator is presented and made available for its adaptation to other species.Comment: In Proceedings AMCA-POP 2010, arXiv:1008.314

    Automatic Programming of Cellular Automata and Artificial Neural Networks Guided by Philosophy

    Full text link
    Many computer models such as cellular automata and artificial neural networks have been developed and successfully applied. However, in some cases, these models might be restrictive on the possible solutions or their solutions might be difficult to interpret. To overcome this problem, we outline a new approach, the so-called allagmatic method, that automatically programs and executes models with as little limitations as possible while maintaining human interpretability. Earlier we described a metamodel and its building blocks according to the philosophical concepts of structure (spatial dimension) and operation (temporal dimension). They are entity, milieu, and update function that together abstractly describe cellular automata, artificial neural networks, and possibly any kind of computer model. By automatically combining these building blocks in an evolutionary computation, interpretability might be increased by the relationship to the metamodel, and models might be translated into more interpretable models via the metamodel. We propose generic and object-oriented programming to implement the entities and their milieus as dynamic and generic arrays and the update function as a method. We show two experiments where a simple cellular automaton and an artificial neural network are automatically programmed, compiled, and executed. A target state is successfully evolved and learned in the cellular automaton and artificial neural network, respectively. We conclude that the allagmatic method can create and execute cellular automaton and artificial neural network models in an automated manner with the guidance of philosophy.Comment: 12 pages, 1 figur

    Learning Policies from Self-Play with Policy Gradients and MCTS Value Estimates

    Get PDF
    In recent years, state-of-the-art game-playing agents often involve policies that are trained in self-playing processes where Monte Carlo tree search (MCTS) algorithms and trained policies iteratively improve each other. The strongest results have been obtained when policies are trained to mimic the search behaviour of MCTS by minimising a cross-entropy loss. Because MCTS, by design, includes an element of exploration, policies trained in this manner are also likely to exhibit a similar extent of exploration. In this paper, we are interested in learning policies for a project with future goals including the extraction of interpretable strategies, rather than state-of-the-art game-playing performance. For these goals, we argue that such an extent of exploration is undesirable, and we propose a novel objective function for training policies that are not exploratory. We derive a policy gradient expression for maximising this objective function, which can be estimated using MCTS value estimates, rather than MCTS visit counts. We empirically evaluate various properties of resulting policies, in a variety of board games.Comment: Accepted at the IEEE Conference on Games (CoG) 201

    Deep Decision Trees for Discriminative Dictionary Learning with Adversarial Multi-Agent Trajectories

    Full text link
    With the explosion in the availability of spatio-temporal tracking data in modern sports, there is an enormous opportunity to better analyse, learn and predict important events in adversarial group environments. In this paper, we propose a deep decision tree architecture for discriminative dictionary learning from adversarial multi-agent trajectories. We first build up a hierarchy for the tree structure by adding each layer and performing feature weight based clustering in the forward pass. We then fine tune the player role weights using back propagation. The hierarchical architecture ensures the interpretability and the integrity of the group representation. The resulting architecture is a decision tree, with leaf-nodes capturing a dictionary of multi-agent group interactions. Due to the ample volume of data available, we focus on soccer tracking data, although our approach can be used in any adversarial multi-agent domain. We present applications of proposed method for simulating soccer games as well as evaluating and quantifying team strategies.Comment: To appear in 4th International Workshop on Computer Vision in Sports (CVsports) at CVPR 201
    corecore