7 research outputs found
Curiosity Driven Exploration of Learned Disentangled Goal Spaces
International audienceIntrinsically motivated goal exploration processes enable agents to autonomously sample goals to explore efficiently complex environments with high-dimensional continuous actions. They have been applied successfully to real world robots to discover repertoires of policies producing a wide diversity of effects. Often these algorithms relied on engineered goal spaces but it was recently shown that one can use deep representation learning algorithms to learn an adequate goal space in simple environments. However, in the case of more complex environments containing multiple objects or distractors, an efficient exploration requires that the structure of the goal space reflects the one of the environment. In this paper we show that using a disentangled goal space leads to better exploration performances than an entangled goal space. We further show that when the representation is disentangled, one can leverage it by sampling goals that maximize learning progress in a modular manner. Finally, we show that the measure of learning progress, used to drive curiosity-driven exploration, can be used simultaneously to discover abstract independently controllable features of the environment. The code used in the experiments is available at https://github.com/flowersteam/ Curiosity_Driven_Goal_Exploration
Intrinsically Motivated Exploration of Learned Goal Spaces
International audienc
Recommended from our members
Learning Parameterized Skills
One of the defining characteristics of human intelligence is the ability to acquire and refine skills. Skills are behaviors for solving problems that an agent encounters often—sometimes in different contexts and situations—throughout its lifetime. Identifying important problems that recur and retaining their solutions as skills allows agents to more rapidly solve novel problems by adjusting and combining their existing skills.
In this thesis we introduce a general framework for learning reusable parameterized skills. Reusable skills are parameterized procedures that—given a description of a problem to be solved—produce appropriate behaviors or policies. They can be sequentially and hierarchically combined with other skills to produce progressively more abstract and temporally extended behaviors.
We identify three major challenges involved in the construction of such skills. First, an agent should be capable of solving a small number of problems and generalizing these experiences to construct a single reusable skill. The skill should be capable of producing appropriate behaviors even when applied to yet unseen variations of a problem. We introduce a method for estimating properties of the lower-dimensional manifold on which problem solutions lie. This allows for the construction of unified models for predicting policies from task parameters.
Secondly, the agent should be able to identify when a skill can be hierarchically decomposed into specialized sub-skills. We observe that the policy manifold may be composed of disjoint, piecewise-smooth charts, each one encoding solutions for a subclass of problems. Identifying and modeling sub-skills allows for the aggregation of related behaviors into single, more abstract skills.
Finally, the agent should be able to actively select on which problems to practice in order to more rapidly become competent in a skill. Thoughtful and deliberate practice is one of the defining characteristics of human expert performance. By carefully choosing on which problems to practice the agent might more rapidly construct a skill that performs well over a wide range of problems.
We address these challenges via a general framework for skill acquisition. We evaluate it on simulated decision-problems and on a physical humanoid robot, and demonstrate that it allows for the efficient and active construction of reusable skills
Learning and planning in videogames via task decomposition
Artificial intelligence (AI) methods have come a long way in tabletop games, with computer programs having now surpassed human experts in the challenging games of chess, Go and heads-up no-limit Texas hold'em. However, a significant simplifying factor in these games is that individual decisions have a relatively large impact on the state of the game. The real world, however, is granular. Human beings are continually presented with new information and are faced with making a multitude of tiny decisions every second. Viewed in these terms, feedback is often sparse, meaning that it only arrives after one has made a great number of decisions. Moreover, in many real-world problems there is a continuous range of actions to choose from, and attaining meaningful feedback from the environment often requires a strong degree of action coordination. Videogames, in which players must likewise contend with granular time scales and continuous action spaces, are in this sense a better proxy for real-world problems, and have thus become regarded by many as the new frontier in games AI. Seemingly, the way in which human players approach granular decision-making in videogames is by decomposing complex tasks into high-level subproblems, thereby allowing them to focus on the "big picture". For example, in Super Mario World, human players seem to look ahead in extended steps, such as climbing a vine or jumping over a pit, rather than planning one frame at a time. Currently though, this type of reasoning does not come easily to machines, leaving many open research problems related to task decomposition. This thesis focuses on three such problems in particular: (1) The challenge of learning subgoals autonomously, so as to lessen the issue of sparse feedback. (2) The challenge of combining discrete planning techniques with extended actions whose durations and effects on the environment are uncertain. (3) The questions of when and why it is beneficial to reason over high-level continuous control variables, such as the velocity of a player-controlled ship, rather than over the most low-level actions available. We address these problems via new algorithms and novel experimental design, demonstrating empirically that our algorithms are more efficient than strong baselines that do not leverage task decomposition, and yielding insight into the types of environment where task decomposition is likely to be beneficial