418 research outputs found

    Computationally Efficient Relational Reinforcement Learning

    Full text link
    Relational Reinforcement Learning (RRL) is a technique that enables Reinforcement Learning (RL) agents to generalize from their experience, allowing them to learn over large or potentially infinite state spaces, to learn context sensitive behaviors, and to learn to solve variable goals and to transfer knowledge between similar situations. Prior RRL architectures are not sufficiently computationally efficient to see use outside of small, niche roles within larger Artificial Intelligence (AI) architectures. I present a novel online, incremental RRL architecture and an implementation that is orders of magnitude faster than its predecessors. The first aspect of this architecture that I explore is a computationally efficient implementation of an adaptive Hierarchical Tile Coding (aHTC), a kind of Adaptive Tile Coding (ATC) in which more general tiles which cover larger portions of the state-action space are kept as ones that cover smaller portions of the state-action space are introduced, using k-dimensional tries (k-d tries) to implement the value function for non-relational Temporal Difference (TD) methods. In order to achieve comparable performance for RRL, I implement the Rete algorithm to replace my k-d tries due to its efficient handling of both the variable binding problem and variable numbers of actions. Tying aHTCs and Rete together, I present a rule grammar that both maps aHTCs onto Rete and allows the architecture to automatically extract relational features in order to support adaptation of the value function over time. I experiment with several refinement criteria and additional functionality with which my agents attempt to determine if rerefinement using different features might allow them to better learn a near optimal policy. I present optimal results using a value criterion for several variants of BlocksWorld. I provide transfer results for BlocksWorld and a scalable Taxicab domain. I additionally introduce a Higher Order Grammar (HOG) that grants online, incremental RRL agents additional flexibility to introduce additional variables and corresponding relations as needed in order to learn effective value functions. I evaluate agents that use the HOG on a version of Blocks World and on an Adventure task. In summary, I present a new online, incremental RRL architecture, a grammar to map aHTCs onto the Rete, and an implementation that is orders of magnitude faster than its predecessors.PHDComputer Science & EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttps://deepblue.lib.umich.edu/bitstream/2027.42/145859/1/bazald_1.pd

    Using the online cross-entropy method to learn relational policies for playing different games

    Get PDF
    By defining a video-game environment as a collection of objects, relations, actions and rewards, the relational reinforcement learning algorithm presented in this paper generates and optimises a set of concise, human-readable relational rules for achieving maximal reward. Rule learning is achieved using a combination of incremental specialisation of rules and a modified online cross-entropy method, which dynamically adjusts the rate of learning as the agent progresses. The algorithm is tested on the Ms. Pac-Man and Mario environments, with results indicating the agent learns an effective policy for acting within each environment

    Towards Practical Multi-Object Manipulation using Relational Reinforcement Learning

    Full text link
    Learning robotic manipulation tasks using reinforcement learning with sparse rewards is currently impractical due to the outrageous data requirements. Many practical tasks require manipulation of multiple objects, and the complexity of such tasks increases with the number of objects. Learning from a curriculum of increasingly complex tasks appears to be a natural solution, but unfortunately, does not work for many scenarios. We hypothesize that the inability of the state-of-the-art algorithms to effectively utilize a task curriculum stems from the absence of inductive biases for transferring knowledge from simpler to complex tasks. We show that graph-based relational architectures overcome this limitation and enable learning of complex tasks when provided with a simple curriculum of tasks with increasing numbers of objects. We demonstrate the utility of our framework on a simulated block stacking task. Starting from scratch, our agent learns to stack six blocks into a tower. Despite using step-wise sparse rewards, our method is orders of magnitude more data-efficient and outperforms the existing state-of-the-art method that utilizes human demonstrations. Furthermore, the learned policy exhibits zero-shot generalization, successfully stacking blocks into taller towers and previously unseen configurations such as pyramids, without any further training.Comment: 10 pages, 4 figures and 1 table in main article, 3 figures and 3 tables in appendix. Supplementary website and videos at https://richardrl.github.io/relational-rl

    Relational reinforcement learning with guided demonstrations

    Get PDF
    © . This manuscript version is made available under the CC-BY-NC-ND 4.0 license http://creativecommons.org/licenses/by-nc-nd/4.0Model-based reinforcement learning is a powerful paradigm for learning tasks in robotics. However, in-depth exploration is usually required and the actions have to be known in advance. Thus, we propose a novel algorithm that integrates the option of requesting teacher demonstrations to learn new domains with fewer action executions and no previous knowledge. Demonstrations allow new actions to be learned and they greatly reduce the amount of exploration required, but they are only requested when they are expected to yield a significant improvement because the teacher's time is considered to be more valuable than the robot's time. Moreover, selecting the appropriate action to demonstrate is not an easy task, and thus some guidance is provided to the teacher. The rule-based model is analyzed to determine the parts of the state that may be incomplete, and to provide the teacher with a set of possible problems for which a demonstration is needed. Rule analysis is also used to find better alternative models and to complete subgoals before requesting help, thereby minimizing the number of requested demonstrations. These improvements were demonstrated in a set of experiments, which included domains from the international planning competition and a robotic task. Adding teacher demonstrations and rule analysis reduced the amount of exploration required by up to 60% in some domains, and improved the success ratio by 35% in other domainsPeer ReviewedPostprint (author's final draft

    Robot pain: a speculative review of its functions

    Get PDF
    Given the scarce bibliography dealing explicitly with robot pain, this chapter has enriched its review with related research works about robot behaviours and capacities in which pain could play a role. It is shown that all such roles ¿ranging from punishment to intrinsic motivation and planning knowledge¿ can be formulated within the unified framework of reinforcement learning.Peer ReviewedPostprint (author's final draft

    Intelligent Business Process Optimization for the Service Industry

    Get PDF
    The company\u27s sustainable competitive advantage derives from its capacity to create value for customers and to adapt the operational practices to changing situations. Business processes are the heart of each company. Therefore process excellence has become a key issue. This book introduces a novel approach focusing on the autonomous optimization of business processes by applying sophisticated machine learning techniques such as Relational Reinforcement Learning and Particle Swarm Optimization
    corecore