2 research outputs found

    Automating abstraction for potential-based reward shaping

    Get PDF
    Within the field of Reinforcement Learning (RL) the successful application of abstraction can play a huge role in decreasing the time required for agents to learn competent policies. Many examples of this speed-up have been observed throughout the literature. Reward Shaping is one such technique for utilising abstractions in this way. This thesis focuses on how an agent can learn its own abstractions from its own experiences to be used for Potential Based Reward Shaping. As the thesis progresses, the environments for which the abstraction construction is automated grow in complexity and scope --- while also utilising less external knowledge of the domains. This culminates in the approaches \textit{Uniform Property State Abstraction} (UPSA) and \textit{Latent Property State Abstraction} (LPSA), which can both augment existing RL algorithms and allow them to construct abstractions from their own experience and then effectively make use of these abstractions to improve convergence time. Empirical results from this thesis demonstrate that this approach can outperform existing deep RL algorithms such as Deep Q-Networks over a range of domains

    Supervised learning in N-tuple neural networks

    No full text
    An N-tuple Neural Network (NNN) is described in which each node fires selectively to its own table of binary trigger patterns. Each node receives input from k input terminals. Supervised learning is used with specially constructed problems: the system is taught to map specific instances of an input set onto specific instances of an output set. Learning is achieved by: (1) calculating a global error term (how far the set of actual outputs differs from the desired set of outputs); (2) either changing the connections between input terminals and N-tuple nodes, or by changing the trigger patterns that the node fires to; (3) re-calculating the global error term, and retaining the changes to the network if the error is less than in (1). Steepest descent optimisation described in (3), is compared with simulated annealing optimisation. Simulated annealing gives better solutions. Other results are that as connectivity k increases the number of possible solutions increases, but the number of possible non-solutions increases even faster. Simulated annealing is particularly helpful when the relative difficulty (ratio of search to solution) increases. In randomly chosen network configurations there is less entropy in the output than there is in the input to the system. When output is re-cycled as input, NNN either cycles or reaches an end-point. When solving complex I/0 maps the system counteracts this trend by systematically increasing its sensitivity. Predicativity can be improved by combining the results of two or more independent NNN models
    corecore