4 research outputs found

    Towards Optimistic, Imaginative, and Harmonious Reinforcement Learning in Single-Agent and Multi-Agent Environments

    Get PDF
    Reinforcement Learning (RL) has recently gained tremendous attention from the research community. Different algorithms have been proposed to tackle a variety of singleagent and multi-agent problems. The fast pace of growth has primarily been driven by the availability of several simplistic toy simulation environments, such as Atari and DeepMind Control Suite. The capability of most of those algorithms to solve complex problems in partially-observable real-world 3D environments, such as visual navigation and autonomous driving, however, remains limited. In real-world problems, the evaluation environment is often unseen during the training which imposes further challenges. Developing robust and efficient RL algorithms for real-world problems that can generalise to unseen environments remains an open problem. One such limitation of RL algorithms is their lack of ability to remain optimistic in the face of tasks that require longer trajectories to complete. That lack of optimism in agents trained using previous RL methods often leads to a lower evaluated success rate. For instance, such an agent gives up on finding an object only after a few steps of searching for it while a longer search is likely to be successful. We hypothesise that such a lack of optimism is manifested in the agent’s underestimation of the expected future reward, i.e. the state-value function. To alleviate the issue we propose to enhance the agent’s state-value function approximator with more global information. In visual navigation, we do so by learning the spatio-temporal relationship between objects present in the environment. Another limitation of previously introduced RL algorithms is their lack of explicit modelling of the outcome of an action before committing to it, i.e. lack of imagination. Model-based RL algorithms have recently been successful in alleviating such limitations in simple toy environments. Building an accurate model of the environment dynamics in 3D visually complex scenes, however, remains infeasible. Therefore, in our second contribution, we hypothesise that a simpler dynamics model that only imagines the (sub-)goal state can achieve the best of both worlds; it avoids complicated modelling of the future per timestep while still alleviating the shortcomings resulting from the lack of imagination. Finally, in our third contribution, we take a step forward beyond single-agent problems to learn multi-agent interactions. In many real-world problems, e.g. autonomous driving, an agent needs to learn to interact with other potentially learning agents while maximising its own individual reward. Such selfish reward optimisation by every agent often leads to aggressive behaviour. We hypothesise that introducing an intrinsic reward for each agent that encourages caring for neighbours can alleviate this problem. As such, we introduce a new optimisation objective that uses information theory to promote less selfish behaviour across the population of the agents. Overall, our three contributions address three main limitations of single-agent and multiagent RL algorithms for solving real-world problems. Through empirical studies, we validate our three hypotheses and show our proposed methods outperform previous state-of-the-art.Thesis (Ph.D.) -- University of Adelaide, School of Computer Science, 202
    corecore