8 research outputs found

    Optimistic Agent: Accurate Graph-Based Value Estimation for More Successful Visual Navigation

    Full text link
    We humans can impeccably search for a target object, given its name only, even in an unseen environment. We argue that this ability is largely due to three main reasons: the incorporation of prior knowledge (or experience), the adaptation of it to the new environment using the observed visual cues and most importantly optimistically searching without giving up early. This is currently missing in the state-of-the-art visual navigation methods based on Reinforcement Learning (RL). In this paper, we propose to use externally learned prior knowledge of the relative object locations and integrate it into our model by constructing a neural graph. In order to efficiently incorporate the graph without increasing the state-space complexity, we propose our Graph-based Value Estimation (GVE) module. GVE provides a more accurate baseline for estimating the Advantage function in actor-critic RL algorithm. This results in reduced value estimation error and, consequently, convergence to a more optimal policy. Through empirical studies, we show that our agent, dubbed as the optimistic agent, has a more realistic estimate of the state value during a navigation episode which leads to a higher success rate. Our extensive ablation studies show the efficacy of our simple method which achieves the state-of-the-art results measured by the conventional visual navigation metrics, e.g. Success Rate (SR) and Success weighted by Path Length (SPL), in AI2THOR environment.Comment: Accepted for publication at WACV 202

    Fast exploration and learning of latent graphs with aliased observations

    Full text link
    We consider the problem of recovering a latent graph where the observations at each node are \emph{aliased}, and transitions are stochastic. Observations are gathered by an agent traversing the graph. Aliasing means that multiple nodes emit the same observation, so the agent can not know in which node it is located. The agent needs to uncover the hidden topology as accurately as possible and in as few steps as possible. This is equivalent to efficient recovery of the transition probabilities of a partially observable Markov decision process (POMDP) in which the observation probabilities are known. An algorithm for efficiently exploring (and ultimately recovering) the latent graph is provided. Our approach is exponentially faster than naive exploration in a variety of challenging topologies with aliased observations while remaining competitive with existing baselines in the unaliased regime

    Autonomous exploration of hierarchical scene graphs

    Get PDF
    L'exploraci贸 rob貌tica aut貌noma 茅s un camp de recerca actiu, on els m猫todes de percepci贸 rob貌tica hi abunden. Els m猫todes basats en grafs, en particular, s贸n una manera de representar l'entorn de forma eficient, i ofereixen una base sobre la que raonar a alt nivell per resoldre tasques de l'脿mbit de la rob貌tica. Proposem un sistema per generar grafs jer脿rquics d'escena autom脿ticament a partir d'entorns foto-realistes. En aquest treball emprem un m猫tode de percepci贸 basat en grafs, Hydra, en combinaci贸 amb un simulador 3D anomenat Habitat-Sim, per explorar i generar representacions en forma de grafs d'escena 3D dels entorns tridimensionals simulats. Aquest sistema i les dades que n'han derivat ens donen una base sobre la que establim un m猫tode general per resoldre tasques d'exploraci贸 en entorns tridimensionals mitjan莽ant Xarxes Neuronals per a Grafs i Aprenentatge per Refor莽.La exploraci贸n rob贸tica aut贸noma es un campo de investigaci贸n activo, donde los m茅todos de percepci贸n rob贸tica abundan. Los m茅todos basados en grafos, en particular, son una forma de representar el entorno de forma eficiente, y ofrecen una base sobre la que razonar a alto nivel para resolver tareas del 谩mbito de la rob贸tica. Proponemos un sistema para generar grafos jer谩rquicos de escena autom谩ticamente a partir de entornos fotorealistas. En este trabajo usamos un m茅todo de percepci贸n basado en grafos, Hydra, en combinaci贸n con un simulador 3D llamado Habitat-Sim, para explorar y generar representaciones en forma de grafos de escena 3D de los entornos tridimensionales simulados. Este sistema y los datos que han derivado de 茅l nos dan una base sobre la que establecemos un m茅todo general para resolver tareas de exploraci贸n en entornos tridimensionales mediante Redes Neuronales para Grafos y Aprendizaje por Refuerzo.Robotic autonomous exploration is an active field of research, where robot perception pipelines abound. Graph-based pipelines, in particular, are a way to represent the environment efficiently, and provide grounds for reasoning on a high level to solve robotics tasks. We propose a framework to generate hierarchical scene graphs automatically from photo-realistic environments. In this thesis, a graph perception pipeline, Hydra, is employed in combination with Habitat-Sim, a 3D simulator, to explore and generate 3D scene graph representations from the simulated 3D maps. This framework and data have provided the grounds to establish a general pipeline for solving exploration tasks in 3D environments using Graph Neural Networks and Reinforcement Learning.Outgoin

    Incorporating Linear Dependencies into Graph Gaussian Processes

    Get PDF
    Graph Gaussian processes are an important technique for learning unknown functions on graphs while quantifying uncertainty. These processes encode prior information by using kernels that reflect the structure of the graph, allowing function values at nearby nodes to be correlated. However, there are limited choices for kernels on graphs, and most existing graph kernels can be shown to rely on the graph Laplacian and behave in a manner that resembles Euclidean radial basis functions. In many applications, additional prior information which goes beyond the graph structure encoded in Laplacian is available: in this work, we study the case where the dependencies between nodes in the target function are known as linear, possibly up to some noise. We propose a type of kernel for graph Gaussian processes that incorporate linear dependencies between nodes, based on an inter-domain-type construction. We show that this construction results in kernels that can encode directed information, and are robust under misspecified linear dependencies. We also show that the graph Mat茅rn kernel, one of the commonly used Laplacian-based kernels, can be obtained as a special case of this construction. We illustrate the properties of these kernels on a set of synthetic examples. We then evaluate these kernels in a real-world traffic speed prediction task, and show that they easily out-perform the baseline kernels. We also use these kernels to learn offline reinforcement learning policies in maze environments. We show that they are significantly more stable and data-efficient than strong baselines, and they can incorporate prior information to generalize to unseen tasks

    Towards Optimistic, Imaginative, and Harmonious Reinforcement Learning in Single-Agent and Multi-Agent Environments

    Get PDF
    Reinforcement Learning (RL) has recently gained tremendous attention from the research community. Different algorithms have been proposed to tackle a variety of singleagent and multi-agent problems. The fast pace of growth has primarily been driven by the availability of several simplistic toy simulation environments, such as Atari and DeepMind Control Suite. The capability of most of those algorithms to solve complex problems in partially-observable real-world 3D environments, such as visual navigation and autonomous driving, however, remains limited. In real-world problems, the evaluation environment is often unseen during the training which imposes further challenges. Developing robust and efficient RL algorithms for real-world problems that can generalise to unseen environments remains an open problem. One such limitation of RL algorithms is their lack of ability to remain optimistic in the face of tasks that require longer trajectories to complete. That lack of optimism in agents trained using previous RL methods often leads to a lower evaluated success rate. For instance, such an agent gives up on finding an object only after a few steps of searching for it while a longer search is likely to be successful. We hypothesise that such a lack of optimism is manifested in the agent鈥檚 underestimation of the expected future reward, i.e. the state-value function. To alleviate the issue we propose to enhance the agent鈥檚 state-value function approximator with more global information. In visual navigation, we do so by learning the spatio-temporal relationship between objects present in the environment. Another limitation of previously introduced RL algorithms is their lack of explicit modelling of the outcome of an action before committing to it, i.e. lack of imagination. Model-based RL algorithms have recently been successful in alleviating such limitations in simple toy environments. Building an accurate model of the environment dynamics in 3D visually complex scenes, however, remains infeasible. Therefore, in our second contribution, we hypothesise that a simpler dynamics model that only imagines the (sub-)goal state can achieve the best of both worlds; it avoids complicated modelling of the future per timestep while still alleviating the shortcomings resulting from the lack of imagination. Finally, in our third contribution, we take a step forward beyond single-agent problems to learn multi-agent interactions. In many real-world problems, e.g. autonomous driving, an agent needs to learn to interact with other potentially learning agents while maximising its own individual reward. Such selfish reward optimisation by every agent often leads to aggressive behaviour. We hypothesise that introducing an intrinsic reward for each agent that encourages caring for neighbours can alleviate this problem. As such, we introduce a new optimisation objective that uses information theory to promote less selfish behaviour across the population of the agents. Overall, our three contributions address three main limitations of single-agent and multiagent RL algorithms for solving real-world problems. Through empirical studies, we validate our three hypotheses and show our proposed methods outperform previous state-of-the-art.Thesis (Ph.D.) -- University of Adelaide, School of Computer Science, 202
    corecore