Search CORE

8 research outputs found

Optimistic Agent: Accurate Graph-Based Value Estimation for More Successful Visual Navigation

Author: Abbasnejad Ehsan
Moghaddam Mahdi Kazemi
Shi Javen Qinfeng
Wu Qi
Publication venue
Publication date: 06/12/2020
Field of study

We humans can impeccably search for a target object, given its name only, even in an unseen environment. We argue that this ability is largely due to three main reasons: the incorporation of prior knowledge (or experience), the adaptation of it to the new environment using the observed visual cues and most importantly optimistically searching without giving up early. This is currently missing in the state-of-the-art visual navigation methods based on Reinforcement Learning (RL). In this paper, we propose to use externally learned prior knowledge of the relative object locations and integrate it into our model by constructing a neural graph. In order to efficiently incorporate the graph without increasing the state-space complexity, we propose our Graph-based Value Estimation (GVE) module. GVE provides a more accurate baseline for estimating the Advantage function in actor-critic RL algorithm. This results in reduced value estimation error and, consequently, convergence to a more optimal policy. Through empirical studies, we show that our agent, dubbed as the optimistic agent, has a more realistic estimate of the state value during a navigation episode which leads to a higher success rate. Our extensive ablation studies show the efficacy of our simple method which achieves the state-of-the-art results measured by the conventional visual navigation metrics, e.g. Success Rate (SR) and Success weighted by Path Length (SPL), in AI2THOR environment.Comment: Accepted for publication at WACV 202

arXiv.org e-Print Archive

Adelaide Research & Scholarship

Fast exploration and learning of latent graphs with aliased observations

Author: Dave Meet
Deshpande Ishan
George Dileep
Lazaro-Gredilla Miguel
Swaminathan Sivaramakrishnan
Publication venue
Publication date: 25/09/2023
Field of study

We consider the problem of recovering a latent graph where the observations at each node are \emph{aliased}, and transitions are stochastic. Observations are gathered by an agent traversing the graph. Aliasing means that multiple nodes emit the same observation, so the agent can not know in which node it is located. The agent needs to uncover the hidden topology as accurately as possible and in as few steps as possible. This is equivalent to efficient recovery of the transition probabilities of a partially observable Markov decision process (POMDP) in which the observation probabilities are known. An algorithm for efficiently exploring (and ultimately recovering) the latent graph is provided. Our approach is exponentially faster than naive exploration in a variety of challenging topologies with aliased observations while remaining competitive with existing baselines in the unaliased regime

arXiv.org e-Print Archive

Autonomous exploration of hierarchical scene graphs

Author: Batlle Casellas Àlex
Publication venue: Universitat Politècnica de Catalunya
Publication date: 20/10/2023
Field of study

L'exploració robòtica autònoma és un camp de recerca actiu, on els mètodes de percepció robòtica hi abunden. Els mètodes basats en grafs, en particular, són una manera de representar l'entorn de forma eficient, i ofereixen una base sobre la que raonar a alt nivell per resoldre tasques de l'àmbit de la robòtica. Proposem un sistema per generar grafs jeràrquics d'escena automàticament a partir d'entorns foto-realistes. En aquest treball emprem un mètode de percepció basat en grafs, Hydra, en combinació amb un simulador 3D anomenat Habitat-Sim, per explorar i generar representacions en forma de grafs d'escena 3D dels entorns tridimensionals simulats. Aquest sistema i les dades que n'han derivat ens donen una base sobre la que establim un mètode general per resoldre tasques d'exploració en entorns tridimensionals mitjançant Xarxes Neuronals per a Grafs i Aprenentatge per Reforç.La exploración robótica autónoma es un campo de investigación activo, donde los métodos de percepción robótica abundan. Los métodos basados en grafos, en particular, son una forma de representar el entorno de forma eficiente, y ofrecen una base sobre la que razonar a alto nivel para resolver tareas del ámbito de la robótica. Proponemos un sistema para generar grafos jerárquicos de escena automáticamente a partir de entornos fotorealistas. En este trabajo usamos un método de percepción basado en grafos, Hydra, en combinación con un simulador 3D llamado Habitat-Sim, para explorar y generar representaciones en forma de grafos de escena 3D de los entornos tridimensionales simulados. Este sistema y los datos que han derivado de él nos dan una base sobre la que establecemos un método general para resolver tareas de exploración en entornos tridimensionales mediante Redes Neuronales para Grafos y Aprendizaje por Refuerzo.Robotic autonomous exploration is an active field of research, where robot perception pipelines abound. Graph-based pipelines, in particular, are a way to represent the environment efficiently, and provide grounds for reasoning on a high level to solve robotics tasks. We propose a framework to generate hierarchical scene graphs automatically from photo-realistic environments. In this thesis, a graph perception pipeline, Hydra, is employed in combination with Habitat-Sim, a 3D simulator, to explore and generate 3D scene graph representations from the simulated 3D maps. This framework and data have provided the grounds to establish a general pipeline for solving exploration tasks in 3D environments using Graph Neural Networks and Reinforcement Learning.Outgoin

UPCommons. Portal del coneixement obert de la UPC

Incorporating Linear Dependencies into Graph Gaussian Processes

Author: Zhang Yueheng
Publication venue: 'University of Waterloo'
Publication date: 23/08/2023
Field of study

Graph Gaussian processes are an important technique for learning unknown functions on graphs while quantifying uncertainty. These processes encode prior information by using kernels that reflect the structure of the graph, allowing function values at nearby nodes to be correlated. However, there are limited choices for kernels on graphs, and most existing graph kernels can be shown to rely on the graph Laplacian and behave in a manner that resembles Euclidean radial basis functions. In many applications, additional prior information which goes beyond the graph structure encoded in Laplacian is available: in this work, we study the case where the dependencies between nodes in the target function are known as linear, possibly up to some noise. We propose a type of kernel for graph Gaussian processes that incorporate linear dependencies between nodes, based on an inter-domain-type construction. We show that this construction results in kernels that can encode directed information, and are robust under misspecified linear dependencies. We also show that the graph Matérn kernel, one of the commonly used Laplacian-based kernels, can be obtained as a special case of this construction. We illustrate the properties of these kernels on a set of synthetic examples. We then evaluate these kernels in a real-world traffic speed prediction task, and show that they easily out-perform the baseline kernels. We also use these kernels to learn offline reinforcement learning policies in maze environments. We show that they are significantly more stable and data-efficient than strong baselines, and they can incorporate prior information to generalize to unseen tasks

University of Waterloo's Institutional Repository

Towards Optimistic, Imaginative, and Harmonious Reinforcement Learning in Single-Agent and Multi-Agent Environments

Author: Kazemi Moghaddam Mohammad Mahdi
Publication venue
Publication date: 01/01/2022
Field of study

Reinforcement Learning (RL) has recently gained tremendous attention from the research community. Different algorithms have been proposed to tackle a variety of singleagent and multi-agent problems. The fast pace of growth has primarily been driven by the availability of several simplistic toy simulation environments, such as Atari and DeepMind Control Suite. The capability of most of those algorithms to solve complex problems in partially-observable real-world 3D environments, such as visual navigation and autonomous driving, however, remains limited. In real-world problems, the evaluation environment is often unseen during the training which imposes further challenges. Developing robust and efficient RL algorithms for real-world problems that can generalise to unseen environments remains an open problem. One such limitation of RL algorithms is their lack of ability to remain optimistic in the face of tasks that require longer trajectories to complete. That lack of optimism in agents trained using previous RL methods often leads to a lower evaluated success rate. For instance, such an agent gives up on finding an object only after a few steps of searching for it while a longer search is likely to be successful. We hypothesise that such a lack of optimism is manifested in the agent’s underestimation of the expected future reward, i.e. the state-value function. To alleviate the issue we propose to enhance the agent’s state-value function approximator with more global information. In visual navigation, we do so by learning the spatio-temporal relationship between objects present in the environment. Another limitation of previously introduced RL algorithms is their lack of explicit modelling of the outcome of an action before committing to it, i.e. lack of imagination. Model-based RL algorithms have recently been successful in alleviating such limitations in simple toy environments. Building an accurate model of the environment dynamics in 3D visually complex scenes, however, remains infeasible. Therefore, in our second contribution, we hypothesise that a simpler dynamics model that only imagines the (sub-)goal state can achieve the best of both worlds; it avoids complicated modelling of the future per timestep while still alleviating the shortcomings resulting from the lack of imagination. Finally, in our third contribution, we take a step forward beyond single-agent problems to learn multi-agent interactions. In many real-world problems, e.g. autonomous driving, an agent needs to learn to interact with other potentially learning agents while maximising its own individual reward. Such selfish reward optimisation by every agent often leads to aggressive behaviour. We hypothesise that introducing an intrinsic reward for each agent that encourages caring for neighbours can alleviate this problem. As such, we introduce a new optimisation objective that uses information theory to promote less selfish behaviour across the population of the agents. Overall, our three contributions address three main limitations of single-agent and multiagent RL algorithms for solving real-world problems. Through empirical studies, we validate our three hypotheses and show our proposed methods outperform previous state-of-the-art.Thesis (Ph.D.) -- University of Adelaide, School of Computer Science, 202

Adelaide Research & Scholarship