1,343 research outputs found

    The robot routing problem for collecting aggregate stochastic rewards

    Get PDF
    We propose a new model for formalizing reward collection problems on graphs with dynamically generated rewards which may appear and disappear based on a stochastic model. The robot routing problem is modeled as a graph whose nodes are stochastic processes generating potential rewards over discrete time. The rewards are generated according to the stochastic process, but at each step, an existing reward disappears with a given probability. The edges in the graph encode the (unit-distance) paths between the rewards' locations. On visiting a node, the robot collects the accumulated reward at the node at that time, but traveling between the nodes takes time. The optimization question asks to compute an optimal (or epsilon-optimal) path that maximizes the expected collected rewards. We consider the finite and infinite-horizon robot routing problems. For finite-horizon, the goal is to maximize the total expected reward, while for infinite horizon we consider limit-average objectives. We study the computational and strategy complexity of these problems, establish NP-lower bounds and show that optimal strategies require memory in general. We also provide an algorithm for computing epsilon-optimal infinite paths for arbitrary epsilon > 0

    Heuristics for the traveling repairman problem with profits

    Get PDF
    In the traveling repairman problem with profits, a repairman (also known as the server) visits a subset of nodes in order to collect time-dependent profits. The objective consists of maximizing the total collected revenue. We restrict our study to the case of a single server with nodes located in the Euclidean plane. We investigate properties of this problem, and we derive a mathematical model assuming that the number of visited nodes is known in advance. We describe a tabu search algorithm with multiple neighborhoods, and we test its performance by running it on instances based on TSPLIB. We conclude that the tabu search algorithm finds good-quality solutions fast, even for large instances

    Can Differentiable Decision Trees Learn Interpretable Reward Functions?

    Full text link
    There is an increasing interest in learning reward functions that model human intent and human preferences. However, many frameworks use blackbox learning methods that, while expressive, are difficult to interpret. We propose and evaluate a novel approach for learning expressive and interpretable reward functions from preferences using Differentiable Decision Trees (DDTs) for both low- and high-dimensional state inputs. We explore and discuss the viability of learning interpretable reward functions using DDTs by evaluating our algorithm on Cartpole, Visual Gridworld environments, and Atari games. We provide evidence that that the tree structure of our learned reward function is useful in determining the extent to which a reward function is aligned with human preferences. We visualize the learned reward DDTs and find that they are capable of learning interpretable reward functions but that the discrete nature of the trees hurts the performance of reinforcement learning at test time. However, we also show evidence that using soft outputs (averaged over all leaf nodes) results in competitive performance when compared with larger capacity deep neural network reward functions

    Advances in Reinforcement Learning

    Get PDF
    Reinforcement Learning (RL) is a very dynamic area in terms of theory and application. This book brings together many different aspects of the current research on several fields associated to RL which has been growing rapidly, producing a wide variety of learning algorithms for different applications. Based on 24 Chapters, it covers a very broad variety of topics in RL and their application in autonomous systems. A set of chapters in this book provide a general overview of RL while other chapters focus mostly on the applications of RL paradigms: Game Theory, Multi-Agent Theory, Robotic, Networking Technologies, Vehicular Navigation, Medicine and Industrial Logistic

    Online planning for multi-robot active perception with self-organising maps

    Full text link
    © 2017, Springer Science+Business Media, LLC, part of Springer Nature. We propose a self-organising map (SOM) algorithm as a solution to a new multi-goal path planning problem for active perception and data collection tasks. We optimise paths for a multi-robot team that aims to maximally observe a set of nodes in the environment. The selected nodes are observed by visiting associated viewpoint regions defined by a sensor model. The key problem characteristics are that the viewpoint regions are overlapping polygonal continuous regions, each node has an observation reward, and the robots are constrained by travel budgets. The SOM algorithm jointly selects and allocates nodes to the robots and finds favourable sequences of sensing locations. The algorithm has a runtime complexity that is polynomial in the number of nodes to be observed and the magnitude of the relative weighting of rewards. We show empirically the runtime is sublinear in the number of robots. We demonstrate feasibility for the active perception task of observing a set of 3D objects. The viewpoint regions consider sensing ranges and self-occlusions, and the rewards are measured as discriminability in the ensemble of shape functions feature space. Exploration objectives for online tasks where the environment is only partially known in advance are modelled by introducing goal regions in unexplored space. Online replanning is performed efficiently by adapting previous solutions as new information becomes available. Simulations were performed using a 3D point-cloud dataset from a real robot in a large outdoor environment. Our results show the proposed methods enable multi-robot planning for online active perception tasks with continuous sets of candidate viewpoints and long planning horizons

    Strategies for Scaleable Communication and Coordination in Multi-Agent (UAV) Systems

    Get PDF
    A system is considered in which agents (UAVs) must cooperatively discover interest-points (i.e., burning trees, geographical features) evolving over a grid. The objective is to locate as many interest-points as possible in the shortest possible time frame. There are two main problems: a control problem, where agents must collectively determine the optimal action, and a communication problem, where agents must share their local states and infer a common global state. Both problems become intractable when the number of agents is large. This survey/concept paper curates a broad selection of work in the literature pointing to a possible solution; a unified control/communication architecture within the framework of reinforcement learning. Two components of this architecture are locally interactive structure in the state-space, and hierarchical multi-level clustering for system-wide communication. The former mitigates the complexity of the control problem and the latter adapts to fundamental throughput constraints in wireless networks. The challenges of applying reinforcement learning to multi-agent systems are discussed. The role of clustering is explored in multi-agent communication. Research directions are suggested to unify these components

    Energy-Constrained Active Exploration Under Incremental-Resolution Symbolic Perception

    Full text link
    In this work, we consider the problem of autonomous exploration in search of targets while respecting a fixed energy budget. The robot is equipped with an incremental-resolution symbolic perception module wherein the perception of targets in the environment improves as the robot's distance from targets decreases. We assume no prior information about the total number of targets, their locations as well as their possible distribution within the environment. This work proposes a novel decision-making framework for the resulting constrained sequential decision-making problem by first converting it into a reward maximization problem on a product graph computed offline. It is then solved online as a Mixed-Integer Linear Program (MILP) where the knowledge about the environment is updated at each step, combining automata-based and MILP-based techniques. We demonstrate the efficacy of our approach with the help of a case study and present empirical evaluation in terms of expected regret. Furthermore, the runtime performance shows that online planning can be efficiently performed for moderately-sized grid environments
    • …
    corecore