452 research outputs found

    Hierarchical Linearly-Solvable Markov Decision Problems

    Full text link
    We present a hierarchical reinforcement learning framework that formulates each task in the hierarchy as a special type of Markov decision process for which the Bellman equation is linear and has analytical solution. Problems of this type, called linearly-solvable MDPs (LMDPs) have interesting properties that can be exploited in a hierarchical setting, such as efficient learning of the optimal value function or task compositionality. The proposed hierarchical approach can also be seen as a novel alternative to solving LMDPs with large state spaces. We derive a hierarchical version of the so-called Z-learning algorithm that learns different tasks simultaneously and show empirically that it significantly outperforms the state-of-the-art learning methods in two classical hierarchical reinforcement learning domains: the taxi domain and an autonomous guided vehicle task.Comment: 11 pages, 6 figures, 26th International Conference on Automated Planning and Schedulin

    Nonparametric Infinite Horizon Kullback-Leibler Stochastic Control

    Full text link
    We present two nonparametric approaches to Kullback-Leibler (KL) control, or linearly-solvable Markov decision problem (LMDP) based on Gaussian processes (GP) and Nystr\"{o}m approximation. Compared to recently developed parametric methods, the proposed data-driven frameworks feature accurate function approximation and efficient on-line operations. Theoretically, we derive the mathematical connection of KL control based on dynamic programming with earlier work in control theory which relies on information theoretic dualities for the infinite time horizon case. Algorithmically, we give explicit optimal control policies in nonparametric forms, and propose on-line update schemes with budgeted computational costs. Numerical results demonstrate the effectiveness and usefulness of the proposed frameworks

    Reinforcement Learning: A Survey

    Full text link
    This paper surveys the field of reinforcement learning from a computer-science perspective. It is written to be accessible to researchers familiar with machine learning. Both the historical basis of the field and a broad selection of current work are summarized. Reinforcement learning is the problem faced by an agent that learns behavior through trial-and-error interactions with a dynamic environment. The work described here has a resemblance to work in psychology, but differs considerably in the details and in the use of the word ``reinforcement.'' The paper discusses central issues of reinforcement learning, including trading off exploration and exploitation, establishing the foundations of the field via Markov decision theory, learning from delayed reinforcement, constructing empirical models to accelerate learning, making use of generalization and hierarchy, and coping with hidden state. It concludes with a survey of some implemented systems and an assessment of the practical utility of current methods for reinforcement learning.Comment: See http://www.jair.org/ for any accompanying file

    See What the Robot Can't See: Learning Cooperative Perception for Visual Navigation

    Full text link
    We consider the problem of navigating a mobile robot towards a target in an unknown environment that is endowed with visual sensors, where neither the robot nor the sensors have access to global positioning information and only use first-person-view images. In order to overcome the need for positioning, we train the sensors to encode and communicate relevant viewpoint information to the mobile robot, whose objective it is to use this information to navigate as efficiently as possible to the target. We overcome the challenge of enabling all the sensors (even those that cannot directly see the target) to predict the direction along the shortest path to the target by implementing a neighborhood-based feature aggregation module using a Graph Neural Network (GNN) architecture. In our experiments, we first demonstrate generalizability to previously unseen environments with various sensor layouts. Our results show that by using communication between the sensors and the robot, we achieve up to 2.0x improvement in SPL (Success weighted by Path Length) when compared to a communication-free baseline. This is done without requiring a global map, positioning data, nor pre-calibration of the sensor network. Second, we perform a zero-shot transfer of our model from simulation to the real world. Laboratory experiments demonstrate the feasibility of our approach in various cluttered environments. Finally, we showcase examples of successful navigation to the target while the sensor network layout is dynamically reconfigured.Comment: Reformatting for IROS with updated result

    Approximate multi-agent planning in dynamic and uncertain environments

    Get PDF
    Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Aeronautics and Astronautics, February 2012."December 2011." Cataloged from PDF version of thesis.Includes bibliographical references (p. 120-131).Teams of autonomous mobile robotic agents will play an important role in the future of robotics. Efficient coordination of these agents within large, cooperative teams is an important characteristic of any system utilizing multiple autonomous vehicles. Applications of such a cooperative technology stretch beyond multi-robot systems to include satellite formations, networked systems, traffic flow, and many others. The diversity of capabilities offered by a team, as opposed to an individual, has attracted the attention of both researchers and practitioners in part due to the associated challenges such as the combinatorial nature of joint action selection among interdependent agents. This thesis aims to address the issues of the issues of scalability and adaptability within teams of such inter-dependent agents while planning, coordinating, and learning in a decentralized environment. In doing so, the first focus is the integration of learning and adaptation algorithms into a multi-agent planning architecture to enable online adaptation of planner parameters. A second focus is the development of approximation algorithms to reduce the computational complexity of decentralized multi-agent planning methods. Such a reduction improves problem scalability and ultimately enables much larger robot teams. Finally, we are interested in implementing these algorithms in meaningful, real-world scenarios. As robots and unmanned systems continue to advance technologically, enabling a self-awareness as to their physical state of health will become critical. In this context, the architecture and algorithms developed in this thesis are implemented in both hardware and software flight experiments under a class of cooperative multi-agent systems we call persistent health management scenarios.by Joshua David Redding.Ph.D

    Navigation with uncertain spatio-temporal resources

    Get PDF
    Supporting people with intelligent navigation instructions enables users to efficiently achieve trip-related objectives (e.g., minimum travel time or fuel consumption) and preserves them from making unnecessary detours. This, in turn, enables them to save time, money and, additionally, minimize CO2CO_2 emissions. For these reasons, manufacturers integrate navigation systems into almost all modern automobiles. Nevertheless, most of them support only simple routing instructions, i.e., how to drive from location A to B. Albeit, people are regularly faced with more complex decisions, e.g. navigating to a cheap gas station on the route while incorporating dynamic gas price changes. Another example-scenario is after reaching the destination, an available facility to park needs to be found. So far, people cruise almost randomly around the goal area in the search for a parking space. As a consequence, persons valuable time is consumed and unnecessary traffic arises. Besides private persons, transportation companies have to make complex mobility decisions. For instance, taxi drivers have to find out where to move next whenever the taxi is idle. There are plenty possibilities for where the taxi driver could go. In case the last drop-off was in a sparsely populated region, waiting for a call from the taxi office will likely result in a longer drive to the next customer. In turn, customer satisfaction decreases with a longer waiting time and implies a potential loss of customers. Recently, the number of data sources that potentially improve these mobility decisions increased. For instance, on-street parking sensors track the current state of the spaces (e.g. Melbourne), mobile applications collect taxi requests from customers and gas stations publish the current prices all in real-time. This thesis investigates the question of how to design algorithms such that they exploit this volatile data. Standard routing algorithms assume a static world. But the availability of passengers, gas prices and the availability of parking spots change over time in a non-deterministic manner. Hence, we model multiple real-world applications as Markov decision processes (MDP), i.e., a framework for sequential decision making under uncertainty. Depending on the task, we propose to solve the MDP with dynamic programming, replanning and hindsight planning or reinforcement learning. Ultimately, we combine all applications in a single problem domain. Subsequently, we propose a reinforcement learning approach that solves all applications in this domain without modification. Furthermore, it decouples the routing task from solving the application itself. Hence, it is transferable to previously unseen street networks without further training.Durch intelligente Navigationssysteme werden Verkehrsteilnehmer davor bewahrt, Umwege zu fahren. Dadurch sparen sie Zeit, Geld und verringern den CO2CO_2-Ausstoß. Aus diesem Grund verbauen Hersteller Navigationssysteme in fast allen NeuwĂ€gen. Bis heute unterstĂŒtzen die meisten Systeme nur einfache Routenplanung, die den kĂŒrzesten oder schnellsten Pfad von A nach B berechnen. Dennoch mĂŒssen Fahrer regelmĂ€ĂŸig Entscheidungen darĂŒber hinaus treffen. Beispielsweise soll eine möglichst gĂŒnstige Tankstelle auf dem Weg zum eigentlichen Ziel besucht werden. Allerdings kann diese ihre Preise, wĂ€hrend der Fahrer oder die Fahrerin auf dem Weg dort hin ist, dynamisch Ă€ndern. Anschließend muss, sobald das eigentliche Ziel erreicht ist, ein Parkplatz gefunden werden. Bisher fahren Parkplatzsuchende zufĂ€llig durch das Zielgebiet in der Hoffnung möglichst schnell einen freien Parkplatz zu finden. Die Suche verursacht zusĂ€tzlichen Verkehr und der Fahrer oder die Fahrerin verbringt mehr Zeit auf der Straße. Neben Privatpersonen mĂŒssen auch Transportunternehmen komplexe Entscheidungen ĂŒber Bewegungen treffen. Zum Beispiel muss ein Taxifahrer, wenn er gerade keinen Fahrgast hat, entscheiden, wo er sich als nĂ€chstes positioniert. Zwar könnte er am letzten Zielort warten, bis er einen Anruf der Taxizentrale bekommt. Falls jedoch der letzte Zielort in einem entlegenen Gebiet ist, muss der nĂ€chste Fahrgast wahrscheinlich lange warten, bis der Fahrer oder die Fahrerin bei ihm ankommt. Damit sinkt die Kundenzufriedenheit, was wiederum einen potentiellen Verlust der Kunden bedeutet. Seit Kurzem gibt es immer mehr Datenquellen, die Entscheidungen fĂŒr diese Probleme verbessern. Beispielsweise wird durch Parkplatzsensoren die VerfĂŒgbarkeit der ParkplĂ€tze verfolgt, mobile Anwendungen sammeln Anfragen ĂŒber FahrgĂ€ste und Tankstellen veröffentlichen ihren aktuellen Preis in Echtzeit. In dieser Arbeit wird der Forschungsfrage nachgegangen, wie Algorithmen gestaltet werden können, sodass diese verĂ€nderlichen Informationen verwendet werden können. Standard-Routing-Algorithmen gehen von einer statischen Welt aus. Aber die VerfĂŒgbarkeit von FahrgĂ€sten, die Tankstellenpreise und die ParkplatzzustĂ€nde Ă€ndern sich nicht deterministisch. Aus diesem Grund modellieren wir eine Reihe von Anwendungen als Markov-Entscheidungsproblem (MDP). ApplikationsabhĂ€ngig schlagen wir vor, das MDP mit dynamischer Programmierung, Replanning bzw. Hindsight Planning oder Reinforcement Learning zu lösen. Abschließend fassen wir alle Anwendungen in einer DomĂ€ne zusammen. Dadurch können wir einen Reinforcement Learning Ansatz definieren, der alle Anwendungen in dieser DomĂ€ne ohne Änderung lösen kann. Dieser Ansatz ermöglicht es, die Routenplanung von der eigentlichen Problemstellung zu lösen. Dadurch ist die gelernte Funktionsapproximation auch auf bisher unbekannte Straßennetze ohne weiteres Training anwendbar

    Enhancing Exploration and Safety in Deep Reinforcement Learning

    Get PDF
    A Deep Reinforcement Learning (DRL) agent tries to learn a policy maximizing a long-term objective by trials and errors in large state spaces. However, this learning paradigm requires a non-trivial amount of interactions in the environment to achieve good performance. Moreover, critical applications, such as robotics, typically involve safety criteria to consider while designing novel DRL solutions. Hence, devising safe learning approaches with efficient exploration is crucial to avoid getting stuck in local optima, failing to learn properly, or causing damages to the surrounding environment. This thesis focuses on developing Deep Reinforcement Learning algorithms to foster efficient exploration and safer behaviors in simulation and real domains of interest, ranging from robotics to multi-agent systems. To this end, we rely both on standard benchmarks, such as SafetyGym, and robotic tasks widely adopted in the literature (e.g., manipulation, navigation). This variety of problems is crucial to assess the statistical significance of our empirical studies and the generalization skills of our approaches. We initially benchmark the sample efficiency versus performance trade-off between value-based and policy-gradient algorithms. This part highlights the benefits of using non-standard simulation environments (i.e., Unity), which also facilitates the development of further optimization for DRL. We also discuss the limitations of standard evaluation metrics (e.g., return) in characterizing the actual behaviors of a policy, proposing the use of Formal Verification (FV) as a practical methodology to evaluate behaviors over desired specifications. The second part introduces Evolutionary Algorithms (EAs) as a gradient-free complimentary optimization strategy. In detail, we combine population-based and gradient-based DRL to diversify exploration and improve performance both in single and multi-agent applications. For the latter, we discuss how prior Multi-Agent (Deep) Reinforcement Learning (MARL) approaches hinder exploration, proposing an architecture that favors cooperation without affecting exploration
    • 

    corecore