352 research outputs found

    Reinforcement Learning: A Survey

    Full text link
    This paper surveys the field of reinforcement learning from a computer-science perspective. It is written to be accessible to researchers familiar with machine learning. Both the historical basis of the field and a broad selection of current work are summarized. Reinforcement learning is the problem faced by an agent that learns behavior through trial-and-error interactions with a dynamic environment. The work described here has a resemblance to work in psychology, but differs considerably in the details and in the use of the word ``reinforcement.'' The paper discusses central issues of reinforcement learning, including trading off exploration and exploitation, establishing the foundations of the field via Markov decision theory, learning from delayed reinforcement, constructing empirical models to accelerate learning, making use of generalization and hierarchy, and coping with hidden state. It concludes with a survey of some implemented systems and an assessment of the practical utility of current methods for reinforcement learning.Comment: See http://www.jair.org/ for any accompanying file

    Adaptive dynamic programming with eligibility traces and complexity reduction of high-dimensional systems

    Get PDF
    This dissertation investigates the application of a variety of computational intelligence techniques, particularly clustering and adaptive dynamic programming (ADP) designs especially heuristic dynamic programming (HDP) and dual heuristic programming (DHP). Moreover, a one-step temporal-difference (TD(0)) and n-step TD (TD(λ)) with their gradients are utilized as learning algorithms to train and online-adapt the families of ADP. The dissertation is organized into seven papers. The first paper demonstrates the robustness of model order reduction (MOR) for simulating complex dynamical systems. Agglomerative hierarchical clustering based on performance evaluation is introduced for MOR. This method computes the reduced order denominator of the transfer function by clustering system poles in a hierarchical dendrogram. Several numerical examples of reducing techniques are taken from the literature to compare with our work. In the second paper, a HDP is combined with the Dyna algorithm for path planning. The third paper uses DHP with an eligibility trace parameter (λ) to track a reference trajectory under uncertainties for a nonholonomic mobile robot by using a first-order Sugeno fuzzy neural network structure for the critic and actor networks. In the fourth and fifth papers, a stability analysis for a model-free action-dependent HDP(λ) is demonstrated with batch- and online-implementation learning, respectively. The sixth work combines two different gradient prediction levels of critic networks. In this work, we provide a convergence proofs. The seventh paper develops a two-hybrid recurrent fuzzy neural network structures for both critic and actor networks. They use a novel n-step gradient temporal-difference (gradient of TD(λ)) of an advanced ADP algorithm called value-gradient learning (VGL(λ)), and convergence proofs are given. Furthermore, the seventh paper is the first to combine the single network adaptive critic with VGL(λ). --Abstract, page iv

    Adaptive and intelligent navigation of autonomous planetary rovers - A survey

    Get PDF
    The application of robotics and autonomous systems in space has increased dramatically. The ongoing Mars rover mission involving the Curiosity rover, along with the success of its predecessors, is a key milestone that showcases the existing capabilities of robotic technology. Nevertheless, there has still been a heavy reliance on human tele-operators to drive these systems. Reducing the reliance on human experts for navigational tasks on Mars remains a major challenge due to the harsh and complex nature of the Martian terrains. The development of a truly autonomous rover system with the capability to be effectively navigated in such environments requires intelligent and adaptive methods fitting for a system with limited resources. This paper surveys a representative selection of work applicable to autonomous planetary rover navigation, discussing some ongoing challenges and promising future research directions from the perspectives of the authors

    Planning with neural networks and reinforcement learning

    Get PDF
    This thesis presents the design, implementation and investigation of some predictive-planning controllers built with neural-networks and inspired by Dyna-PI architectures (Sutton, 1990). Dyna-PI architectures are planning systems based on actor-critic reinforcement learning methods and a model of the environment. The controllers are tested with a simulated robot that solves a stochastic path-finding landmark navigation task. A critical review of ideas and models proposed by the literature on problem solving, planning, reinforcement learning, and neural networks precedes the presentation of the controllers. The review isolates ideas relevant to the design of planners based on neural networks. A "neural forward planner" is implemented that, unlike the Dyna-PI architectures, is taskable in a strong sense. This planner is capable of building a "partial policy" focussed on around efficient start-goal paths, and is capable of deciding to re-plan if "unexpected" states are encountered. Planning iteratively generates "chains of predictions" starting from the current state and using the model of the environment. This model is made up by some neural networks trained to predict the next input when an action is executed. A "neural bidirectional planner" that generates trajectories backward from the goal and forward from the current state is also implemented. This planner exploits the knowledge (image) on the goal, further focuses planning around efficient start-goal paths, and produces a quicker updating of evaluations. In several experiments the generalisation capacity of neural networks proves important for learning but it also causes problems of interference. To deal with these problems a modular neural architecture is implemented, that uses a mixture of experts network for the critic, and a simple hierarchical modular network for the actor. The research also implements a simple form of neural abstract planning named "coarse planning", and investigates its strengths in terms of exploration and evaluations\u27 updating. Some experiments with coarse planning and with other controllers suggest that discounted reinforcement learning may have problems dealing with long-lasting tasks

    Speeding-up Action Learning in a Social Robot with Dyna-Q+: A Bioinspired Probabilistic Model Approach

    Get PDF
    Robotic systems that are developed for social and dynamic environments require adaptive mechanisms to successfully operate. Consequently, learning from rewards has provided meaningful results in applications involving human-robot interaction. In those cases where the robot's state space and the number of actions is extensive, dimensionality becomes intractable and this drastically slows down the learning process. This effect is specially notorious in one-step temporal difference methods because just one update is performed per robot-environment interaction. In this paper, we prove how the action-based learning of a social robot can be improved by combining classical temporal difference reinforcement learning methods, such as Q-learning or Q( λ), with a probabilistic model of the environment. This architecture, which we have called Dyna, allows the robot to simultaneously act and plan using the experience obtained during real human-robot interactions. Principally, Dyna improves classical algorithms in terms of convergence speed and stability, which strengthens the learning process. Hence, in this work we have embedded a Dyna architecture in our social robot, Mini, to endow it with the ability to autonomously maintain an optimal internal state while living in a dynamic environment

    A Deterministic Improved Q-Learning for Path Planning of a Mobile Robot

    Get PDF
    This paper provides a new deterministic Q-learning with a presumed knowledge about the distance from the current state to both the next state and the goal. This knowledge is efficiently used to update the entries in the Q-table once only by utilizing four derived properties of the Q-learning, instead of repeatedly updating them like the classical Q-learning. Naturally, the proposed algorithm has an insignificantly small time complexity in comparison to its classical counterpart. Furthermore, the proposed algorithm stores the Q-value for the best possible action at a state and thus saves significant storage. Experiments undertaken on simulated maze and real platforms confirm that the Q-table obtained by the proposed Q-learning when used for the path-planning application of mobile robots outperforms both the classical and the extended Q-learning with respect to three metrics: traversal time, number of states traversed, and 90° turns required. The reduction in 90° turnings minimizes the energy consumption and thus has importance in the robotics literature

    Online Learning and Planning for Crowd-aware Service Robot Navigation

    Full text link
    Mobile service robots are increasingly used in indoor environments (e.g., shopping malls or museums) among large crowds of people. To efficiently navigate in these environments, such a robot should be able to exhibit a variety of behaviors. It should avoid crowded areas, and not oppose the flow of the crowd. It should be able to identify and avoid specific crowds that result in additional delays (e.g., children in a particular area might slow down the robot). and to seek out a crowd if its task requires it to interact with as many people as possible. These behaviors require the ability to learn and model crowd behavior in an environment. Earlier work used a dataset of paths navigated by people to solve this problem. That approach is expensive, risks privacy violations, and can become outdated as the environment evolves. To overcome these drawbacks, this thesis proposes a new approach where the robot learns models of crowd behavior online and relies only on local onboard sensors. This work develops and tests multiple planners that leverage these models in simulated environments and demonstrate statistically significant improvements in performance. The work reported here is applicable not only to navigation to target locations, but also to a variety of other services

    Stochastic Search Methods for Mobile Manipulators

    Get PDF
    Mobile manipulators are a potential solution to the increasing need for additional flexibility and mobility in industrial applications. However, they tend to lack the accuracy and precision achieved by fixed manipulators, especially in scenarios where both the manipulator and the autonomous vehicle move simultaneously. This paper analyzes the problem of dynamically evaluating the positioning error of mobile manipulators. In particular, it investigates the use of Bayesian methods to predict the position of the end-effector in the presence of uncertainty propagated from the mobile platform. The precision of the mobile manipulator is evaluated through its ability to intercept retroreflective markers using a photoelectric sensor attached to the end-effector. Compared to a deterministic search approach, we observed improved robustness with comparable search times, thereby enabling effective calibration of the mobile manipulator

    Energy efficient path planning: the effectiveness of Q-learning algorithm in saving energy

    Get PDF
    Includes bibliographical references.In this thesis the author investigated the use of a Q-learning based path planning algorithm to investigate how effective it is in saving energy. It is important to pursue any means to save energy in this day and age, due to the excessive exploitation of natural resources and in order to prevent drops in production in industrial environments where less downtime is necessary or other applications where a mobile robot running out of energy can be costly or even disastrous, such as search and rescue operations or dangerous environment navigation. The study was undertaken by implementing a Q-learning based path planning algorithm in several unstructured and unknown environments. A cell decomposition method was used to generate the search space representation of the environments, within which the algorithm operated. The results show that the Q-learning path planner paths on average consumed 3.04% less energy than the A* path planning algorithm, in a square 20% obstacle density environment. The Q-learning path planner consumed on average 5.79% more energy than the least energy paths for the same environment. In the case of rectangular environments, the Q-learning path planning algorithm uses 1.68% less energy, than the A* path algorithm and 3.26 % more energy than the least energy paths. The implication of this study is to highlight the need for the use of learning algorithm in attempting to solve problems whose existing solutions are not learning based, in order to obtain better solutions
    corecore