24 research outputs found

    Neuroevolutionary reinforcement learning for generalized control of simulated helicopters

    Get PDF
    This article presents an extended case study in the application of neuroevolution to generalized simulated helicopter hovering, an important challenge problem for reinforcement learning. While neuroevolution is well suited to coping with the domain’s complex transition dynamics and high-dimensional state and action spaces, the need to explore efficiently and learn on-line poses unusual challenges. We propose and evaluate several methods for three increasingly challenging variations of the task, including the method that won first place in the 2008 Reinforcement Learning Competition. The results demonstrate that (1) neuroevolution can be effective for complex on-line reinforcement learning tasks such as generalized helicopter hovering, (2) neuroevolution excels at finding effective helicopter hovering policies but not at learning helicopter models, (3) due to the difficulty of learning reliable models, model-based approaches to helicopter hovering are feasible only when domain expertise is available to aid the design of a suitable model representation and (4) recent advances in efficient resampling can enable neuroevolution to tackle more aggressively generalized reinforcement learning tasks

    Safe Reinforcement Learning with Contrastive Risk Prediction

    Full text link
    As safety violations can lead to severe consequences in real-world robotic applications, the increasing deployment of Reinforcement Learning (RL) in robotic domains has propelled the study of safe exploration for reinforcement learning (safe RL). In this work, we propose a risk preventive training method for safe RL, which learns a statistical contrastive classifier to predict the probability of a state-action pair leading to unsafe states. Based on the predicted risk probabilities, we can collect risk preventive trajectories and reshape the reward function with risk penalties to induce safe RL policies. We conduct experiments in robotic simulation environments. The results show the proposed approach has comparable performance with the state-of-the-art model-based methods and outperforms conventional model-free safe RL approaches

    Prediction of On-Disk Velocity Across a Coaxial Rotor with XGBoost

    Get PDF
    Recent updates in finite state inflow models to solve multi-rotor systems has come at the expense of extra computation time requirements, especially for higher harmonic cases. A potential solution to counter the lengthy time requirements is the application of machine learning algorithms to fit to velocity distributions and predict future distributions. In this paper, we look at XGBoost as a potential application of machine learning to predict accurate velocity distributions across the rotor disk

    Do Artificial Reinforcement-Learning Agents Matter Morally?

    Full text link
    Artificial reinforcement learning (RL) is a widely used technique in artificial intelligence that provides a general method for training agents to perform a wide variety of behaviours. RL as used in computer science has striking parallels to reward and punishment learning in animal and human brains. I argue that present-day artificial RL agents have a very small but nonzero degree of ethical importance. This is particularly plausible for views according to which sentience comes in degrees based on the abilities and complexities of minds, but even binary views on consciousness should assign nonzero probability to RL programs having morally relevant experiences. While RL programs are not a top ethical priority today, they may become more significant in the coming decades as RL is increasingly applied to industry, robotics, video games, and other areas. I encourage scientists, philosophers, and citizens to begin a conversation about our ethical duties to reduce the harm that we inflict on powerless, voiceless RL agents.Comment: 37 page

    Probabilistic policy reuse for safe reinforcement learning

    Get PDF
    This work introducesPolicy Reuse for Safe Reinforcement Learning, an algorithm that combines ProbabilisticPolicy Reuse and teacher advice for safe exploration in dangerous and continuous state and action reinforce-ment learning problems in which the dynamic behavior is reasonably smooth and the space is Euclidean. Thealgorithm uses a continuously increasing monotonic risk function that allows for the identification of theprobability to end up in failure from a given state. Such a risk function is defined in terms of how far such astate is from the state space known by the learning agent. Probabilistic Policy Reuse is used to safely balancethe exploitation of actual learned knowledge, the exploration of new actions, and the request of teacher advicein parts of the state space considered dangerous. Specifically, thepi-reuse exploration strategy is used. Usingexperiments in the helicopter hover task and a business management problem, we show that thepi-reuseexploration strategy can be used to completely avoid the visit to undesirable situations while maintainingthe performance (in terms of the classical long-term accumulated reward) of the final policy achieved.This paper has been partially supported by the Spanish Ministerio de Economía y Competitividad TIN2015-65686-C5-1-R and the European Union’s Horizon 2020 Research and Innovation programme under Grant Agreement No. 730086 (ERGO). Javier García is partially supported by the Comunidad de Madrid (Spain) funds under the project 2016-T2/TIC-1712

    Contextualize Me -- The Case for Context in Reinforcement Learning

    Full text link
    While Reinforcement Learning ( RL) has made great strides towards solving increasingly complicated problems, many algorithms are still brittle to even slight environmental changes. Contextual Reinforcement Learning (cRL) provides a framework to model such changes in a principled manner, thereby enabling flexible, precise and interpretable task specification and generation. Our goal is to show how the framework of cRL contributes to improving zero-shot generalization in RL through meaningful benchmarks and structured reasoning about generalization tasks. We confirm the insight that optimal behavior in cRL requires context information, as in other related areas of partial observability. To empirically validate this in the cRL framework, we provide various context-extended versions of common RL environments. They are part of the first benchmark library, CARL, designed for generalization based on cRL extensions of popular benchmarks, which we propose as a testbed to further study general agents. We show that in the contextual setting, even simple RL environments become challenging - and that naive solutions are not enough to generalize across complex context spaces.Comment: arXiv admin note: substantial text overlap with arXiv:2110.0210

    Evolving developmental, recurrent and convolutional neural networks for deliberate motion planning in sparse reward tasks

    Get PDF
    Motion planning algorithms have seen a diverse set of approaches in a variety of disciplines. In the domain of artificial evolutionary systems, motion planning has been included in models to achieve sophisticated deliberate behaviours. These algorithms rely on fixed rules or little evolutionary influence which compels behaviours to conform within those specific policies, rather than allowing the model to establish its own specialised behaviour. In order to further these models, the constraints imposed by planning algorithms must be removed to grant greater evolutionary control over behaviours. That is the focus of this thesis. An examination of prevailing neuroevolution methods led to the use of two distinct approaches, NEAT and HyperNEAT. Both were used to gain an understanding of the components necessary to create neuroevolution planning. The findings accumulated in the formation of a novel convolutional neural network architecture with a recurrent convolution process. The architecture’s goal was to iteratively disperse local activations to greater regions of the feature space. Experimentation showed significantly improved robustness over contemporary neuroevolution techniques as well as an efficiency increase over a static rule set. Greater evolutionary responsibility is given to the model with multiple network combinations; all of which continually demonstrated the necessary behaviours. In comparison, these behaviours were shown to be difficult to achieve in a state-of-the-art deep convolutional network. Finally, the unique use of recurrent convolution is relocated to a larger convolutional architecture on an established benchmarking platform. Performance improvements are seen on a number of domains which illustrates that this recurrent mechanism can be exploited in alternative areas outside of planning. By presenting a viable neuroevolution method for motion planning a potential emerges for further systems to adopt and examine the capability of this work in prospective domains, as well as further avenues of experimentation in convolutional architectures
    corecore