Search CORE

22,942 research outputs found

Continual Reinforcement Learning in 3D Non-stationary Environments

Author: Culurciello Eugenio
Desai Karan
Lomonaco Vincenzo
Maltoni Davide
Publication venue
Publication date: 21/04/2020
Field of study

High-dimensional always-changing environments constitute a hard challenge for current reinforcement learning techniques. Artificial agents, nowadays, are often trained off-line in very static and controlled conditions in simulation such that training observations can be thought as sampled i.i.d. from the entire observations space. However, in real world settings, the environment is often non-stationary and subject to unpredictable, frequent changes. In this paper we propose and openly release CRLMaze, a new benchmark for learning continually through reinforcement in a complex 3D non-stationary task based on ViZDoom and subject to several environmental changes. Then, we introduce an end-to-end model-free continual reinforcement learning strategy showing competitive results with respect to four different baselines and not requiring any access to additional supervised signals, previously encountered environmental conditions or observations.Comment: Accepted in the CLVision Workshop at CVPR2020: 13 pages, 4 figures, 5 table

arXiv.org e-Print Archive

Crossref

Archivio istituzionale della ricerca - Alma Mater Studiorum Università di Bologna

Covert Perceptual Capability Development

Author: Huang Xiao
Weng Juyang
Publication venue: Lund University Cognitive Studies
Publication date: 01/01/2005
Field of study

In this paper, we propose a model to develop robots’ covert perceptual capability using reinforcement learning. Covert perceptual behavior is treated as action selected by a motivational system. We apply this model to vision-based navigation. The goal is to enable a robot to learn road boundary type. Instead of dealing with problems in controlled environments with a low-dimensional state space, we test the model on images captured in non-stationary environments. Incremental Hierarchical Discriminant Regression is used to generate states on the fly. Its coarse-to-fine tree structure guarantees real-time retrieval in high-dimensional state space. K Nearest-Neighbor strategy is adopted to further reduce training time complexity

CogPrints Cognitive Sciences Eprint Archive

Low-reynolds-number locomotion via reinforcement learning

Author: Liu Yuexin
Publication venue: Digital Commons @ NJIT
Publication date: 31/08/2022
Field of study

This dissertation summarizes computational results from applying reinforcement learning and deep neural network to the designs of artificial microswimmers in the inertialess regime, where the viscous dissipation in the surrounding fluid environment dominates and the swimmer’s inertia is completely negligible. In particular, works in this dissertation consist of four interrelated studies of the design of microswimmers for different tasks: (1) a one-dimensional microswimmer in free-space that moves towards the target via translation, (2) a one-dimensional microswimmer in a periodic domain that rotates to reach the target, (3) a two-dimensional microswimmer that switches gaits to navigate to the designated targets in a plane, and (4) a two-dimensional microswimmer trained to navigate in a non-stationary environment. The first and second studies focus on how reinforcement learning (specifically model-free, off-policy Q-learning) can be applied to generate one-dimensional translation (part 1) or net rotation (part 2) in low Reynolds number fluids. Through the interaction with the surrounding viscous fluid, the swimmer learns to break the time-reversal symmetry of Stokes flow in order to achieve the maximum displacement (reward) either in free-space or in a periodic domain. In the third part of the dissertation, a deep reinforcement learning approach (proximal policy optimization) is utilized to train a two-dimensional swimmer to develop complex strategies such as run-and-tumble to navigate through environments and move towards specific targets. Proximal policy optimization contains actor-critic model, the critic estimates the value function, the actor updates the policy distribution in the direction suggested by the critic. Results show the artificial trained swimmer can develop effective policy (gaits) such as translation and rotation, and the swimmer can move to specific targets by combining these gaits in an intelligent way. The simulation results also show that without being explicitly programmed, the trained swimmer is able to perform target navigation even under flow perturbation. Finally, in the last part of the dissertation, a generalized step-up reinforcement method with deep learning is developed for an environment that changes in time. In this work, the traditional reinforcement learning is combined with a high confidence context detection, allowing the swimmer to be trained to navigate amphibious non-stationary environments that consist of two distinct regions. Computational results show that the swimmer trained by this algorithm adapts to the environments faster, while developing more effective locomotory strategies in both environments, than traditional reinforcement learning approaches. Furthermore, the effective policies with traditional strategies are compared and analyzed. This work illustrates how deep reinforcement learning method can be conveniently adapted to a broader class of problems such as a microswimmer in a non-stationary environment. Results from this part highlight a powerful alternative to current traditional methods for applications in unpredictable, complex fluid environments and open a route towards future designs of “smart” microswimmers with trainable artificial intelligence

Digital Commons @ New Jersey Institute of Technology (NJIT)

Neurosymbolic Meta-Reinforcement Lookahead Learning Achieves Safe Self-Driving in Non-Stationary Environments

Author: Lei Haozhe
Zhu Quanyan
Publication venue
Publication date: 05/09/2023
Field of study

In the area of learning-driven artificial intelligence advancement, the integration of machine learning (ML) into self-driving (SD) technology stands as an impressive engineering feat. Yet, in real-world applications outside the confines of controlled laboratory scenarios, the deployment of self-driving technology assumes a life-critical role, necessitating heightened attention from researchers towards both safety and efficiency. To illustrate, when a self-driving model encounters an unfamiliar environment in real-time execution, the focus must not solely revolve around enhancing its anticipated performance; equal consideration must be given to ensuring its execution or real-time adaptation maintains a requisite level of safety. This study introduces an algorithm for online meta-reinforcement learning, employing lookahead symbolic constraints based on \emph{Neurosymbolic Meta-Reinforcement Lookahead Learning} (NUMERLA). NUMERLA proposes a lookahead updating mechanism that harmonizes the efficiency of online adaptations with the overarching goal of ensuring long-term safety. Experimental results demonstrate NUMERLA confers the self-driving agent with the capacity for real-time adaptability, leading to safe and self-adaptive driving under non-stationary urban human-vehicle interaction scenarios

arXiv.org e-Print Archive