22,942 research outputs found
Continual Reinforcement Learning in 3D Non-stationary Environments
High-dimensional always-changing environments constitute a hard challenge for
current reinforcement learning techniques. Artificial agents, nowadays, are
often trained off-line in very static and controlled conditions in simulation
such that training observations can be thought as sampled i.i.d. from the
entire observations space. However, in real world settings, the environment is
often non-stationary and subject to unpredictable, frequent changes. In this
paper we propose and openly release CRLMaze, a new benchmark for learning
continually through reinforcement in a complex 3D non-stationary task based on
ViZDoom and subject to several environmental changes. Then, we introduce an
end-to-end model-free continual reinforcement learning strategy showing
competitive results with respect to four different baselines and not requiring
any access to additional supervised signals, previously encountered
environmental conditions or observations.Comment: Accepted in the CLVision Workshop at CVPR2020: 13 pages, 4 figures, 5
table
Covert Perceptual Capability Development
In this paper, we propose a model to develop
robots’ covert perceptual capability using reinforcement learning. Covert perceptual behavior is treated as action selected by a motivational system. We apply this model to
vision-based navigation. The goal is to enable
a robot to learn road boundary type. Instead
of dealing with problems in controlled environments with a low-dimensional state space,
we test the model on images captured in non-stationary environments. Incremental Hierarchical Discriminant Regression is used to
generate states on the fly. Its coarse-to-fine
tree structure guarantees real-time retrieval
in high-dimensional state space. K Nearest-Neighbor strategy is adopted to further reduce training time complexity
Low-reynolds-number locomotion via reinforcement learning
This dissertation summarizes computational results from applying reinforcement learning and deep neural network to the designs of artificial microswimmers in the inertialess regime, where the viscous dissipation in the surrounding fluid environment dominates and the swimmer’s inertia is completely negligible. In particular, works in this dissertation consist of four interrelated studies of the design of microswimmers for different tasks: (1) a one-dimensional microswimmer in free-space that moves towards the target via translation, (2) a one-dimensional microswimmer in a periodic domain that rotates to reach the target, (3) a two-dimensional microswimmer that switches gaits to navigate to the designated targets in a plane, and (4) a two-dimensional microswimmer trained to navigate in a non-stationary environment.
The first and second studies focus on how reinforcement learning (specifically model-free, off-policy Q-learning) can be applied to generate one-dimensional translation (part 1) or net rotation (part 2) in low Reynolds number fluids. Through the interaction with the surrounding viscous fluid, the swimmer learns to break the time-reversal symmetry of Stokes flow in order to achieve the maximum displacement (reward) either in free-space or in a periodic domain.
In the third part of the dissertation, a deep reinforcement learning approach (proximal policy optimization) is utilized to train a two-dimensional swimmer to develop complex strategies such as run-and-tumble to navigate through environments and move towards specific targets. Proximal policy optimization contains actor-critic model, the critic estimates the value function, the actor updates the policy distribution in the direction suggested by the critic. Results show the artificial trained swimmer can develop effective policy (gaits) such as translation and rotation, and the swimmer can move to specific targets by combining these gaits in an intelligent way. The simulation results also show that without being explicitly programmed, the trained swimmer is able to perform target navigation even under flow perturbation.
Finally, in the last part of the dissertation, a generalized step-up reinforcement method with deep learning is developed for an environment that changes in time. In this work, the traditional reinforcement learning is combined with a high confidence context detection, allowing the swimmer to be trained to navigate amphibious non-stationary environments that consist of two distinct regions. Computational results show that the swimmer trained by this algorithm adapts to the environments faster, while developing more effective locomotory strategies in both environments, than traditional reinforcement learning approaches. Furthermore, the effective policies with traditional strategies are compared and analyzed. This work illustrates how deep reinforcement learning method can be conveniently adapted to a broader class of problems such as a microswimmer in a non-stationary environment. Results from this part highlight a powerful alternative to current traditional methods for applications in unpredictable, complex fluid environments and open a route towards future designs of “smart” microswimmers with trainable artificial intelligence
Neurosymbolic Meta-Reinforcement Lookahead Learning Achieves Safe Self-Driving in Non-Stationary Environments
In the area of learning-driven artificial intelligence advancement, the
integration of machine learning (ML) into self-driving (SD) technology stands
as an impressive engineering feat. Yet, in real-world applications outside the
confines of controlled laboratory scenarios, the deployment of self-driving
technology assumes a life-critical role, necessitating heightened attention
from researchers towards both safety and efficiency. To illustrate, when a
self-driving model encounters an unfamiliar environment in real-time execution,
the focus must not solely revolve around enhancing its anticipated performance;
equal consideration must be given to ensuring its execution or real-time
adaptation maintains a requisite level of safety. This study introduces an
algorithm for online meta-reinforcement learning, employing lookahead symbolic
constraints based on \emph{Neurosymbolic Meta-Reinforcement Lookahead Learning}
(NUMERLA). NUMERLA proposes a lookahead updating mechanism that harmonizes the
efficiency of online adaptations with the overarching goal of ensuring
long-term safety. Experimental results demonstrate NUMERLA confers the
self-driving agent with the capacity for real-time adaptability, leading to
safe and self-adaptive driving under non-stationary urban human-vehicle
interaction scenarios
- …