15,977 research outputs found
Budgeted Reinforcement Learning in Continuous State Space
A Budgeted Markov Decision Process (BMDP) is an extension of a Markov
Decision Process to critical applications requiring safety constraints. It
relies on a notion of risk implemented in the shape of a cost signal
constrained to lie below an - adjustable - threshold. So far, BMDPs could only
be solved in the case of finite state spaces with known dynamics. This work
extends the state-of-the-art to continuous spaces environments and unknown
dynamics. We show that the solution to a BMDP is a fixed point of a novel
Budgeted Bellman Optimality operator. This observation allows us to introduce
natural extensions of Deep Reinforcement Learning algorithms to address
large-scale BMDPs. We validate our approach on two simulated applications:
spoken dialogue and autonomous driving.Comment: N. Carrara and E. Leurent have equally contribute
Certified Reinforcement Learning with Logic Guidance
This paper proposes the first model-free Reinforcement Learning (RL)
framework to synthesise policies for unknown, and continuous-state Markov
Decision Processes (MDPs), such that a given linear temporal property is
satisfied. We convert the given property into a Limit Deterministic Buchi
Automaton (LDBA), namely a finite-state machine expressing the property.
Exploiting the structure of the LDBA, we shape a synchronous reward function
on-the-fly, so that an RL algorithm can synthesise a policy resulting in traces
that probabilistically satisfy the linear temporal property. This probability
(certificate) is also calculated in parallel with policy learning when the
state space of the MDP is finite: as such, the RL algorithm produces a policy
that is certified with respect to the property. Under the assumption of finite
state space, theoretical guarantees are provided on the convergence of the RL
algorithm to an optimal policy, maximising the above probability. We also show
that our method produces ''best available'' control policies when the logical
property cannot be satisfied. In the general case of a continuous state space,
we propose a neural network architecture for RL and we empirically show that
the algorithm finds satisfying policies, if there exist such policies. The
performance of the proposed framework is evaluated via a set of numerical
examples and benchmarks, where we observe an improvement of one order of
magnitude in the number of iterations required for the policy synthesis,
compared to existing approaches whenever available.Comment: This article draws from arXiv:1801.08099, arXiv:1809.0782
Reinforcement Learning with Frontier-Based Exploration via Autonomous Environment
Active Simultaneous Localisation and Mapping (SLAM) is a critical problem in
autonomous robotics, enabling robots to navigate to new regions while building
an accurate model of their surroundings. Visual SLAM is a popular technique
that uses virtual elements to enhance the experience. However, existing
frontier-based exploration strategies can lead to a non-optimal path in
scenarios where there are multiple frontiers with similar distance. This issue
can impact the efficiency and accuracy of Visual SLAM, which is crucial for a
wide range of robotic applications, such as search and rescue, exploration, and
mapping. To address this issue, this research combines both an existing
Visual-Graph SLAM known as ExploreORB with reinforcement learning. The proposed
algorithm allows the robot to learn and optimize exploration routes through a
reward-based system to create an accurate map of the environment with proper
frontier selection. Frontier-based exploration is used to detect unexplored
areas, while reinforcement learning optimizes the robot's movement by assigning
rewards for optimal frontier points. Graph SLAM is then used to integrate the
robot's sensory data and build an accurate map of the environment. The proposed
algorithm aims to improve the efficiency and accuracy of ExploreORB by
optimizing the exploration process of frontiers to build a more accurate map.
To evaluate the effectiveness of the proposed approach, experiments will be
conducted in various virtual environments using Gazebo, a robot simulation
software. Results of these experiments will be compared with existing methods
to demonstrate the potential of the proposed approach as an optimal solution
for SLAM in autonomous robotics.Comment: 23 pages, Journa
Deep learning based surrogate modeling and optimization for Microalgal biofuel production and photobioreactor design
Identifying optimal photobioreactor configurations and process operating conditions is
critical to industrialize microalgae-derived biorenewables. Traditionally, this was addressed
by testing numerous design scenarios from integrated physical models coupling
computational fluid dynamics and kinetic modelling. However, this approach presents
computational intractability and numerical instabilities when simulating large-scale systems,
causing time-intensive computing efforts and infeasibility in mathematical optimization.
Therefore, we propose an innovative data-driven surrogate modelling framework which
considerably reduces computing time from months to days by exploiting state-of-the-art deep
learning technology. The framework built upon a few simulated results from the physical
model to learn the sophisticated hydrodynamic and biochemical kinetic mechanisms; then
adopts a hybrid stochastic optimization algorithm to explore untested processes and find
optimal solutions. Through verification, this framework was demonstrated to have
comparable accuracy to the physical model. Moreover, multi-objective optimization was
incorporated to generate a Pareto-frontier for decision-making, advancing its applications in
complex biosystems modelling and optimization
Weakly Supervised Reinforcement Learning for Autonomous Highway Driving via Virtual Safety Cages
The use of neural networks and reinforcement learning has become increasingly
popular in autonomous vehicle control. However, the opaqueness of the resulting
control policies presents a significant barrier to deploying neural
network-based control in autonomous vehicles. In this paper, we present a
reinforcement learning based approach to autonomous vehicle longitudinal
control, where the rule-based safety cages provide enhanced safety for the
vehicle as well as weak supervision to the reinforcement learning agent. By
guiding the agent to meaningful states and actions, this weak supervision
improves the convergence during training and enhances the safety of the final
trained policy. This rule-based supervisory controller has the further
advantage of being fully interpretable, thereby enabling traditional validation
and verification approaches to ensure the safety of the vehicle. We compare
models with and without safety cages, as well as models with optimal and
constrained model parameters, and show that the weak supervision consistently
improves the safety of exploration, speed of convergence, and model
performance. Additionally, we show that when the model parameters are
constrained or sub-optimal, the safety cages can enable a model to learn a safe
driving policy even when the model could not be trained to drive through
reinforcement learning alone.Comment: Published in Sensor
Generating and Detecting True Ambiguity: A Forgotten Danger in DNN Supervision Testing
Deep Neural Networks (DNNs) are becoming a crucial component of modern
software systems, but they are prone to fail under conditions that are
different from the ones observed during training (out-of-distribution inputs)
or on inputs that are truly ambiguous, i.e., inputs that admit multiple classes
with nonzero probability in their labels. Recent work proposed DNN supervisors
to detect high-uncertainty inputs before their possible misclassification leads
to any harm. To test and compare the capabilities of DNN supervisors,
researchers proposed test generation techniques, to focus the testing effort on
high-uncertainty inputs that should be recognized as anomalous by supervisors.
However, existing test generators aim to produce out-of-distribution inputs. No
existing model- and supervisor independent technique targets the generation of
truly ambiguous test inputs, i.e., inputs that admit multiple classes according
to expert human judgment.
In this paper, we propose a novel way to generate ambiguous inputs to test
DNN supervisors and used it to empirically compare several existing supervisor
techniques. In particular, we propose AmbiGuess to generate ambiguous samples
for image classification problems. AmbiGuess is based on gradient-guided
sampling in the latent space of a regularized adversarial autoencoder.
Moreover, we conducted what is -- to the best of our knowledge -- the most
extensive comparative study of DNN supervisors, considering their capabilities
to detect 4 distinct types of high-uncertainty inputs, including truly
ambiguous ones. We find that the tested supervisors' capabilities are
complementary: Those best suited to detect true ambiguity perform worse on
invalid, out-of-distribution and adversarial inputs and vice-versa.Comment: Accepted for publication at Springers "Empirical Software
Engineering" (EMSE
Nonlinear brain dynamics and many-body field dynamics
We report measurements of the brain activity of subjects engaged in
behavioral exchanges with their environments. We observe brain states which are
characterized by coordinated oscillation of populations of neurons that are
changing rapidly with the evolution of the meaningful relationship between the
subject and its environment, established and maintained by active perception.
Sequential spatial patterns of neural activity with high information content
found in sensory cortices of trained animals between onsets of conditioned
stimuli and conditioned responses resemble cinematographic frames. They are not
readily amenable to description either with classical integrodifferential
equations or with the matrix algebras of neural networks. Their modeling is
provided by field theory from condensed matter physics.Comment: 8 pages, Invited talk presented at Fr\"ohlich Centenary International
Symposium "Coherence and Electromagnetic Fields in Biological Systems", July
1-4, 2005, Prague, Czech Republi
- …