Search CORE

15,977 research outputs found

Budgeted Reinforcement Learning in Continuous State Space

Author: Carrara Nicolas
Laroche Romain
Leurent Edouard
Maillard Odalric-Ambrym
Pietquin Olivier
Urvoy Tanguy
Publication venue
Publication date: 27/05/2019
Field of study

A Budgeted Markov Decision Process (BMDP) is an extension of a Markov Decision Process to critical applications requiring safety constraints. It relies on a notion of risk implemented in the shape of a cost signal constrained to lie below an - adjustable - threshold. So far, BMDPs could only be solved in the case of finite state spaces with known dynamics. This work extends the state-of-the-art to continuous spaces environments and unknown dynamics. We show that the solution to a BMDP is a fixed point of a novel Budgeted Bellman Optimality operator. This observation allows us to introduce natural extensions of Deep Reinforcement Learning algorithms to address large-scale BMDPs. We validate our approach on two simulated applications: spoken dialogue and autonomous driving.Comment: N. Carrara and E. Leurent have equally contribute

arXiv.org e-Print Archive

INRIA a CCSD electronic archive server

HAL Descartes

Hal-Diderot

Certified Reinforcement Learning with Logic Guidance

Author: Abate Alessandro
Hasanbeig Mohammadhosein
Kroening Daniel
Publication venue
Publication date: 10/02/2020
Field of study

This paper proposes the first model-free Reinforcement Learning (RL) framework to synthesise policies for unknown, and continuous-state Markov Decision Processes (MDPs), such that a given linear temporal property is satisfied. We convert the given property into a Limit Deterministic Buchi Automaton (LDBA), namely a finite-state machine expressing the property. Exploiting the structure of the LDBA, we shape a synchronous reward function on-the-fly, so that an RL algorithm can synthesise a policy resulting in traces that probabilistically satisfy the linear temporal property. This probability (certificate) is also calculated in parallel with policy learning when the state space of the MDP is finite: as such, the RL algorithm produces a policy that is certified with respect to the property. Under the assumption of finite state space, theoretical guarantees are provided on the convergence of the RL algorithm to an optimal policy, maximising the above probability. We also show that our method produces ''best available'' control policies when the logical property cannot be satisfied. In the general case of a continuous state space, we propose a neural network architecture for RL and we empirically show that the algorithm finds satisfying policies, if there exist such policies. The performance of the proposed framework is evaluated via a set of numerical examples and benchmarks, where we observe an improvement of one order of magnitude in the number of iterations required for the policy synthesis, compared to existing approaches whenever available.Comment: This article draws from arXiv:1801.08099, arXiv:1809.0782

arXiv.org e-Print Archive

Reinforcement Learning with Frontier-Based Exploration via Autonomous Environment

Author: Leong Kenji
Publication venue
Publication date: 14/07/2023
Field of study

Active Simultaneous Localisation and Mapping (SLAM) is a critical problem in autonomous robotics, enabling robots to navigate to new regions while building an accurate model of their surroundings. Visual SLAM is a popular technique that uses virtual elements to enhance the experience. However, existing frontier-based exploration strategies can lead to a non-optimal path in scenarios where there are multiple frontiers with similar distance. This issue can impact the efficiency and accuracy of Visual SLAM, which is crucial for a wide range of robotic applications, such as search and rescue, exploration, and mapping. To address this issue, this research combines both an existing Visual-Graph SLAM known as ExploreORB with reinforcement learning. The proposed algorithm allows the robot to learn and optimize exploration routes through a reward-based system to create an accurate map of the environment with proper frontier selection. Frontier-based exploration is used to detect unexplored areas, while reinforcement learning optimizes the robot's movement by assigning rewards for optimal frontier points. Graph SLAM is then used to integrate the robot's sensory data and build an accurate map of the environment. The proposed algorithm aims to improve the efficiency and accuracy of ExploreORB by optimizing the exploration process of frontiers to build a more accurate map. To evaluate the effectiveness of the proposed approach, experiments will be conducted in various virtual environments using Gazebo, a robot simulation software. Results of these experiments will be compared with existing methods to demonstrate the potential of the proposed approach as an optimal solution for SLAM in autonomous robotics.Comment: 23 pages, Journa

arXiv.org e-Print Archive

Robotics deep reinforcement learning with loose prior knowledge

Author: Botteghi Nicolò
Publication venue: University of Twente
Publication date: 06/10/2021
Field of study

University of Twente Research Information

Deep learning based surrogate modeling and optimization for Microalgal biofuel production and photobioreactor design

Author: Dongda Zhang (3927530)
Ehecatl Antonio del Rio-Chanona (3927527)
Fabio Fiorelli (7128827)
Haider Ali (5574935)
Jonathan Wagner (5214482)
Klaus Hellgardt (1310934)
Publication venue
Publication date: 15/11/2018
Field of study

Identifying optimal photobioreactor configurations and process operating conditions is critical to industrialize microalgae-derived biorenewables. Traditionally, this was addressed by testing numerous design scenarios from integrated physical models coupling computational fluid dynamics and kinetic modelling. However, this approach presents computational intractability and numerical instabilities when simulating large-scale systems, causing time-intensive computing efforts and infeasibility in mathematical optimization. Therefore, we propose an innovative data-driven surrogate modelling framework which considerably reduces computing time from months to days by exploiting state-of-the-art deep learning technology. The framework built upon a few simulated results from the physical model to learn the sophisticated hydrodynamic and biochemical kinetic mechanisms; then adopts a hybrid stochastic optimization algorithm to explore untested processes and find optimal solutions. Through verification, this framework was demonstrated to have comparable accuracy to the physical model. Moreover, multi-objective optimization was incorporated to generate a Pareto-frontier for decision-making, advancing its applications in complex biosystems modelling and optimization

Loughborough University Institutional Repository

Spiral - Imperial College Digital Repository

Weakly Supervised Reinforcement Learning for Autonomous Highway Driving via Virtual Safety Cages

Author: Bowden Richard
Fallah Saber
Kuutti Sampo
Publication venue: 'MDPI AG'
Publication date: 11/03/2021
Field of study

The use of neural networks and reinforcement learning has become increasingly popular in autonomous vehicle control. However, the opaqueness of the resulting control policies presents a significant barrier to deploying neural network-based control in autonomous vehicles. In this paper, we present a reinforcement learning based approach to autonomous vehicle longitudinal control, where the rule-based safety cages provide enhanced safety for the vehicle as well as weak supervision to the reinforcement learning agent. By guiding the agent to meaningful states and actions, this weak supervision improves the convergence during training and enhances the safety of the final trained policy. This rule-based supervisory controller has the further advantage of being fully interpretable, thereby enabling traditional validation and verification approaches to ensure the safety of the vehicle. We compare models with and without safety cages, as well as models with optimal and constrained model parameters, and show that the weak supervision consistently improves the safety of exploration, speed of convergence, and model performance. Additionally, we show that when the model parameters are constrained or sub-optimal, the safety cages can enable a model to learn a safe driving policy even when the model could not be trained to drive through reinforcement learning alone.Comment: Published in Sensor

arXiv.org e-Print Archive

Multidisciplinary Digital Publishing Institute

University of Surrey

Generating and Detecting True Ambiguity: A Forgotten Danger in DNN Supervision Testing

Author: Gómez André García
Tonella Paolo
Weiss Michael
Publication venue
Publication date: 08/09/2023
Field of study

Deep Neural Networks (DNNs) are becoming a crucial component of modern software systems, but they are prone to fail under conditions that are different from the ones observed during training (out-of-distribution inputs) or on inputs that are truly ambiguous, i.e., inputs that admit multiple classes with nonzero probability in their labels. Recent work proposed DNN supervisors to detect high-uncertainty inputs before their possible misclassification leads to any harm. To test and compare the capabilities of DNN supervisors, researchers proposed test generation techniques, to focus the testing effort on high-uncertainty inputs that should be recognized as anomalous by supervisors. However, existing test generators aim to produce out-of-distribution inputs. No existing model- and supervisor independent technique targets the generation of truly ambiguous test inputs, i.e., inputs that admit multiple classes according to expert human judgment. In this paper, we propose a novel way to generate ambiguous inputs to test DNN supervisors and used it to empirically compare several existing supervisor techniques. In particular, we propose AmbiGuess to generate ambiguous samples for image classification problems. AmbiGuess is based on gradient-guided sampling in the latent space of a regularized adversarial autoencoder. Moreover, we conducted what is -- to the best of our knowledge -- the most extensive comparative study of DNN supervisors, considering their capabilities to detect 4 distinct types of high-uncertainty inputs, including truly ambiguous ones. We find that the tested supervisors' capabilities are complementary: Those best suited to detect true ambiguity perform worse on invalid, out-of-distribution and adversarial inputs and vice-versa.Comment: Accepted for publication at Springers "Empirical Software Engineering" (EMSE

arXiv.org e-Print Archive

Nonlinear brain dynamics and many-body field dynamics

Author: Freeman Walter J.
Vitiello Giuseppe
Publication venue
Publication date: 01/01/2005
Field of study

We report measurements of the brain activity of subjects engaged in behavioral exchanges with their environments. We observe brain states which are characterized by coordinated oscillation of populations of neurons that are changing rapidly with the evolution of the meaningful relationship between the subject and its environment, established and maintained by active perception. Sequential spatial patterns of neural activity with high information content found in sensory cortices of trained animals between onsets of conditioned stimuli and conditioned responses resemble cinematographic frames. They are not readily amenable to description either with classical integrodifferential equations or with the matrix algebras of neural networks. Their modeling is provided by field theory from condensed matter physics.Comment: 8 pages, Invited talk presented at Fr\"ohlich Centenary International Symposium "Coherence and Electromagnetic Fields in Biological Systems", July 1-4, 2005, Prague, Czech Republi

arXiv.org e-Print Archive

Archivio della Ricerca - Università di Salerno