1,851 research outputs found
Certified Reinforcement Learning with Logic Guidance
This paper proposes the first model-free Reinforcement Learning (RL)
framework to synthesise policies for unknown, and continuous-state Markov
Decision Processes (MDPs), such that a given linear temporal property is
satisfied. We convert the given property into a Limit Deterministic Buchi
Automaton (LDBA), namely a finite-state machine expressing the property.
Exploiting the structure of the LDBA, we shape a synchronous reward function
on-the-fly, so that an RL algorithm can synthesise a policy resulting in traces
that probabilistically satisfy the linear temporal property. This probability
(certificate) is also calculated in parallel with policy learning when the
state space of the MDP is finite: as such, the RL algorithm produces a policy
that is certified with respect to the property. Under the assumption of finite
state space, theoretical guarantees are provided on the convergence of the RL
algorithm to an optimal policy, maximising the above probability. We also show
that our method produces ''best available'' control policies when the logical
property cannot be satisfied. In the general case of a continuous state space,
we propose a neural network architecture for RL and we empirically show that
the algorithm finds satisfying policies, if there exist such policies. The
performance of the proposed framework is evaluated via a set of numerical
examples and benchmarks, where we observe an improvement of one order of
magnitude in the number of iterations required for the policy synthesis,
compared to existing approaches whenever available.Comment: This article draws from arXiv:1801.08099, arXiv:1809.0782
Chance-Constrained Trajectory Optimization for Safe Exploration and Learning of Nonlinear Systems
Learning-based control algorithms require data collection with abundant
supervision for training. Safe exploration algorithms ensure the safety of this
data collection process even when only partial knowledge is available. We
present a new approach for optimal motion planning with safe exploration that
integrates chance-constrained stochastic optimal control with dynamics learning
and feedback control. We derive an iterative convex optimization algorithm that
solves an \underline{Info}rmation-cost \underline{S}tochastic
\underline{N}onlinear \underline{O}ptimal \underline{C}ontrol problem
(Info-SNOC). The optimization objective encodes both optimal performance and
exploration for learning, and the safety is incorporated as distributionally
robust chance constraints. The dynamics are predicted from a robust regression
model that is learned from data. The Info-SNOC algorithm is used to compute a
sub-optimal pool of safe motion plans that aid in exploration for learning
unknown residual dynamics under safety constraints. A stable feedback
controller is used to execute the motion plan and collect data for model
learning. We prove the safety of rollout from our exploration method and
reduction in uncertainty over epochs, thereby guaranteeing the consistency of
our learning method. We validate the effectiveness of Info-SNOC by designing
and implementing a pool of safe trajectories for a planar robot. We demonstrate
that our approach has higher success rate in ensuring safety when compared to a
deterministic trajectory optimization approach.Comment: Submitted to RA-L 2020, review-
Reinforcement Learning for Racecar Control
This thesis investigates the use of reinforcement learning to learn to drive a racecar in the simulated environment of the Robot Automobile Racing Simulator. Real-life race driving is known to be difficult for humans, and expert human drivers use complex sequences of actions. There are a large number of variables, some of which change stochastically and all of which may affect the outcome. This makes driving a promising domain for testing and developing Machine Learning techniques that have the potential to be robust enough to work in the real world. Therefore the principles of the algorithms from this work may be applicable to a range of problems.
The investigation starts by finding a suitable data structure to represent the information learnt. This is tested using supervised learning. Reinforcement learning is added and roughly tuned, and the supervised learning is then removed. A simple tabular representation is found satisfactory, and this avoids difficulties with more complex methods and allows the investigation to concentrate on the essentials of learning. Various reward sources are tested and a combination of three are found to produce the best performance. Exploration of the problem space is investigated. Results show exploration is essential but controlling how much is done is also important. It turns out the learning episodes need to be very long and because of this the task needs to be treated as continuous by using discounting to limit the size of the variables stored. Eligibility traces are used with success to make the learning more efficient. The tabular representation is made more compact by hashing and more accurate by using smaller buckets. This slows the learning but produces better driving. The improvement given by a rough form of generalisation indicates the replacement of the tabular method by a function approximator is warranted. These results show reinforcement learning can work within the Robot Automobile Racing Simulator, and lay the foundations for building a more efficient and competitive agent
Learning to Prevent Monocular SLAM Failure using Reinforcement Learning
Monocular SLAM refers to using a single camera to estimate robot ego motion
while building a map of the environment. While Monocular SLAM is a well studied
problem, automating Monocular SLAM by integrating it with trajectory planning
frameworks is particularly challenging. This paper presents a novel formulation
based on Reinforcement Learning (RL) that generates fail safe trajectories
wherein the SLAM generated outputs do not deviate largely from their true
values. Quintessentially, the RL framework successfully learns the otherwise
complex relation between perceptual inputs and motor actions and uses this
knowledge to generate trajectories that do not cause failure of SLAM. We show
systematically in simulations how the quality of the SLAM dramatically improves
when trajectories are computed using RL. Our method scales effectively across
Monocular SLAM frameworks in both simulation and in real world experiments with
a mobile robot.Comment: Accepted at the 11th Indian Conference on Computer Vision, Graphics
and Image Processing (ICVGIP) 2018 More info can be found at the project page
at https://robotics.iiit.ac.in/people/vignesh.prasad/SLAMSafePlanner.html and
the supplementary video can be found at
https://www.youtube.com/watch?v=420QmM_Z8v
Hierarchical relative entropy policy search
Many reinforcement learning (RL) tasks, especially in robotics, consist of multiple sub-tasks that
are strongly structured. Such task structures can be exploited by incorporating hierarchical policies
that consist of gating networks and sub-policies. However, this concept has only been partially explored
for real world settings and complete methods, derived from first principles, are needed. Real
world settings are challenging due to large and continuous state-action spaces that are prohibitive
for exhaustive sampling methods. We define the problem of learning sub-policies in continuous
state action spaces as finding a hierarchical policy that is composed of a high-level gating policy to
select the low-level sub-policies for execution by the agent. In order to efficiently share experience
with all sub-policies, also called inter-policy learning, we treat these sub-policies as latent variables
which allows for distribution of the update information between the sub-policies. We present three
different variants of our algorithm, designed to be suitable for a wide variety of real world robot
learning tasks and evaluate our algorithms in two real robot learning scenarios as well as several
simulations and comparisons
Intrinsic motivation and episodic memories for robot exploration of high-dimensional sensory spaces
Dieser Beitrag ist mit Zustimmung des Rechteinhabers aufgrund einer überregionalen Konsortiallizenz frei zugänglich.This work presents an architecture that generates curiosity-driven goal-directed exploration behaviours for an image sensor of a microfarming robot. A combination of deep neural networks for offline unsupervised learning of low-dimensional features from images and of online learning of shallow neural networks representing the inverse and forward kinematics of the system have been used. The artificial curiosity system assigns interest values to a set of pre-defined goals and drives the exploration towards those that are expected to maximise the learning progress. We propose the integration of an episodic memory in intrinsic motivation systems to face catastrophic forgetting issues, typically experienced when performing online updates of artificial neural networks. Our results show that adopting an episodic memory system not only prevents the computational models from quickly forgetting knowledge that has been previously acquired but also provides new avenues for modulating the balance between plasticity and stability of the models.H2020 Marie Skłodowska-Curie Actionshttps://doi.org/10.13039/100010665Horizon 2020 Framework Programmehttps://doi.org/10.13039/100010661Peer Reviewe
- …