Search CORE

3,692 research outputs found

Recommended from our members

Episodic learning

Author: Kibler Dennis F.
Porter Bruce W.
Publication venue: eScholarship, University of California
Publication date: 01/01/1983
Field of study

A system is described which learns to compose sequences of operators into episodes for problem solving. The system incrementally learns when and why operators are applied. Episodes are segmented so that they are generalizable and reusable. The idea of augmenting the instance language with higher level concepts is introduced. The technique of perturbation is described for discovering the essential features for a rule with minimal teacher guidance. The approach is applied to the domain of solving simultaneous linear equations

eScholarship - University of California

Certified Reinforcement Learning with Logic Guidance

Author: Abate Alessandro
Hasanbeig Mohammadhosein
Kroening Daniel
Publication venue
Publication date: 10/02/2020
Field of study

This paper proposes the first model-free Reinforcement Learning (RL) framework to synthesise policies for unknown, and continuous-state Markov Decision Processes (MDPs), such that a given linear temporal property is satisfied. We convert the given property into a Limit Deterministic Buchi Automaton (LDBA), namely a finite-state machine expressing the property. Exploiting the structure of the LDBA, we shape a synchronous reward function on-the-fly, so that an RL algorithm can synthesise a policy resulting in traces that probabilistically satisfy the linear temporal property. This probability (certificate) is also calculated in parallel with policy learning when the state space of the MDP is finite: as such, the RL algorithm produces a policy that is certified with respect to the property. Under the assumption of finite state space, theoretical guarantees are provided on the convergence of the RL algorithm to an optimal policy, maximising the above probability. We also show that our method produces ''best available'' control policies when the logical property cannot be satisfied. In the general case of a continuous state space, we propose a neural network architecture for RL and we empirically show that the algorithm finds satisfying policies, if there exist such policies. The performance of the proposed framework is evaluated via a set of numerical examples and benchmarks, where we observe an improvement of one order of magnitude in the number of iterations required for the policy synthesis, compared to existing approaches whenever available.Comment: This article draws from arXiv:1801.08099, arXiv:1809.0782

arXiv.org e-Print Archive

Value Propagation Networks

Author: Kohli Pushmeet
Lin Zeming
Nardelli Nantas
Synnaeve Gabriel
Torr Philip H. S.
Usunier Nicolas
Publication venue
Publication date: 01/01/2019
Field of study

We present Value Propagation (VProp), a set of parameter-efficient differentiable planning modules built on Value Iteration which can successfully be trained using reinforcement learning to solve unseen tasks, has the capability to generalize to larger map sizes, and can learn to navigate in dynamic environments. We show that the modules enable learning to plan when the environment also includes stochastic elements, providing a cost-efficient learning system to build low-level size-invariant planners for a variety of interactive navigation problems. We evaluate on static and dynamic configurations of MazeBase grid-worlds, with randomly generated environments of several different sizes, and on a StarCraft navigation scenario, with more complex dynamics, and pixels as input.Comment: Updated to match ICLR 2019 OpenReview's versio

arXiv.org e-Print Archive

Oxford University Research Archive