Search CORE

2,019 research outputs found

Certified Reinforcement Learning with Logic Guidance

Author: Abate Alessandro
Hasanbeig Mohammadhosein
Kroening Daniel
Publication venue
Publication date: 10/02/2020
Field of study

This paper proposes the first model-free Reinforcement Learning (RL) framework to synthesise policies for unknown, and continuous-state Markov Decision Processes (MDPs), such that a given linear temporal property is satisfied. We convert the given property into a Limit Deterministic Buchi Automaton (LDBA), namely a finite-state machine expressing the property. Exploiting the structure of the LDBA, we shape a synchronous reward function on-the-fly, so that an RL algorithm can synthesise a policy resulting in traces that probabilistically satisfy the linear temporal property. This probability (certificate) is also calculated in parallel with policy learning when the state space of the MDP is finite: as such, the RL algorithm produces a policy that is certified with respect to the property. Under the assumption of finite state space, theoretical guarantees are provided on the convergence of the RL algorithm to an optimal policy, maximising the above probability. We also show that our method produces ''best available'' control policies when the logical property cannot be satisfied. In the general case of a continuous state space, we propose a neural network architecture for RL and we empirically show that the algorithm finds satisfying policies, if there exist such policies. The performance of the proposed framework is evaluated via a set of numerical examples and benchmarks, where we observe an improvement of one order of magnitude in the number of iterations required for the policy synthesis, compared to existing approaches whenever available.Comment: This article draws from arXiv:1801.08099, arXiv:1809.0782

arXiv.org e-Print Archive

Recommended from our members

Considerations in designing a cybernetic simple 'learning' model; and an overview of the problem of modelling learning

Author: Der-Kureghian Emin
Publication venue: Brunel University Theses
Publication date: 01/01/1988
Field of study

This thesis was submitted for the degree of Doctor of Philosophy and awarded by Brunel University.Learning is viewed as a central feature of living systems and must be manifested in any artifact that claims to exhibit general intelligence. The central aims of the thesis are twofold: (1) - To review and critically assess the empirical and theoretical aspects of learning as have been addressed in a multitude of disciplines, with the aim of extracting fundamental features and elements. (2) - To develop a more systematic approach to the cybernetic modelling of learning than has been achieved hitherto. In pursuit of aim (1) above the following discussions are included: Historical and Philosophical backgrounds; Natural learning, both physiological and psychological aspects; Hierarchies of learning identified in the evolutionary, functional and developmental senses; An extensive section on the general problem of modelling of learning and the formal tools, is included as a link between aims (1) and (2). Following this a systematic and historically oriented study of cybernetic and other related approaches to the problem of modelling of learning is presented. This then leads to the development of a state-of-the-art general purpose experimental cybernetic learning model. The programming and use of this model is also fully described, including an elaborate scheme for the manifestation of simple learning

Brunel University Research Archive

Evolutionary Computation Applied to Urban Traffic Optimization

Author: Enrique Rubio Royo
Javier J. S&#225
Manuel J. Gal&#225
Publication venue: 'IntechOpen'
Publication date: 31/10/2008
Field of study

At the present time, many sings seem to indicate that we live a global energy and environmental crisis. The scientific community argues that the global warming process is, at least in some degree, a consequence of modern societies unsustainable development. A key area in that situation is the citizens mobility. World economies seem to require fast and efficient transportation infrastructures for a significant fraction of the population. The non-stopping overload process that traffic networks are suffering calls for new solutions. In the vast majority of cases it is not viable to extend that infrastructures due to costs, lack of available space, and environmental impacts. Thus, traffic departments all around the world are very interested in optimizing the existing infrastructures to obtain the very best service they can provide. In the last decade many initiatives have been developed to give the traffic network new management facilities for its better exploitation. They are grouped in the so called Intelligent Transportation Systems. Examples of these approaches are the Advanced Traveler Information Systems (ATIS) and Advanced Traffic Management Systems (ATMS). Most of them provide drivers or traffic engineers the current traffic real/simulated situation or traffic forecasts. They may even suggest actions to improve the traffic flow. To do so, researchers have done a lot of work improving traffic simulations, specially through the development of accurate microscopic simulators. In the last decades the application of that family of simulators was restricted to small test cases due to its high computing requirements. Currently, the availability of cheap faster computers has changed this situation. Some famous microsimulators are MITSIM(Yang, Q., 1997), INTEGRATION (Rakha, H., et al., 1998), AIMSUN2 (Barcelo, J., et al., 1996), TRANSIMS (Nagel, K. & Barrett, C., 1997), etc. They will be briefly explained in the following section. Although traffic research is mainly targeted at obtaining accurate simulations there are few groups focused at the optimization or improvement of traffic in an automatic manner â not dependent on traffic engineers experience and âartâ. O pe n A cc es s D at ab as e w w w .ite ch on lin e. co

IntechOpen

Crossref

Scipedia

Stochastic arrays and learning networks

Author: Leaver Richard A.
Publication venue
Publication date: 01/01/1988
Field of study

This thesis presents a study of stochastic arrays and learning networks. These arrays will be shown to consist of simple elements utilising probabilistic coding techniques which may interact with a random and noisy environment to produce useful results. Such networks have generated considerable interest since it is possible to design large parallel self-organising arrays of these elements which are trained by example rather than explicit instruction. Once the learning process has been completed, they then have the potential ability to form generalisations, perform global optimisation of traditionally difficult problems such as routing and incorporate an associative memory capability which can enable such tasks as image recognition and reconstruction to be performed, even when given a partial or noisy view of the target. Since the method of operation of such elements is thought to emulate the basic properties of the neurons of the brain, these arrays have been termed neural 'networks. The research demonstrates the use of stochastic elements for digital signal processing by presenting a novel systolic array, utilising a simple, replicated cell structure, which is shown to perform the operations of Cyclic Correlation and the Discrete Fourier Transform on inherently random and noisy probabilistic single bit inputs. This work is then extended into the field of stochastic learning automata and to neural networks by examining the Associative Reward-Punish (A(_R-P)) pattern recognising learning automaton. The thesis concludes that all the networks described may potentially be generalised to simple variations of one standard probabilistic element utilising stochastic coding, whose properties resemble those of biological neurons. A novel study is presented which describes how a powerful deterministic algorithm, previously considered to be biologically unviable due to its nature, may be represented in this way. It is expected that combinations of these methods may lead to a series of useful hybrid techniques for training networks. The nature of the element generalisation is particularly important as it reveals the potential for encoding successful algorithms in cheap, simple hardware with single bit interconnections. No claim is made that the particular algorithms described are those actually utilised by the brain, only to demonstrate that those properties observed of biological neurons are capable of endowing collective computational ability and that actual biological algorithms may perhaps then become apparent when viewed in this light

Durham e-Theses