2,019 research outputs found

    Certified Reinforcement Learning with Logic Guidance

    Full text link
    This paper proposes the first model-free Reinforcement Learning (RL) framework to synthesise policies for unknown, and continuous-state Markov Decision Processes (MDPs), such that a given linear temporal property is satisfied. We convert the given property into a Limit Deterministic Buchi Automaton (LDBA), namely a finite-state machine expressing the property. Exploiting the structure of the LDBA, we shape a synchronous reward function on-the-fly, so that an RL algorithm can synthesise a policy resulting in traces that probabilistically satisfy the linear temporal property. This probability (certificate) is also calculated in parallel with policy learning when the state space of the MDP is finite: as such, the RL algorithm produces a policy that is certified with respect to the property. Under the assumption of finite state space, theoretical guarantees are provided on the convergence of the RL algorithm to an optimal policy, maximising the above probability. We also show that our method produces ''best available'' control policies when the logical property cannot be satisfied. In the general case of a continuous state space, we propose a neural network architecture for RL and we empirically show that the algorithm finds satisfying policies, if there exist such policies. The performance of the proposed framework is evaluated via a set of numerical examples and benchmarks, where we observe an improvement of one order of magnitude in the number of iterations required for the policy synthesis, compared to existing approaches whenever available.Comment: This article draws from arXiv:1801.08099, arXiv:1809.0782

    Evolutionary Computation Applied to Urban Traffic Optimization

    Get PDF
    At the present time, many sings seem to indicate that we live a global energy and environmental crisis. The scientific community argues that the global warming process is, at least in some degree, a consequence of modern societies unsustainable development. A key area in that situation is the citizens mobility. World economies seem to require fast and efficient transportation infrastructures for a significant fraction of the population. The non-stopping overload process that traffic networks are suffering calls for new solutions. In the vast majority of cases it is not viable to extend that infrastructures due to costs, lack of available space, and environmental impacts. Thus, traffic departments all around the world are very interested in optimizing the existing infrastructures to obtain the very best service they can provide. In the last decade many initiatives have been developed to give the traffic network new management facilities for its better exploitation. They are grouped in the so called Intelligent Transportation Systems. Examples of these approaches are the Advanced Traveler Information Systems (ATIS) and Advanced Traffic Management Systems (ATMS). Most of them provide drivers or traffic engineers the current traffic real/simulated situation or traffic forecasts. They may even suggest actions to improve the traffic flow. To do so, researchers have done a lot of work improving traffic simulations, specially through the development of accurate microscopic simulators. In the last decades the application of that family of simulators was restricted to small test cases due to its high computing requirements. Currently, the availability of cheap faster computers has changed this situation. Some famous microsimulators are MITSIM(Yang, Q., 1997), INTEGRATION (Rakha, H., et al., 1998), AIMSUN2 (Barcelo, J., et al., 1996), TRANSIMS (Nagel, K. & Barrett, C., 1997), etc. They will be briefly explained in the following section. Although traffic research is mainly targeted at obtaining accurate simulations there are few groups focused at the optimization or improvement of traffic in an automatic manner â not dependent on traffic engineers experience and âartâ. O pe n A cc es s D at ab as e w w w .ite ch on lin e. co

    Stochastic arrays and learning networks

    Get PDF
    This thesis presents a study of stochastic arrays and learning networks. These arrays will be shown to consist of simple elements utilising probabilistic coding techniques which may interact with a random and noisy environment to produce useful results. Such networks have generated considerable interest since it is possible to design large parallel self-organising arrays of these elements which are trained by example rather than explicit instruction. Once the learning process has been completed, they then have the potential ability to form generalisations, perform global optimisation of traditionally difficult problems such as routing and incorporate an associative memory capability which can enable such tasks as image recognition and reconstruction to be performed, even when given a partial or noisy view of the target. Since the method of operation of such elements is thought to emulate the basic properties of the neurons of the brain, these arrays have been termed neural 'networks. The research demonstrates the use of stochastic elements for digital signal processing by presenting a novel systolic array, utilising a simple, replicated cell structure, which is shown to perform the operations of Cyclic Correlation and the Discrete Fourier Transform on inherently random and noisy probabilistic single bit inputs. This work is then extended into the field of stochastic learning automata and to neural networks by examining the Associative Reward-Punish (A(_R-P)) pattern recognising learning automaton. The thesis concludes that all the networks described may potentially be generalised to simple variations of one standard probabilistic element utilising stochastic coding, whose properties resemble those of biological neurons. A novel study is presented which describes how a powerful deterministic algorithm, previously considered to be biologically unviable due to its nature, may be represented in this way. It is expected that combinations of these methods may lead to a series of useful hybrid techniques for training networks. The nature of the element generalisation is particularly important as it reveals the potential for encoding successful algorithms in cheap, simple hardware with single bit interconnections. No claim is made that the particular algorithms described are those actually utilised by the brain, only to demonstrate that those properties observed of biological neurons are capable of endowing collective computational ability and that actual biological algorithms may perhaps then become apparent when viewed in this light
    • …
    corecore