1,270 research outputs found

    Towards Thompson Sampling for Complex Bayesian Reasoning

    Get PDF
    Paper III, IV, and VI are not available as a part of the dissertation due to the copyright.Thompson Sampling (TS) is a state-of-art algorithm for bandit problems set in a Bayesian framework. Both the theoretical foundation and the empirical efficiency of TS is wellexplored for plain bandit problems. However, the Bayesian underpinning of TS means that TS could potentially be applied to other, more complex, problems as well, beyond the bandit problem, if suitable Bayesian structures can be found. The objective of this thesis is the development and analysis of TS-based schemes for more complex optimization problems, founded on Bayesian reasoning. We address several complex optimization problems where the previous state-of-art relies on a relatively myopic perspective on the problem. These includes stochastic searching on the line, the Goore game, the knapsack problem, travel time estimation, and equipartitioning. Instead of employing Bayesian reasoning to obtain a solution, they rely on carefully engineered rules. In all brevity, we recast each of these optimization problems in a Bayesian framework, introducing dedicated TS based solution schemes. For all of the addressed problems, the results show that besides being more effective, the TS based approaches we introduce are also capable of solving more adverse versions of the problems, such as dealing with stochastic liars.publishedVersio

    A Risk-Averse Preview-based QQ-Learning Algorithm: Application to Highway Driving of Autonomous Vehicles

    Full text link
    A risk-averse preview-based QQ-learning planner is presented for navigation of autonomous vehicles. To this end, the multi-lane road ahead of a vehicle is represented by a finite-state non-stationary Markov decision process (MDP). A risk assessment unit module is then presented that leverages the preview information provided by sensors along with a stochastic reachability module to assign reward values to the MDP states and update them as scenarios develop. A sampling-based risk-averse preview-based QQ-learning algorithm is finally developed that generates samples using the preview information and reward function to learn risk-averse optimal planning strategies without actual interaction with the environment. The risk factor is imposed on the objective function to avoid fluctuation of the QQ values, which can jeopardize the vehicle's safety and/or performance. The overall hybrid automaton model of the system is leveraged to develop a feasibility check unit module that detects unfeasible plans and enables the planner system to proactively react to the changes of the environment. Theoretical results are provided to bound the number of samples required to guarantee ϵ\epsilon-optimal planning with a high probability. Finally, to verify the efficiency of the presented algorithm, its implementation on highway driving of an autonomous vehicle in a varying traffic density is considered

    Stochastic arrays and learning networks

    Get PDF
    This thesis presents a study of stochastic arrays and learning networks. These arrays will be shown to consist of simple elements utilising probabilistic coding techniques which may interact with a random and noisy environment to produce useful results. Such networks have generated considerable interest since it is possible to design large parallel self-organising arrays of these elements which are trained by example rather than explicit instruction. Once the learning process has been completed, they then have the potential ability to form generalisations, perform global optimisation of traditionally difficult problems such as routing and incorporate an associative memory capability which can enable such tasks as image recognition and reconstruction to be performed, even when given a partial or noisy view of the target. Since the method of operation of such elements is thought to emulate the basic properties of the neurons of the brain, these arrays have been termed neural 'networks. The research demonstrates the use of stochastic elements for digital signal processing by presenting a novel systolic array, utilising a simple, replicated cell structure, which is shown to perform the operations of Cyclic Correlation and the Discrete Fourier Transform on inherently random and noisy probabilistic single bit inputs. This work is then extended into the field of stochastic learning automata and to neural networks by examining the Associative Reward-Punish (A(_R-P)) pattern recognising learning automaton. The thesis concludes that all the networks described may potentially be generalised to simple variations of one standard probabilistic element utilising stochastic coding, whose properties resemble those of biological neurons. A novel study is presented which describes how a powerful deterministic algorithm, previously considered to be biologically unviable due to its nature, may be represented in this way. It is expected that combinations of these methods may lead to a series of useful hybrid techniques for training networks. The nature of the element generalisation is particularly important as it reveals the potential for encoding successful algorithms in cheap, simple hardware with single bit interconnections. No claim is made that the particular algorithms described are those actually utilised by the brain, only to demonstrate that those properties observed of biological neurons are capable of endowing collective computational ability and that actual biological algorithms may perhaps then become apparent when viewed in this light

    Reinforcement Learning using Gaussian Processes for Discretely Controlled Continuous Processes

    Get PDF
    In many application domains such as autonomous avionics, power electronics and process systems engineering there exist discretely controlled continuous processes (DCCPs) which constitute a special subclass of hybrid dynamical systems. We introduce a novel simulation-based approach for DDCPs optimization under uncertainty using Rein-forcement Learning with Gaussian Process models to learn the transitions dynamics descriptive of mode execution and an optimal switching policy for mode selection. Each mode implements a parameterized feedback control law until a stopping condition trig-gers. To deal with the size/dimension of the state space and a continuum of control mode parameters, Bayesian active learning is proposed using a utility function that trades off information content with policy improvement. Throughput maximization in a buffer tank subject to an uncertain schedule of sev-eral inflow discharges is used as case study address-ing supply chain control in manufacturing systemsFil: de Paula, Mariano. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológio - CONICET - Santa Fe. Instituto de Desarrollo y Diseño (i); ArgentinaFil: Martinez, Ernesto Carlos. Consejo Nacional de Invest.cientif.y Tecnicas. Centro Cientifico Tecnol.conicet - Santa Fe. Instituto de Desarrollo y Dise?o (i)
    corecore