1,145 research outputs found

    Towards Thompson Sampling for Complex Bayesian Reasoning

    Get PDF
    Paper III, IV, and VI are not available as a part of the dissertation due to the copyright.Thompson Sampling (TS) is a state-of-art algorithm for bandit problems set in a Bayesian framework. Both the theoretical foundation and the empirical efficiency of TS is wellexplored for plain bandit problems. However, the Bayesian underpinning of TS means that TS could potentially be applied to other, more complex, problems as well, beyond the bandit problem, if suitable Bayesian structures can be found. The objective of this thesis is the development and analysis of TS-based schemes for more complex optimization problems, founded on Bayesian reasoning. We address several complex optimization problems where the previous state-of-art relies on a relatively myopic perspective on the problem. These includes stochastic searching on the line, the Goore game, the knapsack problem, travel time estimation, and equipartitioning. Instead of employing Bayesian reasoning to obtain a solution, they rely on carefully engineered rules. In all brevity, we recast each of these optimization problems in a Bayesian framework, introducing dedicated TS based solution schemes. For all of the addressed problems, the results show that besides being more effective, the TS based approaches we introduce are also capable of solving more adverse versions of the problems, such as dealing with stochastic liars.publishedVersio

    Certified Reinforcement Learning with Logic Guidance

    Full text link
    This paper proposes the first model-free Reinforcement Learning (RL) framework to synthesise policies for unknown, and continuous-state Markov Decision Processes (MDPs), such that a given linear temporal property is satisfied. We convert the given property into a Limit Deterministic Buchi Automaton (LDBA), namely a finite-state machine expressing the property. Exploiting the structure of the LDBA, we shape a synchronous reward function on-the-fly, so that an RL algorithm can synthesise a policy resulting in traces that probabilistically satisfy the linear temporal property. This probability (certificate) is also calculated in parallel with policy learning when the state space of the MDP is finite: as such, the RL algorithm produces a policy that is certified with respect to the property. Under the assumption of finite state space, theoretical guarantees are provided on the convergence of the RL algorithm to an optimal policy, maximising the above probability. We also show that our method produces ''best available'' control policies when the logical property cannot be satisfied. In the general case of a continuous state space, we propose a neural network architecture for RL and we empirically show that the algorithm finds satisfying policies, if there exist such policies. The performance of the proposed framework is evaluated via a set of numerical examples and benchmarks, where we observe an improvement of one order of magnitude in the number of iterations required for the policy synthesis, compared to existing approaches whenever available.Comment: This article draws from arXiv:1801.08099, arXiv:1809.0782

    On Optimal Allocation of Indivisibles Under Uncertainty

    Get PDF
    The optimal use of indivisible resources is often the central issue in the economy and management. One of the main difficulties is the discontinuous nature of the resulting resource allocation problems which may lead to the failure of competitive market allocation mechanisms (unless we agree to "divide" the indivisibles in some indirect way). The problem becomes even more acute when uncertainty of the outcomes of decisions is present. In this paper we formalize the problem as a stochastic optimization problem involving discrete decision variables and uncertainties. By using some concrete examples, we illustrate how some problems of "dividing indivisibles" under uncertainty can be formalized in such terms. Next, we develop a general methodology to solve such problems based on the concept of the branch and bound method. The main idea of the approach is to process large collections of possible solutions and to devote more attention to the most promising groups. By gathering more information to reduce the uncertainty and by specializing the solution the optimal decision can be found

    Time and Cost Optimization of Cyber-Physical Systems by Distributed Reachability Analysis

    Get PDF

    Intelligent Simulation Modeling of a Flexible Manufacturing System with Automated Guided Vehicles

    Get PDF
    Although simulation is a very flexible and cost effective problem solving technique, it has been traditionally limited to building models which are merely descriptive of the system under study. Relatively new approaches combine improvement heuristics and artificial intelligence with simulation to provide prescriptive power in simulation modeling. This study demonstrates the synergy obtained by bringing together the "learning automata theory" and simulation analysis. Intelligent objects are embedded in the simulation model of a Flexible Manufacturing System (FMS), in which Automated Guided Vehicles (AGVs) serve as the material handling system between four unique workcenters. The objective of the study is to find satisfactory AGV routing patterns along available paths to minimize the mean time spent by different kinds of parts in the system. System parameters such as different part routing and processing time requirements, arrivals distribution, number of palettes, available paths between workcenters, number and speed of AGVs can be defined by the user. The network of learning automata acts as the decision maker driving the simulation, and the FMS model acts as the training environment for the automata network; providing realistic, yet cost-effective and risk-free feedback. Object oriented design and implementation of the simulation model with a process oriented world view, graphical animation and visually interactive simulation (using GUI objects such as windows, menus, dialog boxes; mouse sensitive dynamic automaton trace charts and dynamic graphical statistical monitoring) are other issues dealt with in the study

    A novel learning automata game with local feedback for parallel optimization of hydropower production

    Get PDF
    Master's thesis Information- and communication technology IKT590 - University of Agder 2017Hydropower optimization for multi-reservoir systems is classi ed as a combinatorial optimization problem with large state-space that is particularly di cult to solve. There exist no golden standard when solving such problems, and many proposed algorithms are domain speci c. The literature describes several di erent techniques where linear programming approaches are extensively discussed, but tends to succumb to the curse of dimensionality problem when the state vector dimensions increase. This thesis introduces LA LCS, a novel learning automata algorithm that utilizes a parallel form of local feedback. This enables each individual automaton to receive direct feedback, resulting in faster convergence. In addition, the algorithm is implemented using a parallel architecture on a CUDA enabled GPU, along with exhaustive and random search. LA LCS has been veri ed through several scenarios. Experiments show that the algorithm is able to quickly adapt and nd optimal production strategies for problems of variable complexity. The algorithm is empirically veri ed and shown to hold great promise for solving optimization problems, including hydropower production strategies
    • …
    corecore