Search CORE

89 research outputs found

Sequential Decision Making under Uncertainty for Sensor Management in Mobile Robotics

Author: Lauri Mikko
Publication venue: Tampere University of Technology
Publication date: 01/01/2016
Field of study

Sensor management refers to the control of the degrees of freedom in a sensing system. The objective of sensor management is to improve performance e.g. by obtaining more accurate information or by achieving other operational goals. Sensor management is viewed as a sequential decision making process, where decisions at any time are made conditional on the past decisions and measurement data. At the time of deciding a control action for a sensing system the measurement data that will be obtained are unknown. Thus, informally speaking, a solution to a sensor management problem is a policy that determines which sensing action to undertake given the current information on the state of the process under investigation and contingent on any possible realisation of future measurement data outcomes.This thesis studies sensor management framing the contingent planning problem in the partially observable Markov decision process (POMDP) framework. In particular, applications in mobile robotics are considered. Mobile robots are viewed as controllable sensor platforms.Based on earlier work on POMDP based robot control, and distinguishing between the two cases of either exploiting or gathering information, we deﬁne four canonical sensor management problem types in mobile robotics. In each of the problem types, we exploit the structural properties of their inputs to improve eﬃciency of applicable contingent planning algorithms.In particular, we consider sensor management problems for information gathering where the utility of the possible control policies is quantiﬁed by mutual information (MI). We identify the relationship between the POMDP formulation of an environment monitoring problem and another contingent planning problem known as a multi-armed bandit (MAB). In a robotic exploration task, we derive a novel approximation for MI.Through both simulation and real-world experiments in mobile robotics domains, we determine the applicability, advantages, and disadvantages of a POMDP based approach to sensor management in mobile robotics

Trepo - Institutional Repository of Tampere University

A Sampling-Based Method for Gittins Index Approximation

Author: Baas Stef
Boucherie Richard J.
Braaksma Aleida
Publication venue: ArXiv.org
Publication date: 24/07/2023
Field of study

University of Twente Research Information

A Sampling-Based Method for Gittins Index Approximation

Author: Baas Stef
Boucherie Richard J.
Braaksma Aleida
Publication venue
Publication date: 21/07/2023
Field of study

A sampling-based method is introduced to approximate the Gittins index for a general family of alternative bandit processes. The approximation consists of a truncation of the optimization horizon and support for the immediate rewards, an optimal stopping value approximation, and a stochastic approximation procedure. Finite-time error bounds are given for the three approximations, leading to a procedure to construct a confidence interval for the Gittins index using a finite number of Monte Carlo samples, as well as an epsilon-optimal policy for the Bayesian multi-armed bandit. Proofs are given for almost sure convergence and convergence in distribution for the sampling based Gittins index approximation. In a numerical study, the approximation quality of the proposed method is verified for the Bernoulli bandit and Gaussian bandit with known variance, and the method is shown to significantly outperform Thompson sampling and the Bayesian Upper Confidence Bound algorithms for a novel random effects multi-armed bandit

arXiv.org e-Print Archive

Approximate Dynamic Programming: Health Care Applications

Author: Nasrollahzadeh Amir Ali
Publication venue: Clemson University Libraries
Publication date: 01/05/2019
Field of study

This dissertation considers different approximate solutions to Markov decision problems formulated within the dynamic programming framework in two health care applications. Dynamic formulations are appropriate for problems which require optimization over time and a variety of settings for different scenarios and policies. This is similar to the situation in a lot of health care applications for which because of the curses of dimensionality, exact solutions do not always exist. Thus, approximate analysis to find near optimal solutions are motivated. To check the quality of approximation, additional evidence such as boundaries, consistency analysis, or asymptotic behavior evaluation are required. Emergency vehicle management and dose-finding clinical trials are the two heath care applications considered here in order to investigate dynamic formulations, approximate solutions, and solution quality assessments. The dynamic programming formulation for real-time ambulance dispatching and relocation policies, response-adaptive dose-finding clinical trial, and optimal stopping of adaptive clinical trials is presented. Approximate solutions are derived by multiple methods such as basis function regression, one-step look-ahead policy, simulation-based gridding algorithm, and diffusion approximation. Finally, some boundaries to assess the optimality gap and a proof of consistency for approximate solutions are presented to ensure the quality of approximation

Clemson University: TigerPrints

Why Non-myopic Bayesian Optimization is Promising and How Far Should We Look-ahead? A Study via Rollout

Author: Kontar Raed Al
Yue Xubo
Publication venue
Publication date: 20/01/2020
Field of study

Lookahead, also known as non-myopic, Bayesian optimization (BO) aims to find optimal sampling policies through solving a dynamic programming (DP) formulation that maximizes a long-term reward over a rolling horizon. Though promising, lookahead BO faces the risk of error propagation through its increased dependence on a possibly mis-specified model. In this work we focus on the rollout approximation for solving the intractable DP. We first prove the improving nature of rollout in tackling lookahead BO and provide a sufficient condition for the used heuristic to be rollout improving. We then provide both a theoretical and practical guideline to decide on the rolling horizon stagewise. This guideline is built on quantifying the negative effect of a mis-specified model. To illustrate our idea, we provide case studies on both single and multi-information source BO. Empirical results show the advantageous properties of our method over several myopic and non-myopic BO algorithms.Comment: 12 pages, 1 figure Accepted by AISTATS 202

arXiv.org e-Print Archive

Distributed Planning for Self-Organizing Production Systems

Author: Pfrommer Julius
Publication venue: KIT-Bibliothek, Karlsruhe
Publication date: 21/04/2021
Field of study

Für automatisierte Produktionsanlagen gibt es einen fundamentalen Tradeoff zwischen Effizienz und Flexibilität. In den meisten Fällen sind die Abläufe nicht nur durch den physischen Aufbau der Produktionsanlage, sondern auch durch die spezielle zugeschnittene Programmierung der Anlagensteuerung fest vorgegeben. Änderungen müssen aufwändig in einer Vielzahl von Systemen nachgezogen werden. Das macht die Herstellung kleiner Stückzahlen unrentabel. In dieser Dissertation wird ein Ansatz entwickelt, um eine automatische Anpassung des Verhaltens von Produktionsanlagen an wechselnde Aufträge und Rahmenbedingungen zu erreichen. Dabei kommt das Prinzip der Selbstorganisation durch verteilte Planung zum Einsatz. Die aufeinander aufbauenden Ergebnisse der Dissertation sind wie folgt: 1. Es wird ein Modell von Produktionsanlagen entwickelt, dass nahtlos von der detaillierten Betrachtung physikalischer Produktionsprozesse bis hin zu Lieferbeziehungen zwischen Unternehmen skaliert. Im Vergleich zu existierenden Modellen von Produktionsanlagen werden weniger limitierende Annahmen gestellt. In diesem Sinne ist der Modellierungsansatz ein Kandidat für eine häufig geforderte "Theorie der Produktion". 2. Für die so modellierten Szenarien wird ein Algorithmus zur Optimierung der nebenläufigen Abläufe entwickelt. Der Algorithmus verbindet Techniken für die kombinatorische und die kontinuierliche Optimierung: Je nach Detailgrad und Ausgestaltung des modellierten Szenarios kann der identische Algorithmus kombinatorische Fertigungsfeinplanung (Scheduling) vornehmen, weltweite Lieferbeziehungen unter Einbezug von Unsicherheiten und Risiko optimieren und physikalische Prozesse prädiktiv regeln. Dafür werden Techniken der Monte-Carlo Baumsuche (die auch bei Deepminds Alpha Go zum Einsatz kommen) weiterentwickelt. Durch Ausnutzung zusätzlicher Struktur in den Modellen skaliert der Ansatz auch auf große Szenarien. 3. Der Planungsalgorithmus wird auf die verteilte Optimierung durch unabhängige Agenten übertragen. Dafür wird die sogenannte "Nutzen-Propagation" als Koordinations-Mechanismus entwickelt. Diese ist von der Belief-Propagation zur Inferenz in Probabilistischen Graphischen Modellen inspiriert. Jeder teilnehmende Agent hat einen lokalen Handlungsraum, in dem er den Systemzustand beobachten und handelnd eingreifen kann. Die Agenten sind an der Maximierung der Gesamtwohlfahrt über alle Agenten hinweg interessiert. Die dafür notwendige Kooperation entsteht über den Austausch von Nachrichten zwischen benachbarten Agenten. Die Nachrichten beschreiben den erwarteten Nutzen für ein angenommenes Verhalten im Handlungsraum beider Agenten. 4. Es wird eine Beschreibung der wiederverwendbaren Fähigkeiten von Maschinen und Anlagen auf Basis formaler Beschreibungslogiken entwickelt. Ausgehend von den beschriebenen Fähigkeiten, sowie der vorliegenden Aufträge mit ihren notwendigen Produktionsschritten, werden ausführbare Aktionen abgeleitet. Die ausführbaren Aktionen, mit wohldefinierten Vorbedingungen und Effekten, kapseln benötigte Parametrierungen, programmierte Abläufe und die Synchronisation von Maschinen zur Laufzeit. Die Ergebnisse zusammenfassend werden Grundlagen für flexible automatisierte Produktionssysteme geschaffen -- in einer Werkshalle, aber auch über Standorte und Organisationen verteilt -- welche die ihnen innewohnenden Freiheitsgrade durch Planung zur Laufzeit und agentenbasierte Koordination gezielt einsetzen können. Der Bezug zur Praxis wird durch Anwendungsbeispiele hergestellt. Die Machbarkeit des Ansatzes wurde mit realen Maschinen im Rahmen des EU-Projekts SkillPro und in einer Simulationsumgebung mit weiteren Szenarien demonstriert

KITopen