Search CORE

151 research outputs found

Numerical method for expectations of piecewise-determistic Markov processes

Author: Brandejsky Adrien
de Saporta Benoîte
Dufour François
Publication venue
Publication date: 01/01/2012
Field of study

We present a numerical method to compute expectations of functionals of a piecewise-deterministic Markov process. We discuss time dependent functionals as well as deterministic time horizon problems. Our approach is based on the quantization of an underlying discrete-time Markov chain. We obtain bounds for the rate of convergence of the algorithm. The approximation we propose is easily computable and is flexible with respect to some of the parameters defining the problem. Two examples illustrate the paper.Comment: 41 page

arXiv.org e-Print Archive

INRIA a CCSD electronic archive server

Oskar Bordeaux

Safe Learning and Optimization Techniques: Towards a Survey of the State of the Art

Author: Allmendinger Richard
Kim Youngmin
López-Ibáñez Manuel
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2021
Field of study

Safe learning and optimization deals with learning and optimization problems that avoid, as much as possible, the evaluation of non-safe input points, which are solutions, policies, or strategies that cause an irrecoverable loss (e.g., breakage of a machine or equipment, or life threat). Although a comprehensive survey of safe reinforcement learning algorithms was published in 2015, a number of new algorithms have been proposed thereafter, and related works in active learning and in optimization were not considered. This paper reviews those algorithms from a number of domains including reinforcement learning, Gaussian process regression and classification, evolutionary algorithms, and active learning. We provide the fundamental concepts on which the reviewed algorithms are based and a characterization of the individual algorithms. We conclude by explaining how the algorithms are connected and suggestions for future research.Comment: The final authenticated publication was made In: Heintz F., Milano M., O'Sullivan B. (eds) Trustworthy AI - Integrating Learning, Optimization and Reasoning. TAILOR 2020. Lecture Notes in Computer Science, vol 12641. Springer, Cham. The final authenticated publication is available online at \<https://doi.org/10.1007/978-3-030-73959-1_12

arXiv.org e-Print Archive

A CONTINUITY QUESTION OF DUBINS AND SAVAGE

Author: Laraki R
Sudderth W
Publication venue: 'Cambridge University Press (CUP)'
Publication date: 01/06/2017
Field of study

University of Liverpool Repository

Evolutionary game of coalition building under external pressure

Author: A Saichev
CH Papadimitriou
DO Pushkin
E Weese
H Inal
J Norris
JB Jouida
JM Lasry
JN Tsitsiklis
M Finus
MF Chen
MG Crandall
N Gast
PL Krapivsky
VN Kolokoltsov
VN Kolokoltsov
W Chen
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2017
Field of study

We study the fragmentation-coagulation (or merging and splitting) evolutionary control model as introduced recently by one of the authors, where

N

small players can form coalitions to resist to the pressure exerted by the principal. It is a Markov chain in continuous time and the players have a common reward to optimize. We study the behavior as

N

grows and show that the problem converges to a (one player) deterministic optimization problem in continuous time, in the infinite dimensional state space

arXiv.org e-Print Archive

Warwick Research Archives Portal Repository

Archivio istituzionale della ricerca - Università di Padova

ρ-POMDPs have Lipschitz-Continuous ϵ-Optimal Value Functions

Author: Buffet Olivier
Dibangoye Jilles
Fehr Mathieu
Thomas Vincent
Publication venue: HAL CCSD
Publication date: 03/12/2018
Field of study

International audienceMany state-of-the-art algorithms for solving Partially Observable Markov Decision Processes (POMDPs) rely on turning the problem into a "fully observable" problem---a belief MDP---and exploiting the piece-wise linearity and convexity (PWLC) of the optimal value function in this new state space (the belief simplex ∆). This approach has been extended to solving ρ-POMDPs---i.e., for information-oriented criteria-when the reward ρ is convex in ∆. General ρ-POMDPs can also be turned into "fully observable" problems, but with no means to exploit the PWLC property. In this paper, we focus on POMDPs and ρ-POMDPs with λ ρ-Lipschitz reward function, and demonstrate that, for finite horizons, the optimal value function is Lipschitz-continuous. Then, value function approximators are proposed for both upper-and lower-bounding the optimal value function, which are shown to provide uniformly improvable bounds. This allows proposing two algorithms derived from HSVI which are empirically evaluated on various benchmark problems

INRIA a CCSD electronic archive server

Finite horizon analysis of Markov automata

Author: Hatefi Ardakani Hassan
Publication venue: 'Walter de Gruyter GmbH'
Publication date: 01/01/2016
Field of study

Markov automata constitute an expressive continuous-time compositional modelling formalism, featuring stochastic timing and nondeterministic as well as probabilistic branching, all supported in one model. They span as special cases, the models of discrete and continuous-time Markov chains, as well as interactive Markov chains and probabilistic automata. Moreover, they might be equipped with reward and resource structures in order to be used for analysing quantitative aspects of systems, like performance metrics, energy consumption, repair and maintenance costs. Due to their expressive nature, they serve as semantic backbones of engineering frameworks, control applications and safety critical systems. The Architecture Analysis and Design Language (AADL), Dynamic Fault Trees (DFT) and Generalised Stochastic Petri Nets (GSPN) are just some examples. Their expressiveness thus far prevents them from efficient analysis by stochastic solvers and probabilistic model checkers. A major problem context of this thesis lies in their analysis under some budget constraints, i.e. when only a finite budget of resources can be spent by the model. We study mathematical foundations of Markov automata since these are essential for the analysis addressed in this thesis. This includes, in particular, understanding their measurability and establishing their probability measure. Furthermore, we address the analysis of Markov automata in the presence of both reward acquisition and resource consumption within a finite budget of resources. More specifically, we put the problem of computing the optimal expected resource-bounded reward in our focus. In our general setting, we support transient, instantaneous and final reward collection as well as transient resource consumption. Our general formulation of the problem encompasses in particular the optimal time-bound reward and reachability as well as resource-bounded reachability. We develop a sound theory together with a stable approximation scheme with a strict error bound to solve the problem in an efficient way. We report on an implementation of our approach in a supporting tool and also demonstrate its effectiveness and usability over an extensive collection of industrial and academic case studies.Markov-Automaten bilden einen mächtigen Formalismus zur kompositionellen Modellierung mit kontinuierlicher stochastischer Zeit und nichtdeterministischer sowie probabilistischer Verzweigung, welche alle in einem Modell unterstützt werden. Sie enthalten als Spezialfälle die Modelle diskreter und kontinuierlicher Markov-Ketten sowie interaktive Markov-Ketten und probabilistischer Automaten. Darüber hinaus können sie mit Belohnungs- und Ressourcenstrukturen ausgestattet werden, um quantitative Aspekte von Systemen wie Leistungsfähigkeit, Energieverbrauch, Reparatur- und Wartungskosten zu analysieren. Sie dienen aufgrund ihrer Ausdruckskraft als semantisches Rückgrat von Engineering Frameworks, Steuerungsanwendungen und sicherheitskritischen Systemen. Die Architekturanalyse und Designsprache (AADL), Dynamic Fault Trees (DFT) und Generalized Stochastic Petri Nets (GSPN) sind nur einige Beispiele dafür. Ihre Aussagekraft verhindert jedoch bisher eine effiziente Analyse durch stochastische Löser und probabilistische Modellprüfer. Ein wichtiger Problemzusammenhang dieser Arbeit liegt in ihrer Analyse unter Budgetbeschränkungen, das heisst wenn nur ein begrenztes Budget an Ressourcen vom Modell aufgewendet werden kann. Wir studieren mathematische Grundlagen von Markov-Automaten, da diese für die in dieser Arbeit angesprochene Analyse von wesentlicher Bedeutung sind. Dazu gehört insbesondere das Verständnis ihrer Messbarkeit und die Festlegung ihrer Wahrscheinlichkeitsmaßes. Darüber hinaus befassen wir uns mit der Analyse von Markov-Automaten in Bezug auf Belohnungserwerb sowie Ressourcenverbrauch innerhalb eines begrenzten Ressourcenbudgets. Genauer gesagt stellen wir das Problem der Berechnung der optimalen erwarteten Ressourcen-begrenzte Belohnung in unserem Fokus. Dieser Fokus umfasst transiente, sofortige und endgültige Belohnungssammlung sowie transienten Ressourcenverbrauch. Unsere allgemeine Formulierung des Problems beinhalet insbesondere die optimale zeitgebundene Belohnung und Erreichbarkeit sowie ressourcenbeschränkte Erreichbarkeit. Wir entwickeln die grundlegende Theorie dazu. Zur effizienten Lösung des Problems entwerfen wir ein stabilen Approximationsschema mit einer strikten Fehlerschranke. Wir berichten über eine Umsetzung unseres Ansatzes in einem Software-Werkzeug und zeigen seine Wirksamkeit und Verwendbarkeit anhand einer umfangreichen Sammlung von industriellen und akademischen Fallstudien

Proving Expected Sensitivity of Probabilistic Programs with Randomized Variable-Dependent Termination Time

Author: Chatterjee Krishnendu
Deng Yuxin
Fu Hongfei
Wang Peixin
Xu Ming
Publication venue
Publication date: 01/01/2019
Field of study

The notion of program sensitivity (aka Lipschitz continuity) specifies that changes in the program input result in proportional changes to the program output. For probabilistic programs the notion is naturally extended to expected sensitivity. A previous approach develops a relational program logic framework for proving expected sensitivity of probabilistic while loops, where the number of iterations is fixed and bounded. In this work, we consider probabilistic while loops where the number of iterations is not fixed, but randomized and depends on the initial input values. We present a sound approach for proving expected sensitivity of such programs. Our sound approach is martingale-based and can be automated through existing martingale-synthesis algorithms. Furthermore, our approach is compositional for sequential composition of while loops under a mild side condition. We demonstrate the effectiveness of our approach on several classical examples from Gambler's Ruin, stochastic hybrid systems and stochastic gradient descent. We also present experimental results showing that our automated approach can handle various probabilistic programs in the literature

arXiv.org e-Print Archive

The evolutionary game of pressure (or interference), resistance and collaboration

Author: Kolokoltsov V. N. (Vasiliĭ Nikitich)
Publication venue: 'Institute for Operations Research and the Management Sciences (INFORMS)'
Publication date: 01/11/2017
Field of study

In the past few years, volunteers have produced geographic information of different kinds, using a variety of different crowdsourcing platforms, within a broad range of contexts. However, there is still a lack of clarity about the specific types of tasks that volunteers can perform for deriving geographic information from remotely sensed imagery, and how the quality of the produced information can be assessed for particular task types. To fill this gap, we analyse the existing literature and propose a typology of tasks in geographic information crowdsourcing, which distinguishes between classification, digitisation and conflation tasks. We then present a case study related to the “Missing Maps” project aimed at crowdsourced classification to support humanitarian aid. We use our typology to distinguish between the different types of crowdsourced tasks in the project and choose classification tasks related to identifying roads and settlements for an evaluation of the crowdsourced classification. This evaluation shows that the volunteers achieved a satisfactory overall performance (accuracy: 89%; sensitivity: 73%; and precision: 89%). We also analyse different factors that could influence the performance, concluding that volunteers were more likely to incorrectly classify tasks with small objects. Furthermore, agreement among volunteers was shown to be a very good predictor of the reliability of crowdsourced classification: tasks with the highest agreement level were 41 times more probable to be correctly classified by volunteers. The results thus show that the crowdsourced classification of remotely sensed imagery is able to generate geographic information about human settlements with a high level of quality. This study also makes clear the different sophistication levels of tasks that can be performed by volunteers and reveals some factors that may have an impact on their performance. In this paper we extend the framework of evolutionary inspection game put forward recently by the author and coworkers to a large class of conflict interactions dealing with the pressure executed by the major player (or principal) on the large group of small players that can resist this pressure or collaborate with the major player. We prove rigorous results on the convergence of various Markov decision models of interacting small agents (including evolutionary growth), namely pairwise, in groups and by coalition formation, to a deterministic evolution on the distributions of the state spaces of small players paying main attention to situations with an infinite state-space of small players. We supply rather precise rates of convergence. The theoretical results of the paper are applied to the analysis of the processes of inspection, corruption, cyber-security, counter-terrorism, banks and firms merging, strategically enhanced preferential attachment and many other

Warwick Research Archives Portal Repository