4 research outputs found

    Model and Reinforcement Learning for Markov Games with Risk Preferences

    Full text link
    We motivate and propose a new model for non-cooperative Markov game which considers the interactions of risk-aware players. This model characterizes the time-consistent dynamic "risk" from both stochastic state transitions (inherent to the game) and randomized mixed strategies (due to all other players). An appropriate risk-aware equilibrium concept is proposed and the existence of such equilibria is demonstrated in stationary strategies by an application of Kakutani's fixed point theorem. We further propose a simulation-based Q-learning type algorithm for risk-aware equilibrium computation. This algorithm works with a special form of minimax risk measures which can naturally be written as saddle-point stochastic optimization problems, and covers many widely investigated risk measures. Finally, the almost sure convergence of this simulation-based algorithm to an equilibrium is demonstrated under some mild conditions. Our numerical experiments on a two player queuing game validate the properties of our model and algorithm, and demonstrate their worth and applicability in real life competitive decision-making.Comment: 38 pages, 6 tables, 5 figure

    Advancing stability analysis of mean-risk stochastic programs: Bilevel and two-stage models

    Get PDF
    Measuring and managing risk has become crucial in modern decision making under stochastic uncertainty. In two-stage stochastic programming, mean-risk models are essentially defined by a parametric recourse problem and a quantification of risk. The thesis addresses sufficient conditions for weak continuity of the resulting objective functions with respect to perturbations of the underlying probability measure. The approach is based on so called psi-weak topologies that are finer than the topology of weak convergence and allows to unify and extend known results for a comprehensive class of risk measures and recourse problems. In particular, stability of mean-risk models with mixed-integer quadratic and general mixed-integer convex recourse problems is derived for any law-invariant, convex and nondecreasing quantification of risk. From a conceptual point of view, two-stage stochastic programs and bilevel problems under stochastic uncertainty are closely related. Assuming that only the follower can observe the realization of the randomness, the optimistic and pessimistic setting give rise to two-stage problems where only optimal solutions of the lower level are feasible for the recourse problem. So far, stability in stochastic bilevel programming has only been examined for a specific model based on a quantile criterion. The novel approach allows to identify sufficient conditions for stability of stochastic bilevel problems with quadratic lower level and is applicable for a comprehensive class of risk measures.Die Bewertung und das Management von Risken sind ein wesentlicher Aspekt von Entscheidungsproblemen unter stochastischer Unsicherheit. Zielfunktionsbasierte risikoaverse Modelle der zweistufigen stochastischen Optimierung lassen sich im Wesentlichen durch ihr parametrisches Zweitstufenproblem und das betrachtete Risikomaß charakterisieren. Die Arbeit befasst sich mit hinreichenden Bedingungen für Stetigkeit der resultierenden Zielfunktion unter Störungen des zu Grunde liegenden Wahrscheinlichkeitsmaßes bezüglich der Topologie schwacher Konvergenz. Der Ansatz basiert auf so genannten psi-schwachen Topologien, die feiner als die Topologie schwacher Konvergenz sind. Für eine umfassende Klasse von Risikomaßen und Zweitstufenproblemen werden so bestehende Resultate vereinheitlicht und erweitert. Insbesondere lassen sich für jedes verteilungsinvariante, konvexe und nichtfallende Risikomaß Stabilitätsaussagen für Aufgaben mit quadratischem oder konvexem gemischt-ganzzahligen Zweitstufenproblem treffen. Aus konzeptioneller Sicht sind zweistufige stochastische Programme und Bilevel Probleme unter stochastischer Unsicherheit eng miteinander verbunden. Unter der Annnahme, dass nur der Entscheider auf der unteren Ebene die Realisierung des Zufalls beobachten kann, führen sowohl der optimistische als auch der pessimistische Ansatz auf ein zweistufiges stochastisches Programm. Bei diesem sind nur die Optimallösungen der unteren Ebene zulässig für das Zweitstufenproblem. Bisher ist die Stabilität solcher Aufgaben nur für Modelle mit einem speziellen Quantilkriterium untersucht worden. Der neue Ansatz erlaubt es, hinreichende Bedingungen für die Stabilität von stochastischen Bilevel Problemen mit quadratischem Nachfolgerproblem zu identifizieren und ist auf eine reichhaltige Klasse von Risikomaßen anwendbar

    Invariant manifold theory for impulsive functional differential equations with applications

    Get PDF
    The primary contribution of this thesis is a development of invariant manifold theory for impulsive functional differential equations. We begin with an in-depth analysis of linear systems, immersed in a nonautonomous dynamical systems framework. We prove a variation-of-constants formula, introduce appropriate generalizations of stable, centre and unstable subspaces, and develop a Floquet theory for periodic systems. Using the Lyapunov-Perron method, we prove the existence of local centre manifolds at a nonhyperbolic equilibrium of nonlinear impulsive functional differential equations. Using a formal differentiation procedure in conjunction with machinery from functional analysis -- specifically, contraction mappings on scales of Banach spaces -- we prove that the centre manifold is smooth in the state space. By introducing a coordinate system, we are able to prove that the coefficients of any Taylor expansion of the local centre manifold are unique and sufficiently regular in the time and lag arguments that they can be computed by solving an impulsive boundary-value problem. After proving a reduction principle, this leads naturally to explorations into bifurcation theory, where we establish generalizations of the classical fold and Hopf bifurcations for impulsive delay differential equations. Aside from the centre manifold, we demonstrate the existence and smoothness of stable and unstable manifolds and prove a linearized stability theorem. One of the applications of the theory above is an analysis of a SIR model with pulsed vaccination and finite temporary immunity modeled by a discrete delay. We determine an analytical stability criteria for the disease-free equilibrium and prove the existence of a transcritical bifurcation of periodic solutions at some critical vaccination coverage level for generic system parameters. Then, using numerical continuation and a monodromy operator discretization scheme, we track the bifurcating endemic periodic solution until a Hopf point is identifed. A cylinder bifurcation is observed; the periodic orbit expands into a cylinder in the extended phase space before eventually contracting onto a periodic orbit as the vaccination coverage vanishes. The other application is an impulsive stabilization method based on centre manifold reduction and optimization principles. Assuming a cost structure on the impulsive controller and a desired convergence rate target, we prove that under certain conditions there is always an impulsive controller that can stabilize a nonhyperbolic equilibrium with a trivial unstable subspace, robustly with respect to parameter perturbation, while guaranteeing a minimal cost. We then exploit the low-dimensionality of the centre manifold to develop a two-stage program that can be implemented to compute the optimal controller. To demonstrate the effectiveness of the two-stage program, which we call the centre probe method, we use the method to stabilize a complex network of 100 diffusively coupled nodes at a Hopf point. The cost structure is one that assigns higher cost to controlling of nodes that have more neighbours, while the jump functionals are required to be diagonal -- that is, they do not introduce further coupling. We also introduce a secondary goal, which is that the number of nodes that are controlled is minimized
    corecore