533 research outputs found
Model and Reinforcement Learning for Markov Games with Risk Preferences
We motivate and propose a new model for non-cooperative Markov game which
considers the interactions of risk-aware players. This model characterizes the
time-consistent dynamic "risk" from both stochastic state transitions (inherent
to the game) and randomized mixed strategies (due to all other players). An
appropriate risk-aware equilibrium concept is proposed and the existence of
such equilibria is demonstrated in stationary strategies by an application of
Kakutani's fixed point theorem. We further propose a simulation-based
Q-learning type algorithm for risk-aware equilibrium computation. This
algorithm works with a special form of minimax risk measures which can
naturally be written as saddle-point stochastic optimization problems, and
covers many widely investigated risk measures. Finally, the almost sure
convergence of this simulation-based algorithm to an equilibrium is
demonstrated under some mild conditions. Our numerical experiments on a two
player queuing game validate the properties of our model and algorithm, and
demonstrate their worth and applicability in real life competitive
decision-making.Comment: 38 pages, 6 tables, 5 figure
Operational Decision Making under Uncertainty: Inferential, Sequential, and Adversarial Approaches
Modern security threats are characterized by a stochastic, dynamic, partially observable, and ambiguous operational environment. This dissertation addresses such complex security threats using operations research techniques for decision making under uncertainty in operations planning, analysis, and assessment. First, this research develops a new method for robust queue inference with partially observable, stochastic arrival and departure times, motivated by cybersecurity and terrorism applications. In the dynamic setting, this work develops a new variant of Markov decision processes and an algorithm for robust information collection in dynamic, partially observable and ambiguous environments, with an application to a cybersecurity detection problem. In the adversarial setting, this work presents a new application of counterfactual regret minimization and robust optimization to a multi-domain cyber and air defense problem in a partially observable environment
Cost-aware Defense for Parallel Server Systems against Reliability and Security Failures
Parallel server systems in transportation, manufacturing, and computing
heavily rely on dynamic routing using connected cyber components for
computation and communication. Yet, these components remain vulnerable to
random malfunctions and malicious attacks, motivating the need for
fault-tolerant dynamic routing that are both traffic-stabilizing and
cost-efficient. In this paper, we consider a parallel server system with
dynamic routing subject to reliability and stability failures. For the
reliability setting, we consider an infinite-horizon Markov decision process
where the system operator strategically activates protection mechanism upon
each job arrival based on traffic state observations. We prove an optimal
deterministic threshold protecting policy exists based on dynamic programming
recursion of the HJB equation. For the security setting, we extend the model to
an infinite-horizon stochastic game where the attacker strategically
manipulates routing assignment. We show that both players follow a threshold
strategy at every Markov perfect equilibrium. For both failure settings, we
also analyze the stability of the traffic queues under control. Finally, we
develop approximate dynamic programming algorithms to compute the
optimal/equilibrium policies, supplemented with numerical examples and
experiments for validation and illustration.Comment: Major Revision in Automatic
Entropy-Regularized Stochastic Games
In zero-sum stochastic games, where two competing players make decisions under uncertainty, a pair of optimal strategies is traditionally described by Nash equilibrium and computed under the assumption that the players have perfect information about the stochastic transition model of the environment. However, implementing such strategies may make the players vulnerable to unforeseen changes in the environment. In this paper, we introduce entropy-regularized stochastic games where each player aims to maximize the causal entropy of its strategy in addition to its expected payoff. The regularization term balances each player's rationality with its belief about the level of misinformation about the transition model. We consider both entropy-regularized N-stage and entropy-regularized discounted stochastic games, and establish the existence of a value in both games. Moreover, we prove the sufficiency of Markovian and stationary mixed strategies to attain the value, respectively, in N-stage and discounted games. Finally, we present algorithms, which are based on convex optimization problems, to compute the optimal strategies. In a numerical example, we demonstrate the proposed method on a motion planning scenario and illustrate the effect of the regularization term on the expected payoff
Entropy-Regularized Stochastic Games
In two-player zero-sum stochastic games, where two competing players make
decisions under uncertainty, a pair of optimal strategies is traditionally
described by Nash equilibrium and computed under the assumption that the
players have perfect information about the stochastic transition model of the
environment. However, implementing such strategies may make the players
vulnerable to unforeseen changes in the environment. In this paper, we
introduce entropy-regularized stochastic games where each player aims to
maximize the causal entropy of its strategy in addition to its expected payoff.
The regularization term balances each player's rationality with its belief
about the level of misinformation about the transition model. We consider both
entropy-regularized -stage and entropy-regularized discounted stochastic
games, and establish the existence of a value in both games. Moreover, we prove
the sufficiency of Markovian and stationary mixed strategies to attain the
value, respectively, in -stage and discounted games. Finally, we present
algorithms, which are based on convex optimization problems, to compute the
optimal strategies. In a numerical example, we demonstrate the proposed method
on a motion planning scenario and illustrate the effect of the regularization
term on the expected payoff.Comment: Corrected typo
Entropy-Regularized Stochastic Games
In zero-sum stochastic games, where two competing players make decisions under uncertainty, a pair of optimal strategies is traditionally described by Nash equilibrium and computed under the assumption that the players have perfect information about the stochastic transition model of the environment. However, implementing such strategies may make the players vulnerable to unforeseen changes in the environment. In this paper, we introduce entropy-regularized stochastic games where each player aims to maximize the causal entropy of its strategy in addition to its expected payoff. The regularization term balances each player's rationality with its belief about the level of misinformation about the transition model. We consider both entropy-regularized N-stage and entropy-regularized discounted stochastic games, and establish the existence of a value in both games. Moreover, we prove the sufficiency of Markovian and stationary mixed strategies to attain the value, respectively, in N-stage and discounted games. Finally, we present algorithms, which are based on convex optimization problems, to compute the optimal strategies. In a numerical example, we demonstrate the proposed method on a motion planning scenario and illustrate the effect of the regularization term on the expected payoff
Learning and Management for Internet-of-Things: Accounting for Adaptivity and Scalability
Internet-of-Things (IoT) envisions an intelligent infrastructure of networked
smart devices offering task-specific monitoring and control services. The
unique features of IoT include extreme heterogeneity, massive number of
devices, and unpredictable dynamics partially due to human interaction. These
call for foundational innovations in network design and management. Ideally, it
should allow efficient adaptation to changing environments, and low-cost
implementation scalable to massive number of devices, subject to stringent
latency constraints. To this end, the overarching goal of this paper is to
outline a unified framework for online learning and management policies in IoT
through joint advances in communication, networking, learning, and
optimization. From the network architecture vantage point, the unified
framework leverages a promising fog architecture that enables smart devices to
have proximity access to cloud functionalities at the network edge, along the
cloud-to-things continuum. From the algorithmic perspective, key innovations
target online approaches adaptive to different degrees of nonstationarity in
IoT dynamics, and their scalable model-free implementation under limited
feedback that motivates blind or bandit approaches. The proposed framework
aspires to offer a stepping stone that leads to systematic designs and analysis
of task-specific learning and management schemes for IoT, along with a host of
new research directions to build on.Comment: Submitted on June 15 to Proceeding of IEEE Special Issue on Adaptive
and Scalable Communication Network
Optimal Control of Parallel Queues for Managing Volunteer Convergence
Peer Reviewedhttp://deepblue.lib.umich.edu/bitstream/2027.42/163497/2/poms13224.pdfhttp://deepblue.lib.umich.edu/bitstream/2027.42/163497/1/poms13224_am.pd
- âŠ