795 research outputs found

    Stochastic scheduling games with Markov Decision Arrival Processes

    Get PDF
    AbstractIn Hordijk and Koole [1,2], a new type of arrival process, the Markov Decision Arrival Process (MDAP), was introduced, which can be used to model certain dependencies between arrival streams and the system at which the arrivals occur. This arrival process was used to solve control problems with several controllers having a common objective, where the output from one controlled node is fed into a second one, as in tandems of multi-server queues. In the case that objectives of the controllers are different, one may choose a min-max (worst case) approach where typically a controller tries to obtain the best performance under the worst possible (unknown) strategies of the other controllers. We use the MDAP to model such situations, or situations of control in an unknown environment. We apply this approach to several scheduling problems, including scheduling of customers and scheduling of servers. We consider different information patterns including delayed information. For all these models, we obtain several structural results of the optimal policies

    Model and Reinforcement Learning for Markov Games with Risk Preferences

    Full text link
    We motivate and propose a new model for non-cooperative Markov game which considers the interactions of risk-aware players. This model characterizes the time-consistent dynamic "risk" from both stochastic state transitions (inherent to the game) and randomized mixed strategies (due to all other players). An appropriate risk-aware equilibrium concept is proposed and the existence of such equilibria is demonstrated in stationary strategies by an application of Kakutani's fixed point theorem. We further propose a simulation-based Q-learning type algorithm for risk-aware equilibrium computation. This algorithm works with a special form of minimax risk measures which can naturally be written as saddle-point stochastic optimization problems, and covers many widely investigated risk measures. Finally, the almost sure convergence of this simulation-based algorithm to an equilibrium is demonstrated under some mild conditions. Our numerical experiments on a two player queuing game validate the properties of our model and algorithm, and demonstrate their worth and applicability in real life competitive decision-making.Comment: 38 pages, 6 tables, 5 figure

    Markov Games: Receding Horizon Approach

    Get PDF
    We consider a receding horizon approach as an approximate solution totwo-person zero-sum Markov games with infinite horizon discounted costand average cost criteria. We first present error bounds from the optimalequilibrium value of the gamewhen both players take correlated equilibrium receding horizon policiesthat are based on emph{exact} or emph{approximate} solutions of recedingfinite horizon subgames. Motivated by the worst-case optimal control ofqueueing systems by Altman, we then analyze error boundswhen the minimizer plays the (approximate) receding horizon control andthe maximizer plays the worst case policy. We give three heuristicexamples of the approximate receding horizon control. We extend"rollout" by Bertsekas and Castanon and"parallel rollout" and "hindsight optimization" byChang {et al.) into the Markov game settingwithin the framework of the approximate receding horizon approach andanalyze their performances.From the rollout/parallel rollout approaches, the minimizing player seeks to improve the performance of a single heuristic policy it rolls out or to combine dynamically multiple heuristic policies in a set to improve theperformances of all of the heuristic policies simultaneously under theguess that the maximizing player has chosen a fixed worst-case policy. Given epsilon>0epsilon > 0, we give the value of the receding horizon whichguarantees that the parallel rollout policy with the horizon played by the minimizer emph{dominates} any heuristic policy in the set by epsilonepsilon.From the hindsight optimization approach, the minimizing player makes a decision based on his expected optimal hindsight performance over a finite horizon. We finally discuss practical implementations of the receding horizon approaches via simulation

    Cost-aware Defense for Parallel Server Systems against Reliability and Security Failures

    Full text link
    Parallel server systems in transportation, manufacturing, and computing heavily rely on dynamic routing using connected cyber components for computation and communication. Yet, these components remain vulnerable to random malfunctions and malicious attacks, motivating the need for fault-tolerant dynamic routing that are both traffic-stabilizing and cost-efficient. In this paper, we consider a parallel server system with dynamic routing subject to reliability and stability failures. For the reliability setting, we consider an infinite-horizon Markov decision process where the system operator strategically activates protection mechanism upon each job arrival based on traffic state observations. We prove an optimal deterministic threshold protecting policy exists based on dynamic programming recursion of the HJB equation. For the security setting, we extend the model to an infinite-horizon stochastic game where the attacker strategically manipulates routing assignment. We show that both players follow a threshold strategy at every Markov perfect equilibrium. For both failure settings, we also analyze the stability of the traffic queues under control. Finally, we develop approximate dynamic programming algorithms to compute the optimal/equilibrium policies, supplemented with numerical examples and experiments for validation and illustration.Comment: Major Revision in Automatic

    Distributed Algorithms for Learning and Cognitive Medium Access with Logarithmic Regret

    Get PDF
    The problem of distributed learning and channel access is considered in a cognitive network with multiple secondary users. The availability statistics of the channels are initially unknown to the secondary users and are estimated using sensing decisions. There is no explicit information exchange or prior agreement among the secondary users. We propose policies for distributed learning and access which achieve order-optimal cognitive system throughput (number of successful secondary transmissions) under self play, i.e., when implemented at all the secondary users. Equivalently, our policies minimize the regret in distributed learning and access. We first consider the scenario when the number of secondary users is known to the policy, and prove that the total regret is logarithmic in the number of transmission slots. Our distributed learning and access policy achieves order-optimal regret by comparing to an asymptotic lower bound for regret under any uniformly-good learning and access policy. We then consider the case when the number of secondary users is fixed but unknown, and is estimated through feedback. We propose a policy in this scenario whose asymptotic sum regret which grows slightly faster than logarithmic in the number of transmission slots.Comment: Submitted to IEEE JSAC on Advances in Cognitive Radio Networking and Communications, Dec. 2009, Revised May 201

    Games for the Optimal Deployment of Security Forces

    Get PDF
    In this thesis, we develop mathematical models for the optimal deployment of security forces addressing two main challenges: adaptive behavior of the adversary and uncertainty in the model. We address several security applications and model them as agent-intruder games. The agent represents the security forces which can be the coast guard, airport control, or military assets, while the intruder represents the agent's adversary such as illegal fishermen, terrorists or enemy submarines. To determine the optimal agent's deployment strategy, we assume that we deal with an intelligent intruder. This means that the intruder is able to deduce the strategy of the agent. To take this into account, for example by using randomized strategies, we use game theoretical models which are developed to model situations in which two or more players interact. Additionally, uncertainty may arise at several aspects. For example, there might be uncertainty in sensor observations, risk levels of certain areas, or travel times. We address this uncertainty by combining game theoretical models with stochastic modeling, such as queueing theory, Bayesian beliefs, and stochastic game theory. This thesis consists of three parts. In the first part, we introduce two game theoretical models on a network of queues. First, we develop an interdiction game on a network of queues where the intruder enters the network as a regular customer and aims to route to a target node. The agent is modeled as a negative customer which can inspect the queues and remove intruders. By modeling this as a queueing network, stochastic arrivals and travel times can be taken into account. The second model considers a non-cooperative game on a queueing network where multiple players decide on a route that minimizes their sojourn time. We discuss existence of pure Nash equilibria for games with continuous and discrete strategy space and describe how such equilibria can be found. The second part of this thesis considers dynamic games in which information that becomes available during the game plays a role. First, we consider partially observable agent-intruder games (POAIGs). In these types of games, both the agent and the intruder do not have full information about the state space. However, they do partially observe the state space, for example by using sensors. We prove the existence of approximate Nash equilibria for POAIGs with an infinite time horizon and provide methods to find (approximate) solutions for both POAIGs with a finite time horizon and POAIGs with an infinite time horizon. Second, we consider anti-submarine warfare operations with time dependent strategies where parts of the agent's strategy becomes available to the intruder during the game. The intruder represents an enemy submarine which aims to attack a high value unit. The agent is trying to prevent this by the deployment of both frigates and helicopters. In the last part of this thesis we discuss games with restrictions on the agent's strategy. We consider a special case of security games dealing with the protection of large areas for a given planning period. An intruder decides on which cell to attack and an agent selects a patrol route visiting multiple cells from a finite set of patrol routes, such that some given operational conditions on the agent's mobility are met. First, this problem is modeled as a two-player zero-sum game with probabilistic constraints such that the operational conditions are met with high probability. Second, we develop a dynamic variant of this game by using stochastic games. This ensures that strategies are constructed that consider both past actions and expected future risk levels. In the last chapter, we consider Stackelberg security games with a large number of pure strategies. In order to construct operationalizable strategies we limit the number of pure strategies that is allowed in the optimal mixed strategy of the agent. We investigate the cost of these restrictions by introducing the price of usability and develop algorithmic approaches to calculate such strategies efficiently
    • …
    corecore