11 research outputs found

    Search for an Immobile Hider on a Stochastic Network

    Full text link
    Harry hides on an edge of a graph and does not move from there. Sally, starting from a known origin, tries to find him as soon as she can. Harry's goal is to be found as late as possible. At any given time, each edge of the graph is either active or inactive, independently of the other edges, with a known probability of being active. This situation can be modeled as a zero-sum two-person stochastic game. We show that the game has a value and we provide upper and lower bounds for this value. Finally, by generalizing optimal strategies of the deterministic case, we provide more refined results for trees and Eulerian graphs.Comment: 28 pages, 9 figure

    Sur les jeux dynamiques : jeux stochastiques, recherche-dissimulation et transmission d'information

    No full text
    Dans cette thĂšse, nous Ă©tudions divers modĂšles de jeux dynamiques. Ceux-ci modĂ©lisent des processus de dĂ©cisions prises par des agents rationnels en interactions stratĂ©giques et dont la situation Ă©volue au cours du temps. Le premier chapitre est consacrĂ© aux jeux stochastiques. Dans ces derniers, le jeu courant dĂ©pend d’un Ă©tat de la nature, qui Ă©volue d’une Ă©tape Ă  la suivante de maniĂšre alĂ©atoire en fonction de l’état courant ainsi que des actions des joueurs, qui observent ces Ă©lĂ©ments. On Ă©tudie des propriĂ©tĂ©s de communication entre les Ă©tats, lorsque l’espace d’états est sous la forme d’un produit X ×Y, et que les joueurs contrĂŽlent la dynamique sur leur composante de l’espace d’états. On montre l’existence de stratĂ©gies optimales dans tout jeu rĂ©pĂ©tĂ© un nombre suffisant d’étapes, c’est-Ă -dire l’existence de la valeur uniforme, sous hypothĂšse de communication forte d’un cĂŽtĂ©. On montre en revanche la non converge de la valeur du jeu escomptĂ©, qui implique la non existence de la valeur asymptotique, sous hypothĂšse de communication faible des deux cĂŽtĂ©s. Les deux chapitres suivants sont consacrĂ©s Ă  des modĂšles de jeux de recherche-dissimulation. Un chercheur et un dissimulateur agissent sur un espace de recherche. L’objectif du chercheur est typiquement de retrouver le dissimulateur le plus rapidement possible, ou alors de maximiser la probabilitĂ© de le trouver en un temps imparti. L’enjeu est alors de calculer la valeur et les stratĂ©gies optimales des joueurs en fonction de la gĂ©omĂ©trie de l’espace de recherche. Dans un jeu de patrouille, un attaquant choisit un temps et un lieu Ă  attaquer, tandis qu’un patrouilleur marche continĂ»ment. Lorsque l’attaque survient, le patrouilleur a un certain dĂ©lai pour repĂ©rer l’attaquant. Dans un jeu de recherche-dissimulation stochastique, les joueurs se trouvent sur un graphe. La nouveautĂ© du modĂšle est qu’en raison de divers Ă©vĂšnements, Ă  chaque Ă©tape, certaines arĂȘtes peuvent ne pas ĂȘtre disponibles, de sorte que le graphe Ă©volue de façon alĂ©atoire dans le temps. Enfin, le dernier chapitre est consacrĂ© Ă  un modĂšle de jeux rĂ©pĂ©tĂ©s Ă  information incomplĂšte dit de contrĂŽle dynamique de l’information. Un conseiller a une connaissance privĂ©e de l’état de la nature, qui Ă©volue alĂ©atoirement avec le temps. Chaque jour le conseiller choisit la quantitĂ© d’information qu’il dĂ©voile Ă  un investisseur au travers de messages. À son tour, l’investisseur choisit d’investir ou non afin de maximiser son paiement quotidien espĂ©rĂ©. En cas d’investissement, le conseiller reçoit une commission fixe de la part de l’investisseur. Son objectif est alors de maximiser la frĂ©quence escomptĂ©e de jours oĂč a lieu l’investissement. On s’intĂ©resse Ă  une stratĂ©gie de dĂ©voilement d’information particuliĂšre du conseiller dite stratĂ©gie gloutonne. C’est une stratĂ©gie stationnaire ayant la propriĂ©tĂ© de minimiser la quantitĂ© d’information dĂ©voilĂ©e sous contrainte de maximiser le paiement courant du conseiller.In this thesis, we study various models of dynamic games. These model decision-making processes taken by rational agents in strategic interactions and whose situation changes over time. The first chapter is devoted to stochastic games. In these, the current game depends on a state of nature, which evolves randomly from one stage to the next depending on the current state as well as the actions of the players, who observe these elements. We study communication properties between states, when the state space is in the form of a product X × Y, and players control the dynamics on their components of the state space. The existence of optimal strategies in any long enough repeated game, i.e., the existence of the uniform value, is proved under the assumption of strong communication on one side. We prove the non-convergence of the value of the discounted game, which implies the non-existence of the asymptotic value, under the assumption of weak communication on both sides. The next two chapters are devoted to models of search games. A searcher and a hider act on a search space. The searcher’s objective is typically to find the hider as quickly as possible, or to maximize the probability of finding him in a given time. The challenge is then to calculate the value and optimal strategies of the players according to the geometry of the search space. In a patrolling game, an attacker chooses a time and place to attack, while a patroller walks continuously. When the attack occurs, the patroller has a fixed amount of time to locate the attacker. In a stochastic search game, players act on a graph. The novelty of the model is that due to various events, at each stage, some edges may not be available, so the graph evolves randomly over time. Finally, the last chapter is devoted to a model of repeated games with incomplete information called dynamic control of information. An advisor has a private knowledge of the state of nature, which changes randomly over time. Every day, the advisor chooses the amount of information he discloses to an investor through messages. In turn, the investor chooses whether or not to invest in order to maximize her daily expected payoff. In the event of an investment, the advisor receives a fixed commission from the investor. His objective is then to maximize the discounted frequency of days on which investment takes place. We are interested in a specific information disclosure strategy of the advisor called the greedy strategy. It is a stationary strategy with the property of minimizing the amount of information disclosed under the constraint of maximizing the advisor’s current payoff

    Continuous patrolling and hiding games

    No full text
    We present two zero-sum games modeling situations where one player attacks (or hides in) a finite dimensional nonempty compact set, and the other tries to prevent the attack (or find him). The first game, called patrolling game, corresponds to a dynamic formulation of this situation in the sense that the attacker chooses a time and a point to attack and the patroller chooses a continuous trajectory to maximize the probability of finding the attack point in a given time. Whereas the second game, called hiding game, corresponds to a static formulation in which both the searcher and the hider choose simultaneously a point and the searcher maximizes the probability of being at distance less than a given threshold of the hider.Comment: 20 pages, 6 figure

    On Dynamic Games : Stochastic Games, Search Games and Information Provision

    No full text
    Dans cette thĂšse, nous Ă©tudions divers modĂšles de jeux dynamiques. Ceux-ci modĂ©lisent des processus de dĂ©cisions prises par des agents rationnels en interactions stratĂ©giques et dont la situation Ă©volue au cours du temps. Le premier chapitre est consacrĂ© aux jeux stochastiques. Dans ces derniers, le jeu courant dĂ©pend d’un Ă©tat de la nature, qui Ă©volue d’une Ă©tape Ă  la suivante de maniĂšre alĂ©atoire en fonction de l’état courant ainsi que des actions des joueurs, qui observent ces Ă©lĂ©ments. On Ă©tudie des propriĂ©tĂ©s de communication entre les Ă©tats, lorsque l’espace d’états est sous la forme d’un produit X ×Y, et que les joueurs contrĂŽlent la dynamique sur leur composante de l’espace d’états. On montre l’existence de stratĂ©gies optimales dans tout jeu rĂ©pĂ©tĂ© un nombre suffisant d’étapes, c’est-Ă -dire l’existence de la valeur uniforme, sous hypothĂšse de communication forte d’un cĂŽtĂ©. On montre en revanche la non converge de la valeur du jeu escomptĂ©, qui implique la non existence de la valeur asymptotique, sous hypothĂšse de communication faible des deux cĂŽtĂ©s. Les deux chapitres suivants sont consacrĂ©s Ă  des modĂšles de jeux de recherche-dissimulation. Un chercheur et un dissimulateur agissent sur un espace de recherche. L’objectif du chercheur est typiquement de retrouver le dissimulateur le plus rapidement possible, ou alors de maximiser la probabilitĂ© de le trouver en un temps imparti. L’enjeu est alors de calculer la valeur et les stratĂ©gies optimales des joueurs en fonction de la gĂ©omĂ©trie de l’espace de recherche. Dans un jeu de patrouille, un attaquant choisit un temps et un lieu Ă  attaquer, tandis qu’un patrouilleur marche continĂ»ment. Lorsque l’attaque survient, le patrouilleur a un certain dĂ©lai pour repĂ©rer l’attaquant. Dans un jeu de recherche-dissimulation stochastique, les joueurs se trouvent sur un graphe. La nouveautĂ© du modĂšle est qu’en raison de divers Ă©vĂšnements, Ă  chaque Ă©tape, certaines arĂȘtes peuvent ne pas ĂȘtre disponibles, de sorte que le graphe Ă©volue de façon alĂ©atoire dans le temps. Enfin, le dernier chapitre est consacrĂ© Ă  un modĂšle de jeux rĂ©pĂ©tĂ©s Ă  information incomplĂšte dit de contrĂŽle dynamique de l’information. Un conseiller a une connaissance privĂ©e de l’état de la nature, qui Ă©volue alĂ©atoirement avec le temps. Chaque jour le conseiller choisit la quantitĂ© d’information qu’il dĂ©voile Ă  un investisseur au travers de messages. À son tour, l’investisseur choisit d’investir ou non afin de maximiser son paiement quotidien espĂ©rĂ©. En cas d’investissement, le conseiller reçoit une commission fixe de la part de l’investisseur. Son objectif est alors de maximiser la frĂ©quence escomptĂ©e de jours oĂč a lieu l’investissement. On s’intĂ©resse Ă  une stratĂ©gie de dĂ©voilement d’information particuliĂšre du conseiller dite stratĂ©gie gloutonne. C’est une stratĂ©gie stationnaire ayant la propriĂ©tĂ© de minimiser la quantitĂ© d’information dĂ©voilĂ©e sous contrainte de maximiser le paiement courant du conseiller.In this thesis, we study various models of dynamic games. These model decision-making processes taken by rational agents in strategic interactions and whose situation changes over time. The first chapter is devoted to stochastic games. In these, the current game depends on a state of nature, which evolves randomly from one stage to the next depending on the current state as well as the actions of the players, who observe these elements. We study communication properties between states, when the state space is in the form of a product X × Y, and players control the dynamics on their components of the state space. The existence of optimal strategies in any long enough repeated game, i.e., the existence of the uniform value, is proved under the assumption of strong communication on one side. We prove the non-convergence of the value of the discounted game, which implies the non-existence of the asymptotic value, under the assumption of weak communication on both sides. The next two chapters are devoted to models of search games. A searcher and a hider act on a search space. The searcher’s objective is typically to find the hider as quickly as possible, or to maximize the probability of finding him in a given time. The challenge is then to calculate the value and optimal strategies of the players according to the geometry of the search space. In a patrolling game, an attacker chooses a time and place to attack, while a patroller walks continuously. When the attack occurs, the patroller has a fixed amount of time to locate the attacker. In a stochastic search game, players act on a graph. The novelty of the model is that due to various events, at each stage, some edges may not be available, so the graph evolves randomly over time. Finally, the last chapter is devoted to a model of repeated games with incomplete information called dynamic control of information. An advisor has a private knowledge of the state of nature, which changes randomly over time. Every day, the advisor chooses the amount of information he discloses to an investor through messages. In turn, the investor chooses whether or not to invest in order to maximize her daily expected payoff. In the event of an investment, the advisor receives a fixed commission from the investor. His objective is then to maximize the discounted frequency of days on which investment takes place. We are interested in a specific information disclosure strategy of the advisor called the greedy strategy. It is a stationary strategy with the property of minimizing the amount of information disclosed under the constraint of maximizing the advisor’s current payoff

    Making the most of your day: online learning for optimal allocation of time

    Full text link
    We study online learning for optimal allocation when the resource to be allocated is time. %Examples of possible applications include job scheduling for a computing server, a driver filling a day with rides, a landlord renting an estate, etc. An agent receives task proposals sequentially according to a Poisson process and can either accept or reject a proposed task. If she accepts the proposal, she is busy for the duration of the task and obtains a reward that depends on the task duration. If she rejects it, she remains on hold until a new task proposal arrives. We study the regret incurred by the agent, first when she knows her reward function but does not know the distribution of the task duration, and then when she does not know her reward function, either. This natural setting bears similarities with contextual (one-armed) bandits, but with the crucial difference that the normalized reward associated to a context depends on the whole distribution of contexts.Comment: NeurIPS 2021 camera read

    Effect of Tocilizumab vs Usual Care in Adults Hospitalized With COVID-19 and Moderate or Severe Pneumonia

    No full text
    International audienceImportance Severe pneumonia with hyperinflammation and elevated interleukin-6 is a common presentation of coronavirus disease 2019 (COVID-19).Objective To determine whether tocilizumab (TCZ) improves outcomes of patients hospitalized with moderate-to-severe COVID-19 pneumonia.Design, Setting, and Particpants This cohort-embedded, investigator-initiated, multicenter, open-label, bayesian randomized clinical trial investigating patients with COVID-19 and moderate or severe pneumonia requiring at least 3 L/min of oxygen but without ventilation or admission to the intensive care unit was conducted between March 31, 2020, to April 18, 2020, with follow-up through 28 days. Patients were recruited from 9 university hospitals in France. Analyses were performed on an intention-to-treat basis with no correction for multiplicity for secondary outcomes.Interventions Patients were randomly assigned to receive TCZ, 8 mg/kg, intravenously plus usual care on day 1 and on day 3 if clinically indicated (TCZ group) or to receive usual care alone (UC group). Usual care included antibiotic agents, antiviral agents, corticosteroids, vasopressor support, and anticoagulants.Main Outcomes and Measures Primary outcomes were scores higher than 5 on the World Health Organization 10-point Clinical Progression Scale (WHO-CPS) on day 4 and survival without need of ventilation (including noninvasive ventilation) at day 14. Secondary outcomes were clinical status assessed with the WHO-CPS scores at day 7 and day 14, overall survival, time to discharge, time to oxygen supply independency, biological factors such as C-reactive protein level, and adverse events.Results Of 131 patients, 64 patients were randomly assigned to the TCZ group and 67 to UC group; 1 patient in the TCZ group withdrew consent and was not included in the analysis. Of the 130 patients, 42 were women (32%), and median (interquartile range) age was 64 (57.1-74.3) years. In the TCZ group, 12 patients had a WHO-CPS score greater than 5 at day 4 vs 19 in the UC group (median posterior absolute risk difference [ARD] −9.0%; 90% credible interval [CrI], −21.0 to 3.1), with a posterior probability of negative ARD of 89.0% not achieving the 95% predefined efficacy threshold. At day 14, 12% (95% CI −28% to 4%) fewer patients needed noninvasive ventilation (NIV) or mechanical ventilation (MV) or died in the TCZ group than in the UC group (24% vs 36%, median posterior hazard ratio [HR] 0.58; 90% CrI, 0.33-1.00), with a posterior probability of HR less than 1 of 95.0%, achieving the predefined efficacy threshold. The HR for MV or death was 0.58 (90% CrI, 0.30 to 1.09). At day 28, 7 patients had died in the TCZ group and 8 in the UC group (adjusted HR, 0.92; 95% CI 0.33-2.53). Serious adverse events occurred in 20 (32%) patients in the TCZ group and 29 (43%) in the UC group (P = .21).Conclusions and Relevance In this randomized clinical trial of patients with COVID-19 and pneumonia requiring oxygen support but not admitted to the intensive care unit, TCZ did not reduce WHO-CPS scores lower than 5 at day 4 but might have reduced the risk of NIV, MV, or death by day 14. No difference on day 28 mortality was found. Further studies are necessary for confirming these preliminary results.Trial Registration ClinicalTrials.gov Identifier: NCT0433180

    Effect of anakinra versus usual care in adults in hospital with COVID-19 and mild-to-moderate pneumonia (CORIMUNO-ANA-1): a randomised controlled trial

    No full text
    International audienc

    Sarilumab in adults hospitalised with moderate-to-severe COVID-19 pneumonia (CORIMUNO-SARI-1): An open-label randomised controlled trial

    No full text
    International audienc
    corecore