11 research outputs found
Search for an Immobile Hider on a Stochastic Network
Harry hides on an edge of a graph and does not move from there. Sally,
starting from a known origin, tries to find him as soon as she can. Harry's
goal is to be found as late as possible. At any given time, each edge of the
graph is either active or inactive, independently of the other edges, with a
known probability of being active. This situation can be modeled as a zero-sum
two-person stochastic game. We show that the game has a value and we provide
upper and lower bounds for this value. Finally, by generalizing optimal
strategies of the deterministic case, we provide more refined results for trees
and Eulerian graphs.Comment: 28 pages, 9 figure
Sur les jeux dynamiques : jeux stochastiques, recherche-dissimulation et transmission d'information
Dans cette thĂšse, nous Ă©tudions divers modĂšles de jeux dynamiques. Ceux-ci modĂ©lisent des processus de dĂ©cisions prises par des agents rationnels en interactions stratĂ©giques et dont la situation Ă©volue au cours du temps. Le premier chapitre est consacrĂ© aux jeux stochastiques. Dans ces derniers, le jeu courant dĂ©pend dâun Ă©tat de la nature, qui Ă©volue dâune Ă©tape Ă la suivante de maniĂšre alĂ©atoire en fonction de lâĂ©tat courant ainsi que des actions des joueurs, qui observent ces Ă©lĂ©ments. On Ă©tudie des propriĂ©tĂ©s de communication entre les Ă©tats, lorsque lâespace dâĂ©tats est sous la forme dâun produit X ĂY, et que les joueurs contrĂŽlent la dynamique sur leur composante de lâespace dâĂ©tats. On montre lâexistence de stratĂ©gies optimales dans tout jeu rĂ©pĂ©tĂ© un nombre suffisant dâĂ©tapes, câest-Ă -dire lâexistence de la valeur uniforme, sous hypothĂšse de communication forte dâun cĂŽtĂ©. On montre en revanche la non converge de la valeur du jeu escomptĂ©, qui implique la non existence de la valeur asymptotique, sous hypothĂšse de communication faible des deux cĂŽtĂ©s. Les deux chapitres suivants sont consacrĂ©s Ă des modĂšles de jeux de recherche-dissimulation. Un chercheur et un dissimulateur agissent sur un espace de recherche. Lâobjectif du chercheur est typiquement de retrouver le dissimulateur le plus rapidement possible, ou alors de maximiser la probabilitĂ© de le trouver en un temps imparti. Lâenjeu est alors de calculer la valeur et les stratĂ©gies optimales des joueurs en fonction de la gĂ©omĂ©trie de lâespace de recherche. Dans un jeu de patrouille, un attaquant choisit un temps et un lieu Ă attaquer, tandis quâun patrouilleur marche continĂ»ment. Lorsque lâattaque survient, le patrouilleur a un certain dĂ©lai pour repĂ©rer lâattaquant. Dans un jeu de recherche-dissimulation stochastique, les joueurs se trouvent sur un graphe. La nouveautĂ© du modĂšle est quâen raison de divers Ă©vĂšnements, Ă chaque Ă©tape, certaines arĂȘtes peuvent ne pas ĂȘtre disponibles, de sorte que le graphe Ă©volue de façon alĂ©atoire dans le temps. Enfin, le dernier chapitre est consacrĂ© Ă un modĂšle de jeux rĂ©pĂ©tĂ©s Ă information incomplĂšte dit de contrĂŽle dynamique de lâinformation. Un conseiller a une connaissance privĂ©e de lâĂ©tat de la nature, qui Ă©volue alĂ©atoirement avec le temps. Chaque jour le conseiller choisit la quantitĂ© dâinformation quâil dĂ©voile Ă un investisseur au travers de messages. Ă son tour, lâinvestisseur choisit dâinvestir ou non afin de maximiser son paiement quotidien espĂ©rĂ©. En cas dâinvestissement, le conseiller reçoit une commission fixe de la part de lâinvestisseur. Son objectif est alors de maximiser la frĂ©quence escomptĂ©e de jours oĂč a lieu lâinvestissement. On sâintĂ©resse Ă une stratĂ©gie de dĂ©voilement dâinformation particuliĂšre du conseiller dite stratĂ©gie gloutonne. Câest une stratĂ©gie stationnaire ayant la propriĂ©tĂ© de minimiser la quantitĂ© dâinformation dĂ©voilĂ©e sous contrainte de maximiser le paiement courant du conseiller.In this thesis, we study various models of dynamic games. These model decision-making processes taken by rational agents in strategic interactions and whose situation changes over time. The first chapter is devoted to stochastic games. In these, the current game depends on a state of nature, which evolves randomly from one stage to the next depending on the current state as well as the actions of the players, who observe these elements. We study communication properties between states, when the state space is in the form of a product X Ă Y, and players control the dynamics on their components of the state space. The existence of optimal strategies in any long enough repeated game, i.e., the existence of the uniform value, is proved under the assumption of strong communication on one side. We prove the non-convergence of the value of the discounted game, which implies the non-existence of the asymptotic value, under the assumption of weak communication on both sides. The next two chapters are devoted to models of search games. A searcher and a hider act on a search space. The searcherâs objective is typically to find the hider as quickly as possible, or to maximize the probability of finding him in a given time. The challenge is then to calculate the value and optimal strategies of the players according to the geometry of the search space. In a patrolling game, an attacker chooses a time and place to attack, while a patroller walks continuously. When the attack occurs, the patroller has a fixed amount of time to locate the attacker. In a stochastic search game, players act on a graph. The novelty of the model is that due to various events, at each stage, some edges may not be available, so the graph evolves randomly over time. Finally, the last chapter is devoted to a model of repeated games with incomplete information called dynamic control of information. An advisor has a private knowledge of the state of nature, which changes randomly over time. Every day, the advisor chooses the amount of information he discloses to an investor through messages. In turn, the investor chooses whether or not to invest in order to maximize her daily expected payoff. In the event of an investment, the advisor receives a fixed commission from the investor. His objective is then to maximize the discounted frequency of days on which investment takes place. We are interested in a specific information disclosure strategy of the advisor called the greedy strategy. It is a stationary strategy with the property of minimizing the amount of information disclosed under the constraint of maximizing the advisorâs current payoff
Continuous patrolling and hiding games
We present two zero-sum games modeling situations where one player attacks
(or hides in) a finite dimensional nonempty compact set, and the other tries to
prevent the attack (or find him). The first game, called patrolling game,
corresponds to a dynamic formulation of this situation in the sense that the
attacker chooses a time and a point to attack and the patroller chooses a
continuous trajectory to maximize the probability of finding the attack point
in a given time. Whereas the second game, called hiding game, corresponds to a
static formulation in which both the searcher and the hider choose
simultaneously a point and the searcher maximizes the probability of being at
distance less than a given threshold of the hider.Comment: 20 pages, 6 figure
On Dynamic Games : Stochastic Games, Search Games and Information Provision
Dans cette thĂšse, nous Ă©tudions divers modĂšles de jeux dynamiques. Ceux-ci modĂ©lisent des processus de dĂ©cisions prises par des agents rationnels en interactions stratĂ©giques et dont la situation Ă©volue au cours du temps. Le premier chapitre est consacrĂ© aux jeux stochastiques. Dans ces derniers, le jeu courant dĂ©pend dâun Ă©tat de la nature, qui Ă©volue dâune Ă©tape Ă la suivante de maniĂšre alĂ©atoire en fonction de lâĂ©tat courant ainsi que des actions des joueurs, qui observent ces Ă©lĂ©ments. On Ă©tudie des propriĂ©tĂ©s de communication entre les Ă©tats, lorsque lâespace dâĂ©tats est sous la forme dâun produit X ĂY, et que les joueurs contrĂŽlent la dynamique sur leur composante de lâespace dâĂ©tats. On montre lâexistence de stratĂ©gies optimales dans tout jeu rĂ©pĂ©tĂ© un nombre suffisant dâĂ©tapes, câest-Ă -dire lâexistence de la valeur uniforme, sous hypothĂšse de communication forte dâun cĂŽtĂ©. On montre en revanche la non converge de la valeur du jeu escomptĂ©, qui implique la non existence de la valeur asymptotique, sous hypothĂšse de communication faible des deux cĂŽtĂ©s. Les deux chapitres suivants sont consacrĂ©s Ă des modĂšles de jeux de recherche-dissimulation. Un chercheur et un dissimulateur agissent sur un espace de recherche. Lâobjectif du chercheur est typiquement de retrouver le dissimulateur le plus rapidement possible, ou alors de maximiser la probabilitĂ© de le trouver en un temps imparti. Lâenjeu est alors de calculer la valeur et les stratĂ©gies optimales des joueurs en fonction de la gĂ©omĂ©trie de lâespace de recherche. Dans un jeu de patrouille, un attaquant choisit un temps et un lieu Ă attaquer, tandis quâun patrouilleur marche continĂ»ment. Lorsque lâattaque survient, le patrouilleur a un certain dĂ©lai pour repĂ©rer lâattaquant. Dans un jeu de recherche-dissimulation stochastique, les joueurs se trouvent sur un graphe. La nouveautĂ© du modĂšle est quâen raison de divers Ă©vĂšnements, Ă chaque Ă©tape, certaines arĂȘtes peuvent ne pas ĂȘtre disponibles, de sorte que le graphe Ă©volue de façon alĂ©atoire dans le temps. Enfin, le dernier chapitre est consacrĂ© Ă un modĂšle de jeux rĂ©pĂ©tĂ©s Ă information incomplĂšte dit de contrĂŽle dynamique de lâinformation. Un conseiller a une connaissance privĂ©e de lâĂ©tat de la nature, qui Ă©volue alĂ©atoirement avec le temps. Chaque jour le conseiller choisit la quantitĂ© dâinformation quâil dĂ©voile Ă un investisseur au travers de messages. Ă son tour, lâinvestisseur choisit dâinvestir ou non afin de maximiser son paiement quotidien espĂ©rĂ©. En cas dâinvestissement, le conseiller reçoit une commission fixe de la part de lâinvestisseur. Son objectif est alors de maximiser la frĂ©quence escomptĂ©e de jours oĂč a lieu lâinvestissement. On sâintĂ©resse Ă une stratĂ©gie de dĂ©voilement dâinformation particuliĂšre du conseiller dite stratĂ©gie gloutonne. Câest une stratĂ©gie stationnaire ayant la propriĂ©tĂ© de minimiser la quantitĂ© dâinformation dĂ©voilĂ©e sous contrainte de maximiser le paiement courant du conseiller.In this thesis, we study various models of dynamic games. These model decision-making processes taken by rational agents in strategic interactions and whose situation changes over time. The first chapter is devoted to stochastic games. In these, the current game depends on a state of nature, which evolves randomly from one stage to the next depending on the current state as well as the actions of the players, who observe these elements. We study communication properties between states, when the state space is in the form of a product X Ă Y, and players control the dynamics on their components of the state space. The existence of optimal strategies in any long enough repeated game, i.e., the existence of the uniform value, is proved under the assumption of strong communication on one side. We prove the non-convergence of the value of the discounted game, which implies the non-existence of the asymptotic value, under the assumption of weak communication on both sides. The next two chapters are devoted to models of search games. A searcher and a hider act on a search space. The searcherâs objective is typically to find the hider as quickly as possible, or to maximize the probability of finding him in a given time. The challenge is then to calculate the value and optimal strategies of the players according to the geometry of the search space. In a patrolling game, an attacker chooses a time and place to attack, while a patroller walks continuously. When the attack occurs, the patroller has a fixed amount of time to locate the attacker. In a stochastic search game, players act on a graph. The novelty of the model is that due to various events, at each stage, some edges may not be available, so the graph evolves randomly over time. Finally, the last chapter is devoted to a model of repeated games with incomplete information called dynamic control of information. An advisor has a private knowledge of the state of nature, which changes randomly over time. Every day, the advisor chooses the amount of information he discloses to an investor through messages. In turn, the investor chooses whether or not to invest in order to maximize her daily expected payoff. In the event of an investment, the advisor receives a fixed commission from the investor. His objective is then to maximize the discounted frequency of days on which investment takes place. We are interested in a specific information disclosure strategy of the advisor called the greedy strategy. It is a stationary strategy with the property of minimizing the amount of information disclosed under the constraint of maximizing the advisorâs current payoff
Making the most of your day: online learning for optimal allocation of time
We study online learning for optimal allocation when the resource to be
allocated is time. %Examples of possible applications include job scheduling
for a computing server, a driver filling a day with rides, a landlord renting
an estate, etc. An agent receives task proposals sequentially according to a
Poisson process and can either accept or reject a proposed task. If she accepts
the proposal, she is busy for the duration of the task and obtains a reward
that depends on the task duration. If she rejects it, she remains on hold until
a new task proposal arrives. We study the regret incurred by the agent, first
when she knows her reward function but does not know the distribution of the
task duration, and then when she does not know her reward function, either.
This natural setting bears similarities with contextual (one-armed) bandits,
but with the crucial difference that the normalized reward associated to a
context depends on the whole distribution of contexts.Comment: NeurIPS 2021 camera read
Effect of Tocilizumab vs Usual Care in Adults Hospitalized With COVID-19 and Moderate or Severe Pneumonia
International audienceImportance Severe pneumonia with hyperinflammation and elevated interleukin-6 is a common presentation of coronavirus disease 2019 (COVID-19).Objective To determine whether tocilizumab (TCZ) improves outcomes of patients hospitalized with moderate-to-severe COVID-19 pneumonia.Design, Setting, and Particpants This cohort-embedded, investigator-initiated, multicenter, open-label, bayesian randomized clinical trial investigating patients with COVID-19 and moderate or severe pneumonia requiring at least 3 L/min of oxygen but without ventilation or admission to the intensive care unit was conducted between March 31, 2020, to April 18, 2020, with follow-up through 28 days. Patients were recruited from 9 university hospitals in France. Analyses were performed on an intention-to-treat basis with no correction for multiplicity for secondary outcomes.Interventions Patients were randomly assigned to receive TCZ, 8 mg/kg, intravenously plus usual care on day 1 and on day 3 if clinically indicated (TCZ group) or to receive usual care alone (UC group). Usual care included antibiotic agents, antiviral agents, corticosteroids, vasopressor support, and anticoagulants.Main Outcomes and Measures Primary outcomes were scores higher than 5 on the World Health Organization 10-point Clinical Progression Scale (WHO-CPS) on day 4 and survival without need of ventilation (including noninvasive ventilation) at day 14. Secondary outcomes were clinical status assessed with the WHO-CPS scores at day 7 and day 14, overall survival, time to discharge, time to oxygen supply independency, biological factors such as C-reactive protein level, and adverse events.Results Of 131 patients, 64 patients were randomly assigned to the TCZ group and 67 to UC group; 1 patient in the TCZ group withdrew consent and was not included in the analysis. Of the 130 patients, 42 were women (32%), and median (interquartile range) age was 64 (57.1-74.3) years. In the TCZ group, 12 patients had a WHO-CPS score greater than 5 at day 4 vs 19 in the UC group (median posterior absolute risk difference [ARD] â9.0%; 90% credible interval [CrI], â21.0 to 3.1), with a posterior probability of negative ARD of 89.0% not achieving the 95% predefined efficacy threshold. At day 14, 12% (95% CI â28% to 4%) fewer patients needed noninvasive ventilation (NIV) or mechanical ventilation (MV) or died in the TCZ group than in the UC group (24% vs 36%, median posterior hazard ratio [HR] 0.58; 90% CrI, 0.33-1.00), with a posterior probability of HR less than 1 of 95.0%, achieving the predefined efficacy threshold. The HR for MV or death was 0.58 (90% CrI, 0.30 to 1.09). At day 28, 7 patients had died in the TCZ group and 8 in the UC group (adjusted HR, 0.92; 95% CI 0.33-2.53). Serious adverse events occurred in 20 (32%) patients in the TCZ group and 29 (43%) in the UC group (Pâ=â.21).Conclusions and Relevance In this randomized clinical trial of patients with COVID-19 and pneumonia requiring oxygen support but not admitted to the intensive care unit, TCZ did not reduce WHO-CPS scores lower than 5 at day 4 but might have reduced the risk of NIV, MV, or death by day 14. No difference on day 28 mortality was found. Further studies are necessary for confirming these preliminary results.Trial Registration ClinicalTrials.gov Identifier: NCT0433180
Effect of anakinra versus usual care in adults in hospital with COVID-19 and mild-to-moderate pneumonia (CORIMUNO-ANA-1): a randomised controlled trial
International audienc
Sarilumab in adults hospitalised with moderate-to-severe COVID-19 pneumonia (CORIMUNO-SARI-1): An open-label randomised controlled trial
International audienc