555 research outputs found

    Generalised Entropy MDPs and Minimax Regret

    Full text link
    Bayesian methods suffer from the problem of how to specify prior beliefs. One interesting idea is to consider worst-case priors. This requires solving a stochastic zero-sum game. In this paper, we extend well-known results from bandit theory in order to discover minimax-Bayes policies and discuss when they are practical.Comment: 7 pages, NIPS workshop "From bad models to good policies

    Minimax regret and strategic uncertainty

    Get PDF
    This paper introduces a new solution concept, a minimax regret equilibrium, which allows for the possibility that players are uncertain about the rationality and conjectures of their opponents. We provide several applications of our concept. In particular, we consider pricesetting environments and show that optimal pricing policy follows a non-degenerate distribution. The induced price dispersion is consistent with experimental and empirical observations (Baye and Morgan (2004)).minimax regret; rationality; conjectures; price dispersion; auction

    Minimax regret and strategic uncertainty

    Get PDF
    This paper introduces a new solution concept, a minimax regret equilibrium, which allows for the possibility that players are uncertain about the rationality and conjectures of their opponents. We provide several applications of our concept. In particular, we consider pricesetting environments and show that optimal pricing policy follows a non-degenerate distribution. The induced price dispersion is consistent with experimental and empirical observations (Baye and Morgan (2004)).Minimax regret, rationality, conjectures, price dispersion, auction

    On the Saddle-point Solution and the Large-Coalition Asymptotics of Fingerprinting Games

    Full text link
    We study a fingerprinting game in which the number of colluders and the collusion channel are unknown. The encoder embeds fingerprints into a host sequence and provides the decoder with the capability to trace back pirated copies to the colluders. Fingerprinting capacity has recently been derived as the limit value of a sequence of maximin games with mutual information as their payoff functions. However, these games generally do not admit saddle-point solutions and are very hard to solve numerically. Here under the so-called Boneh-Shaw marking assumption, we reformulate the capacity as the value of a single two-person zero-sum game, and show that it is achieved by a saddle-point solution. If the maximal coalition size is k and the fingerprinting alphabet is binary, we show that capacity decays quadratically with k. Furthermore, we prove rigorously that the asymptotic capacity is 1/(k^2 2ln2) and we confirm our earlier conjecture that Tardos' choice of the arcsine distribution asymptotically maximizes the mutual information payoff function while the interleaving attack minimizes it. Along with the asymptotic behavior, numerical solutions to the game for small k are also presented.Comment: submitted to IEEE Trans. on Information Forensics and Securit

    Towards Optimal Algorithms For Online Decision Making Under Practical Constraints

    Get PDF
    Artificial Intelligence is increasingly being used in real-life applications such as driving with autonomous cars; deliveries with autonomous drones; customer support with chat-bots; personal assistant with smart speakers . . . An Artificial Intelligent agent (AI) can be trained to become expert at a task through a system of rewards and punishment, also well known as Reinforcement Learning (RL). However, since the AI will deal with human beings, it also has to follow some moral rules to accomplish any task. For example, the AI should be fair to the other agents and not destroy the environment. Moreover, the AI should not leak the privacy of usersā€™ data it processes. Those rules represent significant challenges in designing AI that we tackle in this thesis through mathematically rigorous solutions.More precisely, we start by considering the basic RL problem modeled as a discrete Markov Decision Process. We propose three simple algorithms (UCRL-V, BUCRL and TSUCRL) using two different paradigms: Frequentist (UCRL-V) and Bayesian (BUCRL and TSUCRL). Through a unified theoretical analysis, we show that our three algorithms are near-optimal. Experiments performed confirm the superiority of our methods compared to existing techniques. Afterwards, we address the issue of fairness in the stateless version of reinforcement learning also known as multi-armed bandit. To concentrate our effort on the key challenges, we focus on two-agents multi-armed bandit. We propose a novel objective that has been shown to be connected to fairness and justice. We derive an algorithm UCRG to solve this novel objective and show theoretically its near-optimality. Next, we tackle the issue of privacy by using the recently introduced notion of Differential Privacy. We design multi-armed bandit algorithms that preserve differential-privacy. Theoretical analyses show that for the same level of privacy, our newly developed algorithms achieve better performance than existing techniques
    • ā€¦
    corecore