26 research outputs found

    Probabilistic inverse reinforcement learning in unknown environments

    Full text link
    We consider the problem of learning by demonstration from agents acting in unknown stochastic Markov environments or games. Our aim is to estimate agent preferences in order to construct improved policies for the same task that the agents are trying to solve. To do so, we extend previous probabilistic approaches for inverse reinforcement learning in known MDPs to the case of unknown dynamics or opponents. We do this by deriving two simplified probabilistic models of the demonstrator's policy and utility. For tractability, we use maximum a posteriori estimation rather than full Bayesian inference. Under a flat prior, this results in a convex optimisation problem. We find that the resulting algorithms are highly competitive against a variety of other methods for inverse reinforcement learning that do have knowledge of the dynamics.Comment: Appears in Proceedings of the Twenty-Ninth Conference on Uncertainty in Artificial Intelligence (UAI2013

    Algorithms for Differentially Private Multi-Armed Bandits

    Get PDF
    We present differentially private algorithms for the stochastic Multi-Armed Bandit (MAB) problem. This is a problem for applications such as adaptive clinical trials, experiment design, and user-targeted advertising where private information is connected to individual rewards. Our major contribution is to show that there exist (Ï”,ÎŽ)(\epsilon, \delta) differentially private variants of Upper Confidence Bound algorithms which have optimal regret, O(ϔ−1+log⁥T)O(\epsilon^{-1} + \log T). This is a significant improvement over previous results, which only achieve poly-log regret O(ϔ−2log⁥2T)O(\epsilon^{-2} \log^{2} T), because of our use of a novel interval-based mechanism. We also substantially improve the bounds of previous family of algorithms which use a continual release mechanism. Experiments clearly validate our theoretical bounds

    Towards Optimal Algorithms For Online Decision Making Under Practical Constraints

    Get PDF
    Artificial Intelligence is increasingly being used in real-life applications such as driving with autonomous cars; deliveries with autonomous drones; customer support with chat-bots; personal assistant with smart speakers . . . An Artificial Intelligent agent (AI) can be trained to become expert at a task through a system of rewards and punishment, also well known as Reinforcement Learning (RL). However, since the AI will deal with human beings, it also has to follow some moral rules to accomplish any task. For example, the AI should be fair to the other agents and not destroy the environment. Moreover, the AI should not leak the privacy of users’ data it processes. Those rules represent significant challenges in designing AI that we tackle in this thesis through mathematically rigorous solutions.More precisely, we start by considering the basic RL problem modeled as a discrete Markov Decision Process. We propose three simple algorithms (UCRL-V, BUCRL and TSUCRL) using two different paradigms: Frequentist (UCRL-V) and Bayesian (BUCRL and TSUCRL). Through a unified theoretical analysis, we show that our three algorithms are near-optimal. Experiments performed confirm the superiority of our methods compared to existing techniques. Afterwards, we address the issue of fairness in the stateless version of reinforcement learning also known as multi-armed bandit. To concentrate our effort on the key challenges, we focus on two-agents multi-armed bandit. We propose a novel objective that has been shown to be connected to fairness and justice. We derive an algorithm UCRG to solve this novel objective and show theoretically its near-optimality. Next, we tackle the issue of privacy by using the recently introduced notion of Differential Privacy. We design multi-armed bandit algorithms that preserve differential-privacy. Theoretical analyses show that for the same level of privacy, our newly developed algorithms achieve better performance than existing techniques

    CaractĂ©ristiques techniques et importance socio-Ă©conomique de l’apiculture au Nord-Ouest du BĂ©nin : cas de la commune de Cobly

    Get PDF
    Au BĂ©nin, la production du miel constitue une source potentielle non nĂ©gligeable de revenu monĂ©taire pour la population rurale. Une enquĂȘte a Ă©tĂ© conduite au Nord-Ouest du BĂ©nin auprĂšs de 35 apiculteurs pour Ă©valuer les caractĂ©ristiques techniques et l’importance socio-Ă©conomique de l’apiculture. Les apiculteurs enquĂȘtĂ©s ont un Ăąge compris entre 20 et 79 ans. La plupart des apiculteurs interviewĂ©s (74,29%) pratiquaient la chasse au miel avant d’ĂȘtre formĂ©s pour l’apiculture moderne. Les types de ruches connus sont la ruche kenyane qui est utilisĂ©e exclusivement par 68,57% des apiculteurs et la ruche traditionnelle utilisĂ©e seulement par 8,57%. Le nombre de ruches colonisĂ©es par apiculteur ou groupement varie de 3 Ă  46. La production annuelle de miel est en moyenne de 10,55 ± 3,56 litres par ruche et de 148,57 ± 77,01 litres par apiculteur ou groupement. Le prix de vente du miel est compris entre 1200 et 2000 F CFA par litre. La recette annuelle brute par apiculteur ou groupement varie de 9000 Ă  580000 F CFA. Le miel est utilisĂ© dans le traitement de 28 maladies dont la brĂ»lure et la toux sont les plus citĂ©es.Mots clĂ©s: Miel, techniques apicoles, revenu monĂ©taire, usages, BĂ©nin
    corecore