6 research outputs found

    SOaN : un algorithme pour la coordination d'agents apprenants et non communicants.

    No full text
    National audienceL'apprentissage par renforcement dans les systèmes multi-agents est un domaine de recherche très actif, comme en témoignent les états de l'art récents [Busoniu et al., 2008, Sandholm, 2007, Bab & Brafman, 2008, Vlassis, 2007]. Lauer et Riedmiller ont notamment montré que, sous certaines hypothèses, il est possible à des agents apprenants simultanément de coordonner leurs actions sans aucune communication et sans qu'ils perçoivent les actions de leurs congénères [Lauer & Riedmiller, 2000]. Cette propriété est particulièrement intéressante pour trouver des stratégies de coopération dans les systèmes multi-agents de grande taille

    The world of Independent learners is not Markovian.

    No full text
    International audienceIn multi-agent systems, the presence of learning agents can cause the environment to be non-Markovian from an agent's perspective thus violat- ing the property that traditional single-agent learning methods rely upon. This paper formalizes some known intuition about concurrently learning agents by providing formal conditions that make the environment non- Markovian from an independent (non-communicative) learner's perspec- tive. New concepts are introduced like the divergent learning paths and the observability of the e ects of others' actions. To illustrate the formal concepts, a case study is also presented. These ndings are signi cant because they both help to understand failures and successes of existing learning algorithms as well as being suggestive for future work

    A learning approach for nonlinear pricing problem

    Get PDF
    Quantity discounts are frequent both in everyday life and in business. Take, for example, product pricing, gas and electricity pricing, transportation and postage pricing, telecommunications, cable TV and Internet access pricing. These are all examples of nonlinear pricing, where the selling firm designs differentiated products and prices them according to the firm's marketing strategy. Nonlinear pricing is also a general model of incomplete information and it has a plenty of applications, such as regulation, taxation and designing labor contracts. This Dissertation develops a new learning approach for the nonlinear pricing problem, where the selling firm has limited information about the buyers' preferences. The main contributions are i) to show how the firm can learn what kind of products should be put up for sale, and what information the firm needs to do this, ii) to introduce a new approach in modeling incomplete information using optimality conditions, iii) to analyze mathematically the general pricing problem with many buyer types and multiple quality dimensions, and iv) to examine the computational issues of solving the pricing problem. The learning method is based on selling the product repeatedly. The firm sets linear tariffs, from which the buyers select the product they wish to consume. This reveals the buyers' marginal valuations, which is exactly the information that is needed to evaluate the optimality conditions. By evaluating the different optimality conditions, the firm learns the buyers who get the same product at the optimum and the buyers who are excluded. Different learning paths are examined in terms of profit, learning time and the buyers' preferences

    The Optimal Strategy against Hedge Algorithm in Repeated Games

    Full text link
    This paper aims to solve the optimal strategy against a well-known adaptive algorithm, the Hedge algorithm, in a finitely repeated 2Ă—22\times 2 zero-sum game. In the literature, related theoretical results are very rare. To this end, we make the evolution analysis for the resulting dynamical game system and build the action recurrence relation based on the Bellman optimality equation. First, we define the state and the State Transition Triangle Graph (STTG); then, we prove that the game system will behave in a periodic-like way when the opponent adopts the myopic best response. Further, based on the myopic path and the recurrence relation between the optimal actions at time-adjacent states, we can solve the optimal strategy of the opponent, which is proved to be periodic on the time interval truncated by a tiny segment and has the same period as the myopic path. Results in this paper are rigorous and inspiring, and the method might help solve the optimal strategy for general games and general algorithms

    Opponent Modelling in Multi-Agent Systems

    Get PDF
    Reinforcement Learning (RL) formalises a problem where an intelligent agent needs to learn and achieve certain goals by maximising a long-term return in an environment. Multi-agent reinforcement learning (MARL) extends traditional RL to multiple agents. Many RL algorithms lose convergence guarantee in non-stationary environments due to the adaptive opponents. Partial observation caused by agents’ different private observations introduces high variance during the training which exacerbates the data inefficiency. In MARL, training an agent to perform well against a set of opponents often leads to bad performance against another set of opponents. Non-stationarity, partial observation and unclear learning objective are three critical problems in MARL which hinder agents’ learning and they all share a cause which is the lack of knowledge of the other agents. Therefore, in this thesis, we propose to solve these problems with opponent modelling methods. We tailor our solutions by combining opponent modelling with other techniques according to the characteristics of problems we face. Specifically, we first propose ROMMEO, an algorithm inspired by Bayesian inference, as a solution to alleviate the non-stationarity in cooperative games. Then we study the partial observation problem caused by agents’ private observation and design an implicit communication training method named PBL. Lastly, we investigate solutions to the non-stationarity and unclear learning objective problems in zero-sum games. We propose a solution named EPSOM which aims for finding safe exploitation strategies to play against non-stationary opponents. We verify our proposed methods by varied experiments and show they can achieve the desired performance. Limitations and future works are discussed in the last chapter of this thesis
    corecore