4,827 research outputs found

    Social Optimum Equilibrium Selection for Distributed Multi-Agent Optimization

    Full text link
    We study the open question of how players learn to play a social optimum pure-strategy Nash equilibrium (PSNE) through repeated interactions in general-sum coordination games. A social optimum of a game is the stable Pareto-optimal state that provides a maximum return in the sum of all players' payoffs (social welfare) and always exists. We consider finite repeated games where each player only has access to its own utility (or payoff) function but is able to exchange information with other players. We develop a novel regret matching (RM) based algorithm for computing an efficient PSNE solution that could approach a desired Pareto-optimal outcome yielding the highest social welfare among all the attainable equilibria in the long run. Our proposed learning procedure follows the regret minimization framework but extends it in three major ways: (1) agents use global, instead of local, utility for calculating regrets, (2) each agent maintains a small and diminishing exploration probability in order to explore various PSNEs, and (3) agents stay with the actions that achieve the best global utility thus far, regardless of regrets. We prove that these three extensions enable the algorithm to select the stable social optimum equilibrium instead of converging to an arbitrary or cyclic equilibrium as in the conventional RM approach. We demonstrate the effectiveness of our approach through a set of applications in multi-agent distributed control, including a large-scale resource allocation game and a hard combinatorial task assignment problem for which no efficient (polynomial) solution exists.Comment: Appears at the 5th Games, Agents, and Incentives Workshop (GAIW 2023). Held as part of the Workshops at the AAMAS 2023 Conferenc

    Sustainable Cooperative Coevolution with a Multi-Armed Bandit

    Get PDF
    This paper proposes a self-adaptation mechanism to manage the resources allocated to the different species comprising a cooperative coevolutionary algorithm. The proposed approach relies on a dynamic extension to the well-known multi-armed bandit framework. At each iteration, the dynamic multi-armed bandit makes a decision on which species to evolve for a generation, using the history of progress made by the different species to guide the decisions. We show experimentally, on a benchmark and a real-world problem, that evolving the different populations at different paces allows not only to identify solutions more rapidly, but also improves the capacity of cooperative coevolution to solve more complex problems.Comment: Accepted at GECCO 201

    A Parameterisation of Algorithms for Distributed Constraint Optimisation via Potential Games

    No full text
    This paper introduces a parameterisation of learning algorithms for distributed constraint optimisation problems (DCOPs). This parameterisation encompasses many algorithms developed in both the computer science and game theory literatures. It is built on our insight that when formulated as noncooperative games, DCOPs form a subset of the class of potential games. This result allows us to prove convergence properties of algorithms developed in the computer science literature using game theoretic methods. Furthermore, our parameterisation can assist system designers by making the pros and cons of, and the synergies between, the various DCOP algorithm components clear

    Distributed Channel and Power Level Selection in VANET Based on SINR using Game Model

    Get PDF
    This paper proposes a scheme of channel selection and transmission power adjustment in Vehicular Ad hoc Network (VANET) using game theoretic approach. The paradigm of VANET enables groups of vehicles to establish a mesh-like communication network. However, the mobility of vehicle, highly dynamic network environment, and the shared-spectrum concept used in VANET pose some challenges such as interference that can decrease the quality of signal. Channel selection and transmit power adjustment are aimed to obtain the higher signal to interference and noise ratio (SINR). In this paper, game theory is implemented to model the channel and power level selection in VANET. Each vehicle represents the player and the combination of channel and power level represents the strategy used by the player to obtain the utility i.e. the SINR. Strategy selection is arranged distributively to each player using Regret Matching Learning (RML) algorithm. Each vehicle evaluates current utility obtained by selecting a strategy to define the probability of that strategy to be selected in the next time. However, RML has a shortcoming for using assumption that hard to be implemented in real VANET environment. Therefore modification of RML devised for this application is also proposed. The simulation model of channel and power level selection is build to evaluate the performance of the proposed scheme. The results of simulation display the improvement of VANET performance in term of SINR and throughput from the proposed scheme

    DR9.3 Final report of the JRRM and ASM activities

    Get PDF
    Deliverable del projecte europeu NEWCOM++This deliverable provides the final report with the summary of the activities carried out in NEWCOM++ WPR9, with a particular focus on those obtained during the last year. They address on the one hand RRM and JRRM strategies in heterogeneous scenarios and, on the other hand, spectrum management and opportunistic spectrum access to achieve an efficient spectrum usage. Main outcomes of the workpackage as well as integration indicators are also summarised.Postprint (published version

    Q-CP: Learning Action Values for Cooperative Planning

    Get PDF
    Research on multi-robot systems has demonstrated promising results in manifold applications and domains. Still, efficiently learning an effective robot behaviors is very difficult, due to unstructured scenarios, high uncertainties, and large state dimensionality (e.g. hyper-redundant and groups of robot). To alleviate this problem, we present Q-CP a cooperative model-based reinforcement learning algorithm, which exploits action values to both (1) guide the exploration of the state space and (2) generate effective policies. Specifically, we exploit Q-learning to attack the curse-of-dimensionality in the iterations of a Monte-Carlo Tree Search. We implement and evaluate Q-CP on different stochastic cooperative (general-sum) games: (1) a simple cooperative navigation problem among 3 robots, (2) a cooperation scenario between a pair of KUKA YouBots performing hand-overs, and (3) a coordination task between two mobile robots entering a door. The obtained results show the effectiveness of Q-CP in the chosen applications, where action values drive the exploration and reduce the computational demand of the planning process while achieving good performance

    Peer-to-Peer Energy Trading in Smart Residential Environment with User Behavioral Modeling

    Get PDF
    Electric power systems are transforming from a centralized unidirectional market to a decentralized open market. With this shift, the end-users have the possibility to actively participate in local energy exchanges, with or without the involvement of the main grid. Rapidly reducing prices for Renewable Energy Technologies (RETs), supported by their ease of installation and operation, with the facilitation of Electric Vehicles (EV) and Smart Grid (SG) technologies to make bidirectional flow of energy possible, has contributed to this changing landscape in the distribution side of the traditional power grid. Trading energy among users in a decentralized fashion has been referred to as Peer- to-Peer (P2P) Energy Trading, which has attracted significant attention from the research and industry communities in recent times. However, previous research has mostly focused on engineering aspects of P2P energy trading systems, often neglecting the central role of users in such systems. P2P trading mechanisms require active participation from users to decide factors such as selling prices, storing versus trading energy, and selection of energy sources among others. The complexity of these tasks, paired with the limited cognitive and time capabilities of human users, can result sub-optimal decisions or even abandonment of such systems if performance is not satisfactory. Therefore, it is of paramount importance for P2P energy trading systems to incorporate user behavioral modeling that captures users’ individual trading behaviors, preferences, and perceived utility in a realistic and accurate manner. Often, such user behavioral models are not known a priori in real-world settings, and therefore need to be learned online as the P2P system is operating. In this thesis, we design novel algorithms for P2P energy trading. By exploiting a variety of statistical, algorithmic, machine learning, and behavioral economics tools, we propose solutions that are able to jointly optimize the system performance while taking into account and learning realistic model of user behavior. The results in this dissertation has been published in IEEE Transactions on Green Communications and Networking 2021, Proceedings of IEEE Global Communication Conference 2022, Proceedings of IEEE Conference on Pervasive Computing and Communications 2023 and ACM Transactions on Evolutionary Learning and Optimization 2023
    corecore