37 research outputs found

    Planning Against Fictitious Players in Repeated Normal Form Games

    No full text
    Planning how to interact against bounded memory and unbounded memory learning opponents needs different treatment. Thus far, however, work in this area has shown how to design plans against bounded memory learning opponents, but no work has dealt with the unbounded memory case. This paper tackles this gap. In particular, we frame this as a planning problem using the framework of repeated matrix games, where the planner's objective is to compute the best exploiting sequence of actions against a learning opponent. The particular class of opponent we study uses a fictitious play process to update her beliefs, but the analysis generalizes to many forms of Bayesian learning agents. Our analysis is inspired by Banerjee and Peng's AIM framework, which works for planning and learning against bounded memory opponents (e.g an adaptive player). Building on this, we show how an unbounded memory opponent (specifically a fictitious player) can also be modelled as a finite MDP and present a new efficient algorithm that can find a way to exploit the opponent by computing in polynomial time a sequence of play that can obtain a higher average reward than those obtained by playing a game theoretic (Nash or correlated) equilibrium

    Differential evolution strategies for large-scale energy resource management in smart grids

    Get PDF
    Smart Grid (SG) technologies are leading the modifications of power grids worldwide. The Energy Resource Management (ERM) in SGs is a highly complex problem that needs to be efficiently addressed to maximize incomes while minimizing operational costs. Due to the nature of the problem, which includes mixed-integer variables and non-linear constraints, Evolutionary Algorithms (EA) are considered a good tool to find optimal and near-optimal solutions to large-scale problems. In this paper, we analyze the application of Differential Evolution (DE) to solve the large-scale ERM problem in SGs through extensive experimentation on a case study using a 33-Bus power network with high penetration of Distributed Energy Resources (DER) and Electric Vehicles (EVs), as well as advanced features such as energy stock exchanges and Demand Response (DR) programs. We analyze the impact of DE parameter seing on four state-of-the art DE strategies. Moreover, DE strategies are compared with other well-known EAs and a deterministic approach based on MINLP. Results suggest that, even when DE strategies are very sensitive to the seing of their parameters, they can find beer solutions than other EAs, and near-optimal solutions in acceptable times compared with a MINLP approach.The present work was done and funded in the scope of the projects: Sustainability Fund CONACYT-SENER by Consejo Nacional de Ciencia y Tecnología (CONACYT) and the National Center of Innovation in Energy (CEMIE-Eolico); H2020 DREAM-GO Project (Marie Sklodowska-Curie grant agreement No 641794) and UID/EEA/00760/2013 funded by FEDER Funds through COMPETE program and by National Funds through FCTinfo:eu-repo/semantics/publishedVersio

    Evolutionary framework for multi-dimensional signaling method applied to energy dispatch problems in smart grids

    Get PDF
    In the smart grid (SG) era, the energy resource management (ERM) in power systems is facing an increase in complexity, mainly due to the high penetration of distributed resources, such as renewable energy and electric vehicles (EVs). Therefore, advanced control techniques and sophisticated planning tools are required to take advantage of the benefits that SG technologies can provide. In this paper, we introduce a new approach called multi-dimensional signaling evolutionary algorithm (MDS-EA) to solve the large-scale ERM problem in SGs. The proposed method uses the general framework from evolutionary algorithms (EAs), combined with a previously proposed rule-based mechanism called multi-dimensional signaling (MDS). In this way, the proposed MDS-EA evolves a population of solutions by modifying variables of interest identified during the evaluation process. Results show that the proposed method can reduce the complexity of metaheuristics implementation while achieving competitive solutions compared with EAs and deterministic approaches in acceptable times.The present work was done and funded in the scope of the projects: Project NetEffiCity (ANI—P2020 18015), and from FEDER Funds through COMPETE program and from National Funds through FCT under the project UID/EEA/00760/2013; Sustainability Fund CONACYT-SENER by Consejo Nacional de Ciencia y Tecnolog´ıa (CONACYT) and the National Center of Innovation in Energy (CEMIE-Eolico, Project No. 206842).info:eu-repo/semantics/publishedVersio

    An exploration strategy for non-stationary opponents

    Get PDF
    The success or failure of any learning algorithm is partially due to the exploration strategy it exerts. However, most exploration strategies assume that the environment is stationary and non-strategic. In this work we shed light on how to design exploration strategies in non-stationary and adversarial environments. Our proposed adversarial drift exploration (DE) is able to efficiently explore the state space while keeping track of regions of the environment that have changed. This proposed exploration is general enough to be applied in single agent non-stationary environments as well as in multiagent settings where the opponent changes its strategy in time. We use a two agent strategic interaction setting to test this new type of exploration, where the opponent switches between different behavioral patterns to emulate a non-deterministic, stochastic and adversarial environment. The agent’s objective is to learn a model of the opponent’s strategy to act optimally. Our contribution is twofold. First, we present DE as a strategy for switch detection. Second, we propose a new algorithm called R-max# for learning and planning against non-stationary opponent. To handle such opponents, R-max# reasons and acts in terms of two objectives: (1) to maximize utilities in the short term while learning and (2) eventually explore opponent behavioral changes. We provide theoretical results showing that R-max# is guaranteed to detect the opponent’s switch and learn a new model in terms of finite sample complexity. R-max# makes efficient use of exploration experiences, which results in rapid adaptation and efficient DE, to deal with the non-stationary nature of the opponent. We show experimentally how using DE outperforms the state of the art algorithms that were explicitly designed for modeling opponents (in terms average rewards) in two complimentary domains
    corecore