132 research outputs found

    An Exchange Mechanism to Coordinate Flexibility in Residential Energy Cooperatives

    Full text link
    Energy cooperatives (ECs) such as residential and industrial microgrids have the potential to mitigate increasing fluctuations in renewable electricity generation, but only if their joint response is coordinated. However, the coordination and control of independently operated flexible resources (e.g., storage, demand response) imposes critical challenges arising from the heterogeneity of the resources, conflict of interests, and impact on the grid. Correspondingly, overcoming these challenges with a general and fair yet efficient exchange mechanism that coordinates these distributed resources will accommodate renewable fluctuations on a local level, thereby supporting the energy transition. In this paper, we introduce such an exchange mechanism. It incorporates a payment structure that encourages prosumers to participate in the exchange by increasing their utility above baseline alternatives. The allocation from the proposed mechanism increases the system efficiency (utilitarian social welfare) and distributes profits more fairly (measured by Nash social welfare) than individual flexibility activation. A case study analyzing the mechanism performance and resulting payments in numerical experiments over real demand and generation profiles of the Pecan Street dataset elucidates the efficacy to promote cooperation between co-located flexibilities in residential cooperatives through local exchange.Comment: Accepted in IEEE ICIT 201

    A Survey and Critique of Multiagent Deep Reinforcement Learning

    Full text link
    Deep reinforcement learning (RL) has achieved outstanding results in recent years. This has led to a dramatic increase in the number of applications and methods. Recent works have explored learning beyond single-agent scenarios and have considered multiagent learning (MAL) scenarios. Initial results report successes in complex multiagent domains, although there are several challenges to be addressed. The primary goal of this article is to provide a clear overview of current multiagent deep reinforcement learning (MDRL) literature. Additionally, we complement the overview with a broader analysis: (i) we revisit previous key components, originally presented in MAL and RL, and highlight how they have been adapted to multiagent deep reinforcement learning settings. (ii) We provide general guidelines to new practitioners in the area: describing lessons learned from MDRL works, pointing to recent benchmarks, and outlining open avenues of research. (iii) We take a more critical tone raising practical challenges of MDRL (e.g., implementation and computational demands). We expect this article will help unify and motivate future research to take advantage of the abundant literature that exists (e.g., RL and MAL) in a joint effort to promote fruitful research in the multiagent community.Comment: Under review since Oct 2018. Earlier versions of this work had the title: "Is multiagent deep reinforcement learning the answer or the question? A brief survey

    Strategic interactions against non-stationary agents

    Get PDF
    Designing an agent that is capable of interacting with another agent is an open problem. An interaction happen when two or more agents perform an action in an environment and they obtain an utility based on the performed joint action.Current multiagent learning techniques do not fare well with agents that change their behavior during a repeated interaction. This happens because they usually do not model the other agents’ behavior and instead make some assumptions that for real scenarios are too restrictive. Furthermore, considering that many applications demand different types of agents to work together this should be an important problem to solve. It does not matter if the domain is cooperative (where agents have a common goal) or competitive (where objectives are different), there is one common aspect: agents must learn how their counterpart is acting and react quickly to changes in behavior

    Algoritmo de aprendizaje para redes bayesianas de nodos temporales

    Get PDF
    Bayesian networks have become the reference model to deal with uncertainty due to its easy understanding and different inference and learning algorithms. However, Bayesian networks can not deal with temporal information. The model known as Temporal Nodes Bayesian Networks (TNBN) is an extension that combines uncertainty reasoning with temporal information, but it has not been used extensively due to a lack of learning algorithms for this type of networks. In this thesis we propose a learning algorithm for Temporal Nodes Bayesian Networks that obtains the structure, the intervals and the associated parameters. The algorithm has three main steps: an initial discretization of the temporal nodes, learning of an initial structure and a refinement of the intervals using the structure information. The intervals’ learning algorithm uses a clustering technique to obtain the temporal intervals. The algorithm was evaluated with synthetic data of three TNBNs of different sizes with two distributions to generate the temporal data. In the experiments the algorithm obtained better scores than the baselines, particularly in structural quality and temporal error. The algorithm was also applied with real data, on one side it was applied in prediction and fault diagnosis in a subsystem of a power plant. For this application the algorithm was evaluated using different number of cases in terms of predictive score, temporal error and number of intervals. On the other, it was applied with data from patients with HIV in order to obtain mutational networks; i.e. networks that show the temporal evolution of the mutations with respect to certain drugs. For these experiments, the models were qualitatively evaluated by experts.Las Redes Bayesianas se han vuelto el modelo de referencia para manejar incertidumbre debido a su facilidad de interpretación y diversos métodos de inferencia y aprendizaje. Sin embargo, las redes bayesianas tradicionales no pueden manejar información temporal. El modelo conocido como Redes Bayesianas de Nodos Temporales (RBNT) es una extensión que combina el manejo de incertidumbre con información temporal, pero su uso no se ha extendido debido a que no existen métodos de aprendizaje para estas redes. En esta tesis proponemos un algoritmo de aprendizaje de Redes Bayesianas de Nodos Temporales que obtiene la estructura, los intervalos y los parámetros asociados. El algoritmo se compone de tres pasos principales: una discretización inicial de los nodos temporales, la obtención de una estructura inicial y posteriormente un refinamiento de los intervalos usando información de la red. El algoritmo de aprendizaje de intervalos hace uso de un algoritmo basado en agrupamiento para obtener los intervalos temporales. El conjunto de intervalos que obtenga el mejor puntaje predictivo es seleccionado. El algoritmo fue evaluado con datos sintéticos de tres RBNTs de diferentes tamaños con dos distribuciones diferentes para generar los datos temporales. En los experimentos el algoritmo superó a los algoritmos base y obtuvo la mejor calidad estructural y el menor error temporal. El algoritmo también fue aplicado con datos reales, por un lado, en predicción y diagnóstico de fallas en un subsistema de una planta eléctrica. Para esta aplicación el algoritmo se evaluó con diferente número de casos de entrada en términos de calidad predictiva, error temporal y número de intervalos. Por otro lado, también se probó con datos de pacientes con VIH para obtener redes mutacionales; es decir redes, que muestren la evolución temporal de las mutaciones con respecto a ciertos medicamentos. Para esta aplicación los modelos fueron evaluados cualitativamente por los expertos

    Terminal Prediction as an Auxiliary Task for Deep Reinforcement Learning

    Full text link
    Deep reinforcement learning has achieved great successes in recent years, but there are still open challenges, such as convergence to locally optimal policies and sample inefficiency. In this paper, we contribute a novel self-supervised auxiliary task, i.e., Terminal Prediction (TP), estimating temporal closeness to terminal states for episodic tasks. The intuition is to help representation learning by letting the agent predict how close it is to a terminal state, while learning its control policy. Although TP could be integrated with multiple algorithms, this paper focuses on Asynchronous Advantage Actor-Critic (A3C) and demonstrating the advantages of A3C-TP. Our extensive evaluation includes: a set of Atari games, the BipedalWalker domain, and a mini version of the recently proposed multi-agent Pommerman game. Our results on Atari games and the BipedalWalker domain suggest that A3C-TP outperforms standard A3C in most of the tested domains and in others it has similar performance. In Pommerman, our proposed method provides significant improvement both in learning efficiency and converging to better policies against different opponents.Comment: AAAI Conference on Artificial Intelligence and Interactive Digital Entertainment (AIIDE'19). arXiv admin note: text overlap with arXiv:1812.0004

    Agent Modeling as Auxiliary Task for Deep Reinforcement Learning

    Full text link
    In this paper we explore how actor-critic methods in deep reinforcement learning, in particular Asynchronous Advantage Actor-Critic (A3C), can be extended with agent modeling. Inspired by recent works on representation learning and multiagent deep reinforcement learning, we propose two architectures to perform agent modeling: the first one based on parameter sharing, and the second one based on agent policy features. Both architectures aim to learn other agents' policies as auxiliary tasks, besides the standard actor (policy) and critic (values). We performed experiments in both cooperative and competitive domains. The former is a problem of coordinated multiagent object transportation and the latter is a two-player mini version of the Pommerman game. Our results show that the proposed architectures stabilize learning and outperform the standard A3C architecture when learning a best response in terms of expected rewards.Comment: AAAI Conference on Artificial Intelligence and Interactive Digital Entertainment (AIIDE'19
    • …
    corecore