132 research outputs found
An Exchange Mechanism to Coordinate Flexibility in Residential Energy Cooperatives
Energy cooperatives (ECs) such as residential and industrial microgrids have
the potential to mitigate increasing fluctuations in renewable electricity
generation, but only if their joint response is coordinated. However, the
coordination and control of independently operated flexible resources (e.g.,
storage, demand response) imposes critical challenges arising from the
heterogeneity of the resources, conflict of interests, and impact on the grid.
Correspondingly, overcoming these challenges with a general and fair yet
efficient exchange mechanism that coordinates these distributed resources will
accommodate renewable fluctuations on a local level, thereby supporting the
energy transition. In this paper, we introduce such an exchange mechanism. It
incorporates a payment structure that encourages prosumers to participate in
the exchange by increasing their utility above baseline alternatives. The
allocation from the proposed mechanism increases the system efficiency
(utilitarian social welfare) and distributes profits more fairly (measured by
Nash social welfare) than individual flexibility activation. A case study
analyzing the mechanism performance and resulting payments in numerical
experiments over real demand and generation profiles of the Pecan Street
dataset elucidates the efficacy to promote cooperation between co-located
flexibilities in residential cooperatives through local exchange.Comment: Accepted in IEEE ICIT 201
A Survey and Critique of Multiagent Deep Reinforcement Learning
Deep reinforcement learning (RL) has achieved outstanding results in recent
years. This has led to a dramatic increase in the number of applications and
methods. Recent works have explored learning beyond single-agent scenarios and
have considered multiagent learning (MAL) scenarios. Initial results report
successes in complex multiagent domains, although there are several challenges
to be addressed. The primary goal of this article is to provide a clear
overview of current multiagent deep reinforcement learning (MDRL) literature.
Additionally, we complement the overview with a broader analysis: (i) we
revisit previous key components, originally presented in MAL and RL, and
highlight how they have been adapted to multiagent deep reinforcement learning
settings. (ii) We provide general guidelines to new practitioners in the area:
describing lessons learned from MDRL works, pointing to recent benchmarks, and
outlining open avenues of research. (iii) We take a more critical tone raising
practical challenges of MDRL (e.g., implementation and computational demands).
We expect this article will help unify and motivate future research to take
advantage of the abundant literature that exists (e.g., RL and MAL) in a joint
effort to promote fruitful research in the multiagent community.Comment: Under review since Oct 2018. Earlier versions of this work had the
title: "Is multiagent deep reinforcement learning the answer or the question?
A brief survey
Strategic interactions against non-stationary agents
Designing an agent that is capable of interacting with another agent is an open problem. An
interaction happen when two or more agents perform an action in an environment and they obtain
an utility based on the performed joint action.Current multiagent learning techniques do not fare
well with agents that change their behavior during a repeated interaction. This happens because
they usually do not model the other agents’ behavior and instead make some assumptions that for
real scenarios are too restrictive. Furthermore, considering that many applications demand different
types of agents to work together this should be an important problem to solve. It does not matter if
the domain is cooperative (where agents have a common goal) or competitive (where objectives are
different), there is one common aspect: agents must learn how their counterpart is acting and react
quickly to changes in behavior
Algoritmo de aprendizaje para redes bayesianas de nodos temporales
Bayesian networks have become the reference model to deal with uncertainty due to its easy understanding and different inference and learning algorithms. However, Bayesian networks can not deal with temporal information. The model known as Temporal Nodes Bayesian Networks (TNBN) is an extension that combines uncertainty reasoning with temporal information, but it has not been used extensively due to a lack of learning algorithms for this type of networks. In this thesis we propose a learning algorithm for Temporal Nodes Bayesian Networks that obtains the structure, the intervals and the associated parameters. The algorithm has three main steps: an initial discretization of the temporal nodes, learning of an initial structure and a refinement of the intervals using the structure information. The intervals’ learning algorithm uses a clustering technique to obtain the temporal intervals. The algorithm was evaluated with synthetic data of three TNBNs of different sizes with two distributions to generate the temporal data. In the experiments the algorithm obtained better scores than the baselines, particularly in structural quality and temporal error. The algorithm was also applied with real data, on one side it was applied in prediction and fault diagnosis in a subsystem of a power plant. For this application the algorithm was evaluated using different number of cases in terms of predictive score, temporal error and number of intervals. On the other, it was applied with data from patients with HIV in order to obtain mutational networks; i.e. networks that show the temporal evolution of the mutations with respect to certain drugs. For these experiments, the models were qualitatively evaluated by experts.Las Redes Bayesianas se han vuelto el modelo de referencia para manejar incertidumbre
debido a su facilidad de interpretación y diversos métodos de inferencia y aprendizaje. Sin
embargo, las redes bayesianas tradicionales no pueden manejar información temporal. El
modelo conocido como Redes Bayesianas de Nodos Temporales (RBNT) es una extensión
que combina el manejo de incertidumbre con información temporal, pero su uso no se ha
extendido debido a que no existen métodos de aprendizaje para estas redes.
En esta tesis proponemos un algoritmo de aprendizaje de Redes Bayesianas de Nodos
Temporales que obtiene la estructura, los intervalos y los parámetros asociados. El algoritmo
se compone de tres pasos principales: una discretización inicial de los nodos temporales, la
obtención de una estructura inicial y posteriormente un refinamiento de los intervalos usando
información de la red. El algoritmo de aprendizaje de intervalos hace uso de un algoritmo
basado en agrupamiento para obtener los intervalos temporales. El conjunto de intervalos
que obtenga el mejor puntaje predictivo es seleccionado.
El algoritmo fue evaluado con datos sintéticos de tres RBNTs de diferentes tamaños
con dos distribuciones diferentes para generar los datos temporales. En los experimentos el
algoritmo superó a los algoritmos base y obtuvo la mejor calidad estructural y el menor
error temporal. El algoritmo también fue aplicado con datos reales, por un lado, en predicción y diagnóstico de fallas en un subsistema de una planta eléctrica. Para esta aplicación
el algoritmo se evaluó con diferente número de casos de entrada en términos de calidad
predictiva, error temporal y número de intervalos. Por otro lado, también se probó con
datos de pacientes con VIH para obtener redes mutacionales; es decir redes, que muestren
la evolución temporal de las mutaciones con respecto a ciertos medicamentos. Para esta
aplicación los modelos fueron evaluados cualitativamente por los expertos
Terminal Prediction as an Auxiliary Task for Deep Reinforcement Learning
Deep reinforcement learning has achieved great successes in recent years, but
there are still open challenges, such as convergence to locally optimal
policies and sample inefficiency. In this paper, we contribute a novel
self-supervised auxiliary task, i.e., Terminal Prediction (TP), estimating
temporal closeness to terminal states for episodic tasks. The intuition is to
help representation learning by letting the agent predict how close it is to a
terminal state, while learning its control policy. Although TP could be
integrated with multiple algorithms, this paper focuses on Asynchronous
Advantage Actor-Critic (A3C) and demonstrating the advantages of A3C-TP. Our
extensive evaluation includes: a set of Atari games, the BipedalWalker domain,
and a mini version of the recently proposed multi-agent Pommerman game. Our
results on Atari games and the BipedalWalker domain suggest that A3C-TP
outperforms standard A3C in most of the tested domains and in others it has
similar performance. In Pommerman, our proposed method provides significant
improvement both in learning efficiency and converging to better policies
against different opponents.Comment: AAAI Conference on Artificial Intelligence and Interactive Digital
Entertainment (AIIDE'19). arXiv admin note: text overlap with
arXiv:1812.0004
Agent Modeling as Auxiliary Task for Deep Reinforcement Learning
In this paper we explore how actor-critic methods in deep reinforcement
learning, in particular Asynchronous Advantage Actor-Critic (A3C), can be
extended with agent modeling. Inspired by recent works on representation
learning and multiagent deep reinforcement learning, we propose two
architectures to perform agent modeling: the first one based on parameter
sharing, and the second one based on agent policy features. Both architectures
aim to learn other agents' policies as auxiliary tasks, besides the standard
actor (policy) and critic (values). We performed experiments in both
cooperative and competitive domains. The former is a problem of coordinated
multiagent object transportation and the latter is a two-player mini version of
the Pommerman game. Our results show that the proposed architectures stabilize
learning and outperform the standard A3C architecture when learning a best
response in terms of expected rewards.Comment: AAAI Conference on Artificial Intelligence and Interactive Digital
Entertainment (AIIDE'19
- …