21 research outputs found
UCB1 Based Reinforcement Learning Model for Adaptive Energy Management in Buildings
This paper proposes a reinforcement learning model for intelligent energy management in buildings, using a UCB1 based approach. Energy management in buildings has become a critical task in recent years, due to the incentives to the increase of energy efficiency and renewable energy sources penetration. Managing the energy consumption, generation and storage in this domain, becomes, however, an arduous task, due to the large uncertainty of the different resources, adjacent to the dynamic characteristics of this environment. In this scope, reinforcement learning is a promising solution to provide adaptiveness to the energy management methods, by learning with the on-going changes in the environment. The model proposed in this paper aims at supporting decisions on the best actions to take in each moment, regarding buildings energy management. A UCB1 based algorithm is applied, and the results are compared to those of an EXP3 approach and a simple reinforcement learning algorithm. Results show that the proposed approach is able to achieve a higher quality of results, by reaching a higher rate of successful actions identification, when compared to the other considered reference approaches.This work has received funding from the European Union's Horizon 2020 research and innovation programme under the Marie Sklodowska-Curie grant agreement No 641794 (project DREAM-GO) and from Project SIMOCE (ANI|P2020 17690).info:eu-repo/semantics/publishedVersio
Contextual Simulated Annealing Q-Learning for Pre-negotiation of Agent-Based Bilateral Negotiations
Electricity markets are complex environments, which have been suffering continuous transformations due to the increase of renewable based generation and the introduction of new players in the system. In this context, players are forced to re-think their behavior and learn how to act in this dynamic environment in order to get as much benefit as possible from market negotiations. This paper introduces a new learning model to enable players identifying the expected prices of future bilateral agreements, as a way to improve the decision-making process in deciding the opponent players to approach for actual negotiations. The proposed model introduces a con-textual dimension in the well-known Q-Learning algorithm, and includes a simulated annealing process to accelerate the convergence process. The proposed model is integrated in a multi-agent decision support system for electricity market players negotiations, enabling the experimentation of results using real data from the Iberian electricity market.This work has received funding from the European Union's Horizon 2020 research and innovation programme under project DOMINOES (grant agreement No 771066) and from FEDER Funds through COMPETE program and from National Funds through FCT under the project UID/EEA/00760/2019.info:eu-repo/semantics/publishedVersio
Correlated Bandits for Dynamic Pricing via the ARC algorithm
The Asymptotic Randomised Control (ARC) algorithm provides a rigorous
approximation to the optimal strategy for a wide class of Bayesian bandits,
while retaining reasonable computational complexity. In particular, it allows a
decision maker to observe signals in addition to their rewards, to incorporate
correlations between the outcomes of different choices, and to have nontrivial
dynamics for their estimates. The algorithm is guaranteed to asymptotically
optimise the expected discounted payoff, with error depending on the initial
uncertainty of the bandit. In this paper, we consider a batched bandit problem
where observations arrive from a generalised linear model; we extend the ARC
algorithm to this setting. We apply this to a classic dynamic pricing problem
based on a Bayesian hierarchical model and demonstrate that the ARC algorithm
outperforms alternative approaches
TSEC: a framework for online experimentation under experimental constraints
Thompson sampling is a popular algorithm for solving multi-armed bandit
problems, and has been applied in a wide range of applications, from website
design to portfolio optimization. In such applications, however, the number of
choices (or arms) can be large, and the data needed to make adaptive
decisions require expensive experimentation. One is then faced with the
constraint of experimenting on only a small subset of arms within
each time period, which poses a problem for traditional Thompson sampling. We
propose a new Thompson Sampling under Experimental Constraints (TSEC) method,
which addresses this so-called "arm budget constraint". TSEC makes use of a
Bayesian interaction model with effect hierarchy priors, to model correlations
between rewards on different arms. This fitted model is then integrated within
Thompson sampling, to jointly identify a good subset of arms for
experimentation and to allocate resources over these arms. We demonstrate the
effectiveness of TSEC in two problems with arm budget constraints. The first is
a simulated website optimization study, where TSEC shows noticeable
improvements over industry benchmarks. The second is a portfolio optimization
application on industry-based exchange-traded funds, where TSEC provides more
consistent and greater wealth accumulation over standard investment strategies