21 research outputs found

    UCB1 Based Reinforcement Learning Model for Adaptive Energy Management in Buildings

    Get PDF
    This paper proposes a reinforcement learning model for intelligent energy management in buildings, using a UCB1 based approach. Energy management in buildings has become a critical task in recent years, due to the incentives to the increase of energy efficiency and renewable energy sources penetration. Managing the energy consumption, generation and storage in this domain, becomes, however, an arduous task, due to the large uncertainty of the different resources, adjacent to the dynamic characteristics of this environment. In this scope, reinforcement learning is a promising solution to provide adaptiveness to the energy management methods, by learning with the on-going changes in the environment. The model proposed in this paper aims at supporting decisions on the best actions to take in each moment, regarding buildings energy management. A UCB1 based algorithm is applied, and the results are compared to those of an EXP3 approach and a simple reinforcement learning algorithm. Results show that the proposed approach is able to achieve a higher quality of results, by reaching a higher rate of successful actions identification, when compared to the other considered reference approaches.This work has received funding from the European Union's Horizon 2020 research and innovation programme under the Marie Sklodowska-Curie grant agreement No 641794 (project DREAM-GO) and from Project SIMOCE (ANI|P2020 17690).info:eu-repo/semantics/publishedVersio

    Contextual Simulated Annealing Q-Learning for Pre-negotiation of Agent-Based Bilateral Negotiations

    Get PDF
    Electricity markets are complex environments, which have been suffering continuous transformations due to the increase of renewable based generation and the introduction of new players in the system. In this context, players are forced to re-think their behavior and learn how to act in this dynamic environment in order to get as much benefit as possible from market negotiations. This paper introduces a new learning model to enable players identifying the expected prices of future bilateral agreements, as a way to improve the decision-making process in deciding the opponent players to approach for actual negotiations. The proposed model introduces a con-textual dimension in the well-known Q-Learning algorithm, and includes a simulated annealing process to accelerate the convergence process. The proposed model is integrated in a multi-agent decision support system for electricity market players negotiations, enabling the experimentation of results using real data from the Iberian electricity market.This work has received funding from the European Union's Horizon 2020 research and innovation programme under project DOMINOES (grant agreement No 771066) and from FEDER Funds through COMPETE program and from National Funds through FCT under the project UID/EEA/00760/2019.info:eu-repo/semantics/publishedVersio

    Correlated Bandits for Dynamic Pricing via the ARC algorithm

    Full text link
    The Asymptotic Randomised Control (ARC) algorithm provides a rigorous approximation to the optimal strategy for a wide class of Bayesian bandits, while retaining reasonable computational complexity. In particular, it allows a decision maker to observe signals in addition to their rewards, to incorporate correlations between the outcomes of different choices, and to have nontrivial dynamics for their estimates. The algorithm is guaranteed to asymptotically optimise the expected discounted payoff, with error depending on the initial uncertainty of the bandit. In this paper, we consider a batched bandit problem where observations arrive from a generalised linear model; we extend the ARC algorithm to this setting. We apply this to a classic dynamic pricing problem based on a Bayesian hierarchical model and demonstrate that the ARC algorithm outperforms alternative approaches

    TSEC: a framework for online experimentation under experimental constraints

    Full text link
    Thompson sampling is a popular algorithm for solving multi-armed bandit problems, and has been applied in a wide range of applications, from website design to portfolio optimization. In such applications, however, the number of choices (or arms) NN can be large, and the data needed to make adaptive decisions require expensive experimentation. One is then faced with the constraint of experimenting on only a small subset of K≪NK \ll N arms within each time period, which poses a problem for traditional Thompson sampling. We propose a new Thompson Sampling under Experimental Constraints (TSEC) method, which addresses this so-called "arm budget constraint". TSEC makes use of a Bayesian interaction model with effect hierarchy priors, to model correlations between rewards on different arms. This fitted model is then integrated within Thompson sampling, to jointly identify a good subset of arms for experimentation and to allocate resources over these arms. We demonstrate the effectiveness of TSEC in two problems with arm budget constraints. The first is a simulated website optimization study, where TSEC shows noticeable improvements over industry benchmarks. The second is a portfolio optimization application on industry-based exchange-traded funds, where TSEC provides more consistent and greater wealth accumulation over standard investment strategies
    corecore