5 research outputs found

    Remote Renewable Energy Hubs

    Full text link
    6. Clean water and sanitation7. Affordable and clean energy8. Decent work and economic growth9. Industry, innovation and infrastructure12. Responsible consumption and production13. Climate actio

    Distributional Reinforcement Learning-based Energy Arbitrage Strategies in Imbalance Settlement Mechanism

    Full text link
    Growth in the penetration of renewable energy sources makes supply more uncertain and leads to an increase in the system imbalance. This trend, together with the single imbalance pricing, opens an opportunity for balance responsible parties (BRPs) to perform energy arbitrage in the imbalance settlement mechanism. To this end, we propose a battery control framework based on distributional reinforcement learning (DRL). Our proposed control framework takes a risk-sensitive perspective, allowing BRPs to adjust their risk preferences: we aim to optimize a weighted sum of the arbitrage profit and a risk measure while constraining the daily number of cycles for the battery. We assess the performance of our proposed control framework using the Belgian imbalance prices of 2022 and compare two state-of-the-art RL methods, deep Q learning and soft actor-critic. Results reveal that the distributional soft actor-critic method can outperform other methods. Moreover, we note that our fully risk-averse agent appropriately learns to hedge against the risk related to the unknown imbalance price by (dis)charging the battery only when the agent is more certain about the price

    Learning optimal environments using projected stochastic gradient ascent

    Full text link
    In this work, we propose a new methodology for jointly sizing a dynamical system and designing its control law. First, the problem is formalized by considering parametrized reinforcement learning environments and parametrized policies. The objective of the optimization problem is to jointly find a control policy and an environment over the joint hypothesis space of parameters such that the sum of rewards gathered by the policy in this environment is maximal. The optimization problem is then addressed by generalizing the direct policy search algorithms to an algorithm we call Direct Environment Search with (projected stochastic) Gradient Ascent (DESGA). We illustrate the performance of DESGA on two benchmarks. First, we consider a parametrized space of Mass-Spring-Damper (MSD) environments and control policies. Then, we use our algorithm for optimizing the size of the components and the operation of a small-scale autonomous energy system, i.e. a solar off-grid microgrid, composed of photovoltaic panels, batteries, etc. On both benchmarks, we compare the results of the execution of DESGA with a theoretical upper-bound on the expected return. Furthermore, the performance of DESGA is compared to an alternative algorithm. The latter performs a grid discretization of the environment's hypothesis space and applies the REINFORCE algorithm to identify pairs of environments and policies resulting in a high expected return. The choice of this algorithm is also discussed and motivated. On both benchmarks, we show that DESGA and the alternative algorithm result in a set of parameters for which the expected return is nearly equal to its theoretical upper-bound. Nevertheless, the execution of DESGA is much less computationally costly

    A Deep Reinforcement Learning Framework for Continuous Intraday Market Bidding

    Full text link
    The large integration of variable energy resources is expected to shift a large part of the energy exchanges closer to real-time, where more accurate forecasts are available. In this context, the short-term electricity markets and in particular the intraday market are considered a suitable trading floor for these exchanges to occur. A key component for the successful renewable energy sources integration is the usage of energy storage. In this paper, we propose a novel modelling framework for the strategic participation of energy storage in the European continuous intraday market where exchanges occur through a centralized order book. The goal of the storage device operator is the maximization of the profits received over the entire trading horizon, while taking into account the operational constraints of the unit. The sequential decision-making problem of trading in the intraday market is modelled as a Markov Decision Process. An asynchronous version of the fitted Q iteration algorithm is chosen for solving this problem due to its sample efficiency. The large and variable number of the existing orders in the order book motivates the use of high-level actions and an alternative state representation. Historical data are used for the generation of a large number of artificial trajectories in order to address exploration issues during the learning process. The resulting policy is back-tested and compared against a number of benchmark strategies. Finally, the impact of the storage characteristics on the total revenues collected in the intraday market is evaluated