5 research outputs found
Remote Renewable Energy Hubs
6. Clean water and sanitation7. Affordable and clean energy8. Decent work and economic growth9. Industry, innovation and infrastructure12. Responsible consumption and production13. Climate actio
Distributional Reinforcement Learning-based Energy Arbitrage Strategies in Imbalance Settlement Mechanism
Growth in the penetration of renewable energy sources makes supply more
uncertain and leads to an increase in the system imbalance. This trend,
together with the single imbalance pricing, opens an opportunity for balance
responsible parties (BRPs) to perform energy arbitrage in the imbalance
settlement mechanism. To this end, we propose a battery control framework based
on distributional reinforcement learning (DRL). Our proposed control framework
takes a risk-sensitive perspective, allowing BRPs to adjust their risk
preferences: we aim to optimize a weighted sum of the arbitrage profit and a
risk measure while constraining the daily number of cycles for the battery. We
assess the performance of our proposed control framework using the Belgian
imbalance prices of 2022 and compare two state-of-the-art RL methods, deep Q
learning and soft actor-critic. Results reveal that the distributional soft
actor-critic method can outperform other methods. Moreover, we note that our
fully risk-averse agent appropriately learns to hedge against the risk related
to the unknown imbalance price by (dis)charging the battery only when the agent
is more certain about the price
Learning optimal environments using projected stochastic gradient ascent
In this work, we propose a new methodology for jointly sizing a dynamical
system and designing its control law. First, the problem is formalized by
considering parametrized reinforcement learning environments and parametrized
policies. The objective of the optimization problem is to jointly find a
control policy and an environment over the joint hypothesis space of parameters
such that the sum of rewards gathered by the policy in this environment is
maximal. The optimization problem is then addressed by generalizing the direct
policy search algorithms to an algorithm we call Direct Environment Search with
(projected stochastic) Gradient Ascent (DESGA). We illustrate the performance
of DESGA on two benchmarks. First, we consider a parametrized space of
Mass-Spring-Damper (MSD) environments and control policies. Then, we use our
algorithm for optimizing the size of the components and the operation of a
small-scale autonomous energy system, i.e. a solar off-grid microgrid, composed
of photovoltaic panels, batteries, etc. On both benchmarks, we compare the
results of the execution of DESGA with a theoretical upper-bound on the
expected return. Furthermore, the performance of DESGA is compared to an
alternative algorithm. The latter performs a grid discretization of the
environment's hypothesis space and applies the REINFORCE algorithm to identify
pairs of environments and policies resulting in a high expected return. The
choice of this algorithm is also discussed and motivated. On both benchmarks,
we show that DESGA and the alternative algorithm result in a set of parameters
for which the expected return is nearly equal to its theoretical upper-bound.
Nevertheless, the execution of DESGA is much less computationally costly
A Deep Reinforcement Learning Framework for Continuous Intraday Market Bidding
The large integration of variable energy resources is expected to shift a large part of the energy exchanges closer to real-time, where more accurate forecasts are available. In this context, the short-term electricity markets and in particular the intraday market are considered a suitable trading floor for these exchanges to occur. A key component for the successful renewable energy sources integration is the usage of energy storage. In this paper, we propose a novel modelling framework for the strategic participation of energy storage in the European continuous intraday market where exchanges occur through a centralized order book. The goal of the storage device operator is the maximization of the profits received over the entire trading horizon, while taking into account the operational constraints of the unit. The sequential decision-making problem of trading in the intraday market is modelled as a Markov Decision Process. An asynchronous version of the fitted Q iteration algorithm is chosen for solving this problem due to its sample efficiency. The large and variable number of the existing orders in the order book motivates the use of high-level actions and an alternative state representation. Historical data are used for the generation of a large number of artificial trajectories in order to address exploration issues during the learning process. The resulting policy is back-tested and compared against a number of benchmark strategies. Finally, the impact of the storage characteristics on the total revenues collected in the intraday market is evaluated