22,335 research outputs found
Optimal Hour-Ahead Bidding in the Real-Time Electricity Market with Battery Storage using Approximate Dynamic Programming
There is growing interest in the use of grid-level storage to smooth
variations in supply that are likely to arise with increased use of wind and
solar energy. Energy arbitrage, the process of buying, storing, and selling
electricity to exploit variations in electricity spot prices, is becoming an
important way of paying for expensive investments into grid-level storage.
Independent system operators such as the NYISO (New York Independent System
Operator) require that battery storage operators place bids into an hour-ahead
market (although settlements may occur in increments as small as 5 minutes,
which is considered near "real-time"). The operator has to place these bids
without knowing the energy level in the battery at the beginning of the hour,
while simultaneously accounting for the value of leftover energy at the end of
the hour. The problem is formulated as a dynamic program. We describe and
employ a convergent approximate dynamic programming (ADP) algorithm that
exploits monotonicity of the value function to find a revenue-generating
bidding policy; using optimal benchmarks, we empirically show the computational
benefits of the algorithm. Furthermore, we propose a distribution-free variant
of the ADP algorithm that does not require any knowledge of the distribution of
the price process (and makes no assumptions regarding a specific real-time
price model). We demonstrate that a policy trained on historical real-time
price data from the NYISO using this distribution-free approach is indeed
effective.Comment: 28 pages, 11 figure
Policy Search: Any Local Optimum Enjoys a Global Performance Guarantee
Local Policy Search is a popular reinforcement learning approach for handling
large state spaces. Formally, it searches locally in a paramet erized policy
space in order to maximize the associated value function averaged over some
predefined distribution. It is probably commonly b elieved that the best one
can hope in general from such an approach is to get a local optimum of this
criterion. In this article, we show th e following surprising result:
\emph{any} (approximate) \emph{local optimum} enjoys a \emph{global performance
guarantee}. We compare this g uarantee with the one that is satisfied by Direct
Policy Iteration, an approximate dynamic programming algorithm that does some
form of Poli cy Search: if the approximation error of Local Policy Search may
generally be bigger (because local search requires to consider a space of s
tochastic policies), we argue that the concentrability coefficient that appears
in the performance bound is much nicer. Finally, we discuss several practical
and theoretical consequences of our analysis
- …