Search CORE

3 research outputs found

Reinforcement learning for power scheduling in a grid-tied pv-battery electric vehicles charging station

Author: Arwa Erick Odhiambo
Publication venue: Department of Electrical Engineering
Publication date: 19/09/2022
Field of study

Grid-tied renewable energy sources (RES) based electric vehicle (EV) charging stations are an example of a distributed generator behind the meter system (DGBMS) which characterizes most modern power infrastructure. To perform power scheduling in such a DGBMS, stochastic variables such as load profile of the charging station, output profile of the RES and tariff profile of the utility must be considered at every decision step. The stochasticity in this kind of optimization environment makes power scheduling a challenging task that deserves substantial research attention. This dissertation investigates the application of reinforcement learning (RL) techniques in solving the power scheduling problem in a grid-tied PV-powered EV charging station with the incorporation of a battery energy storage system. RL is a reward-motivated optimization technique that was derived from the way animals learn to optimize their behavior in a new environment. Unlike other optimization methods such as numerical and soft computing techniques, RL does not require an accurate model of the optimization environment in order to arrive at an optimal solution. This study developed and evaluated the feasibility of two RL algorithms, namely, an asynchronous Q-learning algorithm and an advantage actor-critic (A2C) algorithm, in performing power scheduling in the EV charging station under static conditions. To assess the performances of the proposed algorithms, the conventional Q-learning and actor-critic algorithm were implemented to compare their global cost convergence and their learning characteristics. First, the power scheduling problem was expressed as a sequential decision-making process. Then an asynchronous Q-learning algorithm was developed to solve it. Further, an advantage actor-critic (A2C) algorithm was developed and was used to solve the power scheduling problem. The two algorithms were tested using a 24-hour load, generation and utility grid tariff profiles under static optimization conditions. The performance of the asynchronous Q-learning algorithm was compared with that of the conventional Q-learning method in terms of the global cost, stability and scalability. Likewise, the A2C was compared with the conventional actor-critic method in terms of stability, scalability and convergence. Simulation results showed that both the developed algorithms (asynchronous Q-learning algorithm and A2C) converged to lower global costs and displayed more stable learning characteristics than their conventional counterparts. This research established that proper restriction of the action-space of a Q-learning algorithm improves its stability and convergence. It was also observed that such a restriction may come with compromise of computational speed and scalability. Of the four algorithms analyzed, the A2C was found to produce a power schedule with the lowest global cost and the best usage of the battery energy storage system

Cape Town University OpenUCT

Optimal energy management for a grid-tied solar PV-battery microgrid: A reinforcement learning approach

Author: Muriithi Grace
Publication venue: Faculty of Engineering and the Built Environment
Publication date: 31/03/2023
Field of study

There has been a shift towards energy sustainability in recent years, and this shift should continue. The steady growth of energy demand because of population growth, as well as heightened worries about the number of anthropogenic gases released into the atmosphere and deployment of advanced grid technologies, has spurred the penetration of renewable energy resources (RERs) at different locations and scales in the power grid. As a result, the energy system is moving away from the centralized paradigm of large, controllable power plants and toward a decentralized network based on renewables. Microgrids, either grid-connected or islanded, provide a key solution for integrating RERs, load demand flexibility, and energy storage systems within this framework. Nonetheless, renewable energy resources, such as solar and wind energy, can be extremely stochastic as they are weather dependent. These resources coupled with load demand uncertainties lead to random variations on both the generation and load sides, thus challenging optimal energy management. This thesis develops an optimal energy management system (EMS) for a grid-tied solar PV-battery microgrid. The goal of the EMS is to obtain the minimum operational costs (cost of power exchange with the utility and battery wear cost) while still considering network constraints, which ensure grid violations are avoided. A reinforcement learning (RL) approach is proposed to minimize the operational cost of the microgrid under this stochastic setting. RL is a reward-motivated optimization technique derived from how animals learn to optimize their behaviour in new environments. Unlike other conventional model-based optimization approaches, RL doesn't need an explicit model of the optimization system to get optimal solutions. The EMS is modelled as a Markov Decision Process (MDP) to achieve optimality considering the state, action, and reward function. The feasibility of two RL algorithms, namely, conventional Q-learning algorithm and deep Q network algorithm, are developed, and their efficacy in performing optimal energy management for the designed system is evaluated in this thesis. First, the energy management problem is expressed as a sequential decision-making process, after which two algorithms, trading, and non-trading algorithm, are developed. In the trading algorithm case, excess microgrid's energy can be sold back to the utility to increase revenue, while in the latter case constraining rules are embedded in the designed EMS to ensure that no excess energy is sold back to the utility. Then a Q-learning algorithm is developed to minimize the operational cost of the microgrid under unknown future information. Finally, to evaluate the performance of the proposed EMS, a comparison study between a trading case EMS model and a non-trading case is performed using a typical commercial load curve and PV generation profile over a 24- hour horizon. Numerical simulation results indicated that the algorithm learned to select an optimized energy schedule that minimizes energy cost (cost of power purchased from the utility based on the time-varying tariff and battery wear cost) in both summer and winter case studies. However, comparing the non-trading EMS to the trading EMS model operational costs, the latter one decreased cost by 4.033% in the summer season and 2.199% in the winter season. Secondly, a deep Q network (DQN) method that uses recent learning algorithm enhancements, including experience replay and target network, is developed to learn the system uncertainties, including load demand, grid prices and volatile power supply from the renewables solve the optimal energy management problem. Unlike the Q-learning method, which updates the Q-function using a lookup table (which limits its scalability and overall performance in stochastic optimization), the DQN method uses a deep neural network that approximates the Q- function via statistical regression. The performance of the proposed method is evaluated with differently fluctuating load profiles, i.e., slow, medium, and fast. Simulation results substantiated the efficacy of the proposed method as the algorithm was established to learn from experience to raise the battery state of charge and optimally shift loads from a one-time instance, thus supporting the utility grid in reducing aggregate peak load. Furthermore, the performance of the proposed DQN approach was compared to the conventional Q-learning algorithm in terms of achieving a minimum global cost. Simulation results showed that the DQN algorithm outperformed the conventional Q-learning approach, reducing system operational costs by 15%, 24%, and 26% for the slow, medium, and fast fluctuating load profiles in the studied cases

Cape Town University OpenUCT