Search CORE

3 research outputs found

Q-learning with Experience Replay in a Dynamic Environment

Author: Pieters Mathijs
Wiering Marco
Publication venue
Publication date: 06/12/2016
Field of study

Most research in reinforcement learning has focused on stationary environments. In this paper, we propose several adaptations of Q-learning for a dynamic environment, for both single and multiple agents. The environment consists of a grid of random rewards, where every reward is removed after a visit. We focus on experience replay, a technique that receives a lot of attention nowadays, and combine this method with Q-learning. We compare two variations of experience replay, where experiences are reused based on time or based on the obtained reward. For multi-agent reinforcement learning we compare two variations of policy representation. In the first variation the agents share a Q-function, while in the second variation both agents have a separate Q-function. Furthermore, in both variations we test the effect of reward sharing between the agents. This leads to four different multi-agent reinforcement learning algorithms, from which sharing a Q-function and sharing the rewards is the most cooperative method. The results show that in the single-agent environment both experience replay algorithms significantly outperform standard Q-learning and a greedy benchmark agent. In the multi-agent environment the highest maximum reward sum in a trial is achieved by using one Q-function and reward sharing. The highest mean reward sum is obtained with separate Q-functions and separate rewards

Crossref

Proceedings - University of Groningen

University of Groningen

ARTS repository - University of Groningen

Dissertations of the University of Groningen

Reinforcement Learning Exploration Algorithms for Energy Harvesting Communications Systems

Author: Kamal Ahmed
Masadeh Ala’eddin
Masadeh Ala’eddin
Wang Zhengdao
Wang Zhengdao
Publication venue: Iowa State University Digital Repository
Publication date: 01/01/2018
Field of study

Prolonging the lifetime, and maximizing the throughput are important factors in designing an efficient communications system, especially for energy harvesting-based systems. In this work, the problem of maximizing the throughput of point-to-point energy harvesting communications system, while prolonging its lifetime is investigated. This work considers more real communications system, where this system does not have a priori knowledge about the environment. This system consists of a transmitter and receiver. The transmitter is equipped with an infinite buffer to store data, and energy harvesting capability to harvest renewable energy and store it in a finite battery. The problem of finding an efficient power allocation policy is formulated as a reinforcement learning problem. Two different exploration algorithms are used, which are the convergence-based and the epsilon-greedy algorithms. The first algorithm uses the action-value function convergence error and the exploration time threshold to balance between exploration and exploitation. On the other hand, the second algorithm tries to achieve balancing through the exploration probability (i.e. epsilon). Simulation results show that the convergence-based algorithm outperforms the epsilon-greedy algorithm. Then, the effects of the parameters of each algorithm are investigated

Digital Repository @ Iowa State University (ISU)

Crossref