Search CORE

4 research outputs found

Online Learning Schemes for Power Allocation in Energy Harvesting Communications

Author: Krishnamachari Bhaskar
Sakulkar Pranav
Publication venue
Publication date: 25/08/2016
Field of study

We consider the problem of power allocation over a time-varying channel with unknown distribution in energy harvesting communication systems. In this problem, the transmitter has to choose the transmit power based on the amount of stored energy in its battery with the goal of maximizing the average rate obtained over time. We model this problem as a Markov decision process (MDP) with the transmitter as the agent, the battery status as the state, the transmit power as the action and the rate obtained as the reward. The average reward maximization problem over the MDP can be solved by a linear program (LP) that uses the transition probabilities for the state-action pairs and their reward values to choose a power allocation policy. Since the rewards associated the state-action pairs are unknown, we propose two online learning algorithms: UCLP and Epoch-UCLP that learn these rewards and adapt their policies along the way. The UCLP algorithm solves the LP at each step to decide its current policy using the upper confidence bounds on the rewards, while the Epoch-UCLP algorithm divides the time into epochs, solves the LP only at the beginning of the epochs and follows the obtained policy in that epoch. We prove that the reward losses or regrets incurred by both these algorithms are upper bounded by constants. Epoch-UCLP incurs a higher regret compared to UCLP, but reduces the computational requirements substantially. We also show that the presented algorithms work for online learning in cost minimization problems like the packet scheduling with power-delay tradeoff with minor changes.Comment: This paper is under submission in the IEEE Transaction on Information Theor

arXiv.org e-Print Archive

Learning and Fairness in Energy Harvesting: A Maximin Multi-Armed Bandits Approach

Author: Ghosh Debamita
Hanawal Manjesh K.
Verma Arun
Publication venue
Publication date: 16/06/2020
Field of study

Recent advances in wireless radio frequency (RF) energy harvesting allows sensor nodes to increase their lifespan by remotely charging their batteries. The amount of energy harvested by the nodes varies depending on their ambient environment, and proximity to the source. The lifespan of the sensor network depends on the minimum amount of energy a node can harvest in the network. It is thus important to learn the least amount of energy harvested by nodes so that the source can transmit on a frequency band that maximizes this amount. We model this learning problem as a novel stochastic Maximin Multi-Armed Bandits (Maximin MAB) problem and propose an Upper Confidence Bound (UCB) based algorithm named Maximin UCB. Maximin MAB is a generalization of standard MAB and enjoys the same performance guarantee as that of the UCB1 algorithm. Experimental results validate the performance guarantees of our algorithm.Comment: To be presented at SPCOM 202

arXiv.org e-Print Archive

Optimal Power Control for Transmitting Correlated Sources with Energy Harvesting Constraints

Author: Chen Zhi
Dong Yunquan
Shim Byonghyo
Wang Jian
Publication venue
Publication date: 06/06/2017
Field of study

We investigate the weighted-sum distortion minimization problem in transmitting two correlated Gaussian sources over Gaussian channels using two energy harvesting nodes. To this end, we develop offline and online power control policies to optimize the transmit power of the two nodes. In the offline case, we cast the problem as a convex optimization and investigate the structure of the optimal solution. We also develop a generalized water-filling based power allocation algorithm to obtain the optimal solution efficiently. For the online case, we quantify the distortion of the system using a cost function and show that the expected cost equals the expected weighted-sum distortion. Based on Banach's fixed point theorem, we further propose a geometrically converging algorithm to find the minimum cost via simple iterations. Simulation results show that our online power control outperforms the greedy power control where each node uses all the available energy in each slot and performs close to that of the proposed offline power control. Moreover, the performance of our offline power control almost coincides with the performance limit of the system.Comment: 15 pages, 12 figure

arXiv.org e-Print Archive

Reinforcement Learning based Multi-Access Control and Battery Prediction with Energy Harvesting in IoT Systems

Author: Chu Man
Cui Shuguang
Li Hang
Liao Xuewen
Publication venue
Publication date: 21/09/2018
Field of study

Energy harvesting (EH) is a promising technique to fulfill the long-term and self-sustainable operations for Internet of things (IoT) systems. In this paper, we study the joint access control and battery prediction problems in a small-cell IoT system including multiple EH user equipments (UEs) and one base station (BS) with limited uplink access channels. Each UE has a rechargeable battery with finite capacity. The system control is modeled as a Markov decision process without complete prior knowledge assumed at the BS, which also deals with large sizes in both state and action spaces. First, to handle the access control problem assuming causal battery and channel state information, we propose a scheduling algorithm that maximizes the uplink transmission sum rate based on reinforcement learning (RL) with deep Q-network (DQN) enhancement. Second, for the battery prediction problem, with a fixed round-robin access control policy adopted, we develop a RL based algorithm to minimize the prediction loss (error) without any model knowledge about the energy source and energy arrival process. Finally, the joint access control and battery prediction problem is investigated, where we propose a two-layer RL network to simultaneously deal with maximizing the sum rate and minimizing the prediction loss: the first layer is for battery prediction, the second layer generates the access policy based on the output from the first layer. Experiment results show that the three proposed RL algorithms can achieve better performances compared with existing benchmarks

arXiv.org e-Print Archive