4 research outputs found
Online Learning Schemes for Power Allocation in Energy Harvesting Communications
We consider the problem of power allocation over a time-varying channel with
unknown distribution in energy harvesting communication systems. In this
problem, the transmitter has to choose the transmit power based on the amount
of stored energy in its battery with the goal of maximizing the average rate
obtained over time. We model this problem as a Markov decision process (MDP)
with the transmitter as the agent, the battery status as the state, the
transmit power as the action and the rate obtained as the reward. The average
reward maximization problem over the MDP can be solved by a linear program (LP)
that uses the transition probabilities for the state-action pairs and their
reward values to choose a power allocation policy. Since the rewards associated
the state-action pairs are unknown, we propose two online learning algorithms:
UCLP and Epoch-UCLP that learn these rewards and adapt their policies along the
way. The UCLP algorithm solves the LP at each step to decide its current policy
using the upper confidence bounds on the rewards, while the Epoch-UCLP
algorithm divides the time into epochs, solves the LP only at the beginning of
the epochs and follows the obtained policy in that epoch. We prove that the
reward losses or regrets incurred by both these algorithms are upper bounded by
constants. Epoch-UCLP incurs a higher regret compared to UCLP, but reduces the
computational requirements substantially. We also show that the presented
algorithms work for online learning in cost minimization problems like the
packet scheduling with power-delay tradeoff with minor changes.Comment: This paper is under submission in the IEEE Transaction on Information
Theor
Learning and Fairness in Energy Harvesting: A Maximin Multi-Armed Bandits Approach
Recent advances in wireless radio frequency (RF) energy harvesting allows
sensor nodes to increase their lifespan by remotely charging their batteries.
The amount of energy harvested by the nodes varies depending on their ambient
environment, and proximity to the source. The lifespan of the sensor network
depends on the minimum amount of energy a node can harvest in the network. It
is thus important to learn the least amount of energy harvested by nodes so
that the source can transmit on a frequency band that maximizes this amount. We
model this learning problem as a novel stochastic Maximin Multi-Armed Bandits
(Maximin MAB) problem and propose an Upper Confidence Bound (UCB) based
algorithm named Maximin UCB. Maximin MAB is a generalization of standard MAB
and enjoys the same performance guarantee as that of the UCB1 algorithm.
Experimental results validate the performance guarantees of our algorithm.Comment: To be presented at SPCOM 202
Optimal Power Control for Transmitting Correlated Sources with Energy Harvesting Constraints
We investigate the weighted-sum distortion minimization problem in
transmitting two correlated Gaussian sources over Gaussian channels using two
energy harvesting nodes. To this end, we develop offline and online power
control policies to optimize the transmit power of the two nodes. In the
offline case, we cast the problem as a convex optimization and investigate the
structure of the optimal solution. We also develop a generalized water-filling
based power allocation algorithm to obtain the optimal solution efficiently.
For the online case, we quantify the distortion of the system using a cost
function and show that the expected cost equals the expected weighted-sum
distortion. Based on Banach's fixed point theorem, we further propose a
geometrically converging algorithm to find the minimum cost via simple
iterations. Simulation results show that our online power control outperforms
the greedy power control where each node uses all the available energy in each
slot and performs close to that of the proposed offline power control.
Moreover, the performance of our offline power control almost coincides with
the performance limit of the system.Comment: 15 pages, 12 figure
Reinforcement Learning based Multi-Access Control and Battery Prediction with Energy Harvesting in IoT Systems
Energy harvesting (EH) is a promising technique to fulfill the long-term and
self-sustainable operations for Internet of things (IoT) systems. In this
paper, we study the joint access control and battery prediction problems in a
small-cell IoT system including multiple EH user equipments (UEs) and one base
station (BS) with limited uplink access channels. Each UE has a rechargeable
battery with finite capacity. The system control is modeled as a Markov
decision process without complete prior knowledge assumed at the BS, which also
deals with large sizes in both state and action spaces. First, to handle the
access control problem assuming causal battery and channel state information,
we propose a scheduling algorithm that maximizes the uplink transmission sum
rate based on reinforcement learning (RL) with deep Q-network (DQN)
enhancement. Second, for the battery prediction problem, with a fixed
round-robin access control policy adopted, we develop a RL based algorithm to
minimize the prediction loss (error) without any model knowledge about the
energy source and energy arrival process. Finally, the joint access control and
battery prediction problem is investigated, where we propose a two-layer RL
network to simultaneously deal with maximizing the sum rate and minimizing the
prediction loss: the first layer is for battery prediction, the second layer
generates the access policy based on the output from the first layer.
Experiment results show that the three proposed RL algorithms can achieve
better performances compared with existing benchmarks