74,648 research outputs found
A weighted Markov decision process
The two most commonly considered reward criteria for Markov decision processes are the discounted reward and the long-term average reward. The first tends to "neglect" the future, concentrating on the short-term rewards, while the second one tends to do the opposite. We consider a new reward criterion consisting of the weighted combination of these two criteria, thereby allowing the decision maker to place more or less emphasis on the short-term versus the long-term rewards by varying their weights. The mathematical implications of the new criterion include: the deterministic stationary policies can be outperformed by the randomized stationary policies, which in turn can be outperformed by the nonstationary policies; an optimal policy might not exist. We present an iterative algorithm for computing an e-optimal nonstationary policy with a very simple structure
Age-Energy Tradeoff in Fading Channels with Packet-Based Transmissions
The optimal transmission strategy to minimize the weighted combination of age
of information (AoI) and total energy consumption is studied in this paper. It
is assumed that the status update information is obtained and transmitted at
fixed rate over a Rayleigh fading channel in a packet-based wireless
communication system. A maximum transmission round on each packet is enforced
to guarantee certain reliability of the update packets. Given fixed average
transmission power, the age-energy tradeoff can be formulated as a constrained
Markov decision process (CMDP) problem considering the sensing power
consumption as well. Employing the Lagrangian relaxation, the CMDP problem is
transformed into a Markov decision process (MDP) problem. An algorithm is
proposed to obtain the optimal power allocation policy. Through simulation
results, it is shown that both age and energy efficiency can be improved by the
proposed optimal policy compared with two benchmark schemes. Also, age can be
effectively reduced at the expense of higher energy cost, and more emphasis on
energy consumption leads to higher average age at the same energy efficiency.
Overall, the tradeoff between average age and energy efficiency is identified
An Inverse Method for Policy-Iteration Based Algorithms
We present an extension of two policy-iteration based algorithms on weighted
graphs (viz., Markov Decision Problems and Max-Plus Algebras). This extension
allows us to solve the following inverse problem: considering the weights of
the graph to be unknown constants or parameters, we suppose that a reference
instantiation of those weights is given, and we aim at computing a constraint
on the parameters under which an optimal policy for the reference instantiation
is still optimal. The original algorithm is thus guaranteed to behave well
around the reference instantiation, which provides us with some criteria of
robustness. We present an application of both methods to simple examples. A
prototype implementation has been done
- …