74,648 research outputs found

    A weighted Markov decision process

    Get PDF
    The two most commonly considered reward criteria for Markov decision processes are the discounted reward and the long-term average reward. The first tends to "neglect" the future, concentrating on the short-term rewards, while the second one tends to do the opposite. We consider a new reward criterion consisting of the weighted combination of these two criteria, thereby allowing the decision maker to place more or less emphasis on the short-term versus the long-term rewards by varying their weights. The mathematical implications of the new criterion include: the deterministic stationary policies can be outperformed by the randomized stationary policies, which in turn can be outperformed by the nonstationary policies; an optimal policy might not exist. We present an iterative algorithm for computing an e-optimal nonstationary policy with a very simple structure

    Age-Energy Tradeoff in Fading Channels with Packet-Based Transmissions

    Full text link
    The optimal transmission strategy to minimize the weighted combination of age of information (AoI) and total energy consumption is studied in this paper. It is assumed that the status update information is obtained and transmitted at fixed rate over a Rayleigh fading channel in a packet-based wireless communication system. A maximum transmission round on each packet is enforced to guarantee certain reliability of the update packets. Given fixed average transmission power, the age-energy tradeoff can be formulated as a constrained Markov decision process (CMDP) problem considering the sensing power consumption as well. Employing the Lagrangian relaxation, the CMDP problem is transformed into a Markov decision process (MDP) problem. An algorithm is proposed to obtain the optimal power allocation policy. Through simulation results, it is shown that both age and energy efficiency can be improved by the proposed optimal policy compared with two benchmark schemes. Also, age can be effectively reduced at the expense of higher energy cost, and more emphasis on energy consumption leads to higher average age at the same energy efficiency. Overall, the tradeoff between average age and energy efficiency is identified

    An Inverse Method for Policy-Iteration Based Algorithms

    Full text link
    We present an extension of two policy-iteration based algorithms on weighted graphs (viz., Markov Decision Problems and Max-Plus Algebras). This extension allows us to solve the following inverse problem: considering the weights of the graph to be unknown constants or parameters, we suppose that a reference instantiation of those weights is given, and we aim at computing a constraint on the parameters under which an optimal policy for the reference instantiation is still optimal. The original algorithm is thus guaranteed to behave well around the reference instantiation, which provides us with some criteria of robustness. We present an application of both methods to simple examples. A prototype implementation has been done
    • …
    corecore