425 research outputs found

    Some notes on iterative optimization of structured Markov decision processes with discounted rewards

    Get PDF
    The paper contains a comparison of solution techniques for Markov decision processes with respect to the total reward criterion. It is illustrated by examples that the effect of a number of improvements of the standard iterative method, which are advocated in the literature, is limited in some realistic situations. Numerical evidence is provided to show that exploiting the structure of the problem under consideration often yields a more substantial reduction of the required computational effort than some of the existing acceleration procedures. We advocate that this structure should be analyzed and used in choosing the appropriate solution procedure. This procedure might be composed by blending several of the acceleration concepts that are described in literature. Four test problems are sketched and solved with several successive approximation methods. These methods were composed after analyzing the structure of the problem. The required computational efforts are compared

    An Optimal Policy for a Two Depot Inventory Problem with Stock Transfer

    Get PDF

    DYNAMIC PROGRAMMING: HAS ITS DAY ARRIVED?

    Get PDF
    Research Methods/ Statistical Methods,

    Evaluating Callable and Putable Bonds: An Eigenfunction Expansion Approach

    Full text link
    We propose an efficient method to evaluate callable and putable bonds under a wide class of interest rate models, including the popular short rate diffusion models, as well as their time changed versions with jumps. The method is based on the eigenfunction expansion of the pricing operator. Given the set of call and put dates, the callable and putable bond pricing function is the value function of a stochastic game with stopping times. Under some technical conditions, it is shown to have an eigenfunction expansion in eigenfunctions of the pricing operator with the expansion coefficients determined through a backward recursion. For popular short rate diffusion models, such as CIR, Vasicek, 3/2, the method is orders of magnitude faster than the alternative approaches in the literature. In contrast to the alternative approaches in the literature that have so far been limited to diffusions, the method is equally applicable to short rate jump-diffusion and pure jump models constructed from diffusion models by Bochner's subordination with a L\'{e}vy subordinator

    Update or Wait: How to Keep Your Data Fresh

    Full text link
    In this work, we study how to optimally manage the freshness of information updates sent from a source node to a destination via a channel. A proper metric for data freshness at the destination is the age-of-information, or simply age, which is defined as how old the freshest received update is since the moment that this update was generated at the source node (e.g., a sensor). A reasonable update policy is the zero-wait policy, i.e., the source node submits a fresh update once the previous update is delivered and the channel becomes free, which achieves the maximum throughput and the minimum delay. Surprisingly, this zero-wait policy does not always minimize the age. This counter-intuitive phenomenon motivates us to study how to optimally control information updates to keep the data fresh and to understand when the zero-wait policy is optimal. We introduce a general age penalty function to characterize the level of dissatisfaction on data staleness and formulate the average age penalty minimization problem as a constrained semi-Markov decision problem (SMDP) with an uncountable state space. We develop efficient algorithms to find the optimal update policy among all causal policies, and establish sufficient and necessary conditions for the optimality of the zero-wait policy. Our investigation shows that the zero-wait policy is far from the optimum if (i) the age penalty function grows quickly with respect to the age, (ii) the packet transmission times over the channel are positively correlated over time, or (iii) the packet transmission times are highly random (e.g., following a heavy-tail distribution)

    Time representation in reinforcement learning models of the basal ganglia

    Get PDF
    Reinforcement learning (RL) models have been influential in understanding many aspects of basal ganglia function, from reward prediction to action selection. Time plays an important role in these models, but there is still no theoretical consensus about what kind of time representation is used by the basal ganglia. We review several theoretical accounts and their supporting evidence. We then discuss the relationship between RL models and the timing mechanisms that have been attributed to the basal ganglia. We hypothesize that a single computational system may underlie both RL and interval timing—the perception of duration in the range of seconds to hours. This hypothesis, which extends earlier models by incorporating a time-sensitive action selection mechanism, may have important implications for understanding disorders like Parkinson's disease in which both decision making and timing are impaired

    Optimal Online Transmission Policy for Energy-Constrained Wireless-Powered Communication Networks

    Get PDF
    This work considers the design of online transmission policy in a wireless-powered communication system with a given energy budget. The system design objective is to maximize the long-term throughput of the system exploiting the energy storage capability at the wireless-powered node. We formulate the design problem as a constrained Markov decision process (CMDP) problem and obtain the optimal policy of transmit power and time allocation in each fading block via the Lagrangian approach. To investigate the system performance in different scenarios, numerical simulations are conducted with various system parameters. Our simulation results show that the optimal policy significantly outperforms a myopic policy which only maximizes the throughput in the current fading block. Moreover, the optimal allocation of transmit power and time is shown to be insensitive to the change of modulation and coding schemes, which facilitates its practical implementation.Comment: 7 pages, accepted by ICC 2019. An extended version of this paper is accepted by IEEE TW

    Partial policy iteration for L1-robust Markov decision processes

    Get PDF
    Robust Markov decision processes (MDPs) compute reliable solutions for dynamic decision problems with partially-known transition probabilities. Unfortunately, accounting for uncertainty in the transition probabilities significantly increases the computational complexity of solving robust MDPs, which limits their scalability. This paper describes new, efficient algorithms for solving the common class of robust MDPs with s- and sa-rectangular ambiguity sets defined by weighted L1 norms. We propose partial policy iteration, a new, efficient, flexible, and general policy iteration scheme for robust MDPs. We also propose fast methods for computing the robust Bellman operator in quasi-linear time, nearly matching the ordinary Bellman operator's linear complexity. Our experimental results indicate that the proposed methods are many orders of magnitude faster than the state-of-the-art approach, which uses linear programming solvers combined with a robust value iteration
    corecore