Search CORE

425 research outputs found

Some notes on iterative optimization of structured Markov decision processes with discounted rewards

Author: Hendrikx M.H.M.
van Nunen J.A.E.E.
Wessels J.
Publication venue: Technische Hogeschool Eindhoven
Publication date: 01/01/1980
Field of study

The paper contains a comparison of solution techniques for Markov decision processes with respect to the total reward criterion. It is illustrated by examples that the effect of a number of improvements of the standard iterative method, which are advocated in the literature, is limited in some realistic situations. Numerical evidence is provided to show that exploiting the structure of the problem under consideration often yields a more substantial reduction of the required computational effort than some of the existing acceleration procedures. We advocate that this structure should be analyzed and used in choosing the appropriate solution procedure. This procedure might be composed by blending several of the acceleration concepts that are described in literature. Four test problems are sketched and solved with several successive approximation methods. These methods were composed after analyzing the structure of the problem. The required computational efforts are compared

Repository TU/e

An Optimal Policy for a Two Depot Inventory Problem with Stock Transfer

Author: Archibald Thomas
Publication venue
Publication date: 01/01/1997
Field of study

DYNAMIC PROGRAMMING: HAS ITS DAY ARRIVED?

Author: Burt Oscar R.
Publication venue
Publication date
Field of study

Research Methods/ Statistical Methods,

Evaluating Callable and Putable Bonds: An Eigenfunction Expansion Approach

Author: Li Lingfei
Lim Dongjae
Linetsky Vadim
Publication venue
Publication date: 01/01/2012
Field of study

We propose an efficient method to evaluate callable and putable bonds under a wide class of interest rate models, including the popular short rate diffusion models, as well as their time changed versions with jumps. The method is based on the eigenfunction expansion of the pricing operator. Given the set of call and put dates, the callable and putable bond pricing function is the value function of a stochastic game with stopping times. Under some technical conditions, it is shown to have an eigenfunction expansion in eigenfunctions of the pricing operator with the expansion coefficients determined through a backward recursion. For popular short rate diffusion models, such as CIR, Vasicek, 3/2, the method is orders of magnitude faster than the alternative approaches in the literature. In contrast to the alternative approaches in the literature that have so far been limited to diffusions, the method is equally applicable to short rate jump-diffusion and pure jump models constructed from diffusion models by Bochner's subordination with a L\'{e}vy subordinator

arXiv.org e-Print Archive

CiteSeerX

Update or Wait: How to Keep Your Data Fresh

Author: Koksal C. Emre
Shroff Ness B.
Sun Yin
Uysal-Biyikoglu Elif
Yates Roy D.
Publication venue
Publication date: 21/04/2017
Field of study

In this work, we study how to optimally manage the freshness of information updates sent from a source node to a destination via a channel. A proper metric for data freshness at the destination is the age-of-information, or simply age, which is defined as how old the freshest received update is since the moment that this update was generated at the source node (e.g., a sensor). A reasonable update policy is the zero-wait policy, i.e., the source node submits a fresh update once the previous update is delivered and the channel becomes free, which achieves the maximum throughput and the minimum delay. Surprisingly, this zero-wait policy does not always minimize the age. This counter-intuitive phenomenon motivates us to study how to optimally control information updates to keep the data fresh and to understand when the zero-wait policy is optimal. We introduce a general age penalty function to characterize the level of dissatisfaction on data staleness and formulate the average age penalty minimization problem as a constrained semi-Markov decision problem (SMDP) with an uncountable state space. We develop efficient algorithms to find the optimal update policy among all causal policies, and establish sufficient and necessary conditions for the optimality of the zero-wait policy. Our investigation shows that the zero-wait policy is far from the optimum if (i) the age penalty function grows quickly with respect to the age, (ii) the packet transmission times over the channel are positively correlated over time, or (iii) the packet transmission times are highly random (e.g., following a heavy-tail distribution)

arXiv.org e-Print Archive

Time representation in reinforcement learning models of the basal ganglia

Author: Alvaro F. Nieto Guil
Emilio L. Malchiodi
Maria Eugenia Bernis
Mariana eOksdath
Marisa M. Fernandez
Santiago eQuiroga
Sebastian eDupraz
Silvana B. Rosso
Publication venue: 'Frontiers Media SA'
Publication date: 01/01/2013
Field of study

Reinforcement learning (RL) models have been influential in understanding many aspects of basal ganglia function, from reward prediction to action selection. Time plays an important role in these models, but there is still no theoretical consensus about what kind of time representation is used by the basal ganglia. We review several theoretical accounts and their supporting evidence. We then discuss the relationship between RL models and the timing mechanisms that have been attributed to the basal ganglia. We hypothesize that a single computational system may underlie both RL and interval timing—the perception of duration in the range of seconds to hours. This hypothesis, which extends earlier models by incorporating a time-sensitive action selection mechanism, may have important implications for understanding disorders like Parkinson's disease in which both decision making and timing are impaired

Directory of Open Access Journals

Frontiers - Publisher Connector

Warwick Research Archives Portal Repository

Optimal Online Transmission Policy for Energy-Constrained Wireless-Powered Communication Networks

Author: Li Xian
Ng Derrick Wing Kwan
Sun Changyin
Zhou Xiangyun
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 24/02/2019
Field of study

This work considers the design of online transmission policy in a wireless-powered communication system with a given energy budget. The system design objective is to maximize the long-term throughput of the system exploiting the energy storage capability at the wireless-powered node. We formulate the design problem as a constrained Markov decision process (CMDP) problem and obtain the optimal policy of transmit power and time allocation in each fading block via the Lagrangian approach. To investigate the system performance in different scenarios, numerical simulations are conducted with various system parameters. Our simulation results show that the optimal policy significantly outperforms a myopic policy which only maximizes the throughput in the current fading block. Moreover, the optimal allocation of transmit power and time is shown to be insensitive to the change of modulation and coding schemes, which facilitates its practical implementation.Comment: 7 pages, accepted by ICC 2019. An extended version of this paper is accepted by IEEE TW

arXiv.org e-Print Archive

The Australian National University

Partial policy iteration for L1-robust Markov decision processes

Author: Ho CP
Petrik M
Wiesemann W
Publication venue: Microtome Publishing
Publication date: 01/03/2021
Field of study

Robust Markov decision processes (MDPs) compute reliable solutions for dynamic decision problems with partially-known transition probabilities. Unfortunately, accounting for uncertainty in the transition probabilities significantly increases the computational complexity of solving robust MDPs, which limits their scalability. This paper describes new, efficient algorithms for solving the common class of robust MDPs with s- and sa-rectangular ambiguity sets defined by weighted L1 norms. We propose partial policy iteration, a new, efficient, flexible, and general policy iteration scheme for robust MDPs. We also propose fast methods for computing the robust Bellman operator in quasi-linear time, nearly matching the ordinary Bellman operator's linear complexity. Our experimental results indicate that the proposed methods are many orders of magnitude faster than the state-of-the-art approach, which uses linear programming solvers combined with a robust value iteration

Spiral - Imperial College Digital Repository