425 research outputs found
Some notes on iterative optimization of structured Markov decision processes with discounted rewards
The paper contains a comparison of solution techniques for Markov decision processes with respect to the total reward criterion. It is illustrated by examples that the effect of a number of improvements of the standard iterative method, which are advocated in the literature, is limited in some realistic situations. Numerical evidence is provided to show that exploiting the structure of the problem under consideration often yields a more substantial reduction of the required computational effort than some of the existing acceleration procedures. We advocate that this structure should be analyzed and used in choosing the appropriate solution procedure. This procedure might be composed by blending several of the acceleration concepts that are described in literature. Four test problems are sketched and solved with several successive approximation methods. These methods were composed after analyzing the structure of the problem. The required computational efforts are compared
DYNAMIC PROGRAMMING: HAS ITS DAY ARRIVED?
Research Methods/ Statistical Methods,
Evaluating Callable and Putable Bonds: An Eigenfunction Expansion Approach
We propose an efficient method to evaluate callable and putable bonds under a
wide class of interest rate models, including the popular short rate diffusion
models, as well as their time changed versions with jumps. The method is based
on the eigenfunction expansion of the pricing operator. Given the set of call
and put dates, the callable and putable bond pricing function is the value
function of a stochastic game with stopping times. Under some technical
conditions, it is shown to have an eigenfunction expansion in eigenfunctions of
the pricing operator with the expansion coefficients determined through a
backward recursion. For popular short rate diffusion models, such as CIR,
Vasicek, 3/2, the method is orders of magnitude faster than the alternative
approaches in the literature. In contrast to the alternative approaches in the
literature that have so far been limited to diffusions, the method is equally
applicable to short rate jump-diffusion and pure jump models constructed from
diffusion models by Bochner's subordination with a L\'{e}vy subordinator
Update or Wait: How to Keep Your Data Fresh
In this work, we study how to optimally manage the freshness of information
updates sent from a source node to a destination via a channel. A proper metric
for data freshness at the destination is the age-of-information, or simply age,
which is defined as how old the freshest received update is since the moment
that this update was generated at the source node (e.g., a sensor). A
reasonable update policy is the zero-wait policy, i.e., the source node submits
a fresh update once the previous update is delivered and the channel becomes
free, which achieves the maximum throughput and the minimum delay.
Surprisingly, this zero-wait policy does not always minimize the age. This
counter-intuitive phenomenon motivates us to study how to optimally control
information updates to keep the data fresh and to understand when the zero-wait
policy is optimal. We introduce a general age penalty function to characterize
the level of dissatisfaction on data staleness and formulate the average age
penalty minimization problem as a constrained semi-Markov decision problem
(SMDP) with an uncountable state space. We develop efficient algorithms to find
the optimal update policy among all causal policies, and establish sufficient
and necessary conditions for the optimality of the zero-wait policy. Our
investigation shows that the zero-wait policy is far from the optimum if (i)
the age penalty function grows quickly with respect to the age, (ii) the packet
transmission times over the channel are positively correlated over time, or
(iii) the packet transmission times are highly random (e.g., following a
heavy-tail distribution)
Time representation in reinforcement learning models of the basal ganglia
Reinforcement learning (RL) models have been influential in understanding many aspects of basal ganglia function, from reward prediction to action selection. Time plays an important role in these models, but there is still no theoretical consensus about what kind of time representation is used by the basal ganglia. We review several theoretical accounts and their supporting evidence. We then discuss the relationship between RL models and the timing mechanisms that have been attributed to the basal ganglia. We hypothesize that a single computational system may underlie both RL and interval timing—the perception of duration in the range of seconds to hours. This hypothesis, which extends earlier models by incorporating a time-sensitive action selection mechanism, may have important implications for understanding disorders like Parkinson's disease in which both decision making and timing are impaired
Optimal Online Transmission Policy for Energy-Constrained Wireless-Powered Communication Networks
This work considers the design of online transmission policy in a
wireless-powered communication system with a given energy budget. The system
design objective is to maximize the long-term throughput of the system
exploiting the energy storage capability at the wireless-powered node. We
formulate the design problem as a constrained Markov decision process (CMDP)
problem and obtain the optimal policy of transmit power and time allocation in
each fading block via the Lagrangian approach. To investigate the system
performance in different scenarios, numerical simulations are conducted with
various system parameters. Our simulation results show that the optimal policy
significantly outperforms a myopic policy which only maximizes the throughput
in the current fading block. Moreover, the optimal allocation of transmit power
and time is shown to be insensitive to the change of modulation and coding
schemes, which facilitates its practical implementation.Comment: 7 pages, accepted by ICC 2019. An extended version of this paper is
accepted by IEEE TW
Partial policy iteration for L1-robust Markov decision processes
Robust Markov decision processes (MDPs) compute reliable solutions for dynamic decision problems with partially-known transition probabilities. Unfortunately, accounting for uncertainty in the transition probabilities significantly increases the computational complexity of solving robust MDPs, which limits their scalability. This paper describes new, efficient algorithms for solving the common class of robust MDPs with s- and sa-rectangular ambiguity sets defined by weighted L1 norms. We propose partial policy iteration, a new, efficient, flexible, and general policy iteration scheme for robust MDPs. We also propose fast methods for computing the robust Bellman operator in quasi-linear time, nearly matching the ordinary Bellman operator's linear complexity. Our experimental results indicate that the proposed methods are many orders of magnitude faster than the state-of-the-art approach, which uses linear programming solvers combined with a robust value iteration
- …