Search CORE

38,912 research outputs found

Portfolio allocation under the vendor managed inventory: A Markov decision process

Author: Ezugwu VO
Igbinosun LI
Publication venue: 'African Journals Online (AJOL)'
Publication date: 02/02/2017
Field of study

Markov decision processes have been applied in solving a wide range of optimization problems over the years. This study provides a review of Markov decision processes and investigates its suitability for solutions to portfolio allocation problems under vendor managed inventory in an uncertain market environment. The problem was formulated in the frame work of Markov decision process and a value iteration algorithm was implemented to obtain the expected reward and the optimal policy that maps an action to a given state. Two challenges were examined –the uncertainty about the value of the item which follows a stochastic model and the small state/action spaces that can be solved via value iteration. It was observed that the optimal policy is expected to always short the stock when in state 0 because of its large return. However, while the return is not as large as in state 0, the probability of staying in state 2 is high enough that the vendor should long the stock because he expects high reward for several periods. We also obtained the expected reward for each state every ten iterations using a discount factor of l = 0.95. In spite of the small state/action spaces, the vendor is able to optimize its reward by the use of Markov decision process.Keywords: Portfolio Allocation, Vendor Managed Inventory, Markov Decision Process, Value Iteration, Expected Reward, Optimal Policy

AJOL - African Journals Online

Online algorithms for POMDPs with continuous state, action, and observation spaces

Author: Kochenderfer Mykel
Sunberg Zachary
Publication venue
Publication date: 15/06/2018
Field of study

Online solvers for partially observable Markov decision processes have been applied to problems with large discrete state spaces, but continuous state, action, and observation spaces remain a challenge. This paper begins by investigating double progressive widening (DPW) as a solution to this challenge. However, we prove that this modification alone is not sufficient because the belief representations in the search tree collapse to a single particle causing the algorithm to converge to a policy that is suboptimal regardless of the computation time. This paper proposes and evaluates two new algorithms, POMCPOW and PFT-DPW, that overcome this deficiency by using weighted particle filtering. Simulation results show that these modifications allow the algorithms to be successful where previous approaches fail.Comment: Added Multilane sectio

arXiv.org e-Print Archive

Association for the Advancement of Artificial Intelligence: AAAI Publications

Reinforcement Learning in Rich-Observation MDPs using Spectral Methods

Author: Anandkumar Animashree
Azizzadenesheli Kamyar
Lazaric Alessandro
Publication venue
Publication date: 11/11/2016
Field of study

Reinforcement learning (RL) in Markov decision processes (MDPs) with large state spaces is a challenging problem. The performance of standard RL algorithms degrades drastically with the dimensionality of state space. However, in practice, these large MDPs typically incorporate a latent or hidden low-dimensional structure. In this paper, we study the setting of rich-observation Markov decision processes (ROMDP), where there are a small number of hidden states which possess an injective mapping to the observation states. In other words, every observation state is generated through a single hidden state, and this mapping is unknown a priori. We introduce a spectral decomposition method that consistently learns this mapping, and more importantly, achieves it with low regret. The estimated mapping is integrated into an optimistic RL algorithm (UCRL), which operates on the estimated hidden space. We derive finite-time regret bounds for our algorithm with a weak dependence on the dimensionality of the observed space. In fact, our algorithm asymptotically achieves the same average regret as the oracle UCRL algorithm, which has the knowledge of the mapping from hidden to observed spaces. Thus, we derive an efficient spectral RL algorithm for ROMDPs

Caltech Authors

Application of Markov decision processes to search problems

Author: Hartman L.B.
Hee van, K.M.
Publication venue: Eindhoven University of Technology
Publication date: 01/01/1994
Field of study

Many decision problems contain, in some form, a NP-hard combinatorial problem. Therefore decision support systems have to solve such combinatorial problems in a reasonable time. Many combinatorial problems can be solved by a search method. The search methods used in decision support systems have to be robust in the sense that they can handle a large variety of (user defined) constraints and that they allow user interaction, i.e. they allow a decision maker to control the search process manually. In this paper we show how Markov decision processes can be used to guide a random search process. We first formulate search problems as a special class of Markov decision processes such that the search space of a search problem is the state space of the Markov decision process. In general it is not possible to compute an optimal control procedure for these Markov decision processes in a reasonable time. We therefore, define several simplifications of the original problem that have much smaller state spaces. For these simplifications, decompositions and abstractions, we find optimal strategies and use the exact solutions of these simplified problems to guide a randomized search process. The search process selects states for further search at random with probabilities based on the optimal strategies of the simplified problems. This randomization is a substitute for explicit backtracking and avoids problems with local extrema. These randomized search procedures are repeated as long as we have time to solve the problem. The best solution of those generated during that time is accepted. We illustrate the approach with two examples: the N-puzzle and a job shop scheduling problem

Repository TU/e

Pure OAI Repository

Symblicit algorithms for optimal strategy synthesis in monotonic Markov decision processes

Author: Bohy Aaron
Bruyère Véronique
Raskin Jean-François
Publication venue: 'Open Publishing Association'
Publication date: 01/07/2014
Field of study

When treating Markov decision processes (MDPs) with large state spaces, using explicit representations quickly becomes unfeasible. Lately, Wimmer et al. have proposed a so-called symblicit algorithm for the synthesis of optimal strategies in MDPs, in the quantitative setting of expected mean-payoff. This algorithm, based on the strategy iteration algorithm of Howard and Veinott, efficiently combines symbolic and explicit data structures, and uses binary decision diagrams as symbolic representation. The aim of this paper is to show that the new data structure of pseudo-antichains (an extension of antichains) provides another interesting alternative, especially for the class of monotonic MDPs. We design efficient pseudo-antichain based symblicit algorithms (with open source implementations) for two quantitative settings: the expected mean-payoff and the stochastic shortest path. For two practical applications coming from automated planning and LTL synthesis, we report promising experimental results w.r.t. both the run time and the memory consumption.Comment: In Proceedings SYNT 2014, arXiv:1407.493

arXiv.org e-Print Archive

Directory of Open Access Journals

DI-fusion

A limit theorem for Markov decision processes

Author: Staudigl Mathias
Publication venue: Center for Mathematical Economics
Publication date: 01/01/2013
Field of study

Staudigl M. A limit theorem for Markov decision processes. Center for Mathematical Economics Working Papers. Vol 475. Bielefeld: Center for Mathematical Economics; 2013.In this paper we prove a deterministic approximation theorem for a sequence of Markov decision processes with finitely many actions and general state spaces as they appear frequently in economics, game theory and operations research. Using viscosity solution methods no a-priori differentiabililty assumptions are imposed on the value function. Applications for this result can be found in large deviation theory, and some simple economic problems

Maastricht University Research Portal

MAnnheim DOCument Server

Publications at Bielefeld University