38,912 research outputs found

    Portfolio allocation under the vendor managed inventory: A Markov decision process

    Get PDF
    Markov decision processes have been applied in solving a wide range of optimization problems over the years. This study provides a review of Markov decision processes and investigates its suitability for solutions to portfolio allocation problems under vendor managed inventory in an uncertain market environment. The problem was formulated in the frame work of Markov decision process and a value iteration  algorithm was implemented to obtain the expected reward and the optimal policy that maps an action to a given state. Two challenges were examined –the uncertainty about the value of the item which follows a stochastic model and the small state/action spaces that can be solved via value iteration. It was observed that the optimal policy is expected to always short the stock when in state 0 because of its large return. However, while the return is not as large as in state 0, the probability of staying in state 2 is high enough that the vendor should long the stock because he expects high reward for several periods. We also obtained the expected reward for each state every ten iterations using a discount factor of l = 0.95. In spite of the small state/action spaces, the vendor is able to optimize its reward by the use of Markov decision process.Keywords: Portfolio Allocation, Vendor Managed Inventory, Markov Decision Process, Value Iteration, Expected Reward, Optimal Policy

    Online algorithms for POMDPs with continuous state, action, and observation spaces

    Full text link
    Online solvers for partially observable Markov decision processes have been applied to problems with large discrete state spaces, but continuous state, action, and observation spaces remain a challenge. This paper begins by investigating double progressive widening (DPW) as a solution to this challenge. However, we prove that this modification alone is not sufficient because the belief representations in the search tree collapse to a single particle causing the algorithm to converge to a policy that is suboptimal regardless of the computation time. This paper proposes and evaluates two new algorithms, POMCPOW and PFT-DPW, that overcome this deficiency by using weighted particle filtering. Simulation results show that these modifications allow the algorithms to be successful where previous approaches fail.Comment: Added Multilane sectio

    Reinforcement Learning in Rich-Observation MDPs using Spectral Methods

    Get PDF
    Reinforcement learning (RL) in Markov decision processes (MDPs) with large state spaces is a challenging problem. The performance of standard RL algorithms degrades drastically with the dimensionality of state space. However, in practice, these large MDPs typically incorporate a latent or hidden low-dimensional structure. In this paper, we study the setting of rich-observation Markov decision processes (ROMDP), where there are a small number of hidden states which possess an injective mapping to the observation states. In other words, every observation state is generated through a single hidden state, and this mapping is unknown a priori. We introduce a spectral decomposition method that consistently learns this mapping, and more importantly, achieves it with low regret. The estimated mapping is integrated into an optimistic RL algorithm (UCRL), which operates on the estimated hidden space. We derive finite-time regret bounds for our algorithm with a weak dependence on the dimensionality of the observed space. In fact, our algorithm asymptotically achieves the same average regret as the oracle UCRL algorithm, which has the knowledge of the mapping from hidden to observed spaces. Thus, we derive an efficient spectral RL algorithm for ROMDPs

    Application of Markov decision processes to search problems

    Get PDF
    Many decision problems contain, in some form, a NP-hard combinatorial problem. Therefore decision support systems have to solve such combinatorial problems in a reasonable time. Many combinatorial problems can be solved by a search method. The search methods used in decision support systems have to be robust in the sense that they can handle a large variety of (user defined) constraints and that they allow user interaction, i.e. they allow a decision maker to control the search process manually. In this paper we show how Markov decision processes can be used to guide a random search process. We first formulate search problems as a special class of Markov decision processes such that the search space of a search problem is the state space of the Markov decision process. In general it is not possible to compute an optimal control procedure for these Markov decision processes in a reasonable time. We therefore, define several simplifications of the original problem that have much smaller state spaces. For these simplifications, decompositions and abstractions, we find optimal strategies and use the exact solutions of these simplified problems to guide a randomized search process. The search process selects states for further search at random with probabilities based on the optimal strategies of the simplified problems. This randomization is a substitute for explicit backtracking and avoids problems with local extrema. These randomized search procedures are repeated as long as we have time to solve the problem. The best solution of those generated during that time is accepted. We illustrate the approach with two examples: the N-puzzle and a job shop scheduling problem

    Symblicit algorithms for optimal strategy synthesis in monotonic Markov decision processes

    Full text link
    When treating Markov decision processes (MDPs) with large state spaces, using explicit representations quickly becomes unfeasible. Lately, Wimmer et al. have proposed a so-called symblicit algorithm for the synthesis of optimal strategies in MDPs, in the quantitative setting of expected mean-payoff. This algorithm, based on the strategy iteration algorithm of Howard and Veinott, efficiently combines symbolic and explicit data structures, and uses binary decision diagrams as symbolic representation. The aim of this paper is to show that the new data structure of pseudo-antichains (an extension of antichains) provides another interesting alternative, especially for the class of monotonic MDPs. We design efficient pseudo-antichain based symblicit algorithms (with open source implementations) for two quantitative settings: the expected mean-payoff and the stochastic shortest path. For two practical applications coming from automated planning and LTL synthesis, we report promising experimental results w.r.t. both the run time and the memory consumption.Comment: In Proceedings SYNT 2014, arXiv:1407.493

    A limit theorem for Markov decision processes

    Get PDF
    Staudigl M. A limit theorem for Markov decision processes. Center for Mathematical Economics Working Papers. Vol 475. Bielefeld: Center for Mathematical Economics; 2013.In this paper we prove a deterministic approximation theorem for a sequence of Markov decision processes with finitely many actions and general state spaces as they appear frequently in economics, game theory and operations research. Using viscosity solution methods no a-priori differentiabililty assumptions are imposed on the value function. Applications for this result can be found in large deviation theory, and some simple economic problems
    • …
    corecore