Search CORE

57 research outputs found

Ranking and Selection under Input Uncertainty: Fixed Confidence and Fixed Budget

Author: Wu Di
Zhou Enlu
Publication venue
Publication date: 01/03/2019
Field of study

In stochastic simulation, input uncertainty (IU) is caused by the error in estimating the input distributions using finite real-world data. When it comes to simulation-based Ranking and Selection (R&S), ignoring IU could lead to the failure of many existing selection procedures. In this paper, we study R&S under IU by allowing the possibility of acquiring additional data. Two classical R&S formulations are extended to account for IU: (i) for fixed confidence, we consider when data arrive sequentially so that IU can be reduced over time; (ii) for fixed budget, a joint budget is assumed to be available for both collecting input data and running simulations. New procedures are proposed for each formulation using the frameworks of Sequential Elimination and Optimal Computing Budget Allocation, with theoretical guarantees provided accordingly (e.g., upper bound on the expected running time and finite-sample bound on the probability of false selection). Numerical results demonstrate the effectiveness of our procedures through a multi-stage production-inventory problem

arXiv.org e-Print Archive

Fast Estimation of True Bounds on Bermudan Option Prices under Jump-diffusion Processes

Author: Ye Fan
Zhou Enlu
Zhu Helin
Publication venue
Publication date: 18/05/2013
Field of study

Fast pricing of American-style options has been a difficult problem since it was first introduced to financial markets in 1970s, especially when the underlying stocks' prices follow some jump-diffusion processes. In this paper, we propose a new algorithm to generate tight upper bounds on the Bermudan option price without nested simulation, under the jump-diffusion setting. By exploiting the martingale representation theorem for jump processes on the dual martingale, we are able to explore the unique structure of the optimal dual martingale and construct an approximation that preserves the martingale property. The resulting upper bound estimator avoids the nested Monte Carlo simulation suffered by the original primal-dual algorithm, therefore significantly improves the computational efficiency. Theoretical analysis is provided to guarantee the quality of the martingale approximation. Numerical experiments are conducted to verify the efficiency of our proposed algorithm

arXiv.org e-Print Archive

CiteSeerX

Particle Filtering for Stochastic Control and Global Optimization

Author: Zhou Enlu
Publication venue
Publication date: 01/01/2009
Field of study

This thesis explores new algorithms and results in stochastic control and global optimization through the use of particle filtering. Stochastic control and global optimization are two areas that have many applications but are often difficult to solve. In stochastic control, an important class of problems, namely, partially observable Markov decision processes (POMDPs), provides an ideal paradigm to model discrete-time sequential decision making under uncertainty and partial observation. However, POMDPs usually do not admit analytical solutions, and are computationally very expensive to solve most of the time. While many efficient numerical algorithms have been developed for finite-state POMDPs, there are only a few proposed for continuous-state POMDPs, and even more sparse are relevant analytical results regarding convergence and error bounds. From the modeling viewpoint, many application problems are modeled more naturally by continuous-state POMDPs rather than finite-state POMDPs. Therefore, one part of the thesis is devoted to developing a new efficient algorithm for continuous-state POMDPs and studying the performance of the algorithm both analytically and numerically. Based on the idea of density projection with particle filtering, the proposed algorithm reduces the infinite-dimensional problem to a finite-low-dimensional one, and also has the flexibility and scalability for better approximation if given more computational power. Error bounds are proved for the algorithm, and numerical experiments are carried out on an inventory control problem. In global optimization, many problems are very difficult to solve due to the presence of multiple local optima or badly scaled objective functions. Many approximate solutions methods have been developed and studied. Among them, a recent class of simulation-based methods share the common characteristic of repeatedly drawing candidate solutions from an intermediate probability distribution and then updating the distribution using these candidate solutions, until the probability distribution becomes concentrated on the optimal solution. The efficiency and accuracy of these algorithms depend very much on the choice of the intermediate probability distributions and the updating schemes. Using a novel interpretation of particle filtering, these algorithms are unified under one framework, and hence, many new insights are revealed. By better understanding these existing algorithms, the framework also holds the promise for developing new improved algorithms. Some directions for new improved algorithms are proposed, and numerical experiments are carried out on a few benchmark problems

Digital Repository at the University of Maryland

ProQuest OAI Repository

Optimal stopping under partial observation: Near-value iteration

Author: Enlu Zhou
Publication venue
Publication date: 01/01/2013
Field of study

Abstract We propose a new approximate value iteration method, namely near-value iteration (NVI), to solve continuous-state optimal stopping problems under partial observation, which in general cannot be solved analytically and also pose a great challenge to numerical solutions. NVI is motivated by the expression of the value function as the supremum over an uncountable set of linear functions in the belief state. After a smart manipulation of the operations in the updating equation for the value function, we reduce the set to only two functions at every time step, so as to achieve significant computational savings. NVI yields a value function approximation bounded by the tightest lower and upper bounds that can be achieved by existing algorithms in the same class, so the NVI approximation is closer to the true value function than at least one of these bounds. We demonstrate the effectiveness of our approach on an example of pricing American options under stochastic volatility

CiteSeerX

Reusing Historical Trajectories in Natural Policy Gradient via Importance Sampling: Convergence and Convergence Rate

Author: Lin Yifan
Wang Yuhao
Zhou Enlu
Publication venue
Publication date: 01/03/2024
Field of study

Reinforcement learning provides a mathematical framework for learning-based control, whose success largely depends on the amount of data it can utilize. The efficient utilization of historical trajectories obtained from previous policies is essential for expediting policy optimization. Empirical evidence has shown that policy gradient methods based on importance sampling work well. However, existing literature often neglect the interdependence between trajectories from different iterations, and the good empirical performance lacks a rigorous theoretical justification. In this paper, we study a variant of the natural policy gradient method with reusing historical trajectories via importance sampling. We show that the bias of the proposed estimator of the gradient is asymptotically negligible, the resultant algorithm is convergent, and reusing past trajectories helps improve the convergence rate. We further apply the proposed estimator to popular policy optimization algorithms such as trust region policy optimization. Our theoretical results are verified on classical benchmarks

arXiv.org e-Print Archive