57 research outputs found
Ranking and Selection under Input Uncertainty: Fixed Confidence and Fixed Budget
In stochastic simulation, input uncertainty (IU) is caused by the error in
estimating the input distributions using finite real-world data. When it comes
to simulation-based Ranking and Selection (R&S), ignoring IU could lead to the
failure of many existing selection procedures. In this paper, we study R&S
under IU by allowing the possibility of acquiring additional data. Two
classical R&S formulations are extended to account for IU: (i) for fixed
confidence, we consider when data arrive sequentially so that IU can be reduced
over time; (ii) for fixed budget, a joint budget is assumed to be available for
both collecting input data and running simulations. New procedures are proposed
for each formulation using the frameworks of Sequential Elimination and Optimal
Computing Budget Allocation, with theoretical guarantees provided accordingly
(e.g., upper bound on the expected running time and finite-sample bound on the
probability of false selection). Numerical results demonstrate the
effectiveness of our procedures through a multi-stage production-inventory
problem
Fast Estimation of True Bounds on Bermudan Option Prices under Jump-diffusion Processes
Fast pricing of American-style options has been a difficult problem since it
was first introduced to financial markets in 1970s, especially when the
underlying stocks' prices follow some jump-diffusion processes. In this paper,
we propose a new algorithm to generate tight upper bounds on the Bermudan
option price without nested simulation, under the jump-diffusion setting. By
exploiting the martingale representation theorem for jump processes on the dual
martingale, we are able to explore the unique structure of the optimal dual
martingale and construct an approximation that preserves the martingale
property. The resulting upper bound estimator avoids the nested Monte Carlo
simulation suffered by the original primal-dual algorithm, therefore
significantly improves the computational efficiency. Theoretical analysis is
provided to guarantee the quality of the martingale approximation. Numerical
experiments are conducted to verify the efficiency of our proposed algorithm
Particle Filtering for Stochastic Control and Global Optimization
This thesis explores new algorithms and results in stochastic control and global optimization through the use of particle filtering. Stochastic control and global optimization are two areas that have many applications but are often difficult to solve.
In stochastic control, an important class of problems, namely, partially observable Markov decision processes (POMDPs), provides an ideal paradigm to model discrete-time sequential decision making under uncertainty and partial observation. However, POMDPs usually do not admit analytical solutions, and are computationally very expensive to solve most of the time. While many efficient numerical algorithms have been developed for finite-state POMDPs, there are only a few proposed for continuous-state POMDPs, and even more sparse are relevant analytical results regarding convergence and error bounds. From the modeling viewpoint, many application problems are modeled more naturally by continuous-state POMDPs rather than finite-state POMDPs. Therefore, one part of the thesis is devoted to developing a new efficient algorithm for continuous-state POMDPs and studying the performance of the algorithm both analytically and numerically. Based on the idea of density projection with particle filtering, the proposed algorithm reduces the infinite-dimensional problem to a finite-low-dimensional one, and also has the flexibility and scalability for better approximation if given more computational power. Error bounds are proved for the algorithm, and numerical experiments are carried out on an inventory control problem.
In global optimization, many problems are very difficult to solve due to the presence of multiple local optima or badly scaled objective functions. Many approximate solutions methods have been developed and studied. Among them, a recent class of simulation-based methods share the common characteristic of repeatedly drawing candidate solutions from an intermediate probability distribution and then updating the distribution using these candidate solutions, until the probability distribution becomes concentrated on the optimal solution. The efficiency and accuracy of these algorithms depend very much on the choice of the intermediate probability distributions and the updating schemes. Using a novel interpretation of particle filtering, these algorithms are unified under one framework, and hence, many new insights are revealed. By better understanding these existing algorithms, the framework also holds the promise for developing new improved algorithms. Some directions for new improved algorithms are proposed, and numerical experiments are carried out on a few benchmark problems
Optimal stopping under partial observation: Near-value iteration
Abstract We propose a new approximate value iteration method, namely near-value iteration (NVI), to solve continuous-state optimal stopping problems under partial observation, which in general cannot be solved analytically and also pose a great challenge to numerical solutions. NVI is motivated by the expression of the value function as the supremum over an uncountable set of linear functions in the belief state. After a smart manipulation of the operations in the updating equation for the value function, we reduce the set to only two functions at every time step, so as to achieve significant computational savings. NVI yields a value function approximation bounded by the tightest lower and upper bounds that can be achieved by existing algorithms in the same class, so the NVI approximation is closer to the true value function than at least one of these bounds. We demonstrate the effectiveness of our approach on an example of pricing American options under stochastic volatility
Reusing Historical Trajectories in Natural Policy Gradient via Importance Sampling: Convergence and Convergence Rate
Reinforcement learning provides a mathematical framework for learning-based
control, whose success largely depends on the amount of data it can utilize.
The efficient utilization of historical trajectories obtained from previous
policies is essential for expediting policy optimization. Empirical evidence
has shown that policy gradient methods based on importance sampling work well.
However, existing literature often neglect the interdependence between
trajectories from different iterations, and the good empirical performance
lacks a rigorous theoretical justification. In this paper, we study a variant
of the natural policy gradient method with reusing historical trajectories via
importance sampling. We show that the bias of the proposed estimator of the
gradient is asymptotically negligible, the resultant algorithm is convergent,
and reusing past trajectories helps improve the convergence rate. We further
apply the proposed estimator to popular policy optimization algorithms such as
trust region policy optimization. Our theoretical results are verified on
classical benchmarks
- …