91 research outputs found
Randomized Search Methods for Solving Markov Decision Processes and Global Optimization
Markov decision process (MDP) models provide a unified framework for modeling and describing
sequential decision making problems that arise in engineering, economics, and computer science.
However, when the underlying problem is modeled by MDPs, there is a typical exponential growth in the size of the resultant MDP model with the size of the original problem, which makes practical solution of the MDP models intractable, especially for large problems.
Moreover, for complex systems, it is often the case that some of the parameters of the MDP models cannot be obtained in a feasible way,
but only simulation samples are available. In the first part of this thesis, we develop two sampling/simulation-based numerical algorithms to address the computational difficulties arising from these settings. The proposed algorithms have somewhat different emphasis: one algorithm focuses on MDPs with large state spaces but relatively small action spaces, and emphasizes on the efficient allocation of simulation
samples to find good value function estimates, whereas the other algorithm targets problems with large action spaces but small state spaces, and invokes a population-based approach
to avoid carrying out an optimization over the entire action space. We study the convergence properties of these algorithms and report on computational results to illustrate their performance.
The second part of this thesis is devoted to the development of a general framework called Model Reference Adaptive Search (MRAS) for solving
global optimization problems. The method iteratively updates a parameterized probability distribution on the solution space, so that the sequence of candidate solutions generated from this distribution will converge
asymptotically to the global optimum. We provide a particular instantiation of the framework and establish its convergence properties in both continuous and discrete domains. In addition, we explore the relationship between the
recently proposed Cross-Entropy (CE)
method and MRAS, and show that the model reference framework can also be used to describe the CE method and study its properties. Finally, we formally discuss the extension of the MRAS framework to stochastic optimization problems and carry out numerical experiments to
investigate the performance of the method
COMBINING GRADIENT-BASED OPTIMIZATION WITH STOCHASTIC SEARCH
ABSTRACT We propose a stochastic search algorithm for solving non-differentiable optimization problems. At each iteration, the algorithm searches the solution space by generating a population of candidate solutions from a parameterized sampling distribution. The basic idea is to convert the original optimization problem into a differentiable problem in terms of the parameters of the sampling distribution, and then use a quasiNewton-like method on the reformulated problem to find improved sampling distributions. The algorithm combines the strength of stochastic search from considering a population of candidate solutions to explore the solution space with the rapid convergence behavior of gradient methods by exploiting local differentiable structures. We provide numerical examples to illustrate its performance
Quantile Optimization via Multiple Timescale Local Search for Black-box Functions
We consider quantile optimization of black-box functions that are estimated
with noise. We propose two new iterative three-timescale local search
algorithms. The first algorithm uses an appropriately modified
finite-difference-based gradient estimator that requires + 1 samples of
the black-box function per iteration of the algorithm, where is the number
of decision variables (dimension of the input vector). For higher-dimensional
problems, this algorithm may not be practical if the black-box function
estimates are expensive. The second algorithm employs a
simultaneous-perturbation-based gradient estimator that uses only three samples
for each iteration regardless of problem dimension. Under appropriate
conditions, we show the almost sure convergence of both algorithms. In
addition, for the class of strongly convex functions, we further establish
their (finite-time) convergence rate through a novel fixed-point argument.
Simulation experiments indicate that the algorithms work well on a variety of
test problems and compare well with recently proposed alternative methods
Controlled Optimal Design Program for the Logit Dose Response Model
The assessment of dose-response is an integral component of the drug development process. Parallel dose-response studies are conducted, customarily, in preclinical and phase 1, 2 clinical trials for this purpose. Practical constraints on dose range, dose levels and dose proportions are intrinsic issues in the design of dose response studies because of drug toxicity, efficacy, FDA regulations, protocol requirements, clinical trial logistics, and marketing issues. We provide a free on-line software package called Controlled Optimal Design 2.0 for generating controlled optimal designs that can incorporate prior information and multiple objectives, and meet multiple practical constraints at the same time. Researchers can either run the web-based design program or download its stand-alone version to construct the desired multiple-objective controlled Bayesian optimal designs. Because researchers often adopt ad-hoc design schemes such as the equal allocation rules without knowing how efficient such designs would be for the design problem, the program also evaluates the efficiency of user-supplied designs
Variance Reduction for Generalized Likelihood Ratio Method By Conditional Monte Carlo and Randomized Quasi-Monte Carlo
International audienceThe generalized likelihood ratio (GLR) method is a recently introduced gradient estimation method for handling discontinuities for a wide scope of sample performances. We put the GLR methods from previous work into a single framework, simplify regularity conditions for justifying unbiasedness of GLR, and relax some of those conditions that are difficult to verify in practice. Moreover, we combine GLR with conditional Monte Carlo methods and randomized quasi-Monte Carlo methods to reduce the variance. Numerical experiments show that the variance reduction could be significant in various applications
Generalized Likelihood Ratio Method for Stochastic Models with Uniform Random Numbers As Inputs
We propose a new unbiased stochastic gradient estimator for a family of stochastic models with uniform random numbers as inputs. By extending the generalized likelihood ratio (GLR) method, the proposed estimator applies to discontinuous sample performances with structural parameters without requiring that the tails of the density of the input random variables go down to zero smoothly, an assumption in Peng et al. (2018) and Peng et al. (2020a) that precludes a direct formulation in terms of uniform random numbers as inputs. By overcoming this limitation, our new estimator greatly expands the applicability of the GLR method, which we demonstrate for several general classes of uniform input random numbers, including independent inverse transform random variates and dependent input random variables governed by an Archimedean copula. We show how the new derivative estimator works in specific settings such as density estimation, distribution sensitivity for quantiles, and sensitivity analysis for Markov chain stopping time problems, which we illustrate with applications to statistical quality control, stochastic activity networks, and credit risk derivatives. Numerical experiments substantiate broad applicability and flexibility in dealing with discontinuities in sample performance
An ϵ-Greedy Multiarmed Bandit Approach to Markov Decision Processes
We present REGA, a new adaptive-sampling-based algorithm for the control of finite-horizon Markov decision processes (MDPs) with very large state spaces and small action spaces. We apply a variant of the ϵ-greedy multiarmed bandit algorithm to each stage of the MDP in a recursive manner, thus computing an estimation of the “reward-to-go” value at each stage of the MDP. We provide a finite-time analysis of REGA. In particular, we provide a bound on the probability that the approximation error exceeds a given threshold, where the bound is given in terms of the number of samples collected at each stage of the MDP. We empirically compare REGA against another sampling-based algorithm called RASA by running simulations against the SysAdmin benchmark problem with 210 states. The results show that REGA and RASA achieved similar performance. Moreover, REGA and RASA empirically outperformed an implementation of the algorithm that uses the “original” ϵ-greedy algorithm that commonly appears in the literature
- …