Search CORE

89 research outputs found

Randomized Search Methods for Solving Markov Decision Processes and Global Optimization

Author: Hu jiaqiao
Publication venue
Publication date: 06/07/2006
Field of study

Markov decision process (MDP) models provide a unified framework for modeling and describing sequential decision making problems that arise in engineering, economics, and computer science. However, when the underlying problem is modeled by MDPs, there is a typical exponential growth in the size of the resultant MDP model with the size of the original problem, which makes practical solution of the MDP models intractable, especially for large problems. Moreover, for complex systems, it is often the case that some of the parameters of the MDP models cannot be obtained in a feasible way, but only simulation samples are available. In the first part of this thesis, we develop two sampling/simulation-based numerical algorithms to address the computational difficulties arising from these settings. The proposed algorithms have somewhat different emphasis: one algorithm focuses on MDPs with large state spaces but relatively small action spaces, and emphasizes on the efficient allocation of simulation samples to find good value function estimates, whereas the other algorithm targets problems with large action spaces but small state spaces, and invokes a population-based approach to avoid carrying out an optimization over the entire action space. We study the convergence properties of these algorithms and report on computational results to illustrate their performance. The second part of this thesis is devoted to the development of a general framework called Model Reference Adaptive Search (MRAS) for solving global optimization problems. The method iteratively updates a parameterized probability distribution on the solution space, so that the sequence of candidate solutions generated from this distribution will converge asymptotically to the global optimum. We provide a particular instantiation of the framework and establish its convergence properties in both continuous and discrete domains. In addition, we explore the relationship between the recently proposed Cross-Entropy (CE) method and MRAS, and show that the model reference framework can also be used to describe the CE method and study its properties. Finally, we formally discuss the extension of the MRAS framework to stochastic optimization problems and carry out numerical experiments to investigate the performance of the method

Digital Repository at the University of Maryland

COMBINING GRADIENT-BASED OPTIMIZATION WITH STOCHASTIC SEARCH

Author: Enlu Zhou
Jiaqiao Hu
Publication venue
Publication date: 24/04/2020
Field of study

ABSTRACT We propose a stochastic search algorithm for solving non-differentiable optimization problems. At each iteration, the algorithm searches the solution space by generating a population of candidate solutions from a parameterized sampling distribution. The basic idea is to convert the original optimization problem into a differentiable problem in terms of the parameters of the sampling distribution, and then use a quasiNewton-like method on the reformulated problem to find improved sampling distributions. The algorithm combines the strength of stochastic search from considering a population of candidate solutions to explore the solution space with the rapid convergence behavior of gradient methods by exploiting local differentiable structures. We provide numerical examples to illustrate its performance

CiteSeerX

Quantile Optimization via Multiple Timescale Local Search for Black-box Functions

Author: Fu Michael C.
Hu Jiaqiao
Song Meichen
Publication venue
Publication date: 15/08/2023
Field of study

We consider quantile optimization of black-box functions that are estimated with noise. We propose two new iterative three-timescale local search algorithms. The first algorithm uses an appropriately modified finite-difference-based gradient estimator that requires

2d

+ 1 samples of the black-box function per iteration of the algorithm, where

d

is the number of decision variables (dimension of the input vector). For higher-dimensional problems, this algorithm may not be practical if the black-box function estimates are expensive. The second algorithm employs a simultaneous-perturbation-based gradient estimator that uses only three samples for each iteration regardless of problem dimension. Under appropriate conditions, we show the almost sure convergence of both algorithms. In addition, for the class of strongly convex functions, we further establish their (finite-time) convergence rate through a novel fixed-point argument. Simulation experiments indicate that the algorithms work well on a variety of test problems and compare well with recently proposed alternative methods

arXiv.org e-Print Archive

Controlled Optimal Design Program for the Logit Dose Response Model

Author: Hu Jiaqiao
Su Yi
Wong Weng Kee
Zhu Wei
Publication venue: 'Foundation for Open Access Statistic'
Publication date: 16/07/2010
Field of study

The assessment of dose-response is an integral component of the drug development process. Parallel dose-response studies are conducted, customarily, in preclinical and phase 1, 2 clinical trials for this purpose. Practical constraints on dose range, dose levels and dose proportions are intrinsic issues in the design of dose response studies because of drug toxicity, efficacy, FDA regulations, protocol requirements, clinical trial logistics, and marketing issues. We provide a free on-line software package called Controlled Optimal Design 2.0 for generating controlled optimal designs that can incorporate prior information and multiple objectives, and meet multiple practical constraints at the same time. Researchers can either run the web-based design program or download its stand-alone version to construct the desired multiple-objective controlled Bayesian optimal designs. Because researchers often adopt ad-hoc design schemes such as the equal allocation rules without knowing how efficient such designs would be for the design problem, the program also evaluates the efficiency of user-supplied designs

Directory of Open Access Journals

Journal of Statistical Software

A Model Reference Adaptive Search Algorithm for Global Optimization

Author: Fu Michael C.
Hu Jiaqiao
Marcus Steven I.
Publication venue
Publication date: 01/01/2005
Field of study

Digital Repository at the University of Maryland

Variance Reduction for Generalized Likelihood Ratio Method By Conditional Monte Carlo and Randomized Quasi-Monte Carlo

Author: Fu Michael,
Hu Jiaqiao
L'Ecuyer Pierre
Peng Yijie
Tuffin Bruno
Publication venue: Elsevier B.V. on behalf of KeAi Communications Co., Ltd.
Publication date: 01/01/2022
Field of study

International audienceThe generalized likelihood ratio (GLR) method is a recently introduced gradient estimation method for handling discontinuities for a wide scope of sample performances. We put the GLR methods from previous work into a single framework, simplify regularity conditions for justifying unbiasedness of GLR, and relax some of those conditions that are difficult to verify in practice. Moreover, we combine GLR with conditional Monte Carlo methods and randomized quasi-Monte Carlo methods to reduce the variance. Numerical experiments show that the variance reduction could be significant in various applications

INRIA a CCSD electronic archive server

Generalized Likelihood Ratio Method for Stochastic Models with Uniform Random Numbers As Inputs

Author: Fu Michael,
Hu Jiaqiao
L'Ecuyer Pierre
Peng Yijie
Tuffin Bruno
Publication venue: HAL CCSD
Publication date: 29/05/2020
Field of study

We propose a new unbiased stochastic gradient estimator for a family of stochastic models with uniform random numbers as inputs. By extending the generalized likelihood ratio (GLR) method, the proposed estimator applies to discontinuous sample performances with structural parameters without requiring that the tails of the density of the input random variables go down to zero smoothly, an assumption in Peng et al. (2018) and Peng et al. (2020a) that precludes a direct formulation in terms of uniform random numbers as inputs. By overcoming this limitation, our new estimator greatly expands the applicability of the GLR method, which we demonstrate for several general classes of uniform input random numbers, including independent inverse transform random variates and dependent input random variables governed by an Archimedean copula. We show how the new derivative estimator works in specific settings such as density estimation, distribution sensitivity for quantiles, and sensitivity analysis for Markov chain stopping time problems, which we illustrate with applications to statistical quality control, stochastic activity networks, and credit risk derivatives. Numerical experiments substantiate broad applicability and flexibility in dealing with discontinuities in sample performance

INRIA a CCSD electronic archive server

An Evolutionary Random Search Algorithm for Solving Markov Decision Processes

Author: Fu Michael C.
Hu Jiaqiao
Marcus Steven I.
Ramezani Vahid
Publication venue
Publication date: 01/01/2005
Field of study

Digital Repository at the University of Maryland

An ϵ-Greedy Multiarmed Bandit Approach to Markov Decision Processes

Author: Isa Muqattash
Jiaqiao Hu
Publication venue: 'MDPI AG'
Publication date: 01/01/2023
Field of study

We present REGA, a new adaptive-sampling-based algorithm for the control of finite-horizon Markov decision processes (MDPs) with very large state spaces and small action spaces. We apply a variant of the ϵ-greedy multiarmed bandit algorithm to each stage of the MDP in a recursive manner, thus computing an estimation of the “reward-to-go” value at each stage of the MDP. We provide a finite-time analysis of REGA. In particular, we provide a bound on the probability that the approximation error exceeds a given threshold, where the bound is given in terms of the number of samples collected at each stage of the MDP. We empirically compare REGA against another sampling-based algorithm called RASA by running simulations against the SysAdmin benchmark problem with 210 states. The results show that REGA and RASA achieved similar performance. Moreover, REGA and RASA empirically outperformed an implementation of the algorithm that uses the “original” ϵ-greedy algorithm that commonly appears in the literature

Multidisciplinary Digital Publishing Institute