33 research outputs found
Semidiscrete optimal transport with unknown costs
Semidiscrete optimal transport is a challenging generalization of the
classical transportation problem in linear programming. The goal is to design a
joint distribution for two random variables (one continuous, one discrete) with
fixed marginals, in a way that minimizes expected cost. We formulate a novel
variant of this problem in which the cost functions are unknown, but can be
learned through noisy observations; however, only one function can be sampled
at a time. We develop a semi-myopic algorithm that couples online learning with
stochastic approximation, and prove that it achieves optimal convergence rates,
despite the non-smoothness of the stochastic gradient and the lack of strong
concavity in the objective function
A New Optimal Stepsize For Approximate Dynamic Programming
Approximate dynamic programming (ADP) has proven itself in a wide range of
applications spanning large-scale transportation problems, health care, revenue
management, and energy systems. The design of effective ADP algorithms has many
dimensions, but one crucial factor is the stepsize rule used to update a value
function approximation. Many operations research applications are
computationally intensive, and it is important to obtain good results quickly.
Furthermore, the most popular stepsize formulas use tunable parameters and can
produce very poor results if tuned improperly. We derive a new stepsize rule
that optimizes the prediction error in order to improve the short-term
performance of an ADP algorithm. With only one, relatively insensitive tunable
parameter, the new rule adapts to the level of noise in the problem and
produces faster convergence in numerical experiments.Comment: Matlab files are included with the paper sourc
Optimal Information Blending with Measurements in the L2 Sphere
manuscript (Please, provide the mansucript number!) Authors are encouraged to submit new papers to INFORMS journals by means of a style file template, which includes the journal title. However, use of a template does not certify that the paper has been accepted for publication in the named journal. INFORMS journal templates are for the exclusive purpose of submitting to an INFORMS journal and should not be used to distribute the papers in print or online or to submit the papers to another publication
Optimal learning with non-Gaussian rewards
We propose a novel theoretical characterization of the optimal “Gittins index ” policy in multi-armed bandit problems with non-Gaussian, infinitely divisible reward distributions. We first construct a continuous-time, conditional Lévy process which probabilistically interpolates the sequence of discrete-time rewards. When the rewards are Gaussian, this approach enables an easy connection to the convenient time-change properties of Brownian motion. Although no such device is available in general for the non-Gaussian case, we use optimal stopping theory to characterize the value of the optimal policy as the solution to a free-boundary partial integro-differential equation (PIDE). We provide the free-boundary PIDE in explicit form under the specific settings of exponential and Poisson rewards. We also prove continuity and monotonicity properties of the Gittins index in these two problems, and discuss how the PIDE can be solved numerically to find the optimal index value of a given belief state.
Approximate Dynamic Programming With Correlated Bayesian Beliefs
Abstract — In approximate dynamic programming, we can represent our uncertainty about the value function using a Bayesian model with correlated beliefs. Thus, a decision made at a single state can provide us with information about many states, making each individual observation much more powerful. We propose a new exploration strategy based on the knowledge gradient concept from the optimal learning literature, which is currently the only method capable of handling correlated belief structures. The proposed method outperforms several other heuristics in numerical experiments conducted on two broad problem classes. I