Search CORE

33 research outputs found

Semidiscrete optimal transport with unknown costs

Author: Ryzhov Ilya O.
Zhu Yinchu
Publication venue
Publication date: 14/11/2023
Field of study

Semidiscrete optimal transport is a challenging generalization of the classical transportation problem in linear programming. The goal is to design a joint distribution for two random variables (one continuous, one discrete) with fixed marginals, in a way that minimizes expected cost. We formulate a novel variant of this problem in which the cost functions are unknown, but can be learned through noisy observations; however, only one function can be sampled at a time. We develop a semi-myopic algorithm that couples online learning with stochastic approximation, and prove that it achieves optimal convergence rates, despite the non-smoothness of the stochastic gradient and the lack of strong concavity in the objective function

arXiv.org e-Print Archive

A New Optimal Stepsize For Approximate Dynamic Programming

Author: Frazier Peter I.
Powell Warren B.
Ryzhov Ilya O.
Publication venue
Publication date: 13/07/2014
Field of study

Approximate dynamic programming (ADP) has proven itself in a wide range of applications spanning large-scale transportation problems, health care, revenue management, and energy systems. The design of effective ADP algorithms has many dimensions, but one crucial factor is the stepsize rule used to update a value function approximation. Many operations research applications are computationally intensive, and it is important to obtain good results quickly. Furthermore, the most popular stepsize formulas use tunable parameters and can produce very poor results if tuned improperly. We derive a new stepsize rule that optimizes the prediction error in order to improve the short-term performance of an ADP algorithm. With only one, relatively insensitive tunable parameter, the new rule adapts to the level of noise in the problem and produces faster convergence in numerical experiments.Comment: Matlab files are included with the paper sourc

arXiv.org e-Print Archive

Princeton University Open Access Repository

Crossref

Optimal Information Blending with Measurements in the L2 Sphere

Author: Boris Defourny
Ilya O. Ryzhov
Warren B. Powell
Publication venue
Publication date: 03/11/2015
Field of study

manuscript (Please, provide the mansucript number!) Authors are encouraged to submit new papers to INFORMS journals by means of a style file template, which includes the journal title. However, use of a template does not certify that the paper has been accepted for publication in the named journal. INFORMS journal templates are for the exclusive purpose of submitting to an INFORMS journal and should not be used to distribute the papers in print or online or to submit the papers to another publication

CiteSeerX

The Knowledge Gradient Algorithm for a General Class of Online Learning Problems

Author: DeGroot M. H.
Ginebra J.
Gittins J. C.
Gittins J. C.
Ilya O. Ryzhov
Kaelbling L. P.
Pandey S.
Peter I. Frazier
Steele J. M.
Sutton R. S.
Tesauro G.
Tewari A.
Warren B. Powell
Whittle P.
Publication venue: 'Institute for Operations Research and the Management Sciences (INFORMS)'
Publication date
Field of study

Crossref

Optimal learning with non-Gaussian rewards

Author: Ilya O. Ryzhov
Zi Ding
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2013
Field of study

We propose a novel theoretical characterization of the optimal “Gittins index ” policy in multi-armed bandit problems with non-Gaussian, infinitely divisible reward distributions. We first construct a continuous-time, conditional Lévy process which probabilistically interpolates the sequence of discrete-time rewards. When the rewards are Gaussian, this approach enables an easy connection to the convenient time-change properties of Brownian motion. Although no such device is available in general for the non-Gaussian case, we use optimal stopping theory to characterize the value of the optimal policy as the solution to a free-boundary partial integro-differential equation (PIDE). We provide the free-boundary PIDE in explicit form under the specific settings of exponential and Poisson rewards. We also prove continuity and monotonicity properties of the Gittins index in these two problems, and discuss how the PIDE can be solved numerically to find the optimal index value of a given belief state.

CiteSeerX

Crossref

Approximate Dynamic Programming With Correlated Bayesian Beliefs

Author: Ilya O. Ryzhov
Warren B. Powell
Publication venue
Publication date: 01/01/2010
Field of study

Abstract — In approximate dynamic programming, we can represent our uncertainty about the value function using a Bayesian model with correlated beliefs. Thus, a decision made at a single state can provide us with information about many states, making each individual observation much more powerful. We propose a new exploration strategy based on the knowledge gradient concept from the optimal learning literature, which is currently the only method capable of handling correlated belief structures. The proposed method outperforms several other heuristics in numerical experiments conducted on two broad problem classes. I

CiteSeerX

Crossref