28 research outputs found
Correlated Stochastic Knapsack with a Submodular Objective
We study the correlated stochastic knapsack problem of a submodular target function, with optional additional constraints. We utilize the multilinear extension of submodular function, and bundle it with an adaptation of the relaxed linear constraints from Ma [Mathematics of Operations Research, Volume 43(3), 2018] on correlated stochastic knapsack problem. The relaxation is then solved by the stochastic continuous greedy algorithm, and rounded by a novel method to fit the contention resolution scheme (Feldman et al. [FOCS 2011]). We obtain a pseudo-polynomial time (1 - 1/?e)/2 ? 0.1967 approximation algorithm with or without those additional constraints, eliminating the need of a key assumption and improving on the (1 - 1/?e)/2 ? 0.1106 approximation by Fukunaga et al. [AAAI 2019]
Algorithms and Adaptivity Gaps for Stochastic k-TSP
Given a metric and a , the classic
\textsf{k-TSP} problem is to find a tour originating at the
of minimum length that visits at least nodes in . In this work,
motivated by applications where the input to an optimization problem is
uncertain, we study two stochastic versions of \textsf{k-TSP}.
In Stoch-Reward -TSP, originally defined by Ene-Nagarajan-Saket [ENS17],
each vertex in the given metric contains a stochastic reward .
The goal is to adaptively find a tour of minimum expected length that collects
at least reward ; here "adaptively" means our next decision may depend on
previous outcomes. Ene et al. give an -approximation adaptive
algorithm for this problem, and left open if there is an -approximation
algorithm. We totally resolve their open question and even give an
-approximation \emph{non-adaptive} algorithm for this problem.
We also introduce and obtain similar results for the Stoch-Cost -TSP
problem. In this problem each vertex has a stochastic cost , and the
goal is to visit and select at least vertices to minimize the expected
\emph{sum} of tour length and cost of selected vertices. This problem
generalizes the Price of Information framework [Singla18] from deterministic
probing costs to metric probing costs.
Our techniques are based on two crucial ideas: "repetitions" and "critical
scaling". We show using Freedman's and Jogdeo-Samuels' inequalities that for
our problems, if we truncate the random variables at an ideal threshold and
repeat, then their expected values form a good surrogate. Unfortunately, this
ideal threshold is adaptive as it depends on how far we are from achieving our
target , so we truncate at various different scales and identify a
"critical" scale.Comment: ITCS 202
Dynamic, data-driven decision-making in revenue management
Thesis: Ph. D., Massachusetts Institute of Technology, Sloan School of Management, Operations Research Center, 2018.Cataloged from PDF version of thesis.Includes bibliographical references (pages 233-241).Motivated by applications in Revenue Management (RM), this thesis studies various problems in sequential decision-making and demand learning. In the first module, we consider a personalized RM setting, where items with limited inventories are recommended to heterogeneous customers sequentially visiting an e-commerce platform. We take the perspective of worst-case competitive ratio analysis, and aim to develop algorithms whose performance guarantees do not depend on the customer arrival process. We provide the first solution to this problem when there are both multiple items and multiple prices at which they could be sold, framing it as a general online resource allocation problem and developing a system of forecast-independent bid prices (Chapter 2). Second, we study a related assortment planning problem faced by Walmart Online Grocery, where before checkout, customers are recommended "add-on" items that are complementary to their current shopping cart (Chapter 3). Third, we derive inventory-dependent priceskimming policies for the single-leg RM problem, which extends existing competitive ratio results to non-independent demand (Chapter 4). In this module, we test our algorithms using a publicly-available data set from a major hotel chain. In the second module, we study bundling, which is the practice of selling different items together, and show how to learn and price using bundles. First, we introduce bundling as a new, alternate method for learning the price elasticities of items, which does not require any changing of prices; we validate our method on data from a large online retailer (Chapter 5). Second, we show how to sell bundles of goods profitably even when the goods have high production costs, and derive both distribution-dependent and distribution-free guarantees on the profitability (Chapter 6). In the final module, we study the Markovian multi-armed bandit problem under an undiscounted finite time horizon (Chapter 7). We improve existing approximation algorithms using LP rounding and random sampling techniques, which result in a (1/2 - eps)- approximation for the correlated stochastic knapsack problem that is tight relative to the LP. In this work, we introduce a framework for designing self-sampling algorithms, which is also used in our chronologically-later-to-appear work on add-on recommendation and single-leg RM.by Will (Wei) Ma.Ph. D
Efficient Approximation Schemes for Stochastic Probing and Prophet Problems
Our main contribution is a general framework to design efficient polynomial
time approximation schemes (EPTAS) for fundamental classes of stochastic
combinatorial optimization problems. Given an error parameter ,
such algorithmic schemes attain a -approximation in only
time, where is some function that depends
only on . Technically speaking, our approach relies on presenting
tailor-made reductions to a newly-introduced multi-dimensional extension of the
Santa Claus problem [Bansal-Sviridenko, STOC'06]. Even though the
single-dimensional problem is already known to be APX-Hard, we prove that an
EPTAS can be designed under certain structural assumptions, which hold for our
applications.
To demonstrate the versatility of our framework, we obtain an EPTAS for the
adaptive ProbeMax problem as well as for its non-adaptive counterpart; in both
cases, state-of-the-art approximability results have been inefficient
polynomial time approximation schemes (PTAS) [Chen et al., NIPS'16; Fu et al.,
ICALP'18]. Turning our attention to selection-stopping settings, we further
derive an EPTAS for the Free-Order Prophets problem [Agrawal et al., EC'20] and
for its cost-driven generalization, Pandora's Box with Commitment [Fu et al.,
ICALP'18]. These results improve on known PTASes for their adaptive variants,
and constitute the first non-trivial approximations in the non-adaptive
setting.Comment: 33 page
Essays in Problems of Optimal Sequential Decisions
In this dissertation, we study several Markovian problems of optimal sequential decisions by focusing on research questions that are driven by probabilistic and operations-management considerations. Our probabilistic interest is in understanding the distribution of the total reward that one obtains when implementing a policy that maximizes its expected value. With this respect, we study the sequential selection of unimodal and alternating subsequences from a random sample, and we prove accurate bounds for the expected values and exact asymptotics. In the unimodal problem, we also note that the variance of the optimal total reward can be bounded in terms of its expected value. This fact then motivates a much broader analysis that characterizes a class of Markov decision problems that share this important property. In the alternating subsequence problem, we also outline how one could be able to prove a Central Limit Theorem for the number of alternating selections in a finite random sample, as the size of the sample grows to infinity. Our operations-management interest is in studying the interaction of on-the-job learning and learning-by-doing in a workforce-related problem. Specifically, we study the sequential hiring and retention of heterogeneous workers who learn over time. We model the hiring and retention problem as a Bayesian infinite-armed bandit, and we characterize the optimal policy in detail. Through an extensive set of numerical examples, we gain insights into the managerial nature of the problem, and we demonstrate that the value of active monitoring and screening of employees can be substantial