10 research outputs found
The Price of Information in Combinatorial Optimization
Consider a network design application where we wish to lay down a
minimum-cost spanning tree in a given graph; however, we only have stochastic
information about the edge costs. To learn the precise cost of any edge, we
have to conduct a study that incurs a price. Our goal is to find a spanning
tree while minimizing the disutility, which is the sum of the tree cost and the
total price that we spend on the studies. In a different application, each edge
gives a stochastic reward value. Our goal is to find a spanning tree while
maximizing the utility, which is the tree reward minus the prices that we pay.
Situations such as the above two often arise in practice where we wish to
find a good solution to an optimization problem, but we start with only some
partial knowledge about the parameters of the problem. The missing information
can be found only after paying a probing price, which we call the price of
information. What strategy should we adopt to optimize our expected
utility/disutility?
A classical example of the above setting is Weitzman's "Pandora's box"
problem where we are given probability distributions on values of
independent random variables. The goal is to choose a single variable with a
large value, but we can find the actual outcomes only after paying a price. Our
work is a generalization of this model to other combinatorial optimization
problems such as matching, set cover, facility location, and prize-collecting
Steiner tree. We give a technique that reduces such problems to their non-price
counterparts, and use it to design exact/approximation algorithms to optimize
our utility/disutility. Our techniques extend to situations where there are
additional constraints on what parameters can be probed or when we can
simultaneously probe a subset of the parameters.Comment: SODA 201
Approximation Algorithms for Stochastic k-TSP
This paper studies the stochastic variant of the classical k-TSP problem where rewards at the vertices are independent random variables which are instantiated upon the tour\u27s visit. The objective is to minimize the expected length of a tour that collects reward at least k. The solution is a policy describing the tour which may (adaptive) or may not (non-adaptive) depend on the observed rewards.
Our work presents an adaptive O(log k)-approximation algorithm for Stochastic k-TSP, along with a non-adaptive O(log^2 k)-approximation algorithm which also upper bounds the adaptivity gap by O(log^2 k). We also show that the adaptivity gap of Stochastic k-TSP is at least e, even in the special case of stochastic knapsack cover
Efficient Approximation Schemes for Stochastic Probing and Prophet Problems
Our main contribution is a general framework to design efficient polynomial
time approximation schemes (EPTAS) for fundamental classes of stochastic
combinatorial optimization problems. Given an error parameter ,
such algorithmic schemes attain a -approximation in only
time, where is some function that depends
only on . Technically speaking, our approach relies on presenting
tailor-made reductions to a newly-introduced multi-dimensional extension of the
Santa Claus problem [Bansal-Sviridenko, STOC'06]. Even though the
single-dimensional problem is already known to be APX-Hard, we prove that an
EPTAS can be designed under certain structural assumptions, which hold for our
applications.
To demonstrate the versatility of our framework, we obtain an EPTAS for the
adaptive ProbeMax problem as well as for its non-adaptive counterpart; in both
cases, state-of-the-art approximability results have been inefficient
polynomial time approximation schemes (PTAS) [Chen et al., NIPS'16; Fu et al.,
ICALP'18]. Turning our attention to selection-stopping settings, we further
derive an EPTAS for the Free-Order Prophets problem [Agrawal et al., EC'20] and
for its cost-driven generalization, Pandora's Box with Commitment [Fu et al.,
ICALP'18]. These results improve on known PTASes for their adaptive variants,
and constitute the first non-trivial approximations in the non-adaptive
setting.Comment: 33 page
Dynamic, data-driven decision-making in revenue management
Thesis: Ph. D., Massachusetts Institute of Technology, Sloan School of Management, Operations Research Center, 2018.Cataloged from PDF version of thesis.Includes bibliographical references (pages 233-241).Motivated by applications in Revenue Management (RM), this thesis studies various problems in sequential decision-making and demand learning. In the first module, we consider a personalized RM setting, where items with limited inventories are recommended to heterogeneous customers sequentially visiting an e-commerce platform. We take the perspective of worst-case competitive ratio analysis, and aim to develop algorithms whose performance guarantees do not depend on the customer arrival process. We provide the first solution to this problem when there are both multiple items and multiple prices at which they could be sold, framing it as a general online resource allocation problem and developing a system of forecast-independent bid prices (Chapter 2). Second, we study a related assortment planning problem faced by Walmart Online Grocery, where before checkout, customers are recommended "add-on" items that are complementary to their current shopping cart (Chapter 3). Third, we derive inventory-dependent priceskimming policies for the single-leg RM problem, which extends existing competitive ratio results to non-independent demand (Chapter 4). In this module, we test our algorithms using a publicly-available data set from a major hotel chain. In the second module, we study bundling, which is the practice of selling different items together, and show how to learn and price using bundles. First, we introduce bundling as a new, alternate method for learning the price elasticities of items, which does not require any changing of prices; we validate our method on data from a large online retailer (Chapter 5). Second, we show how to sell bundles of goods profitably even when the goods have high production costs, and derive both distribution-dependent and distribution-free guarantees on the profitability (Chapter 6). In the final module, we study the Markovian multi-armed bandit problem under an undiscounted finite time horizon (Chapter 7). We improve existing approximation algorithms using LP rounding and random sampling techniques, which result in a (1/2 - eps)- approximation for the correlated stochastic knapsack problem that is tight relative to the LP. In this work, we introduce a framework for designing self-sampling algorithms, which is also used in our chronologically-later-to-appear work on add-on recommendation and single-leg RM.by Will (Wei) Ma.Ph. D
On the Adaptivity Gap of Stochastic Orienteering
The input to the stochastic orienteering problem consists of a budget B and metric (V,d) where each vertex v has a job with deterministic reward and random processing time (drawn from a known distribution). The processing times are independent across vertices. The goal is to obtain a non-anticipatory policy to run jobs at different vertices, that maximizes expected reward, subject to the total distance traveled plus processing times being at most B. An adaptive policy is one that can choose the next vertex to visit based on observed random instantiations. Whereas, a non-adaptive policy is just given by
On the adaptivity gap of stochastic orienteering
The input to the stochastic orienteering problem [14] consists of a budget B and metric (V,d) where each vertex v¿¿¿V has a job with a deterministic reward and a random processing time (drawn from a known distribution). The processing times are independent across vertices. The goal is to obtain a non-anticipatory policy (originating from a given root vertex) to run jobs at different vertices, that maximizes expected reward, subject to the total distance traveled plus processing times being at most B. An adaptive policy is one that can choose the next vertex to visit based on observed random instantiations. Whereas, a non-adaptive policy is just given by a fixed ordering of vertices. The adaptivity gap is the worst-case ratio of the expected rewards of the optimal adaptive and non-adaptive policies. We prove an O((loglogB)1/2) lower bound on the adaptivity gap of stochastic orienteering. This provides a negative answer to the O(1)-adaptivity gap conjectured in [14] and comes close to the O(loglogB) upper bound proved there. This result holds even on a line metric. We also show an O(loglogB) upper bound on the adaptivity gap for the correlated stochastic orienteering problem, where the reward of each job is random and possibly correlated to its processing time. Using this, we obtain an improved quasi-polynomial time min{logn,logB}·O~(log2logB)-approximation algorithm for correlated stochastic orienteering