10 research outputs found

    The Price of Information in Combinatorial Optimization

    Full text link
    Consider a network design application where we wish to lay down a minimum-cost spanning tree in a given graph; however, we only have stochastic information about the edge costs. To learn the precise cost of any edge, we have to conduct a study that incurs a price. Our goal is to find a spanning tree while minimizing the disutility, which is the sum of the tree cost and the total price that we spend on the studies. In a different application, each edge gives a stochastic reward value. Our goal is to find a spanning tree while maximizing the utility, which is the tree reward minus the prices that we pay. Situations such as the above two often arise in practice where we wish to find a good solution to an optimization problem, but we start with only some partial knowledge about the parameters of the problem. The missing information can be found only after paying a probing price, which we call the price of information. What strategy should we adopt to optimize our expected utility/disutility? A classical example of the above setting is Weitzman's "Pandora's box" problem where we are given probability distributions on values of nn independent random variables. The goal is to choose a single variable with a large value, but we can find the actual outcomes only after paying a price. Our work is a generalization of this model to other combinatorial optimization problems such as matching, set cover, facility location, and prize-collecting Steiner tree. We give a technique that reduces such problems to their non-price counterparts, and use it to design exact/approximation algorithms to optimize our utility/disutility. Our techniques extend to situations where there are additional constraints on what parameters can be probed or when we can simultaneously probe a subset of the parameters.Comment: SODA 201

    Approximation Algorithms for Stochastic k-TSP

    Get PDF
    This paper studies the stochastic variant of the classical k-TSP problem where rewards at the vertices are independent random variables which are instantiated upon the tour\u27s visit. The objective is to minimize the expected length of a tour that collects reward at least k. The solution is a policy describing the tour which may (adaptive) or may not (non-adaptive) depend on the observed rewards. Our work presents an adaptive O(log k)-approximation algorithm for Stochastic k-TSP, along with a non-adaptive O(log^2 k)-approximation algorithm which also upper bounds the adaptivity gap by O(log^2 k). We also show that the adaptivity gap of Stochastic k-TSP is at least e, even in the special case of stochastic knapsack cover

    Efficient Approximation Schemes for Stochastic Probing and Prophet Problems

    Full text link
    Our main contribution is a general framework to design efficient polynomial time approximation schemes (EPTAS) for fundamental classes of stochastic combinatorial optimization problems. Given an error parameter ϵ>0\epsilon>0, such algorithmic schemes attain a (1+ϵ)(1+\epsilon)-approximation in only t(ϵ)poly(n)t(\epsilon)\cdot poly(n) time, where t()t(\cdot) is some function that depends only on ϵ\epsilon. Technically speaking, our approach relies on presenting tailor-made reductions to a newly-introduced multi-dimensional extension of the Santa Claus problem [Bansal-Sviridenko, STOC'06]. Even though the single-dimensional problem is already known to be APX-Hard, we prove that an EPTAS can be designed under certain structural assumptions, which hold for our applications. To demonstrate the versatility of our framework, we obtain an EPTAS for the adaptive ProbeMax problem as well as for its non-adaptive counterpart; in both cases, state-of-the-art approximability results have been inefficient polynomial time approximation schemes (PTAS) [Chen et al., NIPS'16; Fu et al., ICALP'18]. Turning our attention to selection-stopping settings, we further derive an EPTAS for the Free-Order Prophets problem [Agrawal et al., EC'20] and for its cost-driven generalization, Pandora's Box with Commitment [Fu et al., ICALP'18]. These results improve on known PTASes for their adaptive variants, and constitute the first non-trivial approximations in the non-adaptive setting.Comment: 33 page

    Dynamic, data-driven decision-making in revenue management

    Get PDF
    Thesis: Ph. D., Massachusetts Institute of Technology, Sloan School of Management, Operations Research Center, 2018.Cataloged from PDF version of thesis.Includes bibliographical references (pages 233-241).Motivated by applications in Revenue Management (RM), this thesis studies various problems in sequential decision-making and demand learning. In the first module, we consider a personalized RM setting, where items with limited inventories are recommended to heterogeneous customers sequentially visiting an e-commerce platform. We take the perspective of worst-case competitive ratio analysis, and aim to develop algorithms whose performance guarantees do not depend on the customer arrival process. We provide the first solution to this problem when there are both multiple items and multiple prices at which they could be sold, framing it as a general online resource allocation problem and developing a system of forecast-independent bid prices (Chapter 2). Second, we study a related assortment planning problem faced by Walmart Online Grocery, where before checkout, customers are recommended "add-on" items that are complementary to their current shopping cart (Chapter 3). Third, we derive inventory-dependent priceskimming policies for the single-leg RM problem, which extends existing competitive ratio results to non-independent demand (Chapter 4). In this module, we test our algorithms using a publicly-available data set from a major hotel chain. In the second module, we study bundling, which is the practice of selling different items together, and show how to learn and price using bundles. First, we introduce bundling as a new, alternate method for learning the price elasticities of items, which does not require any changing of prices; we validate our method on data from a large online retailer (Chapter 5). Second, we show how to sell bundles of goods profitably even when the goods have high production costs, and derive both distribution-dependent and distribution-free guarantees on the profitability (Chapter 6). In the final module, we study the Markovian multi-armed bandit problem under an undiscounted finite time horizon (Chapter 7). We improve existing approximation algorithms using LP rounding and random sampling techniques, which result in a (1/2 - eps)- approximation for the correlated stochastic knapsack problem that is tight relative to the LP. In this work, we introduce a framework for designing self-sampling algorithms, which is also used in our chronologically-later-to-appear work on add-on recommendation and single-leg RM.by Will (Wei) Ma.Ph. D

    On the Adaptivity Gap of Stochastic Orienteering

    No full text
    The input to the stochastic orienteering problem consists of a budget B and metric (V,d) where each vertex v has a job with deterministic reward and random processing time (drawn from a known distribution). The processing times are independent across vertices. The goal is to obtain a non-anticipatory policy to run jobs at different vertices, that maximizes expected reward, subject to the total distance traveled plus processing times being at most B. An adaptive policy is one that can choose the next vertex to visit based on observed random instantiations. Whereas, a non-adaptive policy is just given by

    On the adaptivity gap of stochastic orienteering

    No full text
    The input to the stochastic orienteering problem [14] consists of a budget B and metric (V,d) where each vertex v¿¿¿V has a job with a deterministic reward and a random processing time (drawn from a known distribution). The processing times are independent across vertices. The goal is to obtain a non-anticipatory policy (originating from a given root vertex) to run jobs at different vertices, that maximizes expected reward, subject to the total distance traveled plus processing times being at most B. An adaptive policy is one that can choose the next vertex to visit based on observed random instantiations. Whereas, a non-adaptive policy is just given by a fixed ordering of vertices. The adaptivity gap is the worst-case ratio of the expected rewards of the optimal adaptive and non-adaptive policies. We prove an O((loglogB)1/2) lower bound on the adaptivity gap of stochastic orienteering. This provides a negative answer to the O(1)-adaptivity gap conjectured in [14] and comes close to the O(loglogB) upper bound proved there. This result holds even on a line metric. We also show an O(loglogB) upper bound on the adaptivity gap for the correlated stochastic orienteering problem, where the reward of each job is random and possibly correlated to its processing time. Using this, we obtain an improved quasi-polynomial time min{logn,logB}·O~(log2logB)-approximation algorithm for correlated stochastic orienteering