77 research outputs found

    Upfront Commitment in Online Resource Allocation with Patient Customers

    Full text link
    In many on-demand online platforms such as ride-sharing, grocery delivery, or shipping, some arriving agents are patient and willing to wait a short amount of time for the resource or service as long as there is an upfront guarantee that service will be ultimately provided within a certain delay. Motivated by this, we present a setting with patient and impatient agents who seek a resource or service that replenishes periodically. Impatient agents demand the resource immediately upon arrival while patient agents are willing to wait a short period conditioned on an upfront commitment to receive the resource. We study this setting under adversarial arrival models using a relaxed notion of competitive ratio. We present a class of POLYtope-based Resource Allocation (POLYRA) algorithms that achieve optimal or near-optimal competitive ratios. Such POLYRA algorithms work by consulting a particular polytope and only making decisions that guarantee the algorithm's state remains feasible in this polytope. When the number of agent types is either two or three, POLYRA algorithms can obtain the optimal competitive ratio. To design these polytopes, we construct an upper bound on the competitive ratio of any algorithm, which is characterized via a linear program (LP) that considers a collection of overlapping worst-case input sequences. Our designed POLYRA algorithms then mimic the optimal solution of this upper bound LP via its polytope's definition, obtaining the optimal competitive ratio. When there are more than three types, our overlapping worst-case input sequences do not necessarily result in an attainable competitive ratio, and so we present a class of simple and interpretable POLYRA algorithm which achieves at least 80% of the optimal competitive ratio. We complement our theoretical studies with numerical analysis which shows the efficiency of our algorithms beyond adversarial arrival

    Improved Revenue Bounds for Posted-Price and Second-Price Mechanisms

    Full text link
    We study revenue maximization through sequential posted-price (SPP) mechanisms in single-dimensional settings with nn buyers and independent but not necessarily identical value distributions. We construct the SPP mechanisms by considering the best of two simple pricing rules: one that imitates the revenue optimal mchanism, namely the Myersonian mechanism, via the taxation principle and the other that posts a uniform price. Our pricing rules are rather generalizable and yield the first improvement over long-established approximation factors in several settings. We design factor-revealing mathematical programs that crisply capture the approximation factor of our SPP mechanism. In the single-unit setting, our SPP mechanism yields a better approximation factor than the state of the art prior to our work (Azar, Chiplunkar & Kaplan, 2018). In the multi-unit setting, our SPP mechanism yields the first improved approximation factor over the state of the art after over nine years (Yan, 2011 and Chakraborty et al., 2010). Our results on SPP mechanisms immediately imply improved performance guarantees for the equivalent free-order prophet inequality problem. In the position auction setting, our SPP mechanism yields the first higher-than 11/e1-1/e approximation factor. In eager second-price (ESP) auctions, our two simple pricing rules lead to the first improved approximation factor that is strictly greater than what is obtained by the SPP mechanism in the single-unit setting.Comment: Accepted to Operations Researc

    Contextual Bandits with Cross-learning

    Full text link
    In the classical contextual bandits problem, in each round tt, a learner observes some context cc, chooses some action aa to perform, and receives some reward ra,t(c)r_{a,t}(c). We consider the variant of this problem where in addition to receiving the reward ra,t(c)r_{a,t}(c), the learner also learns the values of ra,t(c)r_{a,t}(c') for all other contexts cc'; i.e., the rewards that would have been achieved by performing that action under different contexts. This variant arises in several strategic settings, such as learning how to bid in non-truthful repeated auctions (in this setting the context is the decision maker's private valuation for each auction). We call this problem the contextual bandits problem with cross-learning. The best algorithms for the classical contextual bandits problem achieve O~(CKT)\tilde{O}(\sqrt{CKT}) regret against all stationary policies, where CC is the number of contexts, KK the number of actions, and TT the number of rounds. We demonstrate algorithms for the contextual bandits problem with cross-learning that remove the dependence on CC and achieve regret O(KT)O(\sqrt{KT}) (when contexts are stochastic with known distribution), O~(K1/3T2/3)\tilde{O}(K^{1/3}T^{2/3}) (when contexts are stochastic with unknown distribution), and O~(KT)\tilde{O}(\sqrt{KT}) (when contexts are adversarial but rewards are stochastic).Comment: 48 pages, 5 figure

    Multi-Platform Budget Management in Ad Markets with Non-IC Auctions

    Full text link
    In online advertising markets, budget-constrained advertisers acquire ad placements through repeated bidding in auctions on various platforms. We present a strategy for bidding optimally in a set of auctions that may or may not be incentive-compatible under the presence of budget constraints. Our strategy maximizes the expected total utility across auctions while satisfying the advertiser's budget constraints in expectation. Additionally, we investigate the online setting where the advertiser must submit bids across platforms while learning about other bidders' bids over time. Our algorithm has O(T3/4)O(T^{3/4}) regret under the full-information setting. Finally, we demonstrate that our algorithms have superior cumulative regret on both synthetic and real-world datasets of ad placement auctions, compared to existing adaptive pacing algorithms.Comment: 34 pages, 5 figure

    Learning in Repeated Multi-Unit Pay-As-Bid Auctions

    Full text link
    Motivated by Carbon Emissions Trading Schemes, Treasury Auctions, and Procurement Auctions, which all involve the auctioning of homogeneous multiple units, we consider the problem of learning how to bid in repeated multi-unit pay-as-bid auctions. In each of these auctions, a large number of (identical) items are to be allocated to the largest submitted bids, where the price of each of the winning bids is equal to the bid itself. The problem of learning how to bid in pay-as-bid auctions is challenging due to the combinatorial nature of the action space. We overcome this challenge by focusing on the offline setting, where the bidder optimizes their vector of bids while only having access to the past submitted bids by other bidders. We show that the optimal solution to the offline problem can be obtained using a polynomial time dynamic programming (DP) scheme. We leverage the structure of the DP scheme to design online learning algorithms with polynomial time and space complexity under full information and bandit feedback settings. We achieve an upper bound on regret of O(MTlogB)O(M\sqrt{T\log |\mathcal{B}|}) and O(MBTlogB)O(M\sqrt{|\mathcal{B}|T\log |\mathcal{B}|}) respectively, where MM is the number of units demanded by the bidder, TT is the total number of auctions, and B|\mathcal{B}| is the size of the discretized bid space. We accompany these results with a regret lower bound, which match the linear dependency in MM. Our numerical results suggest that when all agents behave according to our proposed no regret learning algorithms, the resulting market dynamics mainly converge to a welfare maximizing equilibrium where bidders submit uniform bids. Lastly, our experiments demonstrate that the pay-as-bid auction consistently generates significantly higher revenue compared to its popular alternative, the uniform price auction.Comment: 51 pages, 12 Figure

    Fair Assortment Planning

    Full text link
    Many online platforms, ranging from online retail stores to social media platforms, employ algorithms to optimize their offered assortment of items (e.g., products and contents). These algorithms tend to prioritize the platforms' short-term goals by solely featuring items with the highest popularity or revenue. However, this practice can then lead to undesirable outcomes for the rest of the items, making them leave the platform, and in turn hurting the platform's long-term goals. Motivated by that, we introduce and study a fair assortment planning problem, which requires any two items with similar quality/merits to be offered similar outcomes. We show that the problem can be formulated as a linear program (LP), called (FAIR), that optimizes over the distribution of all feasible assortments. To find a near-optimal solution to (FAIR), we propose a framework based on the Ellipsoid method, which requires a polynomial-time separation oracle to the dual of the LP. We show that finding an optimal separation oracle to the dual problem is an NP-complete problem, and hence we propose a series of approximate separation oracles, which then result in a 1/21/2-approx. algorithm and a PTAS for the original Problem (FAIR). The approximate separation oracles are designed by (i) showing the separation oracle to the dual of the LP is equivalent to solving an infinite series of parameterized knapsack problems, and (ii) taking advantage of the structure of the parameterized knapsack problems. Finally, we conduct a case study using the MovieLens dataset, which demonstrates the efficacy of our algorithms and further sheds light on the price of fairness.Comment: 86 pages, 7 figure

    Optimal Learning for Structured Bandits

    Full text link
    We study structured multi-armed bandits, which is the problem of online decision-making under uncertainty in the presence of structural information. In this problem, the decision-maker needs to discover the best course of action despite observing only uncertain rewards over time. The decision-maker is aware of certain structural information regarding the reward distributions and would like to minimize their regret by exploiting this information, where the regret is its performance difference against a benchmark policy that knows the best action ahead of time. In the absence of structural information, the classical upper confidence bound (UCB) and Thomson sampling algorithms are well known to suffer only minimal regret. As recently pointed out, neither algorithms are, however, capable of exploiting structural information that is commonly available in practice. We propose a novel learning algorithm that we call DUSA whose worst-case regret matches the information-theoretic regret lower bound up to a constant factor and can handle a wide range of structural information. Our algorithm DUSA solves a dual counterpart of the regret lower bound at the empirical reward distribution and follows its suggested play. Our proposed algorithm is the first computationally viable learning policy for structured bandit problems that has asymptotic minimal regret

    Dynamic Bandits with an Auto-Regressive Temporal Structure

    Full text link
    Multi-armed bandit (MAB) problems are mainly studied under two extreme settings known as stochastic and adversarial. These two settings, however, do not capture realistic environments such as search engines and marketing and advertising, in which rewards stochastically change in time. Motivated by that, we introduce and study a dynamic MAB problem with stochastic temporal structure, where the expected reward of each arm is governed by an auto-regressive (AR) model. Due to the dynamic nature of the rewards, simple "explore and commit" policies fail, as all arms have to be explored continuously over time. We formalize this by characterizing a per-round regret lower bound, where the regret is measured against a strong (dynamic) benchmark. We then present an algorithm whose per-round regret almost matches our regret lower bound. Our algorithm relies on two mechanisms: (i) alternating between recently pulled arms and unpulled arms with potential, and (ii) restarting. These mechanisms enable the algorithm to dynamically adapt to changes and discard irrelevant past information at a suitable rate. In numerical studies, we further demonstrate the strength of our algorithm under non-stationary settings.Comment: 41 pages, 4 figure
    corecore