381 research outputs found
LIPIcs, Volume 251, ITCS 2023, Complete Volume
LIPIcs, Volume 251, ITCS 2023, Complete Volum
Autobidders with Budget and ROI Constraints: Efficiency, Regret, and Pacing Dynamics
We study a game between autobidding algorithms that compete in an online
advertising platform. Each autobidder is tasked with maximizing its
advertiser's total value over multiple rounds of a repeated auction, subject to
budget and/or return-on-investment constraints. We propose a gradient-based
learning algorithm that is guaranteed to satisfy all constraints and achieves
vanishing individual regret. Our algorithm uses only bandit feedback and can be
used with the first- or second-price auction, as well as with any
"intermediate" auction format. Our main result is that when these autobidders
play against each other, the resulting expected liquid welfare over all rounds
is at least half of the expected optimal liquid welfare achieved by any
allocation. This holds whether or not the bidding dynamics converges to an
equilibrium and regardless of the correlation structure between advertiser
valuations
LIPIcs, Volume 261, ICALP 2023, Complete Volume
LIPIcs, Volume 261, ICALP 2023, Complete Volum
Contextual Bandits with Budgeted Information Reveal
Contextual bandit algorithms are commonly used in digital health to recommend
personalized treatments. However, to ensure the effectiveness of the
treatments, patients are often requested to take actions that have no immediate
benefit to them, which we refer to as pro-treatment actions. In practice,
clinicians have a limited budget to encourage patients to take these actions
and collect additional information. We introduce a novel optimization and
learning algorithm to address this problem. This algorithm effectively combines
the strengths of two algorithmic approaches in a seamless manner, including 1)
an online primal-dual algorithm for deciding the optimal timing to reach out to
patients, and 2) a contextual bandit learning algorithm to deliver personalized
treatment to the patient. We prove that this algorithm admits a sub-linear
regret bound. We illustrate the usefulness of this algorithm on both synthetic
and real-world data
Learning to Bid in Repeated First-Price Auctions with Budgets
Budget management strategies in repeated auctions have received growing
attention in online advertising markets. However, previous work on budget
management in online bidding mainly focused on second-price auctions. The rapid
shift from second-price auctions to first-price auctions for online ads in
recent years has motivated the challenging question of how to bid in repeated
first-price auctions while controlling budgets.
In this work, we study the problem of learning in repeated first-price
auctions with budgets. We design a dual-based algorithm that can achieve a
near-optimal regret with full information feedback
where the maximum competing bid is always revealed after each auction. We
further consider the setting with one-sided information feedback where only the
winning bid is revealed after each auction. We show that our modified algorithm
can still achieve an regret with mild assumptions on
the bidder's value distribution. Finally, we complement the theoretical results
with numerical experiments to confirm the effectiveness of our budget
management policy
A Tight Competitive Ratio for Online Submodular Welfare Maximization
In this paper we consider the online Submodular Welfare (SW) problem. In this
problem we are given bidders each equipped with a general (not necessarily
monotone) submodular utility and items that arrive online. The goal is to
assign each item, once it arrives, to a bidder or discard it, while maximizing
the sum of utilities. When an adversary determines the items' arrival order we
present a simple randomized algorithm that achieves a tight competitive ratio
of \nicefrac{1}{4}. The algorithm is a specialization of an algorithm due to
[Harshaw-Kazemi-Feldman-Karbasi MOR`22], who presented the previously best
known competitive ratio of to the problem. When
the items' arrival order is uniformly random, we present a competitive ratio of
, improving the previously known \nicefrac{1}{4} guarantee.
Our approach for the latter result is based on a better analysis of the
(offline) Residual Random Greedy (RRG) algorithm of
[Buchbinder-Feldman-Naor-Schwartz SODA`14], which we believe might be of
independent interest
Trading-off price for data quality to achieve fair online allocation
We consider the problem of online allocation subject to a long-term fairness
penalty. Contrary to existing works, however, we do not assume that the
decision-maker observes the protected attributes -- which is often unrealistic
in practice. Instead they can purchase data that help estimate them from
sources of different quality; and hence reduce the fairness penalty at some
cost. We model this problem as a multi-armed bandit problem where each arm
corresponds to the choice of a data source, coupled with the online allocation
problem. We propose an algorithm that jointly solves both problems and show
that it has a regret bounded by . A key difficulty is
that the rewards received by selecting a source are correlated by the fairness
penalty, which leads to a need for randomization (despite a stochastic
setting). Our algorithm takes into account contextual information available
before the source selection, and can adapt to many different fairness notions.
We also show that in some instances, the estimates used can be learned on the
fly
Privacy in resource allocation problems
Collaborative decision-making processes help parties optimize their operations, remain competitive in their markets, and improve their performances with environmental issues. However, those parties also want to keep their data private to meet their obligations regarding various regulations and not to disclose their strategic information to the competitors. In this thesis, we study collaborative capacity allocation among multiple parties and present that (near) optimal allocations can be realized while considering the parties' privacy concerns.We first attempt to solve the multi-party resource sharing problem by constructing a single model that is available to all parties. We propose an equivalent data-private model that meets the parties' data privacy requirements while ensuring optimal solutions for each party. We show that when the proposed model is solved, each party can only get its own optimal decisions and cannot observe others' solutions. We support our findings with a simulation study.The third and fourth chapters of this thesis focus on the problem from a different perspective in which we use a reformulation that can be used to distribute the problem among the involved parties. This decomposition lets us eliminate almost all the information-sharing requirements. In Chapter 3, together with the reformulated model, we benefit from a secure multi-party computation protocol that allows parties to disguise their shared information while attaining optimal allocation decisions. We conduct a simulation study on a planning problem and show our proposed algorithm in practice. We use the decomposition approach in Chapter 4 with a different privacy notion. We employ differential privacy as our privacy definition and design a differentially private algorithm for solving the multi-party resource sharing problem. Differential privacy brings in formal data privacy guarantees at the cost of deviating slightly from optimality. We provide bounds on this deviation and discuss the consequences of these theoretical results. We show the proposed algorithm on a planning problem and present insights about its efficiency.<br/
Contextual Bandits with Packing and Covering Constraints: A Modular Lagrangian Approach via Regression
We consider contextual bandits with linear constraints (CBwLC), a variant of
contextual bandits in which the algorithm consumes multiple resources subject
to linear constraints on total consumption. This problem generalizes contextual
bandits with knapsacks (CBwK), allowing for packing and covering constraints,
as well as positive and negative resource consumption. We provide the first
algorithm for CBwLC (or CBwK) that is based on regression oracles. The
algorithm is simple, computationally efficient, and admits vanishing regret. It
is statistically optimal for the variant of CBwK in which the algorithm must
stop once some constraint is violated. Further, we provide the first
vanishing-regret guarantees for CBwLC (or CBwK) that extend beyond the
stochastic environment. We side-step strong impossibility results from prior
work by identifying a weaker (and, arguably, fairer) benchmark to compare
against. Our algorithm builds on LagrangeBwK (Immorlica et al., FOCS 2019), a
Lagrangian-based technique for CBwK, and SquareCB (Foster and Rakhlin, ICML
2020), a regression-based technique for contextual bandits. Our analysis
leverages the inherent modularity of both techniques
- …