36 research outputs found
Upfront Commitment in Online Resource Allocation with Patient Customers
In many on-demand online platforms such as ride-sharing, grocery delivery, or
shipping, some arriving agents are patient and willing to wait a short amount
of time for the resource or service as long as there is an upfront guarantee
that service will be ultimately provided within a certain delay. Motivated by
this, we present a setting with patient and impatient agents who seek a
resource or service that replenishes periodically. Impatient agents demand the
resource immediately upon arrival while patient agents are willing to wait a
short period conditioned on an upfront commitment to receive the resource. We
study this setting under adversarial arrival models using a relaxed notion of
competitive ratio. We present a class of POLYtope-based Resource Allocation
(POLYRA) algorithms that achieve optimal or near-optimal competitive ratios.
Such POLYRA algorithms work by consulting a particular polytope and only making
decisions that guarantee the algorithm's state remains feasible in this
polytope. When the number of agent types is either two or three, POLYRA
algorithms can obtain the optimal competitive ratio. To design these polytopes,
we construct an upper bound on the competitive ratio of any algorithm, which is
characterized via a linear program (LP) that considers a collection of
overlapping worst-case input sequences. Our designed POLYRA algorithms then
mimic the optimal solution of this upper bound LP via its polytope's
definition, obtaining the optimal competitive ratio. When there are more than
three types, our overlapping worst-case input sequences do not necessarily
result in an attainable competitive ratio, and so we present a class of simple
and interpretable POLYRA algorithm which achieves at least 80% of the optimal
competitive ratio. We complement our theoretical studies with numerical
analysis which shows the efficiency of our algorithms beyond adversarial
arrival
Learning in Repeated Multi-Unit Pay-As-Bid Auctions
Motivated by Carbon Emissions Trading Schemes, Treasury Auctions, and
Procurement Auctions, which all involve the auctioning of homogeneous multiple
units, we consider the problem of learning how to bid in repeated multi-unit
pay-as-bid auctions. In each of these auctions, a large number of (identical)
items are to be allocated to the largest submitted bids, where the price of
each of the winning bids is equal to the bid itself. The problem of learning
how to bid in pay-as-bid auctions is challenging due to the combinatorial
nature of the action space. We overcome this challenge by focusing on the
offline setting, where the bidder optimizes their vector of bids while only
having access to the past submitted bids by other bidders. We show that the
optimal solution to the offline problem can be obtained using a polynomial time
dynamic programming (DP) scheme. We leverage the structure of the DP scheme to
design online learning algorithms with polynomial time and space complexity
under full information and bandit feedback settings. We achieve an upper bound
on regret of and respectively, where is the number of units demanded by the
bidder, is the total number of auctions, and is the size of
the discretized bid space. We accompany these results with a regret lower
bound, which match the linear dependency in . Our numerical results suggest
that when all agents behave according to our proposed no regret learning
algorithms, the resulting market dynamics mainly converge to a welfare
maximizing equilibrium where bidders submit uniform bids. Lastly, our
experiments demonstrate that the pay-as-bid auction consistently generates
significantly higher revenue compared to its popular alternative, the uniform
price auction.Comment: 51 pages, 12 Figure
Fair Assortment Planning
Many online platforms, ranging from online retail stores to social media
platforms, employ algorithms to optimize their offered assortment of items
(e.g., products and contents). These algorithms tend to prioritize the
platforms' short-term goals by solely featuring items with the highest
popularity or revenue. However, this practice can then lead to undesirable
outcomes for the rest of the items, making them leave the platform, and in turn
hurting the platform's long-term goals. Motivated by that, we introduce and
study a fair assortment planning problem, which requires any two items with
similar quality/merits to be offered similar outcomes. We show that the problem
can be formulated as a linear program (LP), called (FAIR), that optimizes over
the distribution of all feasible assortments. To find a near-optimal solution
to (FAIR), we propose a framework based on the Ellipsoid method, which requires
a polynomial-time separation oracle to the dual of the LP. We show that finding
an optimal separation oracle to the dual problem is an NP-complete problem, and
hence we propose a series of approximate separation oracles, which then result
in a -approx. algorithm and a PTAS for the original Problem (FAIR). The
approximate separation oracles are designed by (i) showing the separation
oracle to the dual of the LP is equivalent to solving an infinite series of
parameterized knapsack problems, and (ii) taking advantage of the structure of
the parameterized knapsack problems. Finally, we conduct a case study using the
MovieLens dataset, which demonstrates the efficacy of our algorithms and
further sheds light on the price of fairness.Comment: 86 pages, 7 figure
Multi-Platform Budget Management in Ad Markets with Non-IC Auctions
In online advertising markets, budget-constrained advertisers acquire ad
placements through repeated bidding in auctions on various platforms. We
present a strategy for bidding optimally in a set of auctions that may or may
not be incentive-compatible under the presence of budget constraints. Our
strategy maximizes the expected total utility across auctions while satisfying
the advertiser's budget constraints in expectation. Additionally, we
investigate the online setting where the advertiser must submit bids across
platforms while learning about other bidders' bids over time. Our algorithm has
regret under the full-information setting. Finally, we demonstrate
that our algorithms have superior cumulative regret on both synthetic and
real-world datasets of ad placement auctions, compared to existing adaptive
pacing algorithms.Comment: 34 pages, 5 figure
Optimal Learning for Structured Bandits
We study structured multi-armed bandits, which is the problem of online
decision-making under uncertainty in the presence of structural information. In
this problem, the decision-maker needs to discover the best course of action
despite observing only uncertain rewards over time. The decision-maker is aware
of certain structural information regarding the reward distributions and would
like to minimize their regret by exploiting this information, where the regret
is its performance difference against a benchmark policy that knows the best
action ahead of time. In the absence of structural information, the classical
upper confidence bound (UCB) and Thomson sampling algorithms are well known to
suffer only minimal regret. As recently pointed out, neither algorithms are,
however, capable of exploiting structural information that is commonly
available in practice. We propose a novel learning algorithm that we call DUSA
whose worst-case regret matches the information-theoretic regret lower bound up
to a constant factor and can handle a wide range of structural information. Our
algorithm DUSA solves a dual counterpart of the regret lower bound at the
empirical reward distribution and follows its suggested play. Our proposed
algorithm is the first computationally viable learning policy for structured
bandit problems that has asymptotic minimal regret
Contextual Bandits with Cross-learning
In the classical contextual bandits problem, in each round , a learner
observes some context , chooses some action to perform, and receives
some reward . We consider the variant of this problem where in
addition to receiving the reward , the learner also learns the
values of for all other contexts ; i.e., the rewards that
would have been achieved by performing that action under different contexts.
This variant arises in several strategic settings, such as learning how to bid
in non-truthful repeated auctions (in this setting the context is the decision
maker's private valuation for each auction). We call this problem the
contextual bandits problem with cross-learning. The best algorithms for the
classical contextual bandits problem achieve regret
against all stationary policies, where is the number of contexts, the
number of actions, and the number of rounds. We demonstrate algorithms for
the contextual bandits problem with cross-learning that remove the dependence
on and achieve regret (when contexts are stochastic with
known distribution), (when contexts are stochastic
with unknown distribution), and (when contexts are
adversarial but rewards are stochastic).Comment: 48 pages, 5 figure
Dynamic Bandits with an Auto-Regressive Temporal Structure
Multi-armed bandit (MAB) problems are mainly studied under two extreme
settings known as stochastic and adversarial. These two settings, however, do
not capture realistic environments such as search engines and marketing and
advertising, in which rewards stochastically change in time. Motivated by that,
we introduce and study a dynamic MAB problem with stochastic temporal
structure, where the expected reward of each arm is governed by an
auto-regressive (AR) model. Due to the dynamic nature of the rewards, simple
"explore and commit" policies fail, as all arms have to be explored
continuously over time. We formalize this by characterizing a per-round regret
lower bound, where the regret is measured against a strong (dynamic) benchmark.
We then present an algorithm whose per-round regret almost matches our regret
lower bound. Our algorithm relies on two mechanisms: (i) alternating between
recently pulled arms and unpulled arms with potential, and (ii) restarting.
These mechanisms enable the algorithm to dynamically adapt to changes and
discard irrelevant past information at a suitable rate. In numerical studies,
we further demonstrate the strength of our algorithm under non-stationary
settings.Comment: 41 pages, 4 figure
Improved Revenue Bounds for Posted-Price and Second-Price Mechanisms
We study revenue maximization through sequential posted-price (SPP)
mechanisms in single-dimensional settings with buyers and independent but
not necessarily identical value distributions. We construct the SPP mechanisms
by considering the best of two simple pricing rules: one that imitates the
revenue optimal mchanism, namely the Myersonian mechanism, via the taxation
principle and the other that posts a uniform price. Our pricing rules are
rather generalizable and yield the first improvement over long-established
approximation factors in several settings. We design factor-revealing
mathematical programs that crisply capture the approximation factor of our SPP
mechanism. In the single-unit setting, our SPP mechanism yields a better
approximation factor than the state of the art prior to our work (Azar,
Chiplunkar & Kaplan, 2018). In the multi-unit setting, our SPP mechanism yields
the first improved approximation factor over the state of the art after over
nine years (Yan, 2011 and Chakraborty et al., 2010). Our results on SPP
mechanisms immediately imply improved performance guarantees for the equivalent
free-order prophet inequality problem. In the position auction setting, our SPP
mechanism yields the first higher-than approximation factor. In eager
second-price (ESP) auctions, our two simple pricing rules lead to the first
improved approximation factor that is strictly greater than what is obtained by
the SPP mechanism in the single-unit setting.Comment: Accepted to Operations Researc