8 research outputs found
How to Sell Information Optimally: An Algorithmic Study
We investigate the algorithmic problem of selling information to agents who face a decision-making problem under uncertainty. We adopt the model recently proposed by Bergemann et al. [Bergemann et al., 2018], in which information is revealed through signaling schemes called experiments. In the single-agent setting, any mechanism can be represented as a menu of experiments. Our results show that the computational complexity of designing the revenue-optimal menu depends heavily on the way the model is specified. When all the parameters of the problem are given explicitly, we provide a polynomial time algorithm that computes the revenue-optimal menu. For cases where the model is specified with a succinct implicit description, we show that the tractability of the problem is tightly related to the efficient implementation of a Best Response Oracle: when it can be implemented efficiently, we provide an additive FPTAS whose running time is independent of the number of actions. On the other hand, we provide a family of problems, where it is computationally intractable to construct a best response oracle, and we show that it is NP-hard to get even a constant fraction of the optimal revenue. Moreover, we investigate a generalization of the original model by Bergemann et al. [Bergemann et al., 2018] that allows multiple agents to compete for useful information. We leverage techniques developed in the study of auction design (see e.g. [Yang Cai et al., 2012; Saeed Alaei et al., 2012; Yang Cai et al., 2012; Yang Cai et al., 2013; Yang Cai et al., 2013]) to design a polynomial time algorithm that computes the revenue-optimal mechanism for selling information
Multiclass Learnability Beyond the PAC Framework: Universal Rates and Partial Concept Classes
In this paper we study the problem of multiclass classification with a
bounded number of different labels , in the realizable setting. We extend
the traditional PAC model to a) distribution-dependent learning rates, and b)
learning rates under data-dependent assumptions. First, we consider the
universal learning setting (Bousquet, Hanneke, Moran, van Handel and
Yehudayoff, STOC '21), for which we provide a complete characterization of the
achievable learning rates that holds for every fixed distribution. In
particular, we show the following trichotomy: for any concept class, the
optimal learning rate is either exponential, linear or arbitrarily slow.
Additionally, we provide complexity measures of the underlying hypothesis class
that characterize when these rates occur. Second, we consider the problem of
multiclass classification with structured data (such as data lying on a low
dimensional manifold or satisfying margin conditions), a setting which is
captured by partial concept classes (Alon, Hanneke, Holzman and Moran, FOCS
'21). Partial concepts are functions that can be undefined in certain parts of
the input space. We extend the traditional PAC learnability of total concept
classes to partial concept classes in the multiclass setting and investigate
differences between partial and total concepts
Is Selling Complete Information (Approximately) Optimal?
We study the problem of selling information to a data-buyer who faces a decision problem under uncertainty. We consider the classic Bayesian decision-theoretic model pioneered by Blackwell [Bla51, Bla53]. Initially, the data buyer has only partial information about the payoff-relevant state of the world. A data seller offers additional information about the state of the world. The information is revealed through signaling schemes, also referred to as experiments. In the single-agent setting, any mechanism can be represented as a menu of experiments. A recent paper by Bergemann et al. [BBS18] present a complete characterization of the revenue-optimal mechanism in a binary state and binary action environment. By contrast, no characterization is known for the case with more actions. In this paper, we consider more general environments and study arguably the simplest mechanism, which only sells the fully informative experiment. In the environment with binary state and m ≥ 3 actions, we provide an O(m)-approximation to the optimal revenue by selling only the fully informative experiment and show that the approximation ratio is tight up to an absolute constant factor. An important corollary of our lower bound is that the size of the optimal menu must grow at least linearly in the number of available actions, so no universal upper bound exists for the size of the optimal menu in the general single-dimensional setting. We also provide a sufficient condition under which selling only the fully informative experiment achieves the optimal revenue.
For multi-dimensional environments, we prove that even in arguably the simplest matching utility environment with 3 states and 3 actions, the ratio between the optimal revenue and the revenue by selling only the fully informative experiment can grow immediately to a polynomial of the number of agent types. Nonetheless, if the distribution is uniform, we show that selling only the fully informative experiment is indeed the optimal mechanism
Replicability in Reinforcement Learning
We initiate the mathematical study of replicability as an algorithmic
property in the context of reinforcement learning (RL). We focus on the
fundamental setting of discounted tabular MDPs with access to a generative
model. Inspired by Impagliazzo et al. [2022], we say that an RL algorithm is
replicable if, with high probability, it outputs the exact same policy after
two executions on i.i.d. samples drawn from the generator when its internal
randomness is the same. We first provide an efficient -replicable
algorithm for -optimal policy estimation with sample and
time complexity ,
where is the number of state-action pairs. Next, for the subclass of
deterministic algorithms, we provide a lower bound of order
.
Then, we study a relaxed version of replicability proposed by Kalavasis et al.
[2023] called TV indistinguishability. We design a computationally efficient TV
indistinguishable algorithm for policy estimation whose sample complexity is
.
At the cost of running time, we transform these TV indistinguishable
algorithms to -replicable ones without increasing their sample
complexity. Finally, we introduce the notion of approximate-replicability where
we only require that two outputted policies are close under an appropriate
statistical divergence (e.g., Renyi) and show an improved sample complexity of
.Comment: to be published in neurips 202
Replicable Clustering
We design replicable algorithms in the context of statistical clustering
under the recently introduced notion of replicability from Impagliazzo et al.
[2022]. According to this definition, a clustering algorithm is replicable if,
with high probability, its output induces the exact same partition of the
sample space after two executions on different inputs drawn from the same
distribution, when its internal randomness is shared across the executions. We
propose such algorithms for the statistical -medians, statistical -means,
and statistical -centers problems by utilizing approximation routines for
their combinatorial counterparts in a black-box manner. In particular, we
demonstrate a replicable -approximation algorithm for statistical
Euclidean -medians (-means) with sample
complexity. We also describe an -approximation algorithm with an
additional -additive error for statistical Euclidean -centers, albeit
with sample complexity. In addition, we provide experiments on
synthetic distributions in 2D using the -means++ implementation from sklearn
as a black-box that validate our theoretical results.Comment: to be published in NeurIPS 202
Replicable Bandits
In this paper, we introduce the notion of replicable policies in the context
of stochastic bandits, one of the canonical problems in interactive learning. A
policy in the bandit environment is called replicable if it pulls, with high
probability, the exact same sequence of arms in two different and independent
executions (i.e., under independent reward realizations). We show that not only
do replicable policies exist, but also they achieve almost the same optimal
(non-replicable) regret bounds in terms of the time horizon. More specifically,
in the stochastic multi-armed bandits setting, we develop a policy with an
optimal problem-dependent regret bound whose dependence on the replicability
parameter is also optimal. Similarly, for stochastic linear bandits (with
finitely and infinitely many arms) we develop replicable policies that achieve
the best-known problem-independent regret bounds with an optimal dependency on
the replicability parameter. Our results show that even though randomization is
crucial for the exploration-exploitation trade-off, an optimal balance can
still be achieved while pulling the exact same arms in two different rounds of
executions
Optimal Learners for Realizable Regression: PAC Learning and Online Learning
In this work, we aim to characterize the statistical complexity of realizable
regression both in the PAC learning setting and the online learning setting.
Previous work had established the sufficiency of finiteness of the fat
shattering dimension for PAC learnability and the necessity of finiteness of
the scaled Natarajan dimension, but little progress had been made towards a
more complete characterization since the work of Simon 1997 (SICOMP '97). To
this end, we first introduce a minimax instance optimal learner for realizable
regression and propose a novel dimension that both qualitatively and
quantitatively characterizes which classes of real-valued predictors are
learnable. We then identify a combinatorial dimension related to the Graph
dimension that characterizes ERM learnability in the realizable setting.
Finally, we establish a necessary condition for learnability based on a
combinatorial dimension related to the DS dimension, and conjecture that it may
also be sufficient in this context.
Additionally, in the context of online learning we provide a dimension that
characterizes the minimax instance optimal cumulative loss up to a constant
factor and design an optimal online learner for realizable regression, thus
resolving an open question raised by Daskalakis and Golowich in STOC '22