7 research outputs found
Optimal Scoring Rule Design
This paper introduces an optimization problem for proper scoring rule design.
Consider a principal who wants to collect an agent's prediction about an
unknown state. The agent can either report his prior prediction or access a
costly signal and report the posterior prediction. Given a collection of
possible distributions containing the agent's posterior prediction
distribution, the principal's objective is to design a bounded scoring rule to
maximize the agent's worst-case payoff increment between reporting his
posterior prediction and reporting his prior prediction.
We study two settings of such optimization for proper scoring rules: static
and asymptotic settings. In the static setting, where the agent can access one
signal, we propose an efficient algorithm to compute an optimal scoring rule
when the collection of distributions is finite. The agent can adaptively and
indefinitely refine his prediction in the asymptotic setting. We first consider
a sequence of collections of posterior distributions with vanishing covariance,
which emulates general estimators with large samples, and show the optimality
of the quadratic scoring rule. Then, when the agent's posterior distribution is
a Beta-Bernoulli process, we find that the log scoring rule is optimal. We also
prove the optimality of the log scoring rule over a smaller set of functions
for categorical distributions with Dirichlet priors
On Three-Layer Data Markets
We study a three-layer data market comprising users (data owners), platforms,
and a data buyer. Each user benefits from platform services in exchange for
data, incurring privacy loss when their data, albeit noisily, is shared with
the buyer. The user chooses platforms to share data with, while platforms
decide on data noise levels and pricing before selling to the buyer. The buyer
selects platforms to purchase data from. We model these interactions via a
multi-stage game, focusing on the subgame Nash equilibrium. We find that when
the buyer places a high value on user data (and platforms can command high
prices), all platforms offer services to the user who joins and shares data
with every platform. Conversely, when the buyer's valuation of user data is
low, only large platforms with low service costs can afford to serve users. In
this scenario, users exclusively join and share data with these low-cost
platforms. Interestingly, increased competition benefits the buyer, not the
user: as the number of platforms increases, the user utility does not
necessarily improve while the buyer utility improves. However, increasing the
competition improves the overall utilitarian welfare. Building on our analysis,
we then study regulations to improve the user utility. We discover that banning
data sharing maximizes user utility only when all platforms are low-cost. In
mixed markets of high- and low-cost platforms, users prefer a minimum noise
mandate over a sharing ban. Imposing this mandate on high-cost platforms and
banning data sharing for low-cost ones further enhances user utility
The Fair Value of Data Under Heterogeneous Privacy Constraints in Federated Learning
Modern data aggregation often involves a platform collecting data from a
network of users with various privacy options. Platforms must solve the problem
of how to allocate incentives to users to convince them to share their data.
This paper puts forth an idea for a \textit{fair} amount to compensate users
for their data at a given privacy level based on an axiomatic definition of
fairness, along the lines of the celebrated Shapley value. To the best of our
knowledge, these are the first fairness concepts for data that explicitly
consider privacy constraints. We also formulate a heterogeneous federated
learning problem for the platform with privacy level options for users. By
studying this problem, we investigate the amount of compensation users receive
under fair allocations with different privacy levels, amounts of data, and
degrees of heterogeneity. We also discuss what happens when the platform is
forced to design fair incentives. Under certain conditions we find that when
privacy sensitivity is low, the platform will set incentives to ensure that it
collects all the data with the lowest privacy options. When the privacy
sensitivity is above a given threshold, the platform will provide no incentives
to users. Between these two extremes, the platform will set the incentives so
some fraction of the users chooses the higher privacy option and the others
chooses the lower privacy option.Comment: 29 pages, 5 figures, Accepted to TML
Trading-off price for data quality to achieve fair online allocation
We consider the problem of online allocation subject to a long-term fairness
penalty. Contrary to existing works, however, we do not assume that the
decision-maker observes the protected attributes -- which is often unrealistic
in practice. Instead they can purchase data that help estimate them from
sources of different quality; and hence reduce the fairness penalty at some
cost. We model this problem as a multi-armed bandit problem where each arm
corresponds to the choice of a data source, coupled with the online allocation
problem. We propose an algorithm that jointly solves both problems and show
that it has a regret bounded by . A key difficulty is
that the rewards received by selecting a source are correlated by the fairness
penalty, which leads to a need for randomization (despite a stochastic
setting). Our algorithm takes into account contextual information available
before the source selection, and can adapt to many different fairness notions.
We also show that in some instances, the estimates used can be learned on the
fly
Recommended from our members
Algorithmic Bayesian Epistemology
One aspect of the algorithmic lens in theoretical computer science is a view on other scientific disciplines that focuses on satisfactory solutions that adhere to real-world constraints, as opposed to solutions that would be optimal ignoring such constraints. The algorithmic lens has provided a unique and important perspective on many academic fields, including molecular biology, ecology, neuroscience, quantum physics, economics, and social science.
This thesis applies the algorithmic lens to Bayesian epistemology. Traditional Bayesian epistemology provides a comprehensive framework for how an individual's beliefs should evolve upon receiving new information. However, these methods typically assume an exhaustive model of such information, including the correlation structure between different pieces of evidence. In reality, individuals might lack such an exhaustive model, while still needing to form beliefs. Beyond such informational constraints, an individual may be bounded by limited computation, or by limited communication with agents that have access to information, or by the strategic behavior of such agents. Even when these restrictions prevent the formation of a *perfectly* accurate belief, arriving at a *reasonably* accurate belief remains crucial. In this thesis, we establish fundamental possibility and impossibility results about belief formation under a variety of restrictions, and lay the groundwork for further exploration