7 research outputs found

    Optimal Scoring Rule Design

    Full text link
    This paper introduces an optimization problem for proper scoring rule design. Consider a principal who wants to collect an agent's prediction about an unknown state. The agent can either report his prior prediction or access a costly signal and report the posterior prediction. Given a collection of possible distributions containing the agent's posterior prediction distribution, the principal's objective is to design a bounded scoring rule to maximize the agent's worst-case payoff increment between reporting his posterior prediction and reporting his prior prediction. We study two settings of such optimization for proper scoring rules: static and asymptotic settings. In the static setting, where the agent can access one signal, we propose an efficient algorithm to compute an optimal scoring rule when the collection of distributions is finite. The agent can adaptively and indefinitely refine his prediction in the asymptotic setting. We first consider a sequence of collections of posterior distributions with vanishing covariance, which emulates general estimators with large samples, and show the optimality of the quadratic scoring rule. Then, when the agent's posterior distribution is a Beta-Bernoulli process, we find that the log scoring rule is optimal. We also prove the optimality of the log scoring rule over a smaller set of functions for categorical distributions with Dirichlet priors

    On Three-Layer Data Markets

    Full text link
    We study a three-layer data market comprising users (data owners), platforms, and a data buyer. Each user benefits from platform services in exchange for data, incurring privacy loss when their data, albeit noisily, is shared with the buyer. The user chooses platforms to share data with, while platforms decide on data noise levels and pricing before selling to the buyer. The buyer selects platforms to purchase data from. We model these interactions via a multi-stage game, focusing on the subgame Nash equilibrium. We find that when the buyer places a high value on user data (and platforms can command high prices), all platforms offer services to the user who joins and shares data with every platform. Conversely, when the buyer's valuation of user data is low, only large platforms with low service costs can afford to serve users. In this scenario, users exclusively join and share data with these low-cost platforms. Interestingly, increased competition benefits the buyer, not the user: as the number of platforms increases, the user utility does not necessarily improve while the buyer utility improves. However, increasing the competition improves the overall utilitarian welfare. Building on our analysis, we then study regulations to improve the user utility. We discover that banning data sharing maximizes user utility only when all platforms are low-cost. In mixed markets of high- and low-cost platforms, users prefer a minimum noise mandate over a sharing ban. Imposing this mandate on high-cost platforms and banning data sharing for low-cost ones further enhances user utility

    The Fair Value of Data Under Heterogeneous Privacy Constraints in Federated Learning

    Full text link
    Modern data aggregation often involves a platform collecting data from a network of users with various privacy options. Platforms must solve the problem of how to allocate incentives to users to convince them to share their data. This paper puts forth an idea for a \textit{fair} amount to compensate users for their data at a given privacy level based on an axiomatic definition of fairness, along the lines of the celebrated Shapley value. To the best of our knowledge, these are the first fairness concepts for data that explicitly consider privacy constraints. We also formulate a heterogeneous federated learning problem for the platform with privacy level options for users. By studying this problem, we investigate the amount of compensation users receive under fair allocations with different privacy levels, amounts of data, and degrees of heterogeneity. We also discuss what happens when the platform is forced to design fair incentives. Under certain conditions we find that when privacy sensitivity is low, the platform will set incentives to ensure that it collects all the data with the lowest privacy options. When the privacy sensitivity is above a given threshold, the platform will provide no incentives to users. Between these two extremes, the platform will set the incentives so some fraction of the users chooses the higher privacy option and the others chooses the lower privacy option.Comment: 29 pages, 5 figures, Accepted to TML

    Trading-off price for data quality to achieve fair online allocation

    Full text link
    We consider the problem of online allocation subject to a long-term fairness penalty. Contrary to existing works, however, we do not assume that the decision-maker observes the protected attributes -- which is often unrealistic in practice. Instead they can purchase data that help estimate them from sources of different quality; and hence reduce the fairness penalty at some cost. We model this problem as a multi-armed bandit problem where each arm corresponds to the choice of a data source, coupled with the online allocation problem. We propose an algorithm that jointly solves both problems and show that it has a regret bounded by O(T)\mathcal{O}(\sqrt{T}). A key difficulty is that the rewards received by selecting a source are correlated by the fairness penalty, which leads to a need for randomization (despite a stochastic setting). Our algorithm takes into account contextual information available before the source selection, and can adapt to many different fairness notions. We also show that in some instances, the estimates used can be learned on the fly
    corecore