44,843 research outputs found

    Linear Regression from Strategic Data Sources

    Full text link
    Linear regression is a fundamental building block of statistical data analysis. It amounts to estimating the parameters of a linear model that maps input features to corresponding outputs. In the classical setting where the precision of each data point is fixed, the famous Aitken/Gauss-Markov theorem in statistics states that generalized least squares (GLS) is a so-called "Best Linear Unbiased Estimator" (BLUE). In modern data science, however, one often faces strategic data sources, namely, individuals who incur a cost for providing high-precision data. In this paper, we study a setting in which features are public but individuals choose the precision of the outputs they reveal to an analyst. We assume that the analyst performs linear regression on this dataset, and individuals benefit from the outcome of this estimation. We model this scenario as a game where individuals minimize a cost comprising two components: (a) an (agent-specific) disclosure cost for providing high-precision data; and (b) a (global) estimation cost representing the inaccuracy in the linear model estimate. In this game, the linear model estimate is a public good that benefits all individuals. We establish that this game has a unique non-trivial Nash equilibrium. We study the efficiency of this equilibrium and we prove tight bounds on the price of stability for a large class of disclosure and estimation costs. Finally, we study the estimator accuracy achieved at equilibrium. We show that, in general, Aitken's theorem does not hold under strategic data sources, though it does hold if individuals have identical disclosure costs (up to a multiplicative factor). When individuals have non-identical costs, we derive a bound on the improvement of the equilibrium estimation cost that can be achieved by deviating from GLS, under mild assumptions on the disclosure cost functions.Comment: This version (v3) extends the results on the sub-optimality of GLS (Section 6) and improves writing in multiple places compared to v2. Compared to the initial version v1, it also fixes an error in Theorem 6 (now Theorem 5), and extended many of the result

    Regression games

    Get PDF
    The solution of a TU cooperative game can be a distribution of the value of the grand coalition, i.e. it can be a distribution of the payo (utility) all the players together achieve. In a regression model, the evaluation of the explanatory variables can be a distribution of the overall t, i.e. the t of the model every regressor variable is involved. Furthermore, we can take regression models as TU cooperative games where the explanatory (regressor) variables are the players. In this paper we introduce the class of regression games, characterize it and apply the Shapley value to evaluating the explanatory variables in regression models. In order to support our approach we consider Young (1985)'s axiomatization of the Shapley value, and conclude that the Shapley value is a reasonable tool to evaluate the explanatory variables of regression models

    Cheating for Problem Solving: A Genetic Algorithm with Social Interactions

    Get PDF
    We propose a variation of the standard genetic algorithm that incorporates social interaction between the individuals in the population. Our goal is to understand the evolutionary role of social systems and its possible application as a non-genetic new step in evolutionary algorithms. In biological populations, ie animals, even human beings and microorganisms, social interactions often affect the fitness of individuals. It is conceivable that the perturbation of the fitness via social interactions is an evolutionary strategy to avoid trapping into local optimum, thus avoiding a fast convergence of the population. We model the social interactions according to Game Theory. The population is, therefore, composed by cooperator and defector individuals whose interactions produce payoffs according to well known game models (prisoner's dilemma, chicken game, and others). Our results on Knapsack problems show, for some game models, a significant performance improvement as compared to a standard genetic algorithm.Comment: 7 pages, 5 Figures, 5 Tables, Proceedings of Genetic and Evolutionary Computation Conference (GECCO 2009), Montreal, Canad

    Variance Allocation and Shapley Value

    Full text link
    Motivated by the problem of utility allocation in a portfolio under a Markowitz mean-variance choice paradigm, we propose an allocation criterion for the variance of the sum of nn possibly dependent random variables. This criterion, the Shapley value, requires to translate the problem into a cooperative game. The Shapley value has nice properties, but, in general, is computationally demanding. The main result of this paper shows that in our particular case the Shapley value has a very simple form that can be easily computed. The same criterion is used also to allocate the standard deviation of the sum of nn random variables and a conjecture about the relation of the values in the two games is formulated.Comment: 20page

    Learning the Structure and Parameters of Large-Population Graphical Games from Behavioral Data

    Full text link
    We consider learning, from strictly behavioral data, the structure and parameters of linear influence games (LIGs), a class of parametric graphical games introduced by Irfan and Ortiz (2014). LIGs facilitate causal strategic inference (CSI): Making inferences from causal interventions on stable behavior in strategic settings. Applications include the identification of the most influential individuals in large (social) networks. Such tasks can also support policy-making analysis. Motivated by the computational work on LIGs, we cast the learning problem as maximum-likelihood estimation (MLE) of a generative model defined by pure-strategy Nash equilibria (PSNE). Our simple formulation uncovers the fundamental interplay between goodness-of-fit and model complexity: good models capture equilibrium behavior within the data while controlling the true number of equilibria, including those unobserved. We provide a generalization bound establishing the sample complexity for MLE in our framework. We propose several algorithms including convex loss minimization (CLM) and sigmoidal approximations. We prove that the number of exact PSNE in LIGs is small, with high probability; thus, CLM is sound. We illustrate our approach on synthetic data and real-world U.S. congressional voting records. We briefly discuss our learning framework's generality and potential applicability to general graphical games.Comment: Journal of Machine Learning Research. (accepted, pending publication.) Last conference version: submitted March 30, 2012 to UAI 2012. First conference version: entitled, Learning Influence Games, initially submitted on June 1, 2010 to NIPS 201
    • …
    corecore