4,294 research outputs found

    Populations in statistical genetic modelling and inference

    Full text link
    What is a population? This review considers how a population may be defined in terms of understanding the structure of the underlying genetics of the individuals involved. The main approach is to consider statistically identifiable groups of randomly mating individuals, which is well defined in theory for any type of (sexual) organism. We discuss generative models using drift, admixture and spatial structure, and the ancestral recombination graph. These are contrasted with statistical models for inference, principle component analysis and other `non-parametric' methods. The relationships between these approaches are explored with both simulated and real-data examples. The state-of-the-art practical software tools are discussed and contrasted. We conclude that populations are a useful theoretical construct that can be well defined in theory and often approximately exist in practice

    Bayesian nonparametric Plackett-Luce models for the analysis of preferences for college degree programmes

    Full text link
    In this paper we propose a Bayesian nonparametric model for clustering partial ranking data. We start by developing a Bayesian nonparametric extension of the popular Plackett-Luce choice model that can handle an infinite number of choice items. Our framework is based on the theory of random atomic measures, with the prior specified by a completely random measure. We characterise the posterior distribution given data, and derive a simple and effective Gibbs sampler for posterior simulation. We then develop a Dirichlet process mixture extension of our model and apply it to investigate the clustering of preferences for college degree programmes amongst Irish secondary school graduates. The existence of clusters of applicants who have similar preferences for degree programmes is established and we determine that subject matter and geographical location of the third level institution characterise these clusters.Comment: Published in at http://dx.doi.org/10.1214/14-AOAS717 the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org

    A survey of statistical network models

    Full text link
    Networks are ubiquitous in science and have become a focal point for discussion in everyday life. Formal statistical models for the analysis of network data have emerged as a major topic of interest in diverse areas of study, and most of these involve a form of graphical representation. Probability models on graphs date back to 1959. Along with empirical studies in social psychology and sociology from the 1960s, these early works generated an active network community and a substantial literature in the 1970s. This effort moved into the statistical literature in the late 1970s and 1980s, and the past decade has seen a burgeoning network literature in statistical physics and computer science. The growth of the World Wide Web and the emergence of online networking communities such as Facebook, MySpace, and LinkedIn, and a host of more specialized professional network communities has intensified interest in the study of networks and network data. Our goal in this review is to provide the reader with an entry point to this burgeoning literature. We begin with an overview of the historical development of statistical network modeling and then we introduce a number of examples that have been studied in the network literature. Our subsequent discussion focuses on a number of prominent static and dynamic network models and their interconnections. We emphasize formal model descriptions, and pay special attention to the interpretation of parameters and their estimation. We end with a description of some open problems and challenges for machine learning and statistics.Comment: 96 pages, 14 figures, 333 reference

    Social Choice for Partial Preferences Using Imputation

    Get PDF
    Within the field of multiagent systems, the area of computational social choice considers the problems arising when decisions must be made collectively by a group of agents. Usually such systems collect a ranking of the alternatives from each member of the group in turn, and aggregate these individual rankings to arrive at a collective decision. However, when there are many alternatives to consider, individual agents may be unwilling, or unable, to rank all of them, leading to decisions that must be made on the basis of incomplete information. While earlier approaches attempt to work with the provided rankings by making assumptions about the nature of the missing information, this can lead to undesirable outcomes when the assumptions do not hold, and is ill-suited to certain problem domains. In this thesis, we propose a new approach that uses machine learning algorithms (both conventional and purpose-built) to generate plausible completions of each agent’s rankings on the basis of the partial rankings the agent provided (imputations), in a way that reflects the agents’ true preferences. We show that the combination of existing social choice functions with certain classes of imputation algorithms, which forms the core of our proposed solution, is equivalent to a form of social choice. Our system then undergoes an extensive empirical validation under 40 different test conditions, involving more than 50,000 group decision problems generated from real-world electoral data, and is found to outperform existing competitors significantly, leading to better group decisions overall. Detailed empirical findings are also used to characterize the behaviour of the system, and illustrate the circumstances in which it is most advantageous. A general testbed for comparing solutions using real-world and artificial data (Prefmine) is then described, in conjunction with results that justify its design decisions. We move on to propose a new machine learning algorithm intended specifically to learn and impute the preferences of agents, and validate its effectiveness. This Markov-Tree approach is demonstrated to be superior to imputation using conventional machine learning, and has a simple interpretation that characterizes the problems on which it will perform well. Later chapters contain an axiomatic validation of both of our new approaches, as well as techniques for mitigating their manipulability. The thesis concludes with a discussion of the applicability of its contributions, both for multiagent systems and for settings involving human elections. In all, we reveal an interesting connection between machine learning and computational social choice, and introduce a testbed which facilitates future research efforts on computational social choice for partial preferences, by allowing empirical comparisons between competing approaches to be conducted easily, accurately, and quickly. Perhaps most importantly, we offer an important and effective new direction for enabling group decision making when preferences are not completely specified, using imputation methods
    • …
    corecore