4,294 research outputs found
Populations in statistical genetic modelling and inference
What is a population? This review considers how a population may be defined
in terms of understanding the structure of the underlying genetics of the
individuals involved. The main approach is to consider statistically
identifiable groups of randomly mating individuals, which is well defined in
theory for any type of (sexual) organism. We discuss generative models using
drift, admixture and spatial structure, and the ancestral recombination graph.
These are contrasted with statistical models for inference, principle component
analysis and other `non-parametric' methods. The relationships between these
approaches are explored with both simulated and real-data examples. The
state-of-the-art practical software tools are discussed and contrasted. We
conclude that populations are a useful theoretical construct that can be well
defined in theory and often approximately exist in practice
Bayesian nonparametric Plackett-Luce models for the analysis of preferences for college degree programmes
In this paper we propose a Bayesian nonparametric model for clustering
partial ranking data. We start by developing a Bayesian nonparametric extension
of the popular Plackett-Luce choice model that can handle an infinite number of
choice items. Our framework is based on the theory of random atomic measures,
with the prior specified by a completely random measure. We characterise the
posterior distribution given data, and derive a simple and effective Gibbs
sampler for posterior simulation. We then develop a Dirichlet process mixture
extension of our model and apply it to investigate the clustering of
preferences for college degree programmes amongst Irish secondary school
graduates. The existence of clusters of applicants who have similar preferences
for degree programmes is established and we determine that subject matter and
geographical location of the third level institution characterise these
clusters.Comment: Published in at http://dx.doi.org/10.1214/14-AOAS717 the Annals of
Applied Statistics (http://www.imstat.org/aoas/) by the Institute of
Mathematical Statistics (http://www.imstat.org
A survey of statistical network models
Networks are ubiquitous in science and have become a focal point for
discussion in everyday life. Formal statistical models for the analysis of
network data have emerged as a major topic of interest in diverse areas of
study, and most of these involve a form of graphical representation.
Probability models on graphs date back to 1959. Along with empirical studies in
social psychology and sociology from the 1960s, these early works generated an
active network community and a substantial literature in the 1970s. This effort
moved into the statistical literature in the late 1970s and 1980s, and the past
decade has seen a burgeoning network literature in statistical physics and
computer science. The growth of the World Wide Web and the emergence of online
networking communities such as Facebook, MySpace, and LinkedIn, and a host of
more specialized professional network communities has intensified interest in
the study of networks and network data. Our goal in this review is to provide
the reader with an entry point to this burgeoning literature. We begin with an
overview of the historical development of statistical network modeling and then
we introduce a number of examples that have been studied in the network
literature. Our subsequent discussion focuses on a number of prominent static
and dynamic network models and their interconnections. We emphasize formal
model descriptions, and pay special attention to the interpretation of
parameters and their estimation. We end with a description of some open
problems and challenges for machine learning and statistics.Comment: 96 pages, 14 figures, 333 reference
Social Choice for Partial Preferences Using Imputation
Within the field of multiagent systems, the area of computational social choice considers
the problems arising when decisions must be made collectively by a group of agents.
Usually such systems collect a ranking of the alternatives from each member of the group
in turn, and aggregate these individual rankings to arrive at a collective decision. However,
when there are many alternatives to consider, individual agents may be unwilling, or
unable, to rank all of them, leading to decisions that must be made on the basis of incomplete
information. While earlier approaches attempt to work with the provided rankings
by making assumptions about the nature of the missing information, this can lead to undesirable
outcomes when the assumptions do not hold, and is ill-suited to certain problem
domains. In this thesis, we propose a new approach that uses machine learning algorithms
(both conventional and purpose-built) to generate plausible completions of each agent’s
rankings on the basis of the partial rankings the agent provided (imputations), in a way
that reflects the agents’ true preferences. We show that the combination of existing social
choice functions with certain classes of imputation algorithms, which forms the core of our
proposed solution, is equivalent to a form of social choice. Our system then undergoes
an extensive empirical validation under 40 different test conditions, involving more than
50,000 group decision problems generated from real-world electoral data, and is found
to outperform existing competitors significantly, leading to better group decisions overall.
Detailed empirical findings are also used to characterize the behaviour of the system,
and illustrate the circumstances in which it is most advantageous. A general testbed for
comparing solutions using real-world and artificial data (Prefmine) is then described, in
conjunction with results that justify its design decisions. We move on to propose a new
machine learning algorithm intended specifically to learn and impute the preferences of
agents, and validate its effectiveness. This Markov-Tree approach is demonstrated to be
superior to imputation using conventional machine learning, and has a simple interpretation
that characterizes the problems on which it will perform well. Later chapters contain
an axiomatic validation of both of our new approaches, as well as techniques for mitigating
their manipulability. The thesis concludes with a discussion of the applicability of its
contributions, both for multiagent systems and for settings involving human elections. In
all, we reveal an interesting connection between machine learning and computational social
choice, and introduce a testbed which facilitates future research efforts on computational
social choice for partial preferences, by allowing empirical comparisons between competing
approaches to be conducted easily, accurately, and quickly. Perhaps most importantly, we
offer an important and effective new direction for enabling group decision making when
preferences are not completely specified, using imputation methods
- …