10,174 research outputs found
Bayesian nonparametric models for ranked data
We develop a Bayesian nonparametric extension of the popular Plackett-Luce
choice model that can handle an infinite number of choice items. Our framework
is based on the theory of random atomic measures, with the prior specified by a
gamma process. We derive a posterior characterization and a simple and
effective Gibbs sampler for posterior simulation. We develop a time-varying
extension of our model, and apply it to the New York Times lists of weekly
bestselling books.Comment: NIPS - Neural Information Processing Systems (2012
Bayesian Plackett--Luce Mixture Models for Partially Ranked Data
The elicitation of an ordinal judgment on multiple alternatives is often required in many psychological
and behavioral experiments to investigate preference/choice orientation of a specific population. The
Plackett–Luce model is one of the most popular and frequently applied parametric distributions to analyze
rankings of a finite set of items. The present work introduces a Bayesian finite mixture of Plackett–Luce
models to account for unobserved sample heterogeneity of partially ranked data. We describe an efficient
way to incorporate the latent group structure in the data augmentation approach and the derivation of existing
maximum likelihood procedures as special instances of the proposed Bayesian method. Inference can
be conducted with the combination of the Expectation-Maximization algorithm for maximum a posteriori
estimation and the Gibbs sampling iterative procedure.We additionally investigate several Bayesian criteria
for selecting the optimal mixture configuration and describe diagnostic tools for assessing the fitness of
ranking distributions conditionally and unconditionally on the number of ranked items. The utility of the
novel Bayesian parametric Plackett–Luce mixture for characterizing sample heterogeneity is illustrated
with several applications to simulated and real preference ranked data. We compare our method with the
frequentist approach and a Bayesian nonparametric mixture model both assuming the Plackett–Luce model
as a mixture component. Our analysis on real datasets reveals the importance of an accurate diagnostic
check for an appropriate in-depth understanding of the heterogenous nature of the partial ranking data
A nonparametric Bayesian approach to the rare type match problem
The "rare type match problem" is the situation in which the suspect's DNA
profile, matching the DNA profile of the crime stain, is not in the database of
reference. The evaluation of this match in the light of the two competing
hypotheses (the crime stain has been left by the suspect or by another person)
is based on the calculation of the likelihood ratio and depends on the
population proportions of the DNA profiles, that are unknown. We propose a
Bayesian nonparametric method that uses a two-parameter Poisson Dirichlet
distribution as a prior over the ranked population proportions, and discards
the information about the names of the different DNA profiles. This fits very
well the data coming from European Y-STR DNA profiles, and the calculation of
the likelihood ratio becomes quite simple thanks to a justified Empirical Bayes
approach.Comment: arXiv admin note: text overlap with arXiv:1506.0844
Scalable Bayesian nonparametric regression via a Plackett-Luce model for conditional ranks
We present a novel Bayesian nonparametric regression model for covariates X
and continuous, real response variable Y. The model is parametrized in terms of
marginal distributions for Y and X and a regression function which tunes the
stochastic ordering of the conditional distributions F(y|x). By adopting an
approximate composite likelihood approach, we show that the resulting posterior
inference can be decoupled for the separate components of the model. This
procedure can scale to very large datasets and allows for the use of standard,
existing, software from Bayesian nonparametric density estimation and
Plackett-Luce ranking estimation to be applied. As an illustration, we show an
application of our approach to a US Census dataset, with over 1,300,000 data
points and more than 100 covariates
Incremental Learning of Nonparametric Bayesian Mixture Models
Clustering is a fundamental task in many vision applications.
To date, most clustering algorithms work in a
batch setting and training examples must be gathered in a
large group before learning can begin. Here we explore
incremental clustering, in which data can arrive continuously.
We present a novel incremental model-based clustering
algorithm based on nonparametric Bayesian methods,
which we call Memory Bounded Variational Dirichlet
Process (MB-VDP). The number of clusters are determined
flexibly by the data and the approach can be used to automatically
discover object categories. The computational requirements
required to produce model updates are bounded
and do not grow with the amount of data processed. The
technique is well suited to very large datasets, and we show
that our approach outperforms existing online alternatives
for learning nonparametric Bayesian mixture models
Evolution of statistical analysis in empirical software engineering research: Current state and steps forward
Software engineering research is evolving and papers are increasingly based
on empirical data from a multitude of sources, using statistical tests to
determine if and to what degree empirical evidence supports their hypotheses.
To investigate the practices and trends of statistical analysis in empirical
software engineering (ESE), this paper presents a review of a large pool of
papers from top-ranked software engineering journals. First, we manually
reviewed 161 papers and in the second phase of our method, we conducted a more
extensive semi-automatic classification of papers spanning the years 2001--2015
and 5,196 papers. Results from both review steps was used to: i) identify and
analyze the predominant practices in ESE (e.g., using t-test or ANOVA), as well
as relevant trends in usage of specific statistical methods (e.g.,
nonparametric tests and effect size measures) and, ii) develop a conceptual
model for a statistical analysis workflow with suggestions on how to apply
different statistical methods as well as guidelines to avoid pitfalls. Lastly,
we confirm existing claims that current ESE practices lack a standard to report
practical significance of results. We illustrate how practical significance can
be discussed in terms of both the statistical analysis and in the
practitioner's context.Comment: journal submission, 34 pages, 8 figure
- …