10,174 research outputs found

    Bayesian nonparametric models for ranked data

    Get PDF
    We develop a Bayesian nonparametric extension of the popular Plackett-Luce choice model that can handle an infinite number of choice items. Our framework is based on the theory of random atomic measures, with the prior specified by a gamma process. We derive a posterior characterization and a simple and effective Gibbs sampler for posterior simulation. We develop a time-varying extension of our model, and apply it to the New York Times lists of weekly bestselling books.Comment: NIPS - Neural Information Processing Systems (2012

    Bayesian Plackett--Luce Mixture Models for Partially Ranked Data

    Get PDF
    The elicitation of an ordinal judgment on multiple alternatives is often required in many psychological and behavioral experiments to investigate preference/choice orientation of a specific population. The Plackett–Luce model is one of the most popular and frequently applied parametric distributions to analyze rankings of a finite set of items. The present work introduces a Bayesian finite mixture of Plackett–Luce models to account for unobserved sample heterogeneity of partially ranked data. We describe an efficient way to incorporate the latent group structure in the data augmentation approach and the derivation of existing maximum likelihood procedures as special instances of the proposed Bayesian method. Inference can be conducted with the combination of the Expectation-Maximization algorithm for maximum a posteriori estimation and the Gibbs sampling iterative procedure.We additionally investigate several Bayesian criteria for selecting the optimal mixture configuration and describe diagnostic tools for assessing the fitness of ranking distributions conditionally and unconditionally on the number of ranked items. The utility of the novel Bayesian parametric Plackett–Luce mixture for characterizing sample heterogeneity is illustrated with several applications to simulated and real preference ranked data. We compare our method with the frequentist approach and a Bayesian nonparametric mixture model both assuming the Plackett–Luce model as a mixture component. Our analysis on real datasets reveals the importance of an accurate diagnostic check for an appropriate in-depth understanding of the heterogenous nature of the partial ranking data

    A nonparametric Bayesian approach to the rare type match problem

    Full text link
    The "rare type match problem" is the situation in which the suspect's DNA profile, matching the DNA profile of the crime stain, is not in the database of reference. The evaluation of this match in the light of the two competing hypotheses (the crime stain has been left by the suspect or by another person) is based on the calculation of the likelihood ratio and depends on the population proportions of the DNA profiles, that are unknown. We propose a Bayesian nonparametric method that uses a two-parameter Poisson Dirichlet distribution as a prior over the ranked population proportions, and discards the information about the names of the different DNA profiles. This fits very well the data coming from European Y-STR DNA profiles, and the calculation of the likelihood ratio becomes quite simple thanks to a justified Empirical Bayes approach.Comment: arXiv admin note: text overlap with arXiv:1506.0844

    Scalable Bayesian nonparametric regression via a Plackett-Luce model for conditional ranks

    Full text link
    We present a novel Bayesian nonparametric regression model for covariates X and continuous, real response variable Y. The model is parametrized in terms of marginal distributions for Y and X and a regression function which tunes the stochastic ordering of the conditional distributions F(y|x). By adopting an approximate composite likelihood approach, we show that the resulting posterior inference can be decoupled for the separate components of the model. This procedure can scale to very large datasets and allows for the use of standard, existing, software from Bayesian nonparametric density estimation and Plackett-Luce ranking estimation to be applied. As an illustration, we show an application of our approach to a US Census dataset, with over 1,300,000 data points and more than 100 covariates

    Incremental Learning of Nonparametric Bayesian Mixture Models

    Get PDF
    Clustering is a fundamental task in many vision applications. To date, most clustering algorithms work in a batch setting and training examples must be gathered in a large group before learning can begin. Here we explore incremental clustering, in which data can arrive continuously. We present a novel incremental model-based clustering algorithm based on nonparametric Bayesian methods, which we call Memory Bounded Variational Dirichlet Process (MB-VDP). The number of clusters are determined flexibly by the data and the approach can be used to automatically discover object categories. The computational requirements required to produce model updates are bounded and do not grow with the amount of data processed. The technique is well suited to very large datasets, and we show that our approach outperforms existing online alternatives for learning nonparametric Bayesian mixture models

    Evolution of statistical analysis in empirical software engineering research: Current state and steps forward

    Full text link
    Software engineering research is evolving and papers are increasingly based on empirical data from a multitude of sources, using statistical tests to determine if and to what degree empirical evidence supports their hypotheses. To investigate the practices and trends of statistical analysis in empirical software engineering (ESE), this paper presents a review of a large pool of papers from top-ranked software engineering journals. First, we manually reviewed 161 papers and in the second phase of our method, we conducted a more extensive semi-automatic classification of papers spanning the years 2001--2015 and 5,196 papers. Results from both review steps was used to: i) identify and analyze the predominant practices in ESE (e.g., using t-test or ANOVA), as well as relevant trends in usage of specific statistical methods (e.g., nonparametric tests and effect size measures) and, ii) develop a conceptual model for a statistical analysis workflow with suggestions on how to apply different statistical methods as well as guidelines to avoid pitfalls. Lastly, we confirm existing claims that current ESE practices lack a standard to report practical significance of results. We illustrate how practical significance can be discussed in terms of both the statistical analysis and in the practitioner's context.Comment: journal submission, 34 pages, 8 figure
    corecore