60 research outputs found

    Top-Rank Enhanced Listwise Optimization for Statistical Machine Translation

    Full text link
    Pairwise ranking methods are the basis of many widely used discriminative training approaches for structure prediction problems in natural language processing(NLP). Decomposing the problem of ranking hypotheses into pairwise comparisons enables simple and efficient solutions. However, neglecting the global ordering of the hypothesis list may hinder learning. We propose a listwise learning framework for structure prediction problems such as machine translation. Our framework directly models the entire translation list's ordering to learn parameters which may better fit the given listwise samples. Furthermore, we propose top-rank enhanced loss functions, which are more sensitive to ranking errors at higher positions. Experiments on a large-scale Chinese-English translation task show that both our listwise learning framework and top-rank enhanced listwise losses lead to significant improvements in translation quality.Comment: Accepted to CONLL 201

    Bayesian nonparametric models for ranked data

    Get PDF
    We develop a Bayesian nonparametric extension of the popular Plackett-Luce choice model that can handle an infinite number of choice items. Our framework is based on the theory of random atomic measures, with the prior specified by a gamma process. We derive a posterior characterization and a simple and effective Gibbs sampler for posterior simulation. We develop a time-varying extension of our model, and apply it to the New York Times lists of weekly bestselling books.Comment: NIPS - Neural Information Processing Systems (2012

    Minimax-optimal Inference from Partial Rankings

    Full text link
    This paper studies the problem of inferring a global preference based on the partial rankings provided by many users over different subsets of items according to the Plackett-Luce model. A question of particular interest is how to optimally assign items to users for ranking and how many item assignments are needed to achieve a target estimation error. For a given assignment of items to users, we first derive an oracle lower bound of the estimation error that holds even for the more general Thurstone models. Then we show that the Cram\'er-Rao lower bound and our upper bounds inversely depend on the spectral gap of the Laplacian of an appropriately defined comparison graph. When the system is allowed to choose the item assignment, we propose a random assignment scheme. Our oracle lower bound and upper bounds imply that it is minimax-optimal up to a logarithmic factor among all assignment schemes and the lower bound can be achieved by the maximum likelihood estimator as well as popular rank-breaking schemes that decompose partial rankings into pairwise comparisons. The numerical experiments corroborate our theoretical findings.Comment: 16 pages, 2 figure

    Efficient Bayesian Inference for Generalized Bradley-Terry Models

    Full text link
    The Bradley-Terry model is a popular approach to describe probabilities of the possible outcomes when elements of a set are repeatedly compared with one another in pairs. It has found many applications including animal behaviour, chess ranking and multiclass classification. Numerous extensions of the basic model have also been proposed in the literature including models with ties, multiple comparisons, group comparisons and random graphs. From a computational point of view, Hunter (2004) has proposed efficient iterative MM (minorization-maximization) algorithms to perform maximum likelihood estimation for these generalized Bradley-Terry models whereas Bayesian inference is typically performed using MCMC (Markov chain Monte Carlo) algorithms based on tailored Metropolis-Hastings (M-H) proposals. We show here that these MM\ algorithms can be reinterpreted as special instances of Expectation-Maximization (EM) algorithms associated to suitable sets of latent variables and propose some original extensions. These latent variables allow us to derive simple Gibbs samplers for Bayesian inference. We demonstrate experimentally the efficiency of these algorithms on a variety of applications

    Multi-Level Spatial Comparative Judgement Models To Map Deprivation

    Get PDF
    While current comparative judgement models provide strong algorithmic efficiency, they remain data inefficient, often requiring days or weeks of extensive data collection to provide sufficient pair- wise comparisons for stable and accurate parameter estimation. This disparity between data and algorithm efficiency is preventing widespread adoption, especially so in challenging data-collection environments such as mapping human rights abuses. We address the data inefficiency challenge by introducing the finite element Gaussian process Bradley–Terry mixture model, an approach that significantly reduces the number of pairwise comparisons required by comparative judgement mod- els. This is achieved via integration of prior spatial assumptions, encoded as a mixture of functions, each function introducing a spatial smoothness constraint at a specific resolution. These functions are modelled nonparametrically, through Gaussian process prior distributions. We use our method to map deprivation in the city of Dar es Salaam, Tanzania and locate slums in the city where poverty reduction measures can be carried out

    Inferring from an imprecise Plackett–Luce model : application to label ranking

    Get PDF
    Learning ranking models is a difficult task, in which data may be scarce and cautious predictions desirable. To address such issues, we explore the extension of the popular parametric probabilistic Plackett–Luce model, often used to model rankings, to the imprecise setting where estimated parameters are set-valued. In particular, we study how to achieve cautious or conservative inference with it, and illustrate their application on label ranking problems, a specific supervised learning task
    • …
    corecore