2,543 research outputs found
Joint Geo-Spatial Preference and Pairwise Ranking for Point-of-Interest Recommendation
Recommending users with preferred point-of-interests (POIs) has become an important task for location-based social networks, which facilitates users' urban exploration by helping them filter out unattractive locations. Although the influence of geographical neighborhood has been studied in the rating prediction task (i.e. regression), few work have exploited it to develop a ranking-oriented objective function to improve top-N item recommendations. To solve this task, we conduct a manual inspection on real-world datasets, and find that each individual's traits are likely to cluster around multiple centers. Hence, we propose a co-pairwise ranking model based on the assumption that users prefer to assign higher ranks to the POIs near previously rated ones. The proposed method can learn preference ordering from non-observed rating pairs, and thus can alleviate the sparsity problem of matrix factorization. Evaluation on two publicly available datasets shows that our method performs significantly better than state-of-the-art techniques for the top-N item recommendation task
Probabilistic performance estimators for computational chemistry methods: Systematic Improvement Probability and Ranking Probability Matrix. I. Theory
The comparison of benchmark error sets is an essential tool for the
evaluation of theories in computational chemistry. The standard ranking of
methods by their Mean Unsigned Error is unsatisfactory for several reasons
linked to the non-normality of the error distributions and the presence of
underlying trends. Complementary statistics have recently been proposed to
palliate such deficiencies, such as quantiles of the absolute errors
distribution or the mean prediction uncertainty. We introduce here a new score,
the systematic improvement probability (SIP), based on the direct system-wise
comparison of absolute errors. Independently of the chosen scoring rule, the
uncertainty of the statistics due to the incompleteness of the benchmark data
sets is also generally overlooked. However, this uncertainty is essential to
appreciate the robustness of rankings. In the present article, we develop two
indicators based on robust statistics to address this problem: P_{inv}, the
inversion probability between two values of a statistic, and \mathbf{P}_{r},
the ranking probability matrix. We demonstrate also the essential contribution
of the correlations between error sets in these scores comparisons
Efficient Bayesian Inference for Generalized Bradley-Terry Models
The Bradley-Terry model is a popular approach to describe probabilities of
the possible outcomes when elements of a set are repeatedly compared with one
another in pairs. It has found many applications including animal behaviour,
chess ranking and multiclass classification. Numerous extensions of the basic
model have also been proposed in the literature including models with ties,
multiple comparisons, group comparisons and random graphs. From a computational
point of view, Hunter (2004) has proposed efficient iterative MM
(minorization-maximization) algorithms to perform maximum likelihood estimation
for these generalized Bradley-Terry models whereas Bayesian inference is
typically performed using MCMC (Markov chain Monte Carlo) algorithms based on
tailored Metropolis-Hastings (M-H) proposals. We show here that these MM\
algorithms can be reinterpreted as special instances of
Expectation-Maximization (EM) algorithms associated to suitable sets of latent
variables and propose some original extensions. These latent variables allow us
to derive simple Gibbs samplers for Bayesian inference. We demonstrate
experimentally the efficiency of these algorithms on a variety of applications
A Bayesian inference approach for determining player abilities in football
We consider the task of determining a football player's ability for a given
event type, for example, scoring a goal. We propose an interpretable Bayesian
model which is fit using variational inference methods. We implement a Poisson
model to capture occurrences of event types, from which we infer player
abilities. Our approach also allows the visualisation of differences between
players, for a specific ability, through the marginal posterior variational
densities. We then use these inferred player abilities to extend the Bayesian
hierarchical model of Baio and Blangiardo (2010) which captures a team's
scoring rate (the rate at which they score goals). We apply the resulting
scheme to the English Premier League, capturing player abilities over the
2013/2014 season, before using output from the hierarchical model to predict
whether over or under 2.5 goals will be scored in a given game in the 2014/2015
season. This validates our model as a way of providing insights into team
formation and the individual success of sports teams.Comment: 31 pages, 14 figure
Bayesian analysis of wandering vector models for displaying ranking data
In a process of examining k objects, each judge provides a ranking of them. The aim of this paper is to investigate a probabilistic model for ranking data - the wandering vector model. The model represents objects by points in a d-dimensional space, and the judges are represented by latent vectors emanating from the origin in the same space. Each judge samples a vector from a multivariate normal distribution; given this vector, the judge's utility assigned to an object is taken to be the length of the orthogonal projection of the object point onto the judge vector, plus a normally distributed random error. The ordering of the k utilities given by the judge determines the judge's ranking. A Bayesian approach and the Gibbs sampling technique are used for parameter estimation. The method of computing the marginal likelihood proposed by Chib (1995) is used to select the dimensionality of the model. Simulations are done to demonstrate the proposed estimation and model selection method. We then analyze the Goldberg data, in which 10 occupations are ranked according to the degree of social prestige.published_or_final_versio
Methods for Ordinal Peer Grading
MOOCs have the potential to revolutionize higher education with their wide
outreach and accessibility, but they require instructors to come up with
scalable alternates to traditional student evaluation. Peer grading -- having
students assess each other -- is a promising approach to tackling the problem
of evaluation at scale, since the number of "graders" naturally scales with the
number of students. However, students are not trained in grading, which means
that one cannot expect the same level of grading skills as in traditional
settings. Drawing on broad evidence that ordinal feedback is easier to provide
and more reliable than cardinal feedback, it is therefore desirable to allow
peer graders to make ordinal statements (e.g. "project X is better than project
Y") and not require them to make cardinal statements (e.g. "project X is a
B-"). Thus, in this paper we study the problem of automatically inferring
student grades from ordinal peer feedback, as opposed to existing methods that
require cardinal peer feedback. We formulate the ordinal peer grading problem
as a type of rank aggregation problem, and explore several probabilistic models
under which to estimate student grades and grader reliability. We study the
applicability of these methods using peer grading data collected from a real
class -- with instructor and TA grades as a baseline -- and demonstrate the
efficacy of ordinal feedback techniques in comparison to existing cardinal peer
grading methods. Finally, we compare these peer-grading techniques to
traditional evaluation techniques.Comment: Submitted to KDD 201
Multilingual Part-of-Speech Tagging: Two Unsupervised Approaches
We demonstrate the effectiveness of multilingual learning for unsupervised
part-of-speech tagging. The central assumption of our work is that by combining
cues from multiple languages, the structure of each becomes more apparent. We
consider two ways of applying this intuition to the problem of unsupervised
part-of-speech tagging: a model that directly merges tag structures for a pair
of languages into a single sequence and a second model which instead
incorporates multilingual context using latent variables. Both approaches are
formulated as hierarchical Bayesian models, using Markov Chain Monte Carlo
sampling techniques for inference. Our results demonstrate that by
incorporating multilingual evidence we can achieve impressive performance gains
across a range of scenarios. We also found that performance improves steadily
as the number of available languages increases
- …