54,966 research outputs found
Ranking Median Regression: Learning to Order through Local Consensus
This article is devoted to the problem of predicting the value taken by a
random permutation , describing the preferences of an individual over a
set of numbered items say, based on the observation of
an input/explanatory r.v. e.g. characteristics of the individual), when
error is measured by the Kendall distance. In the probabilistic
formulation of the 'Learning to Order' problem we propose, which extends the
framework for statistical Kemeny ranking aggregation developped in
\citet{CKS17}, this boils down to recovering conditional Kemeny medians of
given from i.i.d. training examples . For this reason, this statistical learning problem is
referred to as \textit{ranking median regression} here. Our contribution is
twofold. We first propose a probabilistic theory of ranking median regression:
the set of optimal elements is characterized, the performance of empirical risk
minimizers is investigated in this context and situations where fast learning
rates can be achieved are also exhibited. Next we introduce the concept of
local consensus/median, in order to derive efficient methods for ranking median
regression. The major advantage of this local learning approach lies in its
close connection with the widely studied Kemeny aggregation problem. From an
algorithmic perspective, this permits to build predictive rules for ranking
median regression by implementing efficient techniques for (approximate) Kemeny
median computations at a local level in a tractable manner. In particular,
versions of -nearest neighbor and tree-based methods, tailored to ranking
median regression, are investigated. Accuracy of piecewise constant ranking
median regression rules is studied under a specific smoothness assumption for
's conditional distribution given
Finding Academic Experts on a MultiSensor Approach using Shannon's Entropy
Expert finding is an information retrieval task concerned with the search for
the most knowledgeable people, in some topic, with basis on documents
describing peoples activities. The task involves taking a user query as input
and returning a list of people sorted by their level of expertise regarding the
user query. This paper introduces a novel approach for combining multiple
estimators of expertise based on a multisensor data fusion framework together
with the Dempster-Shafer theory of evidence and Shannon's entropy. More
specifically, we defined three sensors which detect heterogeneous information
derived from the textual contents, from the graph structure of the citation
patterns for the community of experts, and from profile information about the
academic experts. Given the evidences collected, each sensor may define
different candidates as experts and consequently do not agree in a final
ranking decision. To deal with these conflicts, we applied the Dempster-Shafer
theory of evidence combined with Shannon's Entropy formula to fuse this
information and come up with a more accurate and reliable final ranking list.
Experiments made over two datasets of academic publications from the Computer
Science domain attest for the adequacy of the proposed approach over the
traditional state of the art approaches. We also made experiments against
representative supervised state of the art algorithms. Results revealed that
the proposed method achieved a similar performance when compared to these
supervised techniques, confirming the capabilities of the proposed framework
Learning to Rank Academic Experts in the DBLP Dataset
Expert finding is an information retrieval task that is concerned with the
search for the most knowledgeable people with respect to a specific topic, and
the search is based on documents that describe people's activities. The task
involves taking a user query as input and returning a list of people who are
sorted by their level of expertise with respect to the user query. Despite
recent interest in the area, the current state-of-the-art techniques lack in
principled approaches for optimally combining different sources of evidence.
This article proposes two frameworks for combining multiple estimators of
expertise. These estimators are derived from textual contents, from
graph-structure of the citation patterns for the community of experts, and from
profile information about the experts. More specifically, this article explores
the use of supervised learning to rank methods, as well as rank aggregation
approaches, for combing all of the estimators of expertise. Several supervised
learning algorithms, which are representative of the pointwise, pairwise and
listwise approaches, were tested, and various state-of-the-art data fusion
techniques were also explored for the rank aggregation framework. Experiments
that were performed on a dataset of academic publications from the Computer
Science domain attest the adequacy of the proposed approaches.Comment: Expert Systems, 2013. arXiv admin note: text overlap with
arXiv:1302.041
RankMerging: A supervised learning-to-rank framework to predict links in large social network
Uncovering unknown or missing links in social networks is a difficult task
because of their sparsity and because links may represent different types of
relationships, characterized by different structural patterns. In this paper,
we define a simple yet efficient supervised learning-to-rank framework, called
RankMerging, which aims at combining information provided by various
unsupervised rankings. We illustrate our method on three different kinds of
social networks and show that it substantially improves the performances of
unsupervised metrics of ranking. We also compare it to other combination
strategies based on standard methods. Finally, we explore various aspects of
RankMerging, such as feature selection and parameter estimation and discuss its
area of relevance: the prediction of an adjustable number of links on large
networks.Comment: 43 pages, published in Machine Learning Journa
Fast and Robust Rank Aggregation against Model Misspecification
In rank aggregation, preferences from different users are summarized into a
total order under the homogeneous data assumption. Thus, model misspecification
arises and rank aggregation methods take some noise models into account.
However, they all rely on certain noise model assumptions and cannot handle
agnostic noises in the real world. In this paper, we propose CoarsenRank, which
rectifies the underlying data distribution directly and aligns it to the
homogeneous data assumption without involving any noise model. To this end, we
define a neighborhood of the data distribution over which Bayesian inference of
CoarsenRank is performed, and therefore the resultant posterior enjoys
robustness against model misspecification. Further, we derive a tractable
closed-form solution for CoarsenRank making it computationally efficient.
Experiments on real-world datasets show that CoarsenRank is fast and robust,
achieving consistent improvement over baseline methods
- …