82,832 research outputs found
Learning to Rank Academic Experts in the DBLP Dataset
Expert finding is an information retrieval task that is concerned with the
search for the most knowledgeable people with respect to a specific topic, and
the search is based on documents that describe people's activities. The task
involves taking a user query as input and returning a list of people who are
sorted by their level of expertise with respect to the user query. Despite
recent interest in the area, the current state-of-the-art techniques lack in
principled approaches for optimally combining different sources of evidence.
This article proposes two frameworks for combining multiple estimators of
expertise. These estimators are derived from textual contents, from
graph-structure of the citation patterns for the community of experts, and from
profile information about the experts. More specifically, this article explores
the use of supervised learning to rank methods, as well as rank aggregation
approaches, for combing all of the estimators of expertise. Several supervised
learning algorithms, which are representative of the pointwise, pairwise and
listwise approaches, were tested, and various state-of-the-art data fusion
techniques were also explored for the rank aggregation framework. Experiments
that were performed on a dataset of academic publications from the Computer
Science domain attest the adequacy of the proposed approaches.Comment: Expert Systems, 2013. arXiv admin note: text overlap with
arXiv:1302.041
Voting for candidates: adapting data fusion techniques for an expert search task
In an expert search task, the users' need is to identify people who have relevant expertise to a topic of interest. An expert search system predicts and ranks the expertise of a set of candidate persons with respect to the users' query. In this paper, we propose a novel approach for predicting and ranking candidate expertise with respect to a query. We see the problem of ranking experts as a voting problem, which we model by adapting eleven data fusion techniques.We investigate the effectiveness of the voting approach and the associated data fusion techniques across a range of document weighting models, in the context of the TREC 2005 Enterprise track. The evaluation results show that the voting paradigm is very effective, without using any collection specific heuristics. Moreover, we show that improving the quality of the underlying document representation can significantly improve the retrieval performance of the data fusion techniques on an expert search task. In particular, we demonstrate that applying field-based weighting models improves the ranking of candidates. Finally, we demonstrate that the relative performance of the adapted data fusion techniques for the proposed approach is stable regardless of the used weighting models
Modeling Documents as Mixtures of Persons for Expert Finding
In this paper we address the problem of searching for knowledgeable
persons within the enterprise, known as the expert finding (or
expert search) task. We present a probabilistic algorithm using the assumption
that terms in documents are produced by people who are mentioned
in them.We represent documents retrieved to a query as mixtures
of candidate experts language models. Two methods of personal language
models extraction are proposed, as well as the way of combining
them with other evidences of expertise. Experiments conducted with the
TREC Enterprise collection demonstrate the superiority of our approach
in comparison with the best one among existing solutions
Finding Academic Experts on a MultiSensor Approach using Shannon's Entropy
Expert finding is an information retrieval task concerned with the search for
the most knowledgeable people, in some topic, with basis on documents
describing peoples activities. The task involves taking a user query as input
and returning a list of people sorted by their level of expertise regarding the
user query. This paper introduces a novel approach for combining multiple
estimators of expertise based on a multisensor data fusion framework together
with the Dempster-Shafer theory of evidence and Shannon's entropy. More
specifically, we defined three sensors which detect heterogeneous information
derived from the textual contents, from the graph structure of the citation
patterns for the community of experts, and from profile information about the
academic experts. Given the evidences collected, each sensor may define
different candidates as experts and consequently do not agree in a final
ranking decision. To deal with these conflicts, we applied the Dempster-Shafer
theory of evidence combined with Shannon's Entropy formula to fuse this
information and come up with a more accurate and reliable final ranking list.
Experiments made over two datasets of academic publications from the Computer
Science domain attest for the adequacy of the proposed approach over the
traditional state of the art approaches. We also made experiments against
representative supervised state of the art algorithms. Results revealed that
the proposed method achieved a similar performance when compared to these
supervised techniques, confirming the capabilities of the proposed framework
Broad expertise retrieval in sparse data environments
Expertise retrieval has been largely unexplored on data other than the W3C collection. At the same time, many intranets of universities and other knowledge-intensive organisations offer examples of relatively small but clean multilingual expertise data, covering broad ranges of expertise areas. We first present two main expertise retrieval tasks, along with a set of baseline approaches based on generative language modeling, aimed at finding expertise relations between topics and people. For our experimental evaluation, we introduce (and release) a new test set based on a crawl of a university site. Using this test set, we conduct two series of experiments. The first is aimed at determining the effectiveness of baseline expertise retrieval methods applied to the new test set. The second is aimed at assessing refined models that exploit characteristic features of the new test set, such as the organizational structure of the university, and the hierarchical structure of the topics in the test set. Expertise retrieval models are shown to be robust with respect to environments smaller than the W3C collection, and current techniques appear to be generalizable to other settings
Unsupervised, Efficient and Semantic Expertise Retrieval
We introduce an unsupervised discriminative model for the task of retrieving
experts in online document collections. We exclusively employ textual evidence
and avoid explicit feature engineering by learning distributed word
representations in an unsupervised way. We compare our model to
state-of-the-art unsupervised statistical vector space and probabilistic
generative approaches. Our proposed log-linear model achieves the retrieval
performance levels of state-of-the-art document-centric methods with the low
inference cost of so-called profile-centric approaches. It yields a
statistically significant improved ranking over vector space and generative
models in most cases, matching the performance of supervised methods on various
benchmarks. That is, by using solely text we can do as well as methods that
work with external evidence and/or relevance feedback. A contrastive analysis
of rankings produced by discriminative and generative approaches shows that
they have complementary strengths due to the ability of the unsupervised
discriminative model to perform semantic matching.Comment: WWW2016, Proceedings of the 25th International Conference on World
Wide Web. 201
Coauthor prediction for junior researchers
Research collaboration can bring in different perspectives and generate more productive results. However, finding an appropriate collaborator can be difficult due to the lacking of sufficient information. Link prediction is a related technique for collaborator discovery; but its focus has been mostly on the core authors who have relatively more publications. We argue that junior researchers actually need more help in finding collaborators. Thus, in this paper, we focus on coauthor prediction for junior researchers. Most of the previous works on coauthor prediction considered global network feature and local network feature separately, or tried to combine local network feature and content feature. But we found a significant improvement by simply combing local network feature and global network feature. We further developed a regularization based approach to incorporate multiple features simultaneously. Experimental results demonstrated that this approach outperformed the simple linear combination of multiple features. We further showed that content features, which were proved to be useful in link prediction, can be easily integrated into our regularization approach. © 2013 Springer-Verlag
Being Omnipresent To Be Almighty: The Importance of The Global Web Evidence for Organizational Expert Finding
Modern expert nding algorithms are developed under the
assumption that all possible expertise evidence for a person
is concentrated in a company that currently employs the
person. The evidence that can be acquired outside of an
enterprise is traditionally unnoticed. At the same time, the
Web is full of personal information which is sufficiently detailed to judge about a person's skills and knowledge. In this work, we review various sources of expertise evidence out-side of an organization and experiment with rankings built on the data acquired from six dierent sources, accessible through APIs of two major web search engines. We show that these rankings and their combinations are often more realistic and of higher quality than rankings built on organizational data only
- …