29 research outputs found
The right expert at the right time and place: From expertise identification to expertise selection
We propose a unified and complete solution for expert finding in organizations, including not only expertise identification, but also expertise selection functionality. The latter two include the use of implicit and explicit preferences of users on meeting each other, as well as localization and planning as important auxiliary processes. We also propose a solution for privacy protection, which is urgently required in view of the huge amount of privacy sensitive data involved. Various parts are elaborated elsewhere, and we look forward to a realization and usage of the proposed system as a whole
Unsupervised, Efficient and Semantic Expertise Retrieval
We introduce an unsupervised discriminative model for the task of retrieving
experts in online document collections. We exclusively employ textual evidence
and avoid explicit feature engineering by learning distributed word
representations in an unsupervised way. We compare our model to
state-of-the-art unsupervised statistical vector space and probabilistic
generative approaches. Our proposed log-linear model achieves the retrieval
performance levels of state-of-the-art document-centric methods with the low
inference cost of so-called profile-centric approaches. It yields a
statistically significant improved ranking over vector space and generative
models in most cases, matching the performance of supervised methods on various
benchmarks. That is, by using solely text we can do as well as methods that
work with external evidence and/or relevance feedback. A contrastive analysis
of rankings produced by discriminative and generative approaches shows that
they have complementary strengths due to the ability of the unsupervised
discriminative model to perform semantic matching.Comment: WWW2016, Proceedings of the 25th International Conference on World
Wide Web. 201
Broad expertise retrieval in sparse data environments
Expertise retrieval has been largely unexplored on data other than the W3C collection. At the same time, many intranets of universities and other knowledge-intensive organisations offer examples of relatively small but clean multilingual expertise data, covering broad ranges of expertise areas. We first present two main expertise retrieval tasks, along with a set of baseline approaches based on generative language modeling, aimed at finding expertise relations between topics and people. For our experimental evaluation, we introduce (and release) a new test set based on a crawl of a university site. Using this test set, we conduct two series of experiments. The first is aimed at determining the effectiveness of baseline expertise retrieval methods applied to the new test set. The second is aimed at assessing refined models that exploit characteristic features of the new test set, such as the organizational structure of the university, and the hierarchical structure of the topics in the test set. Expertise retrieval models are shown to be robust with respect to environments smaller than the W3C collection, and current techniques appear to be generalizable to other settings
Finding Academic Experts on a MultiSensor Approach using Shannon's Entropy
Expert finding is an information retrieval task concerned with the search for
the most knowledgeable people, in some topic, with basis on documents
describing peoples activities. The task involves taking a user query as input
and returning a list of people sorted by their level of expertise regarding the
user query. This paper introduces a novel approach for combining multiple
estimators of expertise based on a multisensor data fusion framework together
with the Dempster-Shafer theory of evidence and Shannon's entropy. More
specifically, we defined three sensors which detect heterogeneous information
derived from the textual contents, from the graph structure of the citation
patterns for the community of experts, and from profile information about the
academic experts. Given the evidences collected, each sensor may define
different candidates as experts and consequently do not agree in a final
ranking decision. To deal with these conflicts, we applied the Dempster-Shafer
theory of evidence combined with Shannon's Entropy formula to fuse this
information and come up with a more accurate and reliable final ranking list.
Experiments made over two datasets of academic publications from the Computer
Science domain attest for the adequacy of the proposed approach over the
traditional state of the art approaches. We also made experiments against
representative supervised state of the art algorithms. Results revealed that
the proposed method achieved a similar performance when compared to these
supervised techniques, confirming the capabilities of the proposed framework
Learning to Rank Academic Experts in the DBLP Dataset
Expert finding is an information retrieval task that is concerned with the
search for the most knowledgeable people with respect to a specific topic, and
the search is based on documents that describe people's activities. The task
involves taking a user query as input and returning a list of people who are
sorted by their level of expertise with respect to the user query. Despite
recent interest in the area, the current state-of-the-art techniques lack in
principled approaches for optimally combining different sources of evidence.
This article proposes two frameworks for combining multiple estimators of
expertise. These estimators are derived from textual contents, from
graph-structure of the citation patterns for the community of experts, and from
profile information about the experts. More specifically, this article explores
the use of supervised learning to rank methods, as well as rank aggregation
approaches, for combing all of the estimators of expertise. Several supervised
learning algorithms, which are representative of the pointwise, pairwise and
listwise approaches, were tested, and various state-of-the-art data fusion
techniques were also explored for the rank aggregation framework. Experiments
that were performed on a dataset of academic publications from the Computer
Science domain attest the adequacy of the proposed approaches.Comment: Expert Systems, 2013. arXiv admin note: text overlap with
arXiv:1302.041
Mining Experts in Technical Online Forums
Many organizations use or host discussion lists, in the form of online forums and email lists. Analyzing the content of those discussion lists is an effective solution to the task of expert finding, since experts tend to participate often by giving advice, and receive the best feedback. We present a novel method to identify positive comments that helps to identify experts by combining author statistics with polarity mining. Our method is able to distinguish experts from flamers and other people that simply participates frequently in discussions. We demonstrate the validity of our approach by evaluating it with an online discussion forum in Spanish.Sociedad Argentina de Informática e Investigación Operativ
Information Retrieval and Machine Learning Methods for Academic Expert Finding
In the context of academic expert finding, this paper investigates and compares the
performance of information retrieval (IR) and machine learning (ML) methods, including deep
learning, to approach the problem of identifying academic figures who are experts in different
domains when a potential user requests their expertise. IR-based methods construct multifaceted
textual profiles for each expert by clustering information from their scientific publications. Several
methods fully tailored for this problem are presented in this paper. In contrast, ML-based methods
treat expert finding as a classification task, training automatic text classifiers using publications
authored by experts. By comparing these approaches, we contribute to a deeper understanding of
academic-expert-finding techniques and their applicability in knowledge discovery. These methods
are tested with two large datasets from the biomedical field: PMSC-UGR and CORD-19. The results
show how IR techniques were, in general, more robust with both datasets and more suitable than the
ML-based ones, with some exceptions showing good performance.Spanish “Agencia Estatal de Investigación” under
grants PID2019-106758GB-C31 and PID2020-113230RB-C22Spanish “FEDER/Junta de Andalucía-Consejería de Transformación Económica, Industria, Conocimiento y Universidades” under
grant A-TIC-146-UGR20European Regional Development Fund (ERDF-FEDER