10,430 research outputs found
Learning Reputation in an Authorship Network
The problem of searching for experts in a given academic field is hugely
important in both industry and academia. We study exactly this issue with
respect to a database of authors and their publications. The idea is to use
Latent Semantic Indexing (LSI) and Latent Dirichlet Allocation (LDA) to perform
topic modelling in order to find authors who have worked in a query field. We
then construct a coauthorship graph and motivate the use of influence
maximisation and a variety of graph centrality measures to obtain a ranked list
of experts. The ranked lists are further improved using a Markov Chain-based
rank aggregation approach. The complete method is readily scalable to large
datasets. To demonstrate the efficacy of the approach we report on an extensive
set of computational simulations using the Arnetminer dataset. An improvement
in mean average precision is demonstrated over the baseline case of simply
using the order of authors found by the topic models
Topic based language models for ad hoc information retrieval
We propose a topic based approach lo language
modelling for ad-hoc Information Retrieval (IR). Many smoothed estimators used for the multinomial query model in IR rely upon the estimated background collection probabilities. In this paper, we propose a topic based language modelling approach, that uses a more informative prior based on the topical content of a document. In our experiments, the proposed model provides comparable IR performance to the standard models, but when combined in a two stage language model, it outperforms all other estimated models
Joint Modeling of Topics, Citations, and Topical Authority in Academic Corpora
Much of scientific progress stems from previously published findings, but
searching through the vast sea of scientific publications is difficult. We
often rely on metrics of scholarly authority to find the prominent authors but
these authority indices do not differentiate authority based on research
topics. We present Latent Topical-Authority Indexing (LTAI) for jointly
modeling the topics, citations, and topical authority in a corpus of academic
papers. Compared to previous models, LTAI differs in two main aspects. First,
it explicitly models the generative process of the citations, rather than
treating the citations as given. Second, it models each author's influence on
citations of a paper based on the topics of the cited papers, as well as the
citing papers. We fit LTAI to four academic corpora: CORA, Arxiv Physics, PNAS,
and Citeseer. We compare the performance of LTAI against various baselines,
starting with the latent Dirichlet allocation, to the more advanced models
including author-link topic model and dynamic author citation topic model. The
results show that LTAI achieves improved accuracy over other similar models
when predicting words, citations and authors of publications.Comment: Accepted by Transactions of the Association for Computational
Linguistics (TACL); to appea
Characterizing Geo-located Tweets in Brazilian Megacities
This work presents a framework for collecting, processing and mining
geo-located tweets in order to extract meaningful and actionable knowledge in
the context of smart cities. We collected and characterized more than 9M tweets
from the two biggest cities in Brazil, Rio de Janeiro and S\~ao Paulo. We
performed topic modeling using the Latent Dirichlet Allocation model to produce
an unsupervised distribution of semantic topics over the stream of geo-located
tweets as well as a distribution of words over those topics. We manually
labeled and aggregated similar topics obtaining a total of 29 different topics
across both cities. Results showed similarities in the majority of topics for
both cities, reflecting similar interests and concerns among the population of
Rio de Janeiro and S\~ao Paulo. Nevertheless, some specific topics are more
predominant in one of the cities
- …