140,907 research outputs found
Bibliographic Analysis on Research Publications using Authors, Categorical Labels and the Citation Network
Bibliographic analysis considers the author's research areas, the citation
network and the paper content among other things. In this paper, we combine
these three in a topic model that produces a bibliographic model of authors,
topics and documents, using a nonparametric extension of a combination of the
Poisson mixed-topic link model and the author-topic model. This gives rise to
the Citation Network Topic Model (CNTM). We propose a novel and efficient
inference algorithm for the CNTM to explore subsets of research publications
from CiteSeerX. The publication datasets are organised into three corpora,
totalling to about 168k publications with about 62k authors. The queried
datasets are made available online. In three publicly available corpora in
addition to the queried datasets, our proposed model demonstrates an improved
performance in both model fitting and document clustering, compared to several
baselines. Moreover, our model allows extraction of additional useful knowledge
from the corpora, such as the visualisation of the author-topics network.
Additionally, we propose a simple method to incorporate supervision into topic
modelling to achieve further improvement on the clustering task.Comment: Preprint for Journal Machine Learnin
Joint Modeling of Topics, Citations, and Topical Authority in Academic Corpora
Much of scientific progress stems from previously published findings, but
searching through the vast sea of scientific publications is difficult. We
often rely on metrics of scholarly authority to find the prominent authors but
these authority indices do not differentiate authority based on research
topics. We present Latent Topical-Authority Indexing (LTAI) for jointly
modeling the topics, citations, and topical authority in a corpus of academic
papers. Compared to previous models, LTAI differs in two main aspects. First,
it explicitly models the generative process of the citations, rather than
treating the citations as given. Second, it models each author's influence on
citations of a paper based on the topics of the cited papers, as well as the
citing papers. We fit LTAI to four academic corpora: CORA, Arxiv Physics, PNAS,
and Citeseer. We compare the performance of LTAI against various baselines,
starting with the latent Dirichlet allocation, to the more advanced models
including author-link topic model and dynamic author citation topic model. The
results show that LTAI achieves improved accuracy over other similar models
when predicting words, citations and authors of publications.Comment: Accepted by Transactions of the Association for Computational
Linguistics (TACL); to appea
Using Text Similarity to Detect Social Interactions not Captured by Formal Reply Mechanisms
In modeling social interaction online, it is important to understand when
people are reacting to each other. Many systems have explicit indicators of
replies, such as threading in discussion forums or replies and retweets in
Twitter. However, it is likely these explicit indicators capture only part of
people's reactions to each other, thus, computational social science approaches
that use them to infer relationships or influence are likely to miss the mark.
This paper explores the problem of detecting non-explicit responses, presenting
a new approach that uses tf-idf similarity between a user's own tweets and
recent tweets by people they follow. Based on a month's worth of posting data
from 449 ego networks in Twitter, this method demonstrates that it is likely
that at least 11% of reactions are not captured by the explicit reply and
retweet mechanisms. Further, these uncaptured reactions are not evenly
distributed between users: some users, who create replies and retweets without
using the official interface mechanisms, are much more responsive to followees
than they appear. This suggests that detecting non-explicit responses is an
important consideration in mitigating biases and building more accurate models
when using these markers to study social interaction and information diffusion.Comment: A final version of this work was published in the 2015 IEEE 11th
International Conference on e-Science (e-Science
Transforming Graph Representations for Statistical Relational Learning
Relational data representations have become an increasingly important topic
due to the recent proliferation of network datasets (e.g., social, biological,
information networks) and a corresponding increase in the application of
statistical relational learning (SRL) algorithms to these domains. In this
article, we examine a range of representation issues for graph-based relational
data. Since the choice of relational data representation for the nodes, links,
and features can dramatically affect the capabilities of SRL algorithms, we
survey approaches and opportunities for relational representation
transformation designed to improve the performance of these algorithms. This
leads us to introduce an intuitive taxonomy for data representation
transformations in relational domains that incorporates link transformation and
node transformation as symmetric representation tasks. In particular, the
transformation tasks for both nodes and links include (i) predicting their
existence, (ii) predicting their label or type, (iii) estimating their weight
or importance, and (iv) systematically constructing their relevant features. We
motivate our taxonomy through detailed examples and use it to survey and
compare competing approaches for each of these tasks. We also discuss general
conditions for transforming links, nodes, and features. Finally, we highlight
challenges that remain to be addressed
- …