15,836 research outputs found
Joint Modeling of Topics, Citations, and Topical Authority in Academic Corpora
Much of scientific progress stems from previously published findings, but
searching through the vast sea of scientific publications is difficult. We
often rely on metrics of scholarly authority to find the prominent authors but
these authority indices do not differentiate authority based on research
topics. We present Latent Topical-Authority Indexing (LTAI) for jointly
modeling the topics, citations, and topical authority in a corpus of academic
papers. Compared to previous models, LTAI differs in two main aspects. First,
it explicitly models the generative process of the citations, rather than
treating the citations as given. Second, it models each author's influence on
citations of a paper based on the topics of the cited papers, as well as the
citing papers. We fit LTAI to four academic corpora: CORA, Arxiv Physics, PNAS,
and Citeseer. We compare the performance of LTAI against various baselines,
starting with the latent Dirichlet allocation, to the more advanced models
including author-link topic model and dynamic author citation topic model. The
results show that LTAI achieves improved accuracy over other similar models
when predicting words, citations and authors of publications.Comment: Accepted by Transactions of the Association for Computational
Linguistics (TACL); to appea
Longitudinal Citation Prediction using Temporal Graph Neural Networks
Citation count prediction is the task of predicting the number of citations a
paper has gained after a period of time. Prior work viewed this as a static
prediction task. As papers and their citations evolve over time, considering
the dynamics of the number of citations a paper will receive would seem
logical. Here, we introduce the task of sequence citation prediction, where the
goal is to accurately predict the trajectory of the number of citations a
scholarly work receives over time. We propose to view papers as a structured
network of citations, allowing us to use topological information as a learning
signal. Additionally, we learn how this dynamic citation network changes over
time and the impact of paper meta-data such as authors, venues and abstracts.
To approach the introduced task, we derive a dynamic citation network from
Semantic Scholar which spans over 42 years. We present a model which exploits
topological and temporal information using graph convolution networks paired
with sequence prediction, and compare it against multiple baselines, testing
the importance of topological and temporal information and analyzing model
performance. Our experiments show that leveraging both the temporal and
topological information greatly increases the performance of predicting
citation counts over time
Predicting Scientific Success Based on Coauthorship Networks
We address the question to what extent the success of scientific articles is
due to social influence. Analyzing a data set of over 100000 publications from
the field of Computer Science, we study how centrality in the coauthorship
network differs between authors who have highly cited papers and those who do
not. We further show that a machine learning classifier, based only on
coauthorship network centrality measures at time of publication, is able to
predict with high precision whether an article will be highly cited five years
after publication. By this we provide quantitative insight into the social
dimension of scientific publishing - challenging the perception of citations as
an objective, socially unbiased measure of scientific success.Comment: 21 pages, 2 figures, incl. Supplementary Materia
The Child is Father of the Man: Foresee the Success at the Early Stage
Understanding the dynamic mechanisms that drive the high-impact scientific
work (e.g., research papers, patents) is a long-debated research topic and has
many important implications, ranging from personal career development and
recruitment search, to the jurisdiction of research resources. Recent advances
in characterizing and modeling scientific success have made it possible to
forecast the long-term impact of scientific work, where data mining techniques,
supervised learning in particular, play an essential role. Despite much
progress, several key algorithmic challenges in relation to predicting
long-term scientific impact have largely remained open. In this paper, we
propose a joint predictive model to forecast the long-term scientific impact at
the early stage, which simultaneously addresses a number of these open
challenges, including the scholarly feature design, the non-linearity, the
domain-heterogeneity and dynamics. In particular, we formulate it as a
regularized optimization problem and propose effective and scalable algorithms
to solve it. We perform extensive empirical evaluations on large, real
scholarly data sets to validate the effectiveness and the efficiency of our
method.Comment: Correct some typos in our KDD pape
Exploring Features for Predicting Policy Citations
In this study we performed an initial investigation and evaluation of
altmetrics and their relationship with public policy citation of research
papers. We examined methods for using altmetrics and other data to predict
whether a research paper is cited in public policy and applied receiver
operating characteristic curve on various feature groups in order to evaluate
their potential usefulness. From the methods we tested, classifying based on
tweet count provided the best results, achieving an area under the ROC curve of
0.91.Comment: 2 pages, accepted to JCDL '1
Measuring academic influence: Not all citations are equal
The importance of a research article is routinely measured by counting how
many times it has been cited. However, treating all citations with equal weight
ignores the wide variety of functions that citations perform. We want to
automatically identify the subset of references in a bibliography that have a
central academic influence on the citing paper. For this purpose, we examine
the effectiveness of a variety of features for determining the academic
influence of a citation. By asking authors to identify the key references in
their own work, we created a data set in which citations were labeled according
to their academic influence. Using automatic feature selection with supervised
machine learning, we found a model for predicting academic influence that
achieves good performance on this data set using only four features. The best
features, among those we evaluated, were those based on the number of times a
reference is mentioned in the body of a citing paper. The performance of these
features inspired us to design an influence-primed h-index (the hip-index).
Unlike the conventional h-index, it weights citations by how many times a
reference is mentioned. According to our experiments, the hip-index is a better
indicator of researcher performance than the conventional h-index
- …