185,197 research outputs found
Prediction of highly cited papers
In an article written five years ago [arXiv:0809.0522], we described a method
for predicting which scientific papers will be highly cited in the future, even
if they are currently not highly cited. Applying the method to real citation
data we made predictions about papers we believed would end up being well
cited. Here we revisit those predictions, five years on, to see how well we
did. Among the over 2000 papers in our original data set, we examine the fifty
that, by the measures of our previous study, were predicted to do best and we
find that they have indeed received substantially more citations in the
intervening years than other papers, even after controlling for the number of
prior citations. On average these top fifty papers have received 23 times as
many citations in the last five years as the average paper in the data set as a
whole, and 15 times as many as the average paper in a randomly drawn control
group that started out with the same number of citations. Applying our
prediction technique to current data, we also make new predictions of papers
that we believe will be well cited in the next few years.Comment: 6 pages, 3 figures, 2 table
Why so many published sensitivity analyses are false: a systematic review of sensitivity analysis practices
Sensitivity analysis provides information on the relative importance of model input parameters and assumptions. It is distinct from uncertainty analysis, which addresses the question ‘How uncertain is the prediction?’ Uncertainty analysis needs to map what a model does when selected input assumptions and parameters are left free to vary over their range of existence, and this is equally true of a sensitivity analysis. Despite this, many uncertainty and sensitivity analyses still explore the input space moving along one-dimensional corridors leaving space of the input factors mostly unexplored. Our extensive systematic literature review shows that many highly cited papers (42% in the present analysis) fail the elementary requirement to properly explore the space of the input factors. The results, while discipline-dependent, point to a worrying lack of standards and recognized good practices. We end by exploring possible reasons for this problem, and suggest some guidelines for proper use of the methods
How do we use computational models of cognitive processes?
Previously I outlined a scheme for understanding the usefulness of computational models.1 This scheme was accompanied by two specific proposals. Firstly, that although models have diverse purposes, the purposes of individual modelling efforts should be made explicit. Secondly, that the best use of modelling is in establishing the correspondence between model elements and empirical objects in the form of certain 'explanatory' relationships: prediction, testing, existence proofs and proofs of sufficiency and insufficiency. The current work concerns itself with empirical tests of these two claims. I survey highly cited modelling papers and from an analysis of this corpus conclude that although a diverse range of purposes are represented, neither being accompanied by an explicit statement of purpose nor being a model of my 'explanatory' type are necessary for a modelling paper to become highly cited. Neither are these factors associated with higher rates of citation. The results are situated within a philosophy of science and it is concluded that computational modelling in the cognitive sciences does not consist of a simple Popperian prediction-and-falsification dynamic. Although there may be common principles underlying model construction, they are not captured by this scheme and it is difficult to imagine how they could be captured by any simple formula
Predicting Scientific Success Based on Coauthorship Networks
We address the question to what extent the success of scientific articles is
due to social influence. Analyzing a data set of over 100000 publications from
the field of Computer Science, we study how centrality in the coauthorship
network differs between authors who have highly cited papers and those who do
not. We further show that a machine learning classifier, based only on
coauthorship network centrality measures at time of publication, is able to
predict with high precision whether an article will be highly cited five years
after publication. By this we provide quantitative insight into the social
dimension of scientific publishing - challenging the perception of citations as
an objective, socially unbiased measure of scientific success.Comment: 21 pages, 2 figures, incl. Supplementary Materia
Understanding the Impact of Early Citers on Long-Term Scientific Impact
This paper explores an interesting new dimension to the challenging problem
of predicting long-term scientific impact (LTSI) usually measured by the number
of citations accumulated by a paper in the long-term. It is well known that
early citations (within 1-2 years after publication) acquired by a paper
positively affects its LTSI. However, there is no work that investigates if the
set of authors who bring in these early citations to a paper also affect its
LTSI. In this paper, we demonstrate for the first time, the impact of these
authors whom we call early citers (EC) on the LTSI of a paper. Note that this
study of the complex dynamics of EC introduces a brand new paradigm in citation
behavior analysis. Using a massive computer science bibliographic dataset we
identify two distinct categories of EC - we call those authors who have high
overall publication/citation count in the dataset as influential and the rest
of the authors as non-influential. We investigate three characteristic
properties of EC and present an extensive analysis of how each category
correlates with LTSI in terms of these properties. In contrast to popular
perception, we find that influential EC negatively affects LTSI possibly owing
to attention stealing. To motivate this, we present several representative
examples from the dataset. A closer inspection of the collaboration network
reveals that this stealing effect is more profound if an EC is nearer to the
authors of the paper being investigated. As an intuitive use case, we show that
incorporating EC properties in the state-of-the-art supervised citation
prediction models leads to high performance margins. At the closing, we present
an online portal to visualize EC statistics along with the prediction results
for a given query paper
Will This Paper Increase Your h-index? Scientific Impact Prediction
Scientific impact plays a central role in the evaluation of the output of
scholars, departments, and institutions. A widely used measure of scientific
impact is citations, with a growing body of literature focused on predicting
the number of citations obtained by any given publication. The effectiveness of
such predictions, however, is fundamentally limited by the power-law
distribution of citations, whereby publications with few citations are
extremely common and publications with many citations are relatively rare.
Given this limitation, in this work we instead address a related question asked
by many academic researchers in the course of writing a paper, namely: "Will
this paper increase my h-index?" Using a real academic dataset with over 1.7
million authors, 2 million papers, and 8 million citation relationships from
the premier online academic service ArnetMiner, we formalize a novel scientific
impact prediction problem to examine several factors that can drive a paper to
increase the primary author's h-index. We find that the researcher's authority
on the publication topic and the venue in which the paper is published are
crucial factors to the increase of the primary author's h-index, while the
topic popularity and the co-authors' h-indices are of surprisingly little
relevance. By leveraging relevant factors, we find a greater than 87.5%
potential predictability for whether a paper will contribute to an author's
h-index within five years. As a further experiment, we generate a
self-prediction for this paper, estimating that there is a 76% probability that
it will contribute to the h-index of the co-author with the highest current
h-index in five years. We conclude that our findings on the quantification of
scientific impact can help researchers to expand their influence and more
effectively leverage their position of "standing on the shoulders of giants."Comment: Proc. of the 8th ACM International Conference on Web Search and Data
Mining (WSDM'15
Joint Modeling of Topics, Citations, and Topical Authority in Academic Corpora
Much of scientific progress stems from previously published findings, but
searching through the vast sea of scientific publications is difficult. We
often rely on metrics of scholarly authority to find the prominent authors but
these authority indices do not differentiate authority based on research
topics. We present Latent Topical-Authority Indexing (LTAI) for jointly
modeling the topics, citations, and topical authority in a corpus of academic
papers. Compared to previous models, LTAI differs in two main aspects. First,
it explicitly models the generative process of the citations, rather than
treating the citations as given. Second, it models each author's influence on
citations of a paper based on the topics of the cited papers, as well as the
citing papers. We fit LTAI to four academic corpora: CORA, Arxiv Physics, PNAS,
and Citeseer. We compare the performance of LTAI against various baselines,
starting with the latent Dirichlet allocation, to the more advanced models
including author-link topic model and dynamic author citation topic model. The
results show that LTAI achieves improved accuracy over other similar models
when predicting words, citations and authors of publications.Comment: Accepted by Transactions of the Association for Computational
Linguistics (TACL); to appea
- …