22,054 research outputs found
A new algorithm for zero-modified models applied to citation counts
Finding statistical models for citation count data is important for those seeking to understand
the citing process or when using regression to identify factors that associate with citation rates.
As sets of citation counts often include more or less zeros (uncited articles) than would be
expected under the base distribution, it is essential to deal appropriately with them. This article
proposes a new algorithm to fit zero-modified versions of discretised log-normal, hooked
power-law and Weibull models to citation count data from 23 different Scopus categories from
2012. The new algorithm allows the standard errors of all parameter estimates to be calculated,
and hence also confidence intervals and p-values. This algorithm can also estimate negative
zero-modification parameters corresponding to zero-deflation (fewer uncited articles than
expected). The results find no universal best model for the 23 categories and a given dataset
may be zero-inflated relative to one model, but zero-deflated relative to anothe
A new algorithm for zero-modified models applied to citation counts
This is an accepted manuscript of an article published by Springer Nature in Scientometrics on 17/08/2020, available online: https://doi.org/10.1007/s11192-020-03654-8.
The accepted version of the publication may differ from the final published version.Finding statistical models for citation count data is important for those seeking to understand the citing process or
when using regression to identify factors that associate with citation rates. As sets of citation counts often include
more or less zeros (uncited articles) than would be expected under the base distribution, it is essential to deal
appropriately with them. This article proposes a new algorithm to fit zero-modified versions of discretised lognormal, hooked power-law and Weibull models to citation count data from 23 different Scopus categories from
2012. The new algorithm allows the standard errors of all parameter estimates to be calculated, and hence also
confidence intervals and p-values. This algorithm can also estimate negative zero-modification parameters
corresponding to zero-deflation (fewer uncited articles than expected). The results find no universal best model
for the 23 categories. A given dataset may be zero-inflated relative to one model, but zero-deflated relative to
another. We suggest circumstances in which one of the models under consideration may be the best fitting model
Measuring academic influence: Not all citations are equal
The importance of a research article is routinely measured by counting how
many times it has been cited. However, treating all citations with equal weight
ignores the wide variety of functions that citations perform. We want to
automatically identify the subset of references in a bibliography that have a
central academic influence on the citing paper. For this purpose, we examine
the effectiveness of a variety of features for determining the academic
influence of a citation. By asking authors to identify the key references in
their own work, we created a data set in which citations were labeled according
to their academic influence. Using automatic feature selection with supervised
machine learning, we found a model for predicting academic influence that
achieves good performance on this data set using only four features. The best
features, among those we evaluated, were those based on the number of times a
reference is mentioned in the body of a citing paper. The performance of these
features inspired us to design an influence-primed h-index (the hip-index).
Unlike the conventional h-index, it weights citations by how many times a
reference is mentioned. According to our experiments, the hip-index is a better
indicator of researcher performance than the conventional h-index
Modelling Citation Networks
The distribution of the number of academic publications as a function of
citation count for a given year is remarkably similar from year to year. We
measure this similarity as a width of the distribution and find it to be
approximately constant from year to year. We show that simple citation models
fail to capture this behaviour. We then provide a simple three parameter
citation network model using a mixture of local and global search processes
which can reproduce the correct distribution over time. We use the citation
network of papers from the hep-th section of arXiv to test our model. For this
data, around 20% of citations use global information to reference recently
published papers, while the remaining 80% are found using local searches. We
note that this is consistent with other studies though our motivation is very
different from previous work. Finally, we also find that the fluctuations in
the size of an academic publication's bibliography is important for the model.
This is not addressed in most models and needs further work.Comment: 29 pages, 22 figure
Scalable Recommendation with Poisson Factorization
We develop a Bayesian Poisson matrix factorization model for forming
recommendations from sparse user behavior data. These data are large user/item
matrices where each user has provided feedback on only a small subset of items,
either explicitly (e.g., through star ratings) or implicitly (e.g., through
views or purchases). In contrast to traditional matrix factorization
approaches, Poisson factorization implicitly models each user's limited
attention to consume items. Moreover, because of the mathematical form of the
Poisson likelihood, the model needs only to explicitly consider the observed
entries in the matrix, leading to both scalable computation and good predictive
performance. We develop a variational inference algorithm for approximate
posterior inference that scales up to massive data sets. This is an efficient
algorithm that iterates over the observed entries and adjusts an approximate
posterior over the user/item representations. We apply our method to large
real-world user data containing users rating movies, users listening to songs,
and users reading scientific papers. In all these settings, Bayesian Poisson
factorization outperforms state-of-the-art matrix factorization methods
A Biased Review of Sociophysics
Various aspects of recent sociophysics research are shortly reviewed:
Schelling model as an example for lack of interdisciplinary cooperation,
opinion dynamics, combat, and citation statistics as an example for strong
interdisciplinarity.Comment: 16 pages for J. Stat. Phys. including 2 figures and numerous
reference
- …