22,054 research outputs found

    A new algorithm for zero-modified models applied to citation counts

    Get PDF
    Finding statistical models for citation count data is important for those seeking to understand the citing process or when using regression to identify factors that associate with citation rates. As sets of citation counts often include more or less zeros (uncited articles) than would be expected under the base distribution, it is essential to deal appropriately with them. This article proposes a new algorithm to fit zero-modified versions of discretised log-normal, hooked power-law and Weibull models to citation count data from 23 different Scopus categories from 2012. The new algorithm allows the standard errors of all parameter estimates to be calculated, and hence also confidence intervals and p-values. This algorithm can also estimate negative zero-modification parameters corresponding to zero-deflation (fewer uncited articles than expected). The results find no universal best model for the 23 categories and a given dataset may be zero-inflated relative to one model, but zero-deflated relative to anothe

    A new algorithm for zero-modified models applied to citation counts

    Get PDF
    This is an accepted manuscript of an article published by Springer Nature in Scientometrics on 17/08/2020, available online: https://doi.org/10.1007/s11192-020-03654-8. The accepted version of the publication may differ from the final published version.Finding statistical models for citation count data is important for those seeking to understand the citing process or when using regression to identify factors that associate with citation rates. As sets of citation counts often include more or less zeros (uncited articles) than would be expected under the base distribution, it is essential to deal appropriately with them. This article proposes a new algorithm to fit zero-modified versions of discretised lognormal, hooked power-law and Weibull models to citation count data from 23 different Scopus categories from 2012. The new algorithm allows the standard errors of all parameter estimates to be calculated, and hence also confidence intervals and p-values. This algorithm can also estimate negative zero-modification parameters corresponding to zero-deflation (fewer uncited articles than expected). The results find no universal best model for the 23 categories. A given dataset may be zero-inflated relative to one model, but zero-deflated relative to another. We suggest circumstances in which one of the models under consideration may be the best fitting model

    Measuring academic influence: Not all citations are equal

    Get PDF
    The importance of a research article is routinely measured by counting how many times it has been cited. However, treating all citations with equal weight ignores the wide variety of functions that citations perform. We want to automatically identify the subset of references in a bibliography that have a central academic influence on the citing paper. For this purpose, we examine the effectiveness of a variety of features for determining the academic influence of a citation. By asking authors to identify the key references in their own work, we created a data set in which citations were labeled according to their academic influence. Using automatic feature selection with supervised machine learning, we found a model for predicting academic influence that achieves good performance on this data set using only four features. The best features, among those we evaluated, were those based on the number of times a reference is mentioned in the body of a citing paper. The performance of these features inspired us to design an influence-primed h-index (the hip-index). Unlike the conventional h-index, it weights citations by how many times a reference is mentioned. According to our experiments, the hip-index is a better indicator of researcher performance than the conventional h-index

    Modelling Citation Networks

    Full text link
    The distribution of the number of academic publications as a function of citation count for a given year is remarkably similar from year to year. We measure this similarity as a width of the distribution and find it to be approximately constant from year to year. We show that simple citation models fail to capture this behaviour. We then provide a simple three parameter citation network model using a mixture of local and global search processes which can reproduce the correct distribution over time. We use the citation network of papers from the hep-th section of arXiv to test our model. For this data, around 20% of citations use global information to reference recently published papers, while the remaining 80% are found using local searches. We note that this is consistent with other studies though our motivation is very different from previous work. Finally, we also find that the fluctuations in the size of an academic publication's bibliography is important for the model. This is not addressed in most models and needs further work.Comment: 29 pages, 22 figure

    Scalable Recommendation with Poisson Factorization

    Full text link
    We develop a Bayesian Poisson matrix factorization model for forming recommendations from sparse user behavior data. These data are large user/item matrices where each user has provided feedback on only a small subset of items, either explicitly (e.g., through star ratings) or implicitly (e.g., through views or purchases). In contrast to traditional matrix factorization approaches, Poisson factorization implicitly models each user's limited attention to consume items. Moreover, because of the mathematical form of the Poisson likelihood, the model needs only to explicitly consider the observed entries in the matrix, leading to both scalable computation and good predictive performance. We develop a variational inference algorithm for approximate posterior inference that scales up to massive data sets. This is an efficient algorithm that iterates over the observed entries and adjusts an approximate posterior over the user/item representations. We apply our method to large real-world user data containing users rating movies, users listening to songs, and users reading scientific papers. In all these settings, Bayesian Poisson factorization outperforms state-of-the-art matrix factorization methods

    A Biased Review of Sociophysics

    Full text link
    Various aspects of recent sociophysics research are shortly reviewed: Schelling model as an example for lack of interdisciplinary cooperation, opinion dynamics, combat, and citation statistics as an example for strong interdisciplinarity.Comment: 16 pages for J. Stat. Phys. including 2 figures and numerous reference
    • …
    corecore