26 research outputs found

    Runaway Events Dominate the Heavy Tail of Citation Distributions

    Full text link
    Statistical distributions with heavy tails are ubiquitous in natural and social phenomena. Since the entries in heavy tail have disproportional significance, the knowledge of its exact shape is very important. Citations of scientific papers form one of the best-known heavy tail distributions. Even in this case there is a considerable debate whether citation distribution follows the log-normal or power-law fit. The goal of our study is to solve this debate by measuring citation distribution for a very large and homogeneous data. We measured citation distribution for 418,438 Physics papers published in 1980-1989 and cited by 2008. While the log-normal fit deviates too strong from the data, the discrete power-law function with the exponent γ=3.15\gamma=3.15 does better and fits 99.955% of the data. However, the extreme tail of the distribution deviates upward even from the power-law fit and exhibits a dramatic "runaway" behavior. The onset of the runaway regime is revealed macroscopically as the paper garners 1000-1500 citations, however the microscopic measurements of autocorrelation in citation rates are able to predict this behavior in advance.Comment: 6 pages, 5 Figure

    Unsupervised Classification of SAR Images using Hierarchical Agglomeration and EM

    Get PDF
    We implement an unsupervised classification algorithm for high resolution Synthetic Aperture Radar (SAR) images. The foundation of algorithm is based on Classification Expectation-Maximization (CEM). To get rid of two drawbacks of EM type algorithms, namely the initialization and the model order selection, we combine the CEM algorithm with the hierarchical agglomeration strategy and a model order selection criterion called Integrated Completed Likelihood (ICL). We exploit amplitude statistics in a Finite Mixture Model (FMM), and a Multinomial Logistic (MnL) latent class label model for a mixture density to obtain spatially smooth class segments. We test our algorithm on TerraSAR-X data

    A reverse engineering approach to the suppression of citation biases reveals universal properties of citation distributions

    Get PDF
    The large amount of information contained in bibliographic databases has recently boosted the use of citations, and other indicators based on citation numbers, as tools for the quantitative assessment of scientific research. Citations counts are often interpreted as proxies for the scientific influence of papers, journals, scholars, and institutions. However, a rigorous and scientifically grounded methodology for a correct use of citation counts is still missing. In particular, cross-disciplinary comparisons in terms of raw citation counts systematically favors scientific disciplines with higher citation and publication rates. Here we perform an exhaustive study of the citation patterns of millions of papers, and derive a simple transformation of citation counts able to suppress the disproportionate citation counts among scientific domains. We find that the transformation is well described by a power-law function, and that the parameter values of the transformation are typical features of each scientific discipline. Universal properties of citation patterns descend therefore from the fact that citation distributions for papers in a specific field are all part of the same family of univariate distributions.Comment: 9 pages, 6 figures. Supporting information files available at http://filrad.homelinux.or

    Non-Markovian polymer reaction kinetics

    Full text link
    Describing the kinetics of polymer reactions, such as the formation of loops and hairpins in nucleic acids or polypeptides, is complicated by the structural dynamics of their chains. Although both intramolecular reactions, such as cyclization, and intermolecular reactions have been studied extensively, both experimentally and theoretically, there is to date no exact explicit analytical treatment of transport-limited polymer reaction kinetics, even in the case of the simplest (Rouse) model of monomers connected by linear springs. We introduce a new analytical approach to calculate the mean reaction time of polymer reactions that encompasses the non-Markovian dynamics of monomer motion. This requires that the conformational statistics of the polymer at the very instant of reaction be determined, which provides, as a by-product, new information on the reaction path. We show that the typical reactive conformation of the polymer is more extended than the equilibrium conformation, which leads to reaction times significantly shorter than predicted by the existing classical Markovian theory.Comment: Main text (7 pages, 5 figures) + Supplemantary Information (13 pages, 2 figures

    Supporting systematic reviews using LDA-based document representations

    Get PDF
    BACKGROUND: Identifying relevant studies for inclusion in a systematic review (i.e. screening) is a complex, laborious and expensive task. Recently, a number of studies has shown that the use of machine learning and text mining methods to automatically identify relevant studies has the potential to drastically decrease the workload involved in the screening phase. The vast majority of these machine learning methods exploit the same underlying principle, i.e. a study is modelled as a bag-of-words (BOW). METHODS: We explore the use of topic modelling methods to derive a more informative representation of studies. We apply Latent Dirichlet allocation (LDA), an unsupervised topic modelling approach, to automatically identify topics in a collection of studies. We then represent each study as a distribution of LDA topics. Additionally, we enrich topics derived using LDA with multi-word terms identified by using an automatic term recognition (ATR) tool. For evaluation purposes, we carry out automatic identification of relevant studies using support vector machine (SVM)-based classifiers that employ both our novel topic-based representation and the BOW representation. RESULTS: Our results show that the SVM classifier is able to identify a greater number of relevant studies when using the LDA representation than the BOW representation. These observations hold for two systematic reviews of the clinical domain and three reviews of the social science domain. CONCLUSIONS: A topic-based feature representation of documents outperforms the BOW representation when applied to the task of automatic citation screening. The proposed term-enriched topics are more informative and less ambiguous to systematic reviewers. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s13643-015-0117-0) contains supplementary material, which is available to authorized users

    Characterizing and modeling citation dynamics

    Get PDF
    Citation distributions are crucial for the analysis and modeling of the activity of scientists. We investigated bibliometric data of papers published in journals of the American Physical Society, searching for the type of function which best describes the observed citation distributions. We used the goodness of fit with Kolmogorov-Smirnov statistics for three classes of functions: log-normal, simple power law and shifted power law. The shifted power law turns out to be the most reliable hypothesis for all citation networks we derived, which correspond to different time spans. We find that citation dynamics is characterized by bursts, usually occurring within a few years since publication of a paper, and the burst size spans several orders of magnitude. We also investigated the microscopic mechanisms for the evolution of citation networks, by proposing a linear preferential attachment with time dependent initial attractiveness. The model successfully reproduces the empirical citation distributions and accounts for the presence of citation bursts as well.Comment: 8 pages, 5 figure

    Co-Authorship and Bibliographic Coupling Network Effects on Citations

    Get PDF
    Climate change adaptation (CCA) has recently emerged as a new fundamental dimension to be considered in the planning and management of water resources. Because of the need to consider the already perceived changes in climate trends, variability and extremes, and their interactions with evolving social and ecological systems, water management is now facing new challenges. The research community is expected to contribute with innovative methods and tools to support to decision- and policy-makers. Decision Support Systems (DSSs), have a relatively long history in the water management sector. They are usually developed upon pre-existing hydrologic simulation models, providing interfaces for facilitated use beyond the limited group of model developers, and specific routines for decision making (e.g. optimization methods). In recent years, the traditional focus of DSS research has shifted away from the software component, towards the process of structuring problems and aiding decisions, thus including in particular robust methods for stakeholders' participation. The paper analyses the scientific literature, identifies the main open issues, and proposes an innovative operational approach for the implementation of participatory planning and decision-making processes for CCA in the water domain
    corecore