26 research outputs found
Runaway Events Dominate the Heavy Tail of Citation Distributions
Statistical distributions with heavy tails are ubiquitous in natural and
social phenomena. Since the entries in heavy tail have disproportional
significance, the knowledge of its exact shape is very important. Citations of
scientific papers form one of the best-known heavy tail distributions. Even in
this case there is a considerable debate whether citation distribution follows
the log-normal or power-law fit. The goal of our study is to solve this debate
by measuring citation distribution for a very large and homogeneous data. We
measured citation distribution for 418,438 Physics papers published in
1980-1989 and cited by 2008. While the log-normal fit deviates too strong from
the data, the discrete power-law function with the exponent does
better and fits 99.955% of the data. However, the extreme tail of the
distribution deviates upward even from the power-law fit and exhibits a
dramatic "runaway" behavior. The onset of the runaway regime is revealed
macroscopically as the paper garners 1000-1500 citations, however the
microscopic measurements of autocorrelation in citation rates are able to
predict this behavior in advance.Comment: 6 pages, 5 Figure
Unsupervised Classification of SAR Images using Hierarchical Agglomeration and EM
We implement an unsupervised classification algorithm for high resolution Synthetic Aperture Radar (SAR) images. The foundation of algorithm is based on Classification Expectation-Maximization (CEM). To get rid of two drawbacks of EM type algorithms, namely the initialization and the model order selection, we combine the CEM algorithm with the hierarchical agglomeration strategy and a model order selection criterion called Integrated Completed Likelihood (ICL). We exploit amplitude statistics in a Finite Mixture Model (FMM), and a Multinomial Logistic (MnL) latent class label model for a mixture density to obtain spatially smooth class segments. We test our algorithm on TerraSAR-X data
A reverse engineering approach to the suppression of citation biases reveals universal properties of citation distributions
The large amount of information contained in bibliographic databases has
recently boosted the use of citations, and other indicators based on citation
numbers, as tools for the quantitative assessment of scientific research.
Citations counts are often interpreted as proxies for the scientific influence
of papers, journals, scholars, and institutions. However, a rigorous and
scientifically grounded methodology for a correct use of citation counts is
still missing. In particular, cross-disciplinary comparisons in terms of raw
citation counts systematically favors scientific disciplines with higher
citation and publication rates. Here we perform an exhaustive study of the
citation patterns of millions of papers, and derive a simple transformation of
citation counts able to suppress the disproportionate citation counts among
scientific domains. We find that the transformation is well described by a
power-law function, and that the parameter values of the transformation are
typical features of each scientific discipline. Universal properties of
citation patterns descend therefore from the fact that citation distributions
for papers in a specific field are all part of the same family of univariate
distributions.Comment: 9 pages, 6 figures. Supporting information files available at
http://filrad.homelinux.or
Non-Markovian polymer reaction kinetics
Describing the kinetics of polymer reactions, such as the formation of loops
and hairpins in nucleic acids or polypeptides, is complicated by the structural
dynamics of their chains. Although both intramolecular reactions, such as
cyclization, and intermolecular reactions have been studied extensively, both
experimentally and theoretically, there is to date no exact explicit analytical
treatment of transport-limited polymer reaction kinetics, even in the case of
the simplest (Rouse) model of monomers connected by linear springs. We
introduce a new analytical approach to calculate the mean reaction time of
polymer reactions that encompasses the non-Markovian dynamics of monomer
motion. This requires that the conformational statistics of the polymer at the
very instant of reaction be determined, which provides, as a by-product, new
information on the reaction path. We show that the typical reactive
conformation of the polymer is more extended than the equilibrium conformation,
which leads to reaction times significantly shorter than predicted by the
existing classical Markovian theory.Comment: Main text (7 pages, 5 figures) + Supplemantary Information (13 pages,
2 figures
Supporting systematic reviews using LDA-based document representations
BACKGROUND: Identifying relevant studies for inclusion in a systematic review (i.e. screening) is a complex, laborious and expensive task. Recently, a number of studies has shown that the use of machine learning and text mining methods to automatically identify relevant studies has the potential to drastically decrease the workload involved in the screening phase. The vast majority of these machine learning methods exploit the same underlying principle, i.e. a study is modelled as a bag-of-words (BOW). METHODS: We explore the use of topic modelling methods to derive a more informative representation of studies. We apply Latent Dirichlet allocation (LDA), an unsupervised topic modelling approach, to automatically identify topics in a collection of studies. We then represent each study as a distribution of LDA topics. Additionally, we enrich topics derived using LDA with multi-word terms identified by using an automatic term recognition (ATR) tool. For evaluation purposes, we carry out automatic identification of relevant studies using support vector machine (SVM)-based classifiers that employ both our novel topic-based representation and the BOW representation. RESULTS: Our results show that the SVM classifier is able to identify a greater number of relevant studies when using the LDA representation than the BOW representation. These observations hold for two systematic reviews of the clinical domain and three reviews of the social science domain. CONCLUSIONS: A topic-based feature representation of documents outperforms the BOW representation when applied to the task of automatic citation screening. The proposed term-enriched topics are more informative and less ambiguous to systematic reviewers. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s13643-015-0117-0) contains supplementary material, which is available to authorized users
Characterizing and modeling citation dynamics
Citation distributions are crucial for the analysis and modeling of the
activity of scientists. We investigated bibliometric data of papers published
in journals of the American Physical Society, searching for the type of
function which best describes the observed citation distributions. We used the
goodness of fit with Kolmogorov-Smirnov statistics for three classes of
functions: log-normal, simple power law and shifted power law. The shifted
power law turns out to be the most reliable hypothesis for all citation
networks we derived, which correspond to different time spans. We find that
citation dynamics is characterized by bursts, usually occurring within a few
years since publication of a paper, and the burst size spans several orders of
magnitude. We also investigated the microscopic mechanisms for the evolution of
citation networks, by proposing a linear preferential attachment with time
dependent initial attractiveness. The model successfully reproduces the
empirical citation distributions and accounts for the presence of citation
bursts as well.Comment: 8 pages, 5 figure
Co-Authorship and Bibliographic Coupling Network Effects on Citations
Climate change adaptation (CCA) has recently emerged as a new fundamental dimension to be considered in the planning and management of water resources. Because of the need to consider the already perceived changes in climate trends, variability and extremes, and their interactions with evolving social and ecological systems, water management is now facing new challenges. The research community is expected to contribute with innovative methods and tools to support to decision- and policy-makers. Decision Support Systems (DSSs), have a relatively long history in the water management sector. They are usually developed upon pre-existing hydrologic simulation models, providing interfaces for facilitated use beyond the limited group of model developers, and specific routines for decision making (e.g. optimization methods). In recent years, the traditional focus of DSS research has shifted away from the software component, towards the process of structuring problems and aiding decisions, thus including in particular robust methods for stakeholders' participation. The paper analyses the scientific literature, identifies the main open issues, and proposes an innovative operational approach for the implementation of participatory planning and decision-making processes for CCA in the water domain