39 research outputs found
Citation Statistics from 110 Years of Physical Review
Publicly available data reveal long-term systematic features about citation
statistics and how papers are referenced. The data also tell fascinating
citation histories of individual articles.Comment: This is esssentially identical to the article that appeared in the
June 2005 issue of Physics Toda
Tagging Scientific Publications using Wikipedia and Natural Language Processing Tools. Comparison on the ArXiv Dataset
In this work, we compare two simple methods of tagging scientific
publications with labels reflecting their content. As a first source of labels
Wikipedia is employed, second label set is constructed from the noun phrases
occurring in the analyzed corpus. We examine the statistical properties and the
effectiveness of both approaches on the dataset consisting of abstracts from
0.7 million of scientific documents deposited in the ArXiv preprint collection.
We believe that obtained tags can be later on applied as useful document
features in various machine learning tasks (document similarity, clustering,
topic modelling, etc.)
Heavy-Tailed Distribution of Cyber-Risks
With the development of the Internet, new kinds of massive epidemics,
distributed attacks, virtual conflicts and criminality have emerged. We present
a study of some striking statistical properties of cyber-risks that quantify
the distribution and time evolution of information risks on the Internet, to
understand their mechanisms, and create opportunities to mitigate, control,
predict and insure them at a global scale. First, we report an exceptionnaly
stable power-law tail distribution of personal identity losses per event, , with . This result is
robust against a surprising strong non-stationary growth of ID losses
culminating in July 2006 followed by a more stationary phase. Moreover, this
distribution is identical for different types and sizes of targeted
organizations. Since , the cumulative number of all losses over all events
up to time increases faster-than-linear with time according to
, suggesting that privacy, characterized by personal
identities, is necessarily becoming more and more insecure. We also show the
existence of a size effect, such that the largest possible ID losses per event
grow faster-than-linearly as with the organization size . The
small value of the power law distribution of ID losses is
explained by the interplay between Zipf's law and the size effect. We also
infer that compromised entities exhibit basically the same probability to incur
a small or large loss.Comment: 9 pages, 3 figure
Dragon-kings: mechanisms, statistical methods and empirical evidence
This introductory article presents the special Discussion and Debate volume
"From black swans to dragon-kings, is there life beyond power laws?" published
in Eur. Phys. J. Special Topics in May 2012. We summarize and put in
perspective the contributions into three main themes: (i) mechanisms for
dragon-kings, (ii) detection of dragon-kings and statistical tests and (iii)
empirical evidence in a large variety of natural and social systems. Overall,
we are pleased to witness significant advances both in the introduction and
clarification of underlying mechanisms and in the development of novel
efficient tests that demonstrate clear evidence for the presence of
dragon-kings in many systems. However, this positive view should be balanced by
the fact that this remains a very delicate and difficult field, if only due to
the scarcity of data as well as the extraordinary important implications with
respect to hazard assessment, risk control and predictability.Comment: 20 page
Characterizing and modeling citation dynamics
Citation distributions are crucial for the analysis and modeling of the
activity of scientists. We investigated bibliometric data of papers published
in journals of the American Physical Society, searching for the type of
function which best describes the observed citation distributions. We used the
goodness of fit with Kolmogorov-Smirnov statistics for three classes of
functions: log-normal, simple power law and shifted power law. The shifted
power law turns out to be the most reliable hypothesis for all citation
networks we derived, which correspond to different time spans. We find that
citation dynamics is characterized by bursts, usually occurring within a few
years since publication of a paper, and the burst size spans several orders of
magnitude. We also investigated the microscopic mechanisms for the evolution of
citation networks, by proposing a linear preferential attachment with time
dependent initial attractiveness. The model successfully reproduces the
empirical citation distributions and accounts for the presence of citation
bursts as well.Comment: 8 pages, 5 figure
Urban road networks -- Spatial networks with universal geometric features? A case study on Germany's largest cities
Urban road networks have distinct geometric properties that are partially
determined by their (quasi-) two-dimensional structure. In this work, we study
these properties for 20 of the largest German cities. We find that the
small-scale geometry of all examined road networks is extremely similar. The
object-size distributions of road segments and the resulting cellular
structures are characterised by heavy tails. As a specific feature, a large
degree of rectangularity is observed in all networks, with link angle
distributions approximately described by stretched exponential functions. We
present a rigorous statistical analysis of the main geometric characteristics
and discuss their mutual interrelationships. Our results demonstrate the
fundamental importance of cost-efficiency constraints for in time evolution of
urban road networks.Comment: 16 pages; 8 figure
Characterizing Web Syndication Behavior and Content
Abstract. We are witnessing a widespread of web syndication technologies such as RSS or Atom for a timely delivery of frequently updated Web content. Almost every personal weblog, news portal, or discussion forum employs nowadays RSS/Atom feeds for enhancing pull-oriented searching and browsing of web pages with push-oriented protocols of web content. Social media applications such as Twitter or Facebook also employ RSS for notifying users about the newly available posts of their preferred friends. Unfortunately, previous works on RSS/Atom statistical characteristics do not provide a precise and updated characterization of feeds â behavior and content, characterization which can be used to successfully benchmark effectiveness and efficiency of various RSS processing/analysis techniques. In this paper, we present the first thorough analysis of three complementary features of real-scale RSS feeds, namely, publication activity, items structure and length, as well as, vocabulary of its content which we believe are crucial for Web 2.0 applications. Keywords: RSS/Atom Feeds, Publication activity, Items structure and length, textual vocabulary composition and evolution