39 research outputs found

    Citation Statistics from 110 Years of Physical Review

    Full text link
    Publicly available data reveal long-term systematic features about citation statistics and how papers are referenced. The data also tell fascinating citation histories of individual articles.Comment: This is esssentially identical to the article that appeared in the June 2005 issue of Physics Toda

    Tagging Scientific Publications using Wikipedia and Natural Language Processing Tools. Comparison on the ArXiv Dataset

    Full text link
    In this work, we compare two simple methods of tagging scientific publications with labels reflecting their content. As a first source of labels Wikipedia is employed, second label set is constructed from the noun phrases occurring in the analyzed corpus. We examine the statistical properties and the effectiveness of both approaches on the dataset consisting of abstracts from 0.7 million of scientific documents deposited in the ArXiv preprint collection. We believe that obtained tags can be later on applied as useful document features in various machine learning tasks (document similarity, clustering, topic modelling, etc.)

    Heavy-Tailed Distribution of Cyber-Risks

    Full text link
    With the development of the Internet, new kinds of massive epidemics, distributed attacks, virtual conflicts and criminality have emerged. We present a study of some striking statistical properties of cyber-risks that quantify the distribution and time evolution of information risks on the Internet, to understand their mechanisms, and create opportunities to mitigate, control, predict and insure them at a global scale. First, we report an exceptionnaly stable power-law tail distribution of personal identity losses per event, Pr(IDloss≥V)∼1/Vb{\rm Pr}({\rm ID loss} \geq V) \sim 1/V^b, with b=0.7±0.1b =0.7 \pm 0.1. This result is robust against a surprising strong non-stationary growth of ID losses culminating in July 2006 followed by a more stationary phase. Moreover, this distribution is identical for different types and sizes of targeted organizations. Since b<1b<1, the cumulative number of all losses over all events up to time tt increases faster-than-linear with time according to ≃t1/b\mathbf{\simeq t^{1/b}}, suggesting that privacy, characterized by personal identities, is necessarily becoming more and more insecure. We also show the existence of a size effect, such that the largest possible ID losses per event grow faster-than-linearly as ∼S1.3\sim S^{1.3} with the organization size SS. The small value b≃0.7b \simeq 0.7 of the power law distribution of ID losses is explained by the interplay between Zipf's law and the size effect. We also infer that compromised entities exhibit basically the same probability to incur a small or large loss.Comment: 9 pages, 3 figure

    Dragon-kings: mechanisms, statistical methods and empirical evidence

    Full text link
    This introductory article presents the special Discussion and Debate volume "From black swans to dragon-kings, is there life beyond power laws?" published in Eur. Phys. J. Special Topics in May 2012. We summarize and put in perspective the contributions into three main themes: (i) mechanisms for dragon-kings, (ii) detection of dragon-kings and statistical tests and (iii) empirical evidence in a large variety of natural and social systems. Overall, we are pleased to witness significant advances both in the introduction and clarification of underlying mechanisms and in the development of novel efficient tests that demonstrate clear evidence for the presence of dragon-kings in many systems. However, this positive view should be balanced by the fact that this remains a very delicate and difficult field, if only due to the scarcity of data as well as the extraordinary important implications with respect to hazard assessment, risk control and predictability.Comment: 20 page

    Characterizing and modeling citation dynamics

    Get PDF
    Citation distributions are crucial for the analysis and modeling of the activity of scientists. We investigated bibliometric data of papers published in journals of the American Physical Society, searching for the type of function which best describes the observed citation distributions. We used the goodness of fit with Kolmogorov-Smirnov statistics for three classes of functions: log-normal, simple power law and shifted power law. The shifted power law turns out to be the most reliable hypothesis for all citation networks we derived, which correspond to different time spans. We find that citation dynamics is characterized by bursts, usually occurring within a few years since publication of a paper, and the burst size spans several orders of magnitude. We also investigated the microscopic mechanisms for the evolution of citation networks, by proposing a linear preferential attachment with time dependent initial attractiveness. The model successfully reproduces the empirical citation distributions and accounts for the presence of citation bursts as well.Comment: 8 pages, 5 figure

    Urban road networks -- Spatial networks with universal geometric features? A case study on Germany's largest cities

    Full text link
    Urban road networks have distinct geometric properties that are partially determined by their (quasi-) two-dimensional structure. In this work, we study these properties for 20 of the largest German cities. We find that the small-scale geometry of all examined road networks is extremely similar. The object-size distributions of road segments and the resulting cellular structures are characterised by heavy tails. As a specific feature, a large degree of rectangularity is observed in all networks, with link angle distributions approximately described by stretched exponential functions. We present a rigorous statistical analysis of the main geometric characteristics and discuss their mutual interrelationships. Our results demonstrate the fundamental importance of cost-efficiency constraints for in time evolution of urban road networks.Comment: 16 pages; 8 figure

    Characterizing Web Syndication Behavior and Content

    No full text
    Abstract. We are witnessing a widespread of web syndication technologies such as RSS or Atom for a timely delivery of frequently updated Web content. Almost every personal weblog, news portal, or discussion forum employs nowadays RSS/Atom feeds for enhancing pull-oriented searching and browsing of web pages with push-oriented protocols of web content. Social media applications such as Twitter or Facebook also employ RSS for notifying users about the newly available posts of their preferred friends. Unfortunately, previous works on RSS/Atom statistical characteristics do not provide a precise and updated characterization of feeds ’ behavior and content, characterization which can be used to successfully benchmark effectiveness and efficiency of various RSS processing/analysis techniques. In this paper, we present the first thorough analysis of three complementary features of real-scale RSS feeds, namely, publication activity, items structure and length, as well as, vocabulary of its content which we believe are crucial for Web 2.0 applications. Keywords: RSS/Atom Feeds, Publication activity, Items structure and length, textual vocabulary composition and evolution
    corecore