281 research outputs found

    Searching for superspreaders of information in real-world social media

    Full text link
    A number of predictors have been suggested to detect the most influential spreaders of information in online social media across various domains such as Twitter or Facebook. In particular, degree, PageRank, k-core and other centralities have been adopted to rank the spreading capability of users in information dissemination media. So far, validation of the proposed predictors has been done by simulating the spreading dynamics rather than following real information flow in social networks. Consequently, only model-dependent contradictory results have been achieved so far for the best predictor. Here, we address this issue directly. We search for influential spreaders by following the real spreading dynamics in a wide range of networks. We find that the widely-used degree and PageRank fail in ranking users' influence. We find that the best spreaders are consistently located in the k-core across dissimilar social platforms such as Twitter, Facebook, Livejournal and scientific publishing in the American Physical Society. Furthermore, when the complete global network structure is unavailable, we find that the sum of the nearest neighbors' degree is a reliable local proxy for user's influence. Our analysis provides practical instructions for optimal design of strategies for "viral" information dissemination in relevant applications.Comment: 12 pages, 7 figure

    Ultimate periodicity of b-recognisable sets : a quasilinear procedure

    Full text link
    It is decidable if a set of numbers, whose representation in a base b is a regular language, is ultimately periodic. This was established by Honkala in 1986. We give here a structural description of minimal automata that accept an ultimately periodic set of numbers. We then show that it can verified in linear time if a given minimal automaton meets this description. This thus yields a O(n log(n)) procedure for deciding whether a general deterministic automaton accepts an ultimately periodic set of numbers.Comment: presented at DLT 201

    Improving the presentation of search results by multipartite graph clustering of multiple reformulated queries and a novel document representation

    Full text link
    The goal of clustering web search results is to reveal the semantics of the retrieved documents. The main challenge is to make clustering partition relevant to a user’s query. In this paper, we describe a method of clustering search results using a similarity measure between documents retrieved by multiple reformulated queries. The method produces clusters of documents that are most relevant to the original query and, at the same time, represent a more diverse set of semantically related queries. In order to cluster thousands of documents in real time, we designed a novel multipartite graph clustering algorithm that has low polynomial complexity and no manually adjusted hyper–parameters. The loss of semantics resulting from the stem–based document representation is a common problem in information retrieval. To address this problem, we propose an alternative novel document representation, under which words are represented by their synonymy groups.This work was supported by Yandex grant 110104

    Two-dimensional ranking of Wikipedia articles

    Full text link
    The Library of Babel, described by Jorge Luis Borges, stores an enormous amount of information. The Library exists {\it ab aeterno}. Wikipedia, a free online encyclopaedia, becomes a modern analogue of such a Library. Information retrieval and ranking of Wikipedia articles become the challenge of modern society. While PageRank highlights very well known nodes with many ingoing links, CheiRank highlights very communicative nodes with many outgoing links. In this way the ranking becomes two-dimensional. Using CheiRank and PageRank we analyze the properties of two-dimensional ranking of all Wikipedia English articles and show that it gives their reliable classification with rich and nontrivial features. Detailed studies are done for countries, universities, personalities, physicists, chess players, Dow-Jones companies and other categories.Comment: RevTex 9 pages, data, discussion added, more data at http://www.quantware.ups-tlse.fr/QWLIB/2drankwikipedia

    Worldwide spreading of economic crisis

    Full text link
    We model the spreading of a crisis by constructing a global economic network and applying the Susceptible-Infected-Recovered (SIR) epidemic model with a variable probability of infection. The probability of infection depends on the strength of economic relations between the pair of countries, and the strength of the target country. It is expected that a crisis which originates in a large country, such as the USA, has the potential to spread globally, like the recent crisis. Surprisingly we show that also countries with much lower GDP, such as Belgium, are able to initiate a global crisis. Using the {\it k}-shell decomposition method to quantify the spreading power (of a node), we obtain a measure of ``centrality'' as a spreader of each country in the economic network. We thus rank the different countries according to the shell they belong to, and find the 12 most central countries. These countries are the most likely to spread a crisis globally. Of these 12 only six are large economies, while the other six are medium/small ones, a result that could not have been otherwise anticipated. Furthermore, we use our model to predict the crisis spreading potential of countries belonging to different shells according to the crisis magnitude.Comment: 13 pages, 4 figures and Supplementary Materia

    Origins of power-law degree distribution in the heterogeneity of human activity in social networks

    Get PDF
    The probability distribution of number of ties of an individual in a social network follows a scale-free power-law. However, how this distribution arises has not been conclusively demonstrated in direct analyses of people's actions in social networks. Here, we perform a causal inference analysis and find an underlying cause for this phenomenon. Our analysis indicates that heavy-tailed degree distribution is causally determined by similarly skewed distribution of human activity. Specifically, the degree of an individual is entirely random - following a "maximum entropy attachment" model - except for its mean value which depends deterministically on the volume of the users' activity. This relation cannot be explained by interactive models, like preferential attachment, since the observed actions are not likely to be caused by interactions with other people.Comment: 23 pages, 5 figure

    An output-sensitive algorithm for the minimization of 2-dimensional String Covers

    Full text link
    String covers are a powerful tool for analyzing the quasi-periodicity of 1-dimensional data and find applications in automata theory, computational biology, coding and the analysis of transactional data. A \emph{cover} of a string TT is a string CC for which every letter of TT lies within some occurrence of CC. String covers have been generalized in many ways, leading to \emph{k-covers}, \emph{λ\lambda-covers}, \emph{approximate covers} and were studied in different contexts such as \emph{indeterminate strings}. In this paper we generalize string covers to the context of 2-dimensional data, such as images. We show how they can be used for the extraction of textures from images and identification of primitive cells in lattice data. This has interesting applications in image compression, procedural terrain generation and crystallography

    Studies of the limit order book around large price changes

    Full text link
    We study the dynamics of the limit order book of liquid stocks after experiencing large intra-day price changes. In the data we find large variations in several microscopical measures, e.g., the volatility the bid-ask spread, the bid-ask imbalance, the number of queuing limit orders, the activity (number and volume) of limit orders placed and canceled, etc. The relaxation of the quantities is generally very slow that can be described by a power law of exponent 0.4\approx0.4. We introduce a numerical model in order to understand the empirical results better. We find that with a zero intelligence deposition model of the order flow the empirical results can be reproduced qualitatively. This suggests that the slow relaxations might not be results of agents' strategic behaviour. Studying the difference between the exponents found empirically and numerically helps us to better identify the role of strategic behaviour in the phenomena.Comment: 19 pages, 7 figure

    A complementary view on the growth of directory trees

    Full text link
    Trees are a special sub-class of networks with unique properties, such as the level distribution which has often been overlooked. We analyse a general tree growth model proposed by Klemm {\em et. al.} (2005) to explain the growth of user-generated directory structures in computers. The model has a single parameter qq which interpolates between preferential attachment and random growth. Our analysis results in three contributions: First, we propose a more efficient estimation method for qq based on the degree distribution, which is one specific representation of the model. Next, we introduce the concept of a level distribution and analytically solve the model for this representation. This allows for an alternative and independent measure of qq. We argue that, to capture real growth processes, the qq estimations from the degree and the level distributions should coincide. Thus, we finally apply both representations to validate the model with synthetically generated tree structures, as well as with collected data of user directories. In the case of real directory structures, we show that qq measured from the level distribution are incompatible with qq measured from the degree distribution. In contrast to this, we find perfect agreement in the case of simulated data. Thus, we conclude that the model is an incomplete description of the growth of real directory structures as it fails to reproduce the level distribution. This insight can be generalised to point out the importance of the level distribution for modeling tree growth.Comment: 16 pages, 7 figure
    corecore