281 research outputs found
Searching for superspreaders of information in real-world social media
A number of predictors have been suggested to detect the most influential
spreaders of information in online social media across various domains such as
Twitter or Facebook. In particular, degree, PageRank, k-core and other
centralities have been adopted to rank the spreading capability of users in
information dissemination media. So far, validation of the proposed predictors
has been done by simulating the spreading dynamics rather than following real
information flow in social networks. Consequently, only model-dependent
contradictory results have been achieved so far for the best predictor. Here,
we address this issue directly. We search for influential spreaders by
following the real spreading dynamics in a wide range of networks. We find that
the widely-used degree and PageRank fail in ranking users' influence. We find
that the best spreaders are consistently located in the k-core across
dissimilar social platforms such as Twitter, Facebook, Livejournal and
scientific publishing in the American Physical Society. Furthermore, when the
complete global network structure is unavailable, we find that the sum of the
nearest neighbors' degree is a reliable local proxy for user's influence. Our
analysis provides practical instructions for optimal design of strategies for
"viral" information dissemination in relevant applications.Comment: 12 pages, 7 figure
Ultimate periodicity of b-recognisable sets : a quasilinear procedure
It is decidable if a set of numbers, whose representation in a base b is a
regular language, is ultimately periodic. This was established by Honkala in
1986.
We give here a structural description of minimal automata that accept an
ultimately periodic set of numbers. We then show that it can verified in linear
time if a given minimal automaton meets this description.
This thus yields a O(n log(n)) procedure for deciding whether a general
deterministic automaton accepts an ultimately periodic set of numbers.Comment: presented at DLT 201
Improving the presentation of search results by multipartite graph clustering of multiple reformulated queries and a novel document representation
The goal of clustering web search results is to reveal the semantics of the retrieved documents. The main challenge is to make clustering partition relevant to a user’s query. In this paper, we describe a method of clustering search results using a similarity measure between documents retrieved by multiple reformulated queries. The method produces clusters of documents that are most relevant to the original query and, at the same time, represent a more diverse set of semantically related queries. In order to cluster thousands of documents in real time, we designed a novel multipartite graph clustering algorithm that has low polynomial complexity and no manually adjusted hyper–parameters. The loss of semantics resulting from the stem–based document representation is a common problem in information retrieval. To address this problem, we propose an alternative novel document representation, under which words are represented by their synonymy groups.This work was supported by Yandex grant 110104
Two-dimensional ranking of Wikipedia articles
The Library of Babel, described by Jorge Luis Borges, stores an enormous
amount of information. The Library exists {\it ab aeterno}. Wikipedia, a free
online encyclopaedia, becomes a modern analogue of such a Library. Information
retrieval and ranking of Wikipedia articles become the challenge of modern
society. While PageRank highlights very well known nodes with many ingoing
links, CheiRank highlights very communicative nodes with many outgoing links.
In this way the ranking becomes two-dimensional. Using CheiRank and PageRank we
analyze the properties of two-dimensional ranking of all Wikipedia English
articles and show that it gives their reliable classification with rich and
nontrivial features. Detailed studies are done for countries, universities,
personalities, physicists, chess players, Dow-Jones companies and other
categories.Comment: RevTex 9 pages, data, discussion added, more data at
http://www.quantware.ups-tlse.fr/QWLIB/2drankwikipedia
Worldwide spreading of economic crisis
We model the spreading of a crisis by constructing a global economic network
and applying the Susceptible-Infected-Recovered (SIR) epidemic model with a
variable probability of infection. The probability of infection depends on the
strength of economic relations between the pair of countries, and the strength
of the target country. It is expected that a crisis which originates in a large
country, such as the USA, has the potential to spread globally, like the recent
crisis. Surprisingly we show that also countries with much lower GDP, such as
Belgium, are able to initiate a global crisis. Using the {\it k}-shell
decomposition method to quantify the spreading power (of a node), we obtain a
measure of ``centrality'' as a spreader of each country in the economic
network. We thus rank the different countries according to the shell they
belong to, and find the 12 most central countries. These countries are the most
likely to spread a crisis globally. Of these 12 only six are large economies,
while the other six are medium/small ones, a result that could not have been
otherwise anticipated. Furthermore, we use our model to predict the crisis
spreading potential of countries belonging to different shells according to the
crisis magnitude.Comment: 13 pages, 4 figures and Supplementary Materia
Origins of power-law degree distribution in the heterogeneity of human activity in social networks
The probability distribution of number of ties of an individual in a social
network follows a scale-free power-law. However, how this distribution arises
has not been conclusively demonstrated in direct analyses of people's actions
in social networks. Here, we perform a causal inference analysis and find an
underlying cause for this phenomenon. Our analysis indicates that heavy-tailed
degree distribution is causally determined by similarly skewed distribution of
human activity. Specifically, the degree of an individual is entirely random -
following a "maximum entropy attachment" model - except for its mean value
which depends deterministically on the volume of the users' activity. This
relation cannot be explained by interactive models, like preferential
attachment, since the observed actions are not likely to be caused by
interactions with other people.Comment: 23 pages, 5 figure
An output-sensitive algorithm for the minimization of 2-dimensional String Covers
String covers are a powerful tool for analyzing the quasi-periodicity of
1-dimensional data and find applications in automata theory, computational
biology, coding and the analysis of transactional data. A \emph{cover} of a
string is a string for which every letter of lies within some
occurrence of . String covers have been generalized in many ways, leading to
\emph{k-covers}, \emph{-covers}, \emph{approximate covers} and were
studied in different contexts such as \emph{indeterminate strings}.
In this paper we generalize string covers to the context of 2-dimensional
data, such as images. We show how they can be used for the extraction of
textures from images and identification of primitive cells in lattice data.
This has interesting applications in image compression, procedural terrain
generation and crystallography
Studies of the limit order book around large price changes
We study the dynamics of the limit order book of liquid stocks after
experiencing large intra-day price changes. In the data we find large
variations in several microscopical measures, e.g., the volatility the bid-ask
spread, the bid-ask imbalance, the number of queuing limit orders, the activity
(number and volume) of limit orders placed and canceled, etc. The relaxation of
the quantities is generally very slow that can be described by a power law of
exponent . We introduce a numerical model in order to understand
the empirical results better. We find that with a zero intelligence deposition
model of the order flow the empirical results can be reproduced qualitatively.
This suggests that the slow relaxations might not be results of agents'
strategic behaviour. Studying the difference between the exponents found
empirically and numerically helps us to better identify the role of strategic
behaviour in the phenomena.Comment: 19 pages, 7 figure
A complementary view on the growth of directory trees
Trees are a special sub-class of networks with unique properties, such as the
level distribution which has often been overlooked. We analyse a general tree
growth model proposed by Klemm {\em et. al.} (2005) to explain the growth of
user-generated directory structures in computers. The model has a single
parameter which interpolates between preferential attachment and random
growth. Our analysis results in three contributions: First, we propose a more
efficient estimation method for based on the degree distribution, which is
one specific representation of the model. Next, we introduce the concept of a
level distribution and analytically solve the model for this representation.
This allows for an alternative and independent measure of . We argue that,
to capture real growth processes, the estimations from the degree and the
level distributions should coincide. Thus, we finally apply both
representations to validate the model with synthetically generated tree
structures, as well as with collected data of user directories. In the case of
real directory structures, we show that measured from the level
distribution are incompatible with measured from the degree distribution.
In contrast to this, we find perfect agreement in the case of simulated data.
Thus, we conclude that the model is an incomplete description of the growth of
real directory structures as it fails to reproduce the level distribution. This
insight can be generalised to point out the importance of the level
distribution for modeling tree growth.Comment: 16 pages, 7 figure
- …