892 research outputs found
Google matrix analysis of DNA sequences
For DNA sequences of various species we construct the Google matrix G of
Markov transitions between nearby words composed of several letters. The
statistical distribution of matrix elements of this matrix is shown to be
described by a power law with the exponent being close to those of outgoing
links in such scale-free networks as the World Wide Web (WWW). At the same time
the sum of ingoing matrix elements is characterized by the exponent being
significantly larger than those typical for WWW networks. This results in a
slow algebraic decay of the PageRank probability determined by the distribution
of ingoing elements. The spectrum of G is characterized by a large gap leading
to a rapid relaxation process on the DNA sequence networks. We introduce the
PageRank proximity correlator between different species which determines their
statistical similarity from the view point of Markov chains. The properties of
other eigenstates of the Google matrix are also discussed. Our results
establish scale-free features of DNA sequence networks showing their
similarities and distinctions with the WWW and linguistic networks.Comment: latex, 11 fig
Towards the characterization of individual users through Web analytics
We perform an analysis of the way individual users navigate in the Web. We
focus primarily in the temporal patterns of they return to a given page. The
return probability as a function of time as well as the distribution of time
intervals between consecutive visits are measured and found to be independent
of the level of activity of single users. The results indicate a rich variety
of individual behaviors and seem to preclude the possibility of defining a
characteristic frequency for each user in his/her visits to a single site.Comment: 8 pages, 4 figures. To appear in Proceeding of Complex'0
Centrality measures for graphons: Accounting for uncertainty in networks
As relational datasets modeled as graphs keep increasing in size and their
data-acquisition is permeated by uncertainty, graph-based analysis techniques
can become computationally and conceptually challenging. In particular, node
centrality measures rely on the assumption that the graph is perfectly known --
a premise not necessarily fulfilled for large, uncertain networks. Accordingly,
centrality measures may fail to faithfully extract the importance of nodes in
the presence of uncertainty. To mitigate these problems, we suggest a
statistical approach based on graphon theory: we introduce formal definitions
of centrality measures for graphons and establish their connections to
classical graph centrality measures. A key advantage of this approach is that
centrality measures defined at the modeling level of graphons are inherently
robust to stochastic variations of specific graph realizations. Using the
theory of linear integral operators, we define degree, eigenvector, Katz and
PageRank centrality functions for graphons and establish concentration
inequalities demonstrating that graphon centrality functions arise naturally as
limits of their counterparts defined on sequences of graphs of increasing size.
The same concentration inequalities also provide high-probability bounds
between the graphon centrality functions and the centrality measures on any
sampled graph, thereby establishing a measure of uncertainty of the measured
centrality score. The same concentration inequalities also provide
high-probability bounds between the graphon centrality functions and the
centrality measures on any sampled graph, thereby establishing a measure of
uncertainty of the measured centrality score.Comment: Authors ordered alphabetically, all authors contributed equally. 21
pages, 7 figure
- …