490 research outputs found

    A framework for evaluating statistical dependencies and rank correlations in power law graphs

    Get PDF
    We analyze dependencies in power law graph data (Web sample, Wikipedia sample and a preferential attachment graph) using statistical inference for multivariate regular variation. To the best of our knowledge, this is the first attempt to apply the well developed theory of regular variation to graph data. The new insights this yields are striking: the three above-mentioned data sets are shown to have a totally different dependence structure between different graph parameters, such as in-degree and PageRank. Based on the proposed methodology, we suggest a new measure for rank correlations. Unlike most known methods, this measure is especially sensitive to rank permutations for topranked nodes. Using this method, we demonstrate that the PageRank ranking is not sensitive to moderate changes in the damping factor

    Degree-degree correlations in random graphs with heavy-tailed degrees

    Get PDF
    We investigate degree-degree correlations for scale-free graph sequences. The main conclusion of this paper is that the assortativity coefficient is not the appropriate way to describe degree-dependences in scale-free random graphs. Indeed, we study the infinite volume limit of the assortativity coefficient, and show that this limit is always non-negative when the degrees have finite first but infinite third moment, i.e., when the degree exponent Ī³+1\gamma + 1 of the density satisfies Ī³āˆˆ(1,3)\gamma \in (1,3). More generally, our results show that the correlation coefficient is inappropriate to describe dependencies between random variables having infinite variance. We start with a simple model of the sample correlation of random variables XX and YY, which are linear combinations with non-negative coefficients of the same infinite variance random variables. In this case, the correlation coefficient of XX and YY is not defined, and the sample covariance converges to a proper random variable with support that is a subinterval of (āˆ’1,1)(-1,1). Further, for any joint distribution (X,Y)(X,Y) with equal marginals being non-negative power-law distributions with infinite variance (as in the case of degree-degree correlations), we show that the limit is non-negative. We next adapt these results to the assortativity in networks as described by the degree-degree correlation coefficient, and show that it is non-negative in the large graph limit when the degree distribution has an infinite third moment. We illustrate these results with several examples where the assortativity behaves in a non-sensible way. We further discuss alternatives for describing assortativity in networks based on rank correlations that are appropriate for infinite variance variables. We support these mathematical results by simulations

    Degree-degree correlations in random graphs with heavy-tailed degrees

    Get PDF
    Mixing patterns in large self-organizing networks, such as the Internet, the World Wide Web, social and biological networks are often characterized by degree-degree {dependencies} between neighbouring nodes. One of the problems with the commonly used Pearson's correlation coefficient (termed as the assortativity coefficient) is that {in disassortative networks its magnitude decreases} with the network size. This makes it impossible to compare mixing patterns, for example, in two web crawls of different size. We start with a simple model of two heavy-tailed highly correlated random variable XX and YY, and show that the sample correlation coefficient converges in distribution either to a proper random variable on [āˆ’1,1][-1,1], or to zero, and if X,Yā‰„0X,Y\ge 0 then the limit is non-negative. We next show that it is non-negative in the large graph limit when the degree distribution has an infinite third moment. We consider the alternative degree-degree dependency measure, based on the Spearman's rho, and prove that it converges to an appropriate limit under very general conditions. We verify that these conditions hold in common network models, such as configuration model and Preferential Attachment model. We conclude that rank correlations provide a suitable and informative method for uncovering network mixing patterns

    Stochastic analysis of web page ranking

    Get PDF
    Today, the study of the World Wide Web is one of the most challenging subjects. In this work we consider the Web from a probabilistic point of view. We analyze the relations between various characteristics of the Web. In particular, we are interested in the Web properties that affect the Web page ranking, which is a measure of popularity and importance of a page in the Web. Mainly we restrict our attention on two widely-used algorithms for ranking: the number of references on a page (indegree), and Googleā€™s PageRank. For the majority of self-organizing networks, such as the Web and the Wikipedia, the in-degree and the PageRank are observed to follow power laws. In this thesis we present a new methodology for analyzing the probabilistic behavior of the PageRank distribution and the dependence between various power law parameters of the Web. Our approach is based on the techniques from the theory of regular variations and the extreme value theory. We start Chapter 2 with models for distributions of the number of incoming (indegree) and outgoing (out-degree) links of a page. Next, we define the PageRank as a solution of a stochastic equation R d= PN i=1 AiRi+B, where Riā€™s are distributed as R. This equation is inspired by the original definition of the PageRank. In particular, N models in-degree of a page, and B stays for the user preference. We use a probabilistic approach to show that the equation has a unique non-trivial solution with fixed finite mean. Our analysis based on a recurrent stochastic model for the power iteration algorithm commonly used in PageRank computations. Further, we obtain that the PageRank asymptotics after each iteration are determined by the asymptotics of the random variable with the heaviest tail among N and B. If the tails of N and B are equally heavy, then in fact we get the sum of two asymptotic expressions. We predict the tail behavior of the limiting distribution of the PageRank as a convergence of the results for iterations. To prove the predicted behavior we use another techniques in Chapter 3. In Chapter 3 we define the tail behavior for the models of the in-degree and the PageRank distribution using Laplace-Stieltjes transforms and the Tauberian theorem. We derive the equation for the Laplace-Stieltjes transforms, that corresponds to the general stochastic equation, and obtain our main result that establishes the tail behavior of the solution of the stochastic equation. In Chapter 4 we perform a number of experiments on the Web and the Wikipedia data sets, and on preferential attachment graphs in order to justify the results obtained in Chapters 2 and 3. The numerical results show a good agreement with our stochastic model for the PageRank distribution. Moreover, in Section 4.1 we also address the problem of evaluating power laws in the real data sets. We define several state of the art techniques from the statistical analysis of heavy tails, and we provide empirical evidence on the asymptotic similarity between in-degree and PageRank. Inspired by the minor effect of the out-degree distribution on the asymptotics of the PageRank, in Section 4.4 we introduce a new ranking scheme, called PAR, which combines features of HITS and PageRank ranking schemes. In Chapter 5 we examine the dependence structure in the power law graphs. First, we analytically define the tail dependencies between in-degree and PageRank of a one particular page by using the stochastic equation of the PageRank. We formally establish the relative importance of the two main factors for high ranking: large in-degree and a high rank of one of the ancestors. Second, we compute the angular measures for in-degrees, out-degrees and PageRank scores in three large data sets. The analysis of extremal dependence leads us to propose a new rank correlation measure which is particularly plausible for power law data. Finally, in Chapter 6 we apply the new rank correlation measure from Chapter 5 to various problems of rank aggregation. From numerical results we conclude that methods that are defined by the angular measure can provide good precision for the top nodes in large data sets, however they can fail in a small data sets

    Measuring extremal dependencies in web graphs

    Get PDF
    We analyze dependencies in power law graph data (Web sample, Wikipedia sample and a preferential attachment graph) using statistical inference for multivariate regular variation. The well developed theory of regular variation is widely applied in extreme value theory, telecommunications and mathematical finance, and it provides a natural mathematical formalism for analyzing dependencies between variables with power laws. However, most of the proposed methods have never been used in the Web graph data mining. The present work fills this gap. The new insights this yields are striking: the three above-mentioned data sets are shown to have a totally different dependence structure between different graph parameters, such as in-degree and PageRank
    • ā€¦
    corecore