42 research outputs found

    Stochastic analysis of web page ranking

    Get PDF
    Today, the study of the World Wide Web is one of the most challenging subjects. In this work we consider the Web from a probabilistic point of view. We analyze the relations between various characteristics of the Web. In particular, we are interested in the Web properties that affect the Web page ranking, which is a measure of popularity and importance of a page in the Web. Mainly we restrict our attention on two widely-used algorithms for ranking: the number of references on a page (indegree), and Googleā€™s PageRank. For the majority of self-organizing networks, such as the Web and the Wikipedia, the in-degree and the PageRank are observed to follow power laws. In this thesis we present a new methodology for analyzing the probabilistic behavior of the PageRank distribution and the dependence between various power law parameters of the Web. Our approach is based on the techniques from the theory of regular variations and the extreme value theory. We start Chapter 2 with models for distributions of the number of incoming (indegree) and outgoing (out-degree) links of a page. Next, we define the PageRank as a solution of a stochastic equation R d= PN i=1 AiRi+B, where Riā€™s are distributed as R. This equation is inspired by the original definition of the PageRank. In particular, N models in-degree of a page, and B stays for the user preference. We use a probabilistic approach to show that the equation has a unique non-trivial solution with fixed finite mean. Our analysis based on a recurrent stochastic model for the power iteration algorithm commonly used in PageRank computations. Further, we obtain that the PageRank asymptotics after each iteration are determined by the asymptotics of the random variable with the heaviest tail among N and B. If the tails of N and B are equally heavy, then in fact we get the sum of two asymptotic expressions. We predict the tail behavior of the limiting distribution of the PageRank as a convergence of the results for iterations. To prove the predicted behavior we use another techniques in Chapter 3. In Chapter 3 we define the tail behavior for the models of the in-degree and the PageRank distribution using Laplace-Stieltjes transforms and the Tauberian theorem. We derive the equation for the Laplace-Stieltjes transforms, that corresponds to the general stochastic equation, and obtain our main result that establishes the tail behavior of the solution of the stochastic equation. In Chapter 4 we perform a number of experiments on the Web and the Wikipedia data sets, and on preferential attachment graphs in order to justify the results obtained in Chapters 2 and 3. The numerical results show a good agreement with our stochastic model for the PageRank distribution. Moreover, in Section 4.1 we also address the problem of evaluating power laws in the real data sets. We define several state of the art techniques from the statistical analysis of heavy tails, and we provide empirical evidence on the asymptotic similarity between in-degree and PageRank. Inspired by the minor effect of the out-degree distribution on the asymptotics of the PageRank, in Section 4.4 we introduce a new ranking scheme, called PAR, which combines features of HITS and PageRank ranking schemes. In Chapter 5 we examine the dependence structure in the power law graphs. First, we analytically define the tail dependencies between in-degree and PageRank of a one particular page by using the stochastic equation of the PageRank. We formally establish the relative importance of the two main factors for high ranking: large in-degree and a high rank of one of the ancestors. Second, we compute the angular measures for in-degrees, out-degrees and PageRank scores in three large data sets. The analysis of extremal dependence leads us to propose a new rank correlation measure which is particularly plausible for power law data. Finally, in Chapter 6 we apply the new rank correlation measure from Chapter 5 to various problems of rank aggregation. From numerical results we conclude that methods that are defined by the angular measure can provide good precision for the top nodes in large data sets, however they can fail in a small data sets

    Not all paths lead to Rome: Analysing the network of sister cities

    Full text link
    This work analyses the practice of sister city pairing. We investigate structural properties of the resulting city and country networks and present rankings of the most central nodes in these networks. We identify different country clusters and find that the practice of sister city pairing is not influenced by geographical proximity but results in highly assortative networks.Comment: 7 pages, 4 figure

    On the Accuracy of Hyper-local Geotagging of Social Media Content

    Full text link
    Social media users share billions of items per year, only a small fraction of which is geotagged. We present a data- driven approach for identifying non-geotagged content items that can be associated with a hyper-local geographic area by modeling the location distributions of hyper-local n-grams that appear in the text. We explore the trade-off between accuracy, precision and coverage of this method. Further, we explore differences across content received from multiple platforms and devices, and show, for example, that content shared via different sources and applications produces significantly different geographic distributions, and that it is best to model and predict location for items according to their source. Our findings show the potential and the bounds of a data-driven approach to geotag short social media texts, and offer implications for all applications that use data-driven approaches to locate content.Comment: 10 page

    Determining factors behind the PageRank log-log plot

    Get PDF
    We study the relation between PageRank and other parameters of information networks such as in-degree, out-degree, and the fraction of dangling nodes. We model this relation through a stochastic equation inspired by the original definition of PageRank. Further, we use the theory of regular variation to prove that PageRank and in-degree follow power laws with the same exponent. The difference between these two power laws is in a multiple coefficient, which depends mainly on the fraction of dangling nodes, average in-degree, the power law exponent, and damping factor. The out-degree distribution has a minor effect, which we explicitly quantify. Our theoretical predictions show a good agreement with experimental data on three different samples of the Web

    Jointly they edit: examining the impact of community identification on political interaction in Wikipedia

    Get PDF
    In their 2005 study, Adamic and Glance coined the memorable phrase "divided they blog", referring to a trend of cyberbalkanization in the political blogosphere, with liberal and conservative blogs tending to link to other blogs with a similar political slant, and not to one another. As political discussion and activity increasingly moves online, the power of framing political discourses is shifting from mass media to social media. Continued examination of political interactions online is critical, and we extend this line of research by examining the activities of political users within the Wikipedia community. First, we examined how users in Wikipedia choose to display (or not to display) their political affiliation. Next, we more closely examined the patterns of cross-party interaction and community participation among those users proclaiming a political affiliation. In contrast to previous analyses of other social media, we did not find strong trends indicating a preference to interact with members of the same political party within the Wikipedia community. Our results indicate that users who proclaim their political affiliation within the community tend to proclaim their identity as a "Wikipedian" even more loudly. It seems that the shared identity of "being Wikipedian" may be strong enough to triumph over other potentially divisive facets of personal identity, such as political affiliation.Comment: 33 pages, 5 figure

    Measuring extremal dependencies in web graphs

    Get PDF
    We analyze dependencies in power law graph data (Web sample, Wikipedia sample and a preferential attachment graph) using statistical inference for multivariate regular variation. The well developed theory of regular variation is widely applied in extreme value theory, telecommunications and mathematical finance, and it provides a natural mathematical formalism for analyzing dependencies between variables with power laws. However, most of the proposed methods have never been used in the Web graph data mining. The present work fills this gap. The new insights this yields are striking: the three above-mentioned data sets are shown to have a totally different dependence structure between different graph parameters, such as in-degree and PageRank

    Thermodynamics and Separation Factor of Uranium from Fission Products in ā€œLiquid Metal-Molten Saltā€ System

    Get PDF
    The present chapter contains the results of studying electrochemical and thermodynamic properties of La, Nd, and U in ā€œliquid metal-molten saltā€ systems, where liquid metals were binary Ga-Al and Ga-In alloys of various compositions. The apparent standard potentials of ternary U-Ga-In, U-Ga-Al, La-Ga-In, La-Ga-Al, Nd-Ga-In, and Nd-Ga-Al alloys at various temperatures were determined, and the temperature dependencies were obtained. Primary thermodynamic properties (activity coefficients, partial excess Gibbs free energy change, partial enthalpy change of mixing, and excess entropy change) were calculated. The influence of the bimetallic alloy composition and the nature of lanthanide on thermodynamic properties of compounds are discussed. The values of U/Nd separation factors on gallium-aluminum and U/La on gallium-indium alloys were calculated. The value of the separation factors strongly depends on the alloy composition. Uranium in this case is accumulating in the metallic phase and lanthanides in the salt melt. Analysis of the data obtained showed the perspective use of the active cathodes (Ga-Al and Ga-In instead of single Cd) in future innovative methods for reprocessing spent nuclear fuels (SNF) and high-active nuclear wastes in the future of closed nuclear fuel cycle
    corecore