42 research outputs found
Stochastic analysis of web page ranking
Today, the study of the World Wide Web is one of the most challenging subjects. In this work we consider the Web from a probabilistic point of view. We analyze the relations between various characteristics of the Web. In particular, we are interested in the Web properties that affect the Web page ranking, which is a measure of popularity and importance of a page in the Web. Mainly we restrict our attention on two widely-used algorithms for ranking: the number of references on a page (indegree), and Googleās PageRank. For the majority of self-organizing networks, such as the Web and the Wikipedia, the in-degree and the PageRank are observed to follow power laws. In this thesis we present a new methodology for analyzing the probabilistic behavior of the PageRank distribution and the dependence between various power law parameters of the Web. Our approach is based on the techniques from the theory of regular variations and the extreme value theory. We start Chapter 2 with models for distributions of the number of incoming (indegree) and outgoing (out-degree) links of a page. Next, we define the PageRank as a solution of a stochastic equation R d= PN i=1 AiRi+B, where Riās are distributed as R. This equation is inspired by the original definition of the PageRank. In particular, N models in-degree of a page, and B stays for the user preference. We use a probabilistic approach to show that the equation has a unique non-trivial solution with fixed finite mean. Our analysis based on a recurrent stochastic model for the power iteration algorithm commonly used in PageRank computations. Further, we obtain that the PageRank asymptotics after each iteration are determined by the asymptotics of the random variable with the heaviest tail among N and B. If the tails of N and B are equally heavy, then in fact we get the sum of two asymptotic expressions. We predict the tail behavior of the limiting distribution of the PageRank as a convergence of the results for iterations. To prove the predicted behavior we use another techniques in Chapter 3. In Chapter 3 we define the tail behavior for the models of the in-degree and the PageRank distribution using Laplace-Stieltjes transforms and the Tauberian theorem. We derive the equation for the Laplace-Stieltjes transforms, that corresponds to the general stochastic equation, and obtain our main result that establishes the tail behavior of the solution of the stochastic equation. In Chapter 4 we perform a number of experiments on the Web and the Wikipedia data sets, and on preferential attachment graphs in order to justify the results obtained in Chapters 2 and 3. The numerical results show a good agreement with our stochastic model for the PageRank distribution. Moreover, in Section 4.1 we also address the problem of evaluating power laws in the real data sets. We define several state of the art techniques from the statistical analysis of heavy tails, and we provide empirical evidence on the asymptotic similarity between in-degree and PageRank. Inspired by the minor effect of the out-degree distribution on the asymptotics of the PageRank, in Section 4.4 we introduce a new ranking scheme, called PAR, which combines features of HITS and PageRank ranking schemes. In Chapter 5 we examine the dependence structure in the power law graphs. First, we analytically define the tail dependencies between in-degree and PageRank of a one particular page by using the stochastic equation of the PageRank. We formally establish the relative importance of the two main factors for high ranking: large in-degree and a high rank of one of the ancestors. Second, we compute the angular measures for in-degrees, out-degrees and PageRank scores in three large data sets. The analysis of extremal dependence leads us to propose a new rank correlation measure which is particularly plausible for power law data. Finally, in Chapter 6 we apply the new rank correlation measure from Chapter 5 to various problems of rank aggregation. From numerical results we conclude that methods that are defined by the angular measure can provide good precision for the top nodes in large data sets, however they can fail in a small data sets
Not all paths lead to Rome: Analysing the network of sister cities
This work analyses the practice of sister city pairing. We investigate
structural properties of the resulting city and country networks and present
rankings of the most central nodes in these networks. We identify different
country clusters and find that the practice of sister city pairing is not
influenced by geographical proximity but results in highly assortative
networks.Comment: 7 pages, 4 figure
On the Accuracy of Hyper-local Geotagging of Social Media Content
Social media users share billions of items per year, only a small fraction of
which is geotagged. We present a data- driven approach for identifying
non-geotagged content items that can be associated with a hyper-local
geographic area by modeling the location distributions of hyper-local n-grams
that appear in the text. We explore the trade-off between accuracy, precision
and coverage of this method. Further, we explore differences across content
received from multiple platforms and devices, and show, for example, that
content shared via different sources and applications produces significantly
different geographic distributions, and that it is best to model and predict
location for items according to their source. Our findings show the potential
and the bounds of a data-driven approach to geotag short social media texts,
and offer implications for all applications that use data-driven approaches to
locate content.Comment: 10 page
Determining factors behind the PageRank log-log plot
We study the relation between PageRank and other parameters of information
networks such as in-degree, out-degree, and the fraction of dangling nodes. We
model this relation through a stochastic equation inspired by the original
definition of PageRank. Further, we use the theory of regular variation to
prove that PageRank and in-degree follow power laws with the same exponent. The
difference between these two power laws is in a multiple coefficient, which
depends mainly on the fraction of dangling nodes, average in-degree, the power
law exponent, and damping factor. The out-degree distribution has a minor
effect, which we explicitly quantify. Our theoretical predictions show a good
agreement with experimental data on three different samples of the Web
Jointly they edit: examining the impact of community identification on political interaction in Wikipedia
In their 2005 study, Adamic and Glance coined the memorable phrase "divided
they blog", referring to a trend of cyberbalkanization in the political
blogosphere, with liberal and conservative blogs tending to link to other blogs
with a similar political slant, and not to one another. As political discussion
and activity increasingly moves online, the power of framing political
discourses is shifting from mass media to social media. Continued examination
of political interactions online is critical, and we extend this line of
research by examining the activities of political users within the Wikipedia
community. First, we examined how users in Wikipedia choose to display (or not
to display) their political affiliation. Next, we more closely examined the
patterns of cross-party interaction and community participation among those
users proclaiming a political affiliation. In contrast to previous analyses of
other social media, we did not find strong trends indicating a preference to
interact with members of the same political party within the Wikipedia
community. Our results indicate that users who proclaim their political
affiliation within the community tend to proclaim their identity as a
"Wikipedian" even more loudly. It seems that the shared identity of "being
Wikipedian" may be strong enough to triumph over other potentially divisive
facets of personal identity, such as political affiliation.Comment: 33 pages, 5 figure
Measuring extremal dependencies in web graphs
We analyze dependencies in power law graph data (Web sample, Wikipedia sample and a preferential attachment graph) using statistical inference for multivariate regular variation. The well developed theory of regular variation is widely applied in extreme value theory, telecommunications and mathematical finance, and it provides a natural mathematical formalism for analyzing dependencies between variables with power laws. However, most of the proposed methods have never been used in the Web graph data mining. The present work fills this gap. The new insights this yields are striking: the three above-mentioned data sets are shown to have a totally different dependence structure between different graph parameters, such as in-degree and PageRank
Thermodynamics and Separation Factor of Uranium from Fission Products in āLiquid Metal-Molten Saltā System
The present chapter contains the results of studying electrochemical and thermodynamic properties of La, Nd, and U in āliquid metal-molten saltā systems, where liquid metals were binary Ga-Al and Ga-In alloys of various compositions. The apparent standard potentials of ternary U-Ga-In, U-Ga-Al, La-Ga-In, La-Ga-Al, Nd-Ga-In, and Nd-Ga-Al alloys at various temperatures were determined, and the temperature dependencies were obtained. Primary thermodynamic properties (activity coefficients, partial excess Gibbs free energy change, partial enthalpy change of mixing, and excess entropy change) were calculated. The influence of the bimetallic alloy composition and the nature of lanthanide on thermodynamic properties of compounds are discussed. The values of U/Nd separation factors on gallium-aluminum and U/La on gallium-indium alloys were calculated. The value of the separation factors strongly depends on the alloy composition. Uranium in this case is accumulating in the metallic phase and lanthanides in the salt melt. Analysis of the data obtained showed the perspective use of the active cathodes (Ga-Al and Ga-In instead of single Cd) in future innovative methods for reprocessing spent nuclear fuels (SNF) and high-active nuclear wastes in the future of closed nuclear fuel cycle