Search CORE

42 research outputs found

Stochastic analysis of web page ranking

Author: Volkovich Yana
Publication venue: University of Twente
Publication date: 01/01/2009
Field of study

Today, the study of the World Wide Web is one of the most challenging subjects. In this work we consider the Web from a probabilistic point of view. We analyze the relations between various characteristics of the Web. In particular, we are interested in the Web properties that affect the Web page ranking, which is a measure of popularity and importance of a page in the Web. Mainly we restrict our attention on two widely-used algorithms for ranking: the number of references on a page (indegree), and Google’s PageRank. For the majority of self-organizing networks, such as the Web and the Wikipedia, the in-degree and the PageRank are observed to follow power laws. In this thesis we present a new methodology for analyzing the probabilistic behavior of the PageRank distribution and the dependence between various power law parameters of the Web. Our approach is based on the techniques from the theory of regular variations and the extreme value theory. We start Chapter 2 with models for distributions of the number of incoming (indegree) and outgoing (out-degree) links of a page. Next, we define the PageRank as a solution of a stochastic equation R d= PN i=1 AiRi+B, where Ri’s are distributed as R. This equation is inspired by the original definition of the PageRank. In particular, N models in-degree of a page, and B stays for the user preference. We use a probabilistic approach to show that the equation has a unique non-trivial solution with fixed finite mean. Our analysis based on a recurrent stochastic model for the power iteration algorithm commonly used in PageRank computations. Further, we obtain that the PageRank asymptotics after each iteration are determined by the asymptotics of the random variable with the heaviest tail among N and B. If the tails of N and B are equally heavy, then in fact we get the sum of two asymptotic expressions. We predict the tail behavior of the limiting distribution of the PageRank as a convergence of the results for iterations. To prove the predicted behavior we use another techniques in Chapter 3. In Chapter 3 we define the tail behavior for the models of the in-degree and the PageRank distribution using Laplace-Stieltjes transforms and the Tauberian theorem. We derive the equation for the Laplace-Stieltjes transforms, that corresponds to the general stochastic equation, and obtain our main result that establishes the tail behavior of the solution of the stochastic equation. In Chapter 4 we perform a number of experiments on the Web and the Wikipedia data sets, and on preferential attachment graphs in order to justify the results obtained in Chapters 2 and 3. The numerical results show a good agreement with our stochastic model for the PageRank distribution. Moreover, in Section 4.1 we also address the problem of evaluating power laws in the real data sets. We define several state of the art techniques from the statistical analysis of heavy tails, and we provide empirical evidence on the asymptotic similarity between in-degree and PageRank. Inspired by the minor effect of the out-degree distribution on the asymptotics of the PageRank, in Section 4.4 we introduce a new ranking scheme, called PAR, which combines features of HITS and PageRank ranking schemes. In Chapter 5 we examine the dependence structure in the power law graphs. First, we analytically define the tail dependencies between in-degree and PageRank of a one particular page by using the stochastic equation of the PageRank. We formally establish the relative importance of the two main factors for high ranking: large in-degree and a high rank of one of the ancestors. Second, we compute the angular measures for in-degrees, out-degrees and PageRank scores in three large data sets. The analysis of extremal dependence leads us to propose a new rank correlation measure which is particularly plausible for power law data. Finally, in Chapter 6 we apply the new rank correlation measure from Chapter 5 to various problems of rank aggregation. From numerical results we conclude that methods that are defined by the angular measure can provide good precision for the top nodes in large data sets, however they can fail in a small data sets

University of Twente Research Information

Not all paths lead to Rome: Analysing the network of sister cities

Author: Aragón Pablo
Kaltenbrunner Andreas
Laniado David
Volkovich Yana
Publication venue
Publication date: 29/01/2013
Field of study

This work analyses the practice of sister city pairing. We investigate structural properties of the resulting city and country networks and present rankings of the most central nodes in these networks. We identify different country clusters and find that the practice of sister city pairing is not influenced by geographical proximity but results in highly assortative networks.Comment: 7 pages, 4 figure

arXiv.org e-Print Archive

On the Accuracy of Hyper-local Geotagging of Social Media Content

Author: Flatow David
Kanza Yaron
Naaman Mor
Volkovich Yana
Xie Ke Eddie
Publication venue
Publication date: 01/02/2015
Field of study

Social media users share billions of items per year, only a small fraction of which is geotagged. We present a data- driven approach for identifying non-geotagged content items that can be associated with a hyper-local geographic area by modeling the location distributions of hyper-local n-grams that appear in the text. We explore the trade-off between accuracy, precision and coverage of this method. Further, we explore differences across content received from multiple platforms and devices, and show, for example, that content shared via different sources and applications produces significantly different geographic distributions, and that it is best to model and predict location for items according to their source. Our findings show the potential and the bounds of a data-driven approach to geotag short social media texts, and offer implications for all applications that use data-driven approaches to locate content.Comment: 10 page

arXiv.org e-Print Archive

CiteSeerX

Determining factors behind the PageRank log-log plot

Author: Bonato Anthony
Chung Fan R.K.
Donato Debora
Litvak Nelly
Volkovich Yana
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2007
Field of study

We study the relation between PageRank and other parameters of information networks such as in-degree, out-degree, and the fraction of dangling nodes. We model this relation through a stochastic equation inspired by the original definition of PageRank. Further, we use the theory of regular variation to prove that PageRank and in-degree follow power laws with the same exponent. The difference between these two power laws is in a multiple coefficient, which depends mainly on the fraction of dangling nodes, average in-degree, the power law exponent, and damping factor. The out-degree distribution has a minor effect, which we explicitly quantify. Our theoretical predictions show a good agreement with experimental data on three different samples of the Web

arXiv.org e-Print Archive

CiteSeerX

University of Twente Research Information

Jointly they edit: examining the impact of community identification on political interaction in Wikipedia

Author: Aragón Pablo
Kaltenbrunner Andreas
Kappler Karolin
Laniado David
Neff Jessica G.
Volkovich Yana
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 05/11/2012
Field of study

In their 2005 study, Adamic and Glance coined the memorable phrase "divided they blog", referring to a trend of cyberbalkanization in the political blogosphere, with liberal and conservative blogs tending to link to other blogs with a similar political slant, and not to one another. As political discussion and activity increasingly moves online, the power of framing political discourses is shifting from mass media to social media. Continued examination of political interactions online is critical, and we extend this line of research by examining the activities of political users within the Wikipedia community. First, we examined how users in Wikipedia choose to display (or not to display) their political affiliation. Next, we more closely examined the patterns of cross-party interaction and community participation among those users proclaiming a political affiliation. In contrast to previous analyses of other social media, we did not find strong trends indicating a preference to interact with members of the same political party within the Wikipedia community. Our results indicate that users who proclaim their political affiliation within the community tend to proclaim their identity as a "Wikipedian" even more loudly. It seems that the shared identity of "being Wikipedian" may be strong enough to triumph over other potentially divisive facets of personal identity, such as political affiliation.Comment: 33 pages, 5 figure

arXiv.org e-Print Archive

Directory of Open Access Journals

PubMed Central

Measuring extremal dependencies in web graphs

Author: Bert Zwart
Nelly Litvak
Yana Volkovich
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/11/2007
Field of study

We analyze dependencies in power law graph data (Web sample, Wikipedia sample and a preferential attachment graph) using statistical inference for multivariate regular variation. The well developed theory of regular variation is widely applied in extreme value theory, telecommunications and mathematical finance, and it provides a natural mathematical formalism for analyzing dependencies between variables with power laws. However, most of the proposed methods have never been used in the Web graph data mining. The present work fills this gap. The new insights this yields are striking: the three above-mentioned data sets are shown to have a totally different dependence structure between different graph parameters, such as in-degree and PageRank

CiteSeerX

Crossref

CWI's Institutional Repository

University of Twente Research Information

Thermodynamics and Separation Factor of Uranium from Fission Products in “Liquid Metal-Molten Salt” System

Author: Bychkov Alexander
Luk’yanova Yana
Novoselova Alena
Osipenko Alexander
Smolenski Valeri
Volkovich Vladimir
Publication venue: 'IntechOpen'
Publication date: 20/12/2017
Field of study

The present chapter contains the results of studying electrochemical and thermodynamic properties of La, Nd, and U in “liquid metal-molten salt” systems, where liquid metals were binary Ga-Al and Ga-In alloys of various compositions. The apparent standard potentials of ternary U-Ga-In, U-Ga-Al, La-Ga-In, La-Ga-Al, Nd-Ga-In, and Nd-Ga-Al alloys at various temperatures were determined, and the temperature dependencies were obtained. Primary thermodynamic properties (activity coefficients, partial excess Gibbs free energy change, partial enthalpy change of mixing, and excess entropy change) were calculated. The influence of the bimetallic alloy composition and the nature of lanthanide on thermodynamic properties of compounds are discussed. The values of U/Nd separation factors on gallium-aluminum and U/La on gallium-indium alloys were calculated. The value of the separation factors strongly depends on the alloy composition. Uranium in this case is accumulating in the metallic phase and lanthanides in the salt melt. Analysis of the data obtained showed the perspective use of the active cathodes (Ga-Al and Ga-In instead of single Cd) in future innovative methods for reprocessing spent nuclear fuels (SNF) and high-active nuclear wastes in the future of closed nuclear fuel cycle

IntechOpen

Crossref

Asymptotic analysis for personalized Web search

Author: Andersen
Barabási
Bingham
Boldi
Fortunato
Haveliwala
Jeh
Kamvar
Kraaij
Langville
Nelly Litvak
Page
Ponte
Resnick
Richardson
Volkovich
Yana Volkovich
Publication venue: 'Applied Probability Trust'
Publication date
Field of study

Crossref