106,508 research outputs found

    Local Ranking Problem on the BrowseGraph

    Full text link
    The "Local Ranking Problem" (LRP) is related to the computation of a centrality-like rank on a local graph, where the scores of the nodes could significantly differ from the ones computed on the global graph. Previous work has studied LRP on the hyperlink graph but never on the BrowseGraph, namely a graph where nodes are webpages and edges are browsing transitions. Recently, this graph has received more and more attention in many different tasks such as ranking, prediction and recommendation. However, a web-server has only the browsing traffic performed on its pages (local BrowseGraph) and, as a consequence, the local computation can lead to estimation errors, which hinders the increasing number of applications in the state of the art. Also, although the divergence between the local and global ranks has been measured, the possibility of estimating such divergence using only local knowledge has been mainly overlooked. These aspects are of great interest for online service providers who want to: (i) gauge their ability to correctly assess the importance of their resources only based on their local knowledge, and (ii) take into account real user browsing fluxes that better capture the actual user interest than the static hyperlink network. We study the LRP problem on a BrowseGraph from a large news provider, considering as subgraphs the aggregations of browsing traces of users coming from different domains. We show that the distance between rankings can be accurately predicted based only on structural information of the local graph, being able to achieve an average rank correlation as high as 0.8

    Cut Size Statistics of Graph Bisection Heuristics

    Full text link
    We investigate the statistical properties of cut sizes generated by heuristic algorithms which solve approximately the graph bisection problem. On an ensemble of sparse random graphs, we find empirically that the distribution of the cut sizes found by ``local'' algorithms becomes peaked as the number of vertices in the graphs becomes large. Evidence is given that this distribution tends towards a Gaussian whose mean and variance scales linearly with the number of vertices of the graphs. Given the distribution of cut sizes associated with each heuristic, we provide a ranking procedure which takes into account both the quality of the solutions and the speed of the algorithms. This procedure is demonstrated for a selection of local graph bisection heuristics.Comment: 17 pages, 5 figures, submitted to SIAM Journal on Optimization also available at http://ipnweb.in2p3.fr/~martin

    Term-Specific Eigenvector-Centrality in Multi-Relation Networks

    Get PDF
    Fuzzy matching and ranking are two information retrieval techniques widely used in web search. Their application to structured data, however, remains an open problem. This article investigates how eigenvector-centrality can be used for approximate matching in multi-relation graphs, that is, graphs where connections of many different types may exist. Based on an extension of the PageRank matrix, eigenvectors representing the distribution of a term after propagating term weights between related data items are computed. The result is an index which takes the document structure into account and can be used with standard document retrieval techniques. As the scheme takes the shape of an index transformation, all necessary calculations are performed during index tim

    Link prediction in very large directed graphs: Exploiting hierarchical properties in parallel

    Get PDF
    Link prediction is a link mining task that tries to find new edges within a given graph. Among the targets of link prediction there is large directed graphs, which are frequent structures nowadays. The typical sparsity of large graphs demands of high precision predictions in order to obtain usable results. However, the size of those graphs only permits the execution of scalable algorithms. As a trade-off between those two problems we recently proposed a link prediction algorithm for directed graphs that exploits hierarchical properties. The algorithm can be classified as a local score, which entails scalability. Unlike the rest of local scores, our proposal assumes the existence of an underlying model for the data which allows it to produce predictions with a higher precision. We test the validity of its hierarchical assumptions on two clearly hierarchical data sets, one of them based on RDF. Then we test it on a non-hierarchical data set based on Wikipedia to demonstrate its broad applicability. Given the computational complexity of link prediction in very large graphs we also introduce some general recommendations useful to make of link prediction an efficiently parallelized problem.Peer ReviewedPostprint (published version
    corecore