934 research outputs found
PRSim: Sublinear Time SimRank Computation on Large Power-Law Graphs
{\it SimRank} is a classic measure of the similarities of nodes in a graph.
Given a node in graph , a {\em single-source SimRank query}
returns the SimRank similarities between node and each node . This type of queries has numerous applications in web search and social
networks analysis, such as link prediction, web mining, and spam detection.
Existing methods for single-source SimRank queries, however, incur query cost
at least linear to the number of nodes , which renders them inapplicable for
real-time and interactive analysis.
{ This paper proposes \prsim, an algorithm that exploits the structure of
graphs to efficiently answer single-source SimRank queries. \prsim uses an
index of size , where is the number of edges in the graph, and
guarantees a query time that depends on the {\em reverse PageRank} distribution
of the input graph. In particular, we prove that \prsim runs in sub-linear time
if the degree distribution of the input graph follows the power-law
distribution, a property possessed by many real-world graphs. Based on the
theoretical analysis, we show that the empirical query time of all existing
SimRank algorithms also depends on the reverse PageRank distribution of the
graph.} Finally, we present the first experimental study that evaluates the
absolute errors of various SimRank algorithms on large graphs, and we show that
\prsim outperforms the state of the art in terms of query time, accuracy, index
size, and scalability.Comment: ACM SIGMOD 201
Exact Single-Source SimRank Computation on Large Graphs
SimRank is a popular measurement for evaluating the node-to-node similarities
based on the graph topology. In recent years, single-source and top- SimRank
queries have received increasing attention due to their applications in web
mining, social network analysis, and spam detection. However, a fundamental
obstacle in studying SimRank has been the lack of ground truths. The only exact
algorithm, Power Method, is computationally infeasible on graphs with more than
nodes. Consequently, no existing work has evaluated the actual
trade-offs between query time and accuracy on large real-world graphs. In this
paper, we present ExactSim, the first algorithm that computes the exact
single-source and top- SimRank results on large graphs. With high
probability, this algorithm produces ground truths with a rigorous theoretical
guarantee. We conduct extensive experiments on real-world datasets to
demonstrate the efficiency of ExactSim. The results show that ExactSim provides
the ground truth for any single-source SimRank query with a precision up to 7
decimal places within a reasonable query time.Comment: ACM SIGMOD 202
Non-Conservative Diffusion and its Application to Social Network Analysis
The random walk is fundamental to modeling dynamic processes on networks.
Metrics based on the random walk have been used in many applications from image
processing to Web page ranking. However, how appropriate are random walks to
modeling and analyzing social networks? We argue that unlike a random walk,
which conserves the quantity diffusing on a network, many interesting social
phenomena, such as the spread of information or disease on a social network,
are fundamentally non-conservative. When an individual infects her neighbor
with a virus, the total amount of infection increases. We classify diffusion
processes as conservative and non-conservative and show how these differences
impact the choice of metrics used for network analysis, as well as our
understanding of network structure and behavior. We show that Alpha-Centrality,
which mathematically describes non-conservative diffusion, leads to new
insights into the behavior of spreading processes on networks. We give a
scalable approximate algorithm for computing the Alpha-Centrality in a massive
graph. We validate our approach on real-world online social networks of Digg.
We show that a non-conservative metric, such as Alpha-Centrality, produces
better agreement with empirical measure of influence than conservative metrics,
such as PageRank. We hope that our investigation will inspire further
exploration into the realms of conservative and non-conservative metrics in
social network analysis
Term-Specific Eigenvector-Centrality in Multi-Relation Networks
Fuzzy matching and ranking are two information retrieval techniques widely used in web search. Their application to structured data, however, remains an open problem. This article investigates how eigenvector-centrality can be used for approximate matching in multi-relation graphs, that is, graphs where connections of many different types may exist. Based on an extension of the PageRank matrix, eigenvectors representing the distribution of a term after propagating term weights between related data items are computed. The result is an index which takes the document structure into account and can be used with standard document retrieval techniques. As the scheme takes the shape of an index transformation, all necessary calculations are performed during index tim
- …