103,582 research outputs found
Outward Influence and Cascade Size Estimation in Billion-scale Networks
Estimating cascade size and nodes' influence is a fundamental task in social,
technological, and biological networks. Yet this task is extremely challenging
due to the sheer size and the structural heterogeneity of networks. We
investigate a new influence measure, termed outward influence (OI), defined as
the (expected) number of nodes that a subset of nodes will activate,
excluding the nodes in S. Thus, OI equals, the de facto standard measure,
influence spread of S minus |S|. OI is not only more informative for nodes with
small influence, but also, critical in designing new effective sampling and
statistical estimation methods.
Based on OI, we propose SIEA/SOIEA, novel methods to estimate influence
spread/outward influence at scale and with rigorous theoretical guarantees. The
proposed methods are built on two novel components 1) IICP an important
sampling method for outward influence, and 2) RSA, a robust mean estimation
method that minimize the number of samples through analyzing variance and range
of random variables. Compared to the state-of-the art for influence estimation,
SIEA is times faster in theory and up to several orders of
magnitude faster in practice. For the first time, influence of nodes in the
networks of billions of edges can be estimated with high accuracy within a few
minutes. Our comprehensive experiments on real-world networks also give
evidence against the popular practice of using a fixed number, e.g. 10K or 20K,
of samples to compute the "ground truth" for influence spread.Comment: 16 pages, SIGMETRICS 201
Importance Sketching of Influence Dynamics in Billion-scale Networks
The blooming availability of traces for social, biological, and communication
networks opens up unprecedented opportunities in analyzing diffusion processes
in networks. However, the sheer sizes of the nowadays networks raise serious
challenges in computational efficiency and scalability.
In this paper, we propose a new hyper-graph sketching framework for inflence
dynamics in networks. The central of our sketching framework, called SKIS, is
an efficient importance sampling algorithm that returns only non-singular
reverse cascades in the network. Comparing to previously developed sketches
like RIS and SKIM, our sketch significantly enhances estimation quality while
substantially reducing processing time and memory-footprint. Further, we
present general strategies of using SKIS to enhance existing algorithms for
influence estimation and influence maximization which are motivated by
practical applications like viral marketing. Using SKIS, we design high-quality
influence oracle for seed sets with average estimation error up to 10x times
smaller than those using RIS and 6x times smaller than SKIM. In addition, our
influence maximization using SKIS substantially improves the quality of
solutions for greedy algorithms. It achieves up to 10x times speed-up and 4x
memory reduction for the fastest RIS-based DSSA algorithm, while maintaining
the same theoretical guarantees.Comment: 12 pages, to appear in ICDM 2017 as a regular pape
Kirchhoff Index As a Measure of Edge Centrality in Weighted Networks: Nearly Linear Time Algorithms
Most previous work of centralities focuses on metrics of vertex importance
and methods for identifying powerful vertices, while related work for edges is
much lesser, especially for weighted networks, due to the computational
challenge. In this paper, we propose to use the well-known Kirchhoff index as
the measure of edge centrality in weighted networks, called -Kirchhoff
edge centrality. The Kirchhoff index of a network is defined as the sum of
effective resistances over all vertex pairs. The centrality of an edge is
reflected in the increase of Kirchhoff index of the network when the edge
is partially deactivated, characterized by a parameter . We define two
equivalent measures for -Kirchhoff edge centrality. Both are global
metrics and have a better discriminating power than commonly used measures,
based on local or partial structural information of networks, e.g. edge
betweenness and spanning edge centrality.
Despite the strong advantages of Kirchhoff index as a centrality measure and
its wide applications, computing the exact value of Kirchhoff edge centrality
for each edge in a graph is computationally demanding. To solve this problem,
for each of the -Kirchhoff edge centrality metrics, we present an
efficient algorithm to compute its -approximation for all the
edges in nearly linear time in . The proposed -Kirchhoff edge
centrality is the first global metric of edge importance that can be provably
approximated in nearly-linear time. Moreover, according to the
-Kirchhoff edge centrality, we present a -Kirchhoff vertex
centrality measure, as well as a fast algorithm that can compute
-approximate Kirchhoff vertex centrality for all the vertices in
nearly linear time in
Bidirectional PageRank Estimation: From Average-Case to Worst-Case
We present a new algorithm for estimating the Personalized PageRank (PPR)
between a source and target node on undirected graphs, with sublinear
running-time guarantees over the worst-case choice of source and target nodes.
Our work builds on a recent line of work on bidirectional estimators for PPR,
which obtained sublinear running-time guarantees but in an average-case sense,
for a uniformly random choice of target node. Crucially, we show how the
reversibility of random walks on undirected networks can be exploited to
convert average-case to worst-case guarantees. While past bidirectional methods
combine forward random walks with reverse local pushes, our algorithm combines
forward local pushes with reverse random walks. We also discuss how to modify
our methods to estimate random-walk probabilities for any length distribution,
thereby obtaining fast algorithms for estimating general graph diffusions,
including the heat kernel, on undirected networks.Comment: Workshop on Algorithms and Models for the Web-Graph (WAW) 201
Non-linear regression models for Approximate Bayesian Computation
Approximate Bayesian inference on the basis of summary statistics is
well-suited to complex problems for which the likelihood is either
mathematically or computationally intractable. However the methods that use
rejection suffer from the curse of dimensionality when the number of summary
statistics is increased. Here we propose a machine-learning approach to the
estimation of the posterior density by introducing two innovations. The new
method fits a nonlinear conditional heteroscedastic regression of the parameter
on the summary statistics, and then adaptively improves estimation using
importance sampling. The new algorithm is compared to the state-of-the-art
approximate Bayesian methods, and achieves considerable reduction of the
computational burden in two examples of inference in statistical genetics and
in a queueing model.Comment: 4 figures; version 3 minor changes; to appear in Statistics and
Computin
- …