Propagation of contagion through networks is a fundamental process. It is
used to model the spread of information, influence, or a viral infection.
Diffusion patterns can be specified by a probabilistic model, such as
Independent Cascade (IC), or captured by a set of representative traces.
Basic computational problems in the study of diffusion are influence queries
(determining the potency of a specified seed set of nodes) and Influence
Maximization (identifying the most influential seed set of a given size).
Answering each influence query involves many edge traversals, and does not
scale when there are many queries on very large graphs. The gold standard for
Influence Maximization is the greedy algorithm, which iteratively adds to the
seed set a node maximizing the marginal gain in influence. Greedy has a
guaranteed approximation ratio of at least (1-1/e) and actually produces a
sequence of nodes, with each prefix having approximation guarantee with respect
to the same-size optimum. Since Greedy does not scale well beyond a few million
edges, for larger inputs one must currently use either heuristics or
alternative algorithms designed for a pre-specified small seed set size.
We develop a novel sketch-based design for influence computation. Our greedy
Sketch-based Influence Maximization (SKIM) algorithm scales to graphs with
billions of edges, with one to two orders of magnitude speedup over the best
greedy methods. It still has a guaranteed approximation ratio, and in practice
its quality nearly matches that of exact greedy. We also present influence
oracles, which use linear-time preprocessing to generate a small sketch for
each node, allowing the influence of any seed set to be quickly answered from
the sketches of its nodes.Comment: 10 pages, 5 figures. Appeared at the 23rd Conference on Information
and Knowledge Management (CIKM 2014) in Shanghai, Chin