13,921 research outputs found
Importance Sketching of Influence Dynamics in Billion-scale Networks
The blooming availability of traces for social, biological, and communication
networks opens up unprecedented opportunities in analyzing diffusion processes
in networks. However, the sheer sizes of the nowadays networks raise serious
challenges in computational efficiency and scalability.
In this paper, we propose a new hyper-graph sketching framework for inflence
dynamics in networks. The central of our sketching framework, called SKIS, is
an efficient importance sampling algorithm that returns only non-singular
reverse cascades in the network. Comparing to previously developed sketches
like RIS and SKIM, our sketch significantly enhances estimation quality while
substantially reducing processing time and memory-footprint. Further, we
present general strategies of using SKIS to enhance existing algorithms for
influence estimation and influence maximization which are motivated by
practical applications like viral marketing. Using SKIS, we design high-quality
influence oracle for seed sets with average estimation error up to 10x times
smaller than those using RIS and 6x times smaller than SKIM. In addition, our
influence maximization using SKIS substantially improves the quality of
solutions for greedy algorithms. It achieves up to 10x times speed-up and 4x
memory reduction for the fastest RIS-based DSSA algorithm, while maintaining
the same theoretical guarantees.Comment: 12 pages, to appear in ICDM 2017 as a regular pape
From Micro to Macro: Uncovering and Predicting Information Cascading Process with Behavioral Dynamics
Cascades are ubiquitous in various network environments. How to predict these
cascades is highly nontrivial in several vital applications, such as viral
marketing, epidemic prevention and traffic management. Most previous works
mainly focus on predicting the final cascade sizes. As cascades are typical
dynamic processes, it is always interesting and important to predict the
cascade size at any time, or predict the time when a cascade will reach a
certain size (e.g. an threshold for outbreak). In this paper, we unify all
these tasks into a fundamental problem: cascading process prediction. That is,
given the early stage of a cascade, how to predict its cumulative cascade size
of any later time? For such a challenging problem, how to understand the micro
mechanism that drives and generates the macro phenomenons (i.e. cascading
proceese) is essential. Here we introduce behavioral dynamics as the micro
mechanism to describe the dynamic process of a node's neighbors get infected by
a cascade after this node get infected (i.e. one-hop subcascades). Through
data-driven analysis, we find out the common principles and patterns lying in
behavioral dynamics and propose a novel Networked Weibull Regression model for
behavioral dynamics modeling. After that we propose a novel method for
predicting cascading processes by effectively aggregating behavioral dynamics,
and propose a scalable solution to approximate the cascading process with a
theoretical guarantee. We extensively evaluate the proposed method on a large
scale social network dataset. The results demonstrate that the proposed method
can significantly outperform other state-of-the-art baselines in multiple tasks
including cascade size prediction, outbreak time prediction and cascading
process prediction.Comment: 10 pages, 11 figure
Sketch-based Influence Maximization and Computation: Scaling up with Guarantees
Propagation of contagion through networks is a fundamental process. It is
used to model the spread of information, influence, or a viral infection.
Diffusion patterns can be specified by a probabilistic model, such as
Independent Cascade (IC), or captured by a set of representative traces.
Basic computational problems in the study of diffusion are influence queries
(determining the potency of a specified seed set of nodes) and Influence
Maximization (identifying the most influential seed set of a given size).
Answering each influence query involves many edge traversals, and does not
scale when there are many queries on very large graphs. The gold standard for
Influence Maximization is the greedy algorithm, which iteratively adds to the
seed set a node maximizing the marginal gain in influence. Greedy has a
guaranteed approximation ratio of at least (1-1/e) and actually produces a
sequence of nodes, with each prefix having approximation guarantee with respect
to the same-size optimum. Since Greedy does not scale well beyond a few million
edges, for larger inputs one must currently use either heuristics or
alternative algorithms designed for a pre-specified small seed set size.
We develop a novel sketch-based design for influence computation. Our greedy
Sketch-based Influence Maximization (SKIM) algorithm scales to graphs with
billions of edges, with one to two orders of magnitude speedup over the best
greedy methods. It still has a guaranteed approximation ratio, and in practice
its quality nearly matches that of exact greedy. We also present influence
oracles, which use linear-time preprocessing to generate a small sketch for
each node, allowing the influence of any seed set to be quickly answered from
the sketches of its nodes.Comment: 10 pages, 5 figures. Appeared at the 23rd Conference on Information
and Knowledge Management (CIKM 2014) in Shanghai, Chin
- …