36,797 research outputs found

    Sketch-based Influence Maximization and Computation: Scaling up with Guarantees

    Full text link
    Propagation of contagion through networks is a fundamental process. It is used to model the spread of information, influence, or a viral infection. Diffusion patterns can be specified by a probabilistic model, such as Independent Cascade (IC), or captured by a set of representative traces. Basic computational problems in the study of diffusion are influence queries (determining the potency of a specified seed set of nodes) and Influence Maximization (identifying the most influential seed set of a given size). Answering each influence query involves many edge traversals, and does not scale when there are many queries on very large graphs. The gold standard for Influence Maximization is the greedy algorithm, which iteratively adds to the seed set a node maximizing the marginal gain in influence. Greedy has a guaranteed approximation ratio of at least (1-1/e) and actually produces a sequence of nodes, with each prefix having approximation guarantee with respect to the same-size optimum. Since Greedy does not scale well beyond a few million edges, for larger inputs one must currently use either heuristics or alternative algorithms designed for a pre-specified small seed set size. We develop a novel sketch-based design for influence computation. Our greedy Sketch-based Influence Maximization (SKIM) algorithm scales to graphs with billions of edges, with one to two orders of magnitude speedup over the best greedy methods. It still has a guaranteed approximation ratio, and in practice its quality nearly matches that of exact greedy. We also present influence oracles, which use linear-time preprocessing to generate a small sketch for each node, allowing the influence of any seed set to be quickly answered from the sketches of its nodes.Comment: 10 pages, 5 figures. Appeared at the 23rd Conference on Information and Knowledge Management (CIKM 2014) in Shanghai, Chin

    Identifying influencers in a social network : the value of real referral data

    Get PDF
    Individuals influence each other through social interactions and marketers aim to leverage this interpersonal influence to attract new customers. It still remains a challenge to identify those customers in a social network that have the most influence on their social connections. A common approach to the influence maximization problem is to simulate influence cascades through the network based on the existence of links in the network using diffusion models. Our study contributes to the literature by evaluating these principles using real-life referral behaviour data. A new ranking metric, called Referral Rank, is introduced that builds on the game theoretic concept of the Shapley value for assigning each individual in the network a value that reflects the likelihood of referring new customers. We also explore whether these methods can be further improved by looking beyond the one-hop neighbourhood of the influencers. Experiments on a large telecommunication data set and referral data set demonstrate that using traditional simulation based methods to identify influencers in a social network can lead to suboptimal decisions as the results overestimate actual referral cascades. We also find that looking at the influence of the two-hop neighbours of the customers improves the influence spread and product adoption. Our findings suggest that companies can take two actions to improve their decision support system for identifying influential customers: (1) improve the data by incorporating data that reflects the actual referral behaviour of the customers or (2) extend the method by looking at the influence of the connections in the two-hop neighbourhood of the customers

    Sample Complexity Bounds for Influence Maximization

    Get PDF
    Influence maximization (IM) is the problem of finding for a given s ? 1 a set S of |S|=s nodes in a network with maximum influence. With stochastic diffusion models, the influence of a set S of seed nodes is defined as the expectation of its reachability over simulations, where each simulation specifies a deterministic reachability function. Two well-studied special cases are the Independent Cascade (IC) and the Linear Threshold (LT) models of Kempe, Kleinberg, and Tardos [Kempe et al., 2003]. The influence function in stochastic diffusion is unbiasedly estimated by averaging reachability values over i.i.d. simulations. We study the IM sample complexity: the number of simulations needed to determine a (1-?)-approximate maximizer with confidence 1-?. Our main result is a surprising upper bound of O(s ? ?^{-2} ln (n/?)) for a broad class of models that includes IC and LT models and their mixtures, where n is the number of nodes and ? is the number of diffusion steps. Generally ? ? n, so this significantly improves over the generic upper bound of O(s n ?^{-2} ln (n/?)). Our sample complexity bounds are derived from novel upper bounds on the variance of the reachability that allow for small relative error for influential sets and additive error when influence is small. Moreover, we provide a data-adaptive method that can detect and utilize fewer simulations on models where it suffices. Finally, we provide an efficient greedy design that computes an (1-1/e-?)-approximate maximizer from simulations and applies to any submodular stochastic diffusion model that satisfies the variance bounds
    • …
    corecore