66 research outputs found

    Scaling up Group Closeness Maximization

    Get PDF
    Closeness is a widely-used centrality measure in social network analysis. For a node it indicates the inverse average shortest-path distance to the other nodes of the network. While the identification of the k nodes with highest closeness received significant attention, many applications are actually interested in finding a group of nodes that is central as a whole. For this problem, only recently a greedy algorithm with approximation ratio (1−1/e) has been proposed [Chen et al., ADC 2016]. Since this algorithm’s running time is still expensive for large networks, a heuristic without approximation guarantee has also been proposed in the same paper. In the present paper we develop new techniques to speed up the greedy algorithm without losing its theoretical guarantee. Compared to a straightforward implementation, our approach is orders of magnitude faster and, compared to the heuristic proposed by Chen et al., we always find a solution with better quality in a comparable running time in our experiments. Our method Greedy++ allows us to approximate the group with maximum closeness on networks with up to hundreds of millions of edges in minutes or at most a few hours. To have the same theoretical guarantee, the greedy approach by [Chen et al., ADC 2016] would take several days already on networks with hundreds of thousands of edges. In a comparison with the optimum, our experiments show that the solution found by Greedy++ is actually much better than the theoretical guarantee. Over all tested networks, the empirical approximation ratio is never lower than 0.97. Finally, we study for the first time the correlation between the top-k nodes with highest closeness and an approximation of the most central group in large complex networks and show that the overlap between the two is relatively small

    Influential Billboard Slot Selection using Pruned Submodularity Graph

    Full text link
    Billboard Advertisement has emerged as an effective out-of-home advertisement technique and adopted by many commercial houses. In this case, the billboards are owned by some companies and they are provided to the commercial houses slot\mbox{-}wise on a payment basis. Now, given the database of billboards along with their slot information which kk slots should be chosen to maximize the influence. Formally, we call this problem as the \textsc{Influential Billboard Slot Selection} Problem. In this paper, we pose this problem as a combinatorial optimization problem. Under the `triggering model of influence', the influence function is non-negative, monotone, and submodular. However, as the incremental greedy approach for submodular function maximization does not scale well along with the size of the problem instances, there is a need to develop efficient solution methodologies for this problem.Comment: 15 Pages, 6 Figure

    Influence Maximization Meets Efficiency and Effectiveness: A Hop-Based Approach

    Full text link
    Influence Maximization is an extensively-studied problem that targets at selecting a set of initial seed nodes in the Online Social Networks (OSNs) to spread the influence as widely as possible. However, it remains an open challenge to design fast and accurate algorithms to find solutions in large-scale OSNs. Prior Monte-Carlo-simulation-based methods are slow and not scalable, while other heuristic algorithms do not have any theoretical guarantee and they have been shown to produce poor solutions for quite some cases. In this paper, we propose hop-based algorithms that can easily scale to millions of nodes and billions of edges. Unlike previous heuristics, our proposed hop-based approaches can provide certain theoretical guarantees. Experimental evaluations with real OSN datasets demonstrate the efficiency and effectiveness of our algorithms.Comment: Extended version of the conference paper at ASONAM 2017, 11 page

    Sample Complexity Bounds for Influence Maximization

    Get PDF
    Influence maximization (IM) is the problem of finding for a given s ? 1 a set S of |S|=s nodes in a network with maximum influence. With stochastic diffusion models, the influence of a set S of seed nodes is defined as the expectation of its reachability over simulations, where each simulation specifies a deterministic reachability function. Two well-studied special cases are the Independent Cascade (IC) and the Linear Threshold (LT) models of Kempe, Kleinberg, and Tardos [Kempe et al., 2003]. The influence function in stochastic diffusion is unbiasedly estimated by averaging reachability values over i.i.d. simulations. We study the IM sample complexity: the number of simulations needed to determine a (1-?)-approximate maximizer with confidence 1-?. Our main result is a surprising upper bound of O(s ? ?^{-2} ln (n/?)) for a broad class of models that includes IC and LT models and their mixtures, where n is the number of nodes and ? is the number of diffusion steps. Generally ? ? n, so this significantly improves over the generic upper bound of O(s n ?^{-2} ln (n/?)). Our sample complexity bounds are derived from novel upper bounds on the variance of the reachability that allow for small relative error for influential sets and additive error when influence is small. Moreover, we provide a data-adaptive method that can detect and utilize fewer simulations on models where it suffices. Finally, we provide an efficient greedy design that computes an (1-1/e-?)-approximate maximizer from simulations and applies to any submodular stochastic diffusion model that satisfies the variance bounds

    The Solution Distribution of Influence Maximization: A High-level Experimental Study on Three Algorithmic Approaches

    Full text link
    Influence maximization is among the most fundamental algorithmic problems in social influence analysis. Over the last decade, a great effort has been devoted to developing efficient algorithms for influence maximization, so that identifying the ``best'' algorithm has become a demanding task. In SIGMOD'17, Arora, Galhotra, and Ranu reported benchmark results on eleven existing algorithms and demonstrated that there is no single state-of-the-art offering the best trade-off between computational efficiency and solution quality. In this paper, we report a high-level experimental study on three well-established algorithmic approaches for influence maximization, referred to as Oneshot, Snapshot, and Reverse Influence Sampling (RIS). Different from Arora et al., our experimental methodology is so designed that we examine the distribution of random solutions, characterize the relation between the sample number and the actual solution quality, and avoid implementation dependencies. Our main findings are as follows: 1. For a sufficiently large sample number, we obtain a unique solution regardless of algorithms. 2. The average solution quality of Oneshot, Snapshot, and RIS improves at the same rate up to scaling of sample number. 3. Oneshot requires more samples than Snapshot, and Snapshot requires fewer but larger samples than RIS. We discuss the time efficiency when conditioning Oneshot, Snapshot, and RIS to be of identical accuracy. Our conclusion is that Oneshot is suitable only if the size of available memory is limited, and RIS is more efficient than Snapshot for large networks; Snapshot is preferable for small, low-probability networks.Comment: To appear in SIGMOD 202

    Holistic Influence Maximization: Combining Scalability and Efficiency with Opinion-Aware Models

    Full text link
    The steady growth of graph data from social networks has resulted in wide-spread research in finding solutions to the influence maximization problem. In this paper, we propose a holistic solution to the influence maximization (IM) problem. (1) We introduce an opinion-cum-interaction (OI) model that closely mirrors the real-world scenarios. Under the OI model, we introduce a novel problem of Maximizing the Effective Opinion (MEO) of influenced users. We prove that the MEO problem is NP-hard and cannot be approximated within a constant ratio unless P=NP. (2) We propose a heuristic algorithm OSIM to efficiently solve the MEO problem. To better explain the OSIM heuristic, we first introduce EaSyIM - the opinion-oblivious version of OSIM, a scalable algorithm capable of running within practical compute times on commodity hardware. In addition to serving as a fundamental building block for OSIM, EaSyIM is capable of addressing the scalability aspect - memory consumption and running time, of the IM problem as well. Empirically, our algorithms are capable of maintaining the deviation in the spread always within 5% of the best known methods in the literature. In addition, our experiments show that both OSIM and EaSyIM are effective, efficient, scalable and significantly enhance the ability to analyze real datasets.Comment: ACM SIGMOD Conference 2016, 18 pages, 29 figure
    • …
    corecore