66 research outputs found
Scaling up Group Closeness Maximization
Closeness is a widely-used centrality measure in social network analysis. For a node it indicates the inverse average shortest-path distance to the other nodes of the network. While the identification of the k nodes with highest closeness received significant attention, many applications are actually interested in finding a group of nodes that is central as a whole. For this problem, only recently a greedy algorithm with approximation ratio (1−1/e) has been proposed [Chen et al., ADC 2016]. Since this algorithm’s running time is still expensive for large networks, a heuristic without approximation guarantee has also been proposed in the same paper.
In the present paper we develop new techniques to speed up the greedy algorithm without losing its theoretical guarantee. Compared to a straightforward implementation, our approach is orders of magnitude faster and, compared to the heuristic proposed by Chen et al., we always find a solution with better quality in a comparable running time in our experiments.
Our method Greedy++ allows us to approximate the group with maximum closeness on networks with up to hundreds of millions of edges in minutes or at most a few hours. To have the same theoretical guarantee, the greedy approach by [Chen et al., ADC 2016] would take several days already on networks with hundreds of thousands of edges.
In a comparison with the optimum, our experiments show that the solution found by Greedy++ is actually much better than the theoretical guarantee. Over all tested networks, the empirical approximation ratio is never lower than 0.97.
Finally, we study for the first time the correlation between the top-k nodes with highest closeness and an approximation of the most central group in large complex networks and show that the overlap between the two is relatively small
Influential Billboard Slot Selection using Pruned Submodularity Graph
Billboard Advertisement has emerged as an effective out-of-home advertisement
technique and adopted by many commercial houses. In this case, the billboards
are owned by some companies and they are provided to the commercial houses
slot\mbox{-}wise on a payment basis. Now, given the database of billboards
along with their slot information which slots should be chosen to maximize
the influence. Formally, we call this problem as the \textsc{Influential
Billboard Slot Selection} Problem. In this paper, we pose this problem as a
combinatorial optimization problem. Under the `triggering model of influence',
the influence function is non-negative, monotone, and submodular. However, as
the incremental greedy approach for submodular function maximization does not
scale well along with the size of the problem instances, there is a need to
develop efficient solution methodologies for this problem.Comment: 15 Pages, 6 Figure
Influence Maximization Meets Efficiency and Effectiveness: A Hop-Based Approach
Influence Maximization is an extensively-studied problem that targets at
selecting a set of initial seed nodes in the Online Social Networks (OSNs) to
spread the influence as widely as possible. However, it remains an open
challenge to design fast and accurate algorithms to find solutions in
large-scale OSNs. Prior Monte-Carlo-simulation-based methods are slow and not
scalable, while other heuristic algorithms do not have any theoretical
guarantee and they have been shown to produce poor solutions for quite some
cases. In this paper, we propose hop-based algorithms that can easily scale to
millions of nodes and billions of edges. Unlike previous heuristics, our
proposed hop-based approaches can provide certain theoretical guarantees.
Experimental evaluations with real OSN datasets demonstrate the efficiency and
effectiveness of our algorithms.Comment: Extended version of the conference paper at ASONAM 2017, 11 page
Sample Complexity Bounds for Influence Maximization
Influence maximization (IM) is the problem of finding for a given s ? 1 a set S of |S|=s nodes in a network with maximum influence. With stochastic diffusion models, the influence of a set S of seed nodes is defined as the expectation of its reachability over simulations, where each simulation specifies a deterministic reachability function. Two well-studied special cases are the Independent Cascade (IC) and the Linear Threshold (LT) models of Kempe, Kleinberg, and Tardos [Kempe et al., 2003]. The influence function in stochastic diffusion is unbiasedly estimated by averaging reachability values over i.i.d. simulations. We study the IM sample complexity: the number of simulations needed to determine a (1-?)-approximate maximizer with confidence 1-?. Our main result is a surprising upper bound of O(s ? ?^{-2} ln (n/?)) for a broad class of models that includes IC and LT models and their mixtures, where n is the number of nodes and ? is the number of diffusion steps. Generally ? ? n, so this significantly improves over the generic upper bound of O(s n ?^{-2} ln (n/?)). Our sample complexity bounds are derived from novel upper bounds on the variance of the reachability that allow for small relative error for influential sets and additive error when influence is small. Moreover, we provide a data-adaptive method that can detect and utilize fewer simulations on models where it suffices. Finally, we provide an efficient greedy design that computes an (1-1/e-?)-approximate maximizer from simulations and applies to any submodular stochastic diffusion model that satisfies the variance bounds
The Solution Distribution of Influence Maximization: A High-level Experimental Study on Three Algorithmic Approaches
Influence maximization is among the most fundamental algorithmic problems in
social influence analysis. Over the last decade, a great effort has been
devoted to developing efficient algorithms for influence maximization, so that
identifying the ``best'' algorithm has become a demanding task. In SIGMOD'17,
Arora, Galhotra, and Ranu reported benchmark results on eleven existing
algorithms and demonstrated that there is no single state-of-the-art offering
the best trade-off between computational efficiency and solution quality.
In this paper, we report a high-level experimental study on three
well-established algorithmic approaches for influence maximization, referred to
as Oneshot, Snapshot, and Reverse Influence Sampling (RIS). Different from
Arora et al., our experimental methodology is so designed that we examine the
distribution of random solutions, characterize the relation between the sample
number and the actual solution quality, and avoid implementation dependencies.
Our main findings are as follows: 1. For a sufficiently large sample number, we
obtain a unique solution regardless of algorithms. 2. The average solution
quality of Oneshot, Snapshot, and RIS improves at the same rate up to scaling
of sample number. 3. Oneshot requires more samples than Snapshot, and Snapshot
requires fewer but larger samples than RIS. We discuss the time efficiency when
conditioning Oneshot, Snapshot, and RIS to be of identical accuracy. Our
conclusion is that Oneshot is suitable only if the size of available memory is
limited, and RIS is more efficient than Snapshot for large networks; Snapshot
is preferable for small, low-probability networks.Comment: To appear in SIGMOD 202
Holistic Influence Maximization: Combining Scalability and Efficiency with Opinion-Aware Models
The steady growth of graph data from social networks has resulted in
wide-spread research in finding solutions to the influence maximization
problem. In this paper, we propose a holistic solution to the influence
maximization (IM) problem. (1) We introduce an opinion-cum-interaction (OI)
model that closely mirrors the real-world scenarios. Under the OI model, we
introduce a novel problem of Maximizing the Effective Opinion (MEO) of
influenced users. We prove that the MEO problem is NP-hard and cannot be
approximated within a constant ratio unless P=NP. (2) We propose a heuristic
algorithm OSIM to efficiently solve the MEO problem. To better explain the OSIM
heuristic, we first introduce EaSyIM - the opinion-oblivious version of OSIM, a
scalable algorithm capable of running within practical compute times on
commodity hardware. In addition to serving as a fundamental building block for
OSIM, EaSyIM is capable of addressing the scalability aspect - memory
consumption and running time, of the IM problem as well.
Empirically, our algorithms are capable of maintaining the deviation in the
spread always within 5% of the best known methods in the literature. In
addition, our experiments show that both OSIM and EaSyIM are effective,
efficient, scalable and significantly enhance the ability to analyze real
datasets.Comment: ACM SIGMOD Conference 2016, 18 pages, 29 figure
- …