4 research outputs found

    Sample Complexity Bounds for Influence Maximization

    Get PDF
    Influence maximization (IM) is the problem of finding for a given s ? 1 a set S of |S|=s nodes in a network with maximum influence. With stochastic diffusion models, the influence of a set S of seed nodes is defined as the expectation of its reachability over simulations, where each simulation specifies a deterministic reachability function. Two well-studied special cases are the Independent Cascade (IC) and the Linear Threshold (LT) models of Kempe, Kleinberg, and Tardos [Kempe et al., 2003]. The influence function in stochastic diffusion is unbiasedly estimated by averaging reachability values over i.i.d. simulations. We study the IM sample complexity: the number of simulations needed to determine a (1-?)-approximate maximizer with confidence 1-?. Our main result is a surprising upper bound of O(s ? ?^{-2} ln (n/?)) for a broad class of models that includes IC and LT models and their mixtures, where n is the number of nodes and ? is the number of diffusion steps. Generally ? ? n, so this significantly improves over the generic upper bound of O(s n ?^{-2} ln (n/?)). Our sample complexity bounds are derived from novel upper bounds on the variance of the reachability that allow for small relative error for influential sets and additive error when influence is small. Moreover, we provide a data-adaptive method that can detect and utilize fewer simulations on models where it suffices. Finally, we provide an efficient greedy design that computes an (1-1/e-?)-approximate maximizer from simulations and applies to any submodular stochastic diffusion model that satisfies the variance bounds

    Seeding with Differentially Private Network Information

    Full text link
    When designing interventions in public health, development, and education, decision makers rely on social network data to target a small number of people, capitalizing on peer effects and social contagion to bring about the most welfare benefits to the population. Developing new methods that are privacy-preserving for network data collection and targeted interventions is critical for designing sustainable public health and development interventions on social networks. In a similar vein, social media platforms rely on network data and information from past diffusions to organize their ad campaign and improve the efficacy of targeted advertising. Ensuring that these network operations do not violate users' privacy is critical to the sustainability of social media platforms and their ad economies. We study privacy guarantees for influence maximization algorithms when the social network is unknown, and the inputs are samples of prior influence cascades that are collected at random. Building on recent results that address seeding with costly network information, our privacy-preserving algorithms introduce randomization in the collected data or the algorithm output, and can bound each node's (or group of nodes') privacy loss in deciding whether or not their data should be included in the algorithm input. We provide theoretical guarantees of the seeding performance with a limited sample size subject to differential privacy budgets in both central and local privacy regimes. Simulations on synthetic and empirical network datasets reveal the diminishing value of network information with decreasing privacy budget in both regimes.Comment: Preliminary version in AAMAS 2023: https://dl.acm.org/doi/10.5555/3545946.3599081 -- Code and data: https://github.com/aminrahimian/dp-inf-ma

    Complexity, Algorithms, and Heuristics of Influence Maximization

    Full text link
    People often adopt improved behaviors, products, or ideas through the influence of friends. This is modeled by emph{cascades}. One way to spread such positive elements through society is to identify those most influential agents---those that cause the maximum spread, and initiate the spread by seeding them. However, this strategy has a key difficulty: finding these influential seed nodes. This is difficult even if both the network structure and the way the cascade spreads are known. In emph{the influence maximization problem}, a central planner is given a graph and a limited budget kk, and he needs to pick kk seeds such that the expected total number of infected vertices in the graph at the end of the cascade is maximized. This problem plays a central role in viral marketing, outbreak detection, rumor controls, etc. This thesis focuses on computational complexity, approximability and algorithm/heuristic design aspects of the influence maximization problem, with both emph{submodular} and emph{nonsubmodular} diffusion models. The first part of the thesis studies submodular influence maximization mainly in the computational complexity and algorithm analysis aspects, which includes some breakthroughs in understanding the approximability of submodular influence maximization and the theoretical performance of the well-studied greedy algorithm. The second part of the thesis focuses on nonsubmodular influence maximization. New sociologically founded nonsubmodular diffusion models are proposed, and we show how the seeding strategy for nonsubmodular diffusion models is fundamentally different compared to submodular diffusion models.PHDComputer Science & EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttps://deepblue.lib.umich.edu/bitstream/2027.42/155221/1/bstao_1.pd
    corecore