12 research outputs found
Overexposure-aware influence maximization
Viral marketing campaigns are often negatively affected by overexposure. Overexposure occurs when users become less likely to favor a promoted product, after receiving information about the product from too large a fraction of their friends. Yet, existing influence diffusion models do not take overexposure into account, effectively overestimating the number of users who favor the product and diffuse information about it. In this work, we propose the first influence diffusion model that captures overexposure. In our model, LAICO (Latency Aware Independent Cascade Model with Overexposure), the activation probability of a node representing a user is multiplied (discounted) by an overexposure score, which is calculated based on the ratio between the estimated and the maximum possible number of attempts performed to activate the node. We also study the influence maximization problem under LAICO. Since the spread function in LAICO is non-submodular, algorithms for submodular maximization are not appropriate to address the problem. Therefore, we develop an approximation algorithm which exploits monotone submodular upper and lower bound functions of spread, and a heuristic which aims to maximize a proxy function of spread iteratively. Our experiments show the effectiveness and efficiency of our algorithms
Heavy Nodes in a Small Neighborhood: Algorithms and applications
We introduce a weighted and unconstrained variant of the well-known minimum k union problem: Given a bipartite graph (U,V, E) with weights for all nodes in V, find a set S \xe2\x8a\x86 V such that the ratio between the total weight of the nodes in S and the number of their distinct incident nodes in U is maximized. Our problem, which we term Heavy Nodes in a Small Neighborhood (HNSN), finds applications in marketing, team formation, and money laundering detection. For example, in the latter application, S represents bank account holders who obtain illicit money from some peers of a criminal and route it through their accounts to a target account belonging to the criminal. We prove that HNSN can be solved exactly in polynomial time via linear programming. As the size of can be very large in practice, we also develop a near linear-time greedy heuristic. In addition, we formalize a money laundering scenario involving multiple target accounts and show how our algorithms can be extended to deal with it. Our experiments on real and synthetic datasets show that our algorithms find optimal or near-optimal solutions, outperforming a natural baseline, and that they can detect money laundering much more effectively and efficiently than a state-of-the-art method
Discovering episodes with compact minimal windows
Abstract: Discovering the most interesting patterns is the key problem in the field of pattern mining. While ranking or selecting patterns is well-studied for itemsets it is surprisingly under-researched for other, more complex, pattern types. In this paper we propose a new quality measure for episodes. An episode is essentially a set of events with possible restrictions on the order of events. We say that an episode is significant if its occurrence is abnormally compact, that is, only few gap events occur between the actual episode events, when compared to the expected length according to the independence model. We can apply this measure as a post-pruning step by first discovering frequent episodes and then rank them according to this measure. In order to compute the score we will need to compute the mean and the variance according to the independence model. As a main technical contribution we introduce a technique that allows us to compute these values. Such a task is surprisingly complex and in order to solve it we develop intricate finite state machines that allow us to compute the needed statistics. We also show that asymptotically our score can be interpreted as a value. In our experiments we demonstrate that despite its intricacy our ranking is fast: we can rank tens of thousands episodes in seconds. Our experiments with text data demonstrate that our measure ranks interpretable episodes high