16 research outputs found
Cascade-based community detection in complex networks
Σημείωση: διατίθεται συμπληρωματικό υλικό σε ξεχωριστό αρχείο
When Less is More: Systematic Analysis of Cascade-based Community Detection
Information diffusion, spreading of infectious diseases, and spreading of
rumors are fundamental processes occurring in real-life networks. In many
practical cases, one can observe when nodes become infected, but the underlying
network, over which a contagion or information propagates, is hidden. Inferring
properties of the underlying network is important since these properties can be
used for constraining infections, forecasting, viral marketing, etc. Moreover,
for many applications, it is sufficient to recover only coarse high-level
properties of this network rather than all its edges. In this paper, we conduct
a systematic and extensive analysis of the following problem: given only the
infection times, find communities of highly interconnected nodes. We carry out
a thorough comparison between existing and new approaches on several large
datasets and cover methodological challenges that are specific to this problem.
One of the main conclusions is that the most stable performance and the most
significant improvement on the current state-of-the-art are achieved by our
proposed simple heuristic approaches that are agnostic to a particular graph
structure and epidemic model. We also show that some well-known community
detection algorithms can be enhanced by including edge weights based on the
cascade data
Flow-based Influence Graph Visual Summarization
Visually mining a large influence graph is appealing yet challenging. People
are amazed by pictures of newscasting graph on Twitter, engaged by hidden
citation networks in academics, nevertheless often troubled by the unpleasant
readability of the underlying visualization. Existing summarization methods
enhance the graph visualization with blocked views, but have adverse effect on
the latent influence structure. How can we visually summarize a large graph to
maximize influence flows? In particular, how can we illustrate the impact of an
individual node through the summarization? Can we maintain the appealing graph
metaphor while preserving both the overall influence pattern and fine
readability?
To answer these questions, we first formally define the influence graph
summarization problem. Second, we propose an end-to-end framework to solve the
new problem. Our method can not only highlight the flow-based influence
patterns in the visual summarization, but also inherently support rich graph
attributes. Last, we present a theoretic analysis and report our experiment
results. Both evidences demonstrate that our framework can effectively
approximate the proposed influence graph summarization objective while
outperforming previous methods in a typical scenario of visually mining
academic citation networks.Comment: to appear in IEEE International Conference on Data Mining (ICDM),
Shen Zhen, China, December 201
Community Detection in Networks with Node Attributes
Community detection algorithms are fundamental tools that allow us to uncover
organizational principles in networks. When detecting communities, there are
two possible sources of information one can use: the network structure, and the
features and attributes of nodes. Even though communities form around nodes
that have common edges and common attributes, typically, algorithms have only
focused on one of these two data modalities: community detection algorithms
traditionally focus only on the network structure, while clustering algorithms
mostly consider only node attributes. In this paper, we develop Communities
from Edge Structure and Node Attributes (CESNA), an accurate and scalable
algorithm for detecting overlapping communities in networks with node
attributes. CESNA statistically models the interaction between the network
structure and the node attributes, which leads to more accurate community
detection as well as improved robustness in the presence of noise in the
network structure. CESNA has a linear runtime in the network size and is able
to process networks an order of magnitude larger than comparable approaches.
Last, CESNA also helps with the interpretation of detected communities by
finding relevant node attributes for each community.Comment: Published in the proceedings of IEEE ICDM '1
Validating Network Value of Influencers by means of Explanations
Recently, there has been significant interest in social influence analysis.
One of the central problems in this area is the problem of identifying
influencers, such that by convincing these users to perform a certain action
(like buying a new product), a large number of other users get influenced to
follow the action. The client of such an application is a marketer who would
target these influencers for marketing a given new product, say by providing
free samples or discounts. It is natural that before committing resources for
targeting an influencer the marketer would be interested in validating the
influence (or network value) of influencers returned. This requires digging
deeper into such analytical questions as: who are their followers, on what
actions (or products) they are influential, etc. However, the current
approaches to identifying influencers largely work as a black box in this
respect. The goal of this paper is to open up the black box, address these
questions and provide informative and crisp explanations for validating the
network value of influencers.
We formulate the problem of providing explanations (called PROXI) as a
discrete optimization problem of feature selection. We show that PROXI is not
only NP-hard to solve exactly, it is NP-hard to approximate within any
reasonable factor. Nevertheless, we show interesting properties of the
objective function and develop an intuitive greedy heuristic. We perform
detailed experimental analysis on two real world datasets - Twitter and
Flixster, and show that our approach is useful in generating concise and
insightful explanations of the influence distribution of users and that our
greedy algorithm is effective and efficient with respect to several baselines
Holistic Influence Maximization: Combining Scalability and Efficiency with Opinion-Aware Models
The steady growth of graph data from social networks has resulted in
wide-spread research in finding solutions to the influence maximization
problem. In this paper, we propose a holistic solution to the influence
maximization (IM) problem. (1) We introduce an opinion-cum-interaction (OI)
model that closely mirrors the real-world scenarios. Under the OI model, we
introduce a novel problem of Maximizing the Effective Opinion (MEO) of
influenced users. We prove that the MEO problem is NP-hard and cannot be
approximated within a constant ratio unless P=NP. (2) We propose a heuristic
algorithm OSIM to efficiently solve the MEO problem. To better explain the OSIM
heuristic, we first introduce EaSyIM - the opinion-oblivious version of OSIM, a
scalable algorithm capable of running within practical compute times on
commodity hardware. In addition to serving as a fundamental building block for
OSIM, EaSyIM is capable of addressing the scalability aspect - memory
consumption and running time, of the IM problem as well.
Empirically, our algorithms are capable of maintaining the deviation in the
spread always within 5% of the best known methods in the literature. In
addition, our experiments show that both OSIM and EaSyIM are effective,
efficient, scalable and significantly enhance the ability to analyze real
datasets.Comment: ACM SIGMOD Conference 2016, 18 pages, 29 figure
Almost Exact Recovery in Gossip Opinion Dynamics over Stochastic Block Models
We study community detection based on state observations from gossip opinion
dynamics over stochastic block models (SBM). It is assumed that a network is
generated from a two-community SBM where each agent has a community label and
each edge exists with probability depending on its endpoints' labels. A gossip
process then evolves over the sampled network. We propose two algorithms to
detect the communities out of a single trajectory of the process. It is shown
that, when the influence of stubborn agents is small and the link probability
within communities is large, an algorithm based on clustering transient agent
states can achieve almost exact recovery of the communities. That is, the
algorithm can recover all but a vanishing part of community labels with high
probability. In contrast, when the influence of stubborn agents is large,
another algorithm based on clustering time average of agent states can achieve
almost exact recovery. Numerical experiments are given for illustration of the
two algorithms and the theoretical results of the paper