75,053 research outputs found
Mining Density Contrast Subgraphs
Dense subgraph discovery is a key primitive in many graph mining
applications, such as detecting communities in social networks and mining gene
correlation from biological data. Most studies on dense subgraph mining only
deal with one graph. However, in many applications, we have more than one graph
describing relations among a same group of entities. In this paper, given two
graphs sharing the same set of vertices, we investigate the problem of
detecting subgraphs that contrast the most with respect to density. We call
such subgraphs Density Contrast Subgraphs, or DCS in short. Two widely used
graph density measures, average degree and graph affinity, are considered. For
both density measures, mining DCS is equivalent to mining the densest subgraph
from a "difference" graph, which may have both positive and negative edge
weights. Due to the existence of negative edge weights, existing dense subgraph
detection algorithms cannot identify the subgraph we need. We prove the
computational hardness of mining DCS under the two graph density measures and
develop efficient algorithms to find DCS. We also conduct extensive experiments
on several real-world datasets to evaluate our algorithms. The experimental
results show that our algorithms are both effective and efficient.Comment: Full version of an ICDE'18 pape
Detecting the community structure and activity patterns of temporal networks: a non-negative tensor factorization approach
The increasing availability of temporal network data is calling for more
research on extracting and characterizing mesoscopic structures in temporal
networks and on relating such structure to specific functions or properties of
the system. An outstanding challenge is the extension of the results achieved
for static networks to time-varying networks, where the topological structure
of the system and the temporal activity patterns of its components are
intertwined. Here we investigate the use of a latent factor decomposition
technique, non-negative tensor factorization, to extract the community-activity
structure of temporal networks. The method is intrinsically temporal and allows
to simultaneously identify communities and to track their activity over time.
We represent the time-varying adjacency matrix of a temporal network as a
three-way tensor and approximate this tensor as a sum of terms that can be
interpreted as communities of nodes with an associated activity time series. We
summarize known computational techniques for tensor decomposition and discuss
some quality metrics that can be used to tune the complexity of the factorized
representation. We subsequently apply tensor factorization to a temporal
network for which a ground truth is available for both the community structure
and the temporal activity patterns. The data we use describe the social
interactions of students in a school, the associations between students and
school classes, and the spatio-temporal trajectories of students over time. We
show that non-negative tensor factorization is capable of recovering the class
structure with high accuracy. In particular, the extracted tensor components
can be validated either as known school classes, or in terms of correlated
activity patterns, i.e., of spatial and temporal coincidences that are
determined by the known school activity schedule
A framework for community detection in heterogeneous multi-relational networks
There has been a surge of interest in community detection in homogeneous
single-relational networks which contain only one type of nodes and edges.
However, many real-world systems are naturally described as heterogeneous
multi-relational networks which contain multiple types of nodes and edges. In
this paper, we propose a new method for detecting communities in such networks.
Our method is based on optimizing the composite modularity, which is a new
modularity proposed for evaluating partitions of a heterogeneous
multi-relational network into communities. Our method is parameter-free,
scalable, and suitable for various networks with general structure. We
demonstrate that it outperforms the state-of-the-art techniques in detecting
pre-planted communities in synthetic networks. Applied to a real-world Digg
network, it successfully detects meaningful communities.Comment: 27 pages, 10 figure
A Network Topology Approach to Bot Classification
Automated social agents, or bots, are increasingly becoming a problem on
social media platforms. There is a growing body of literature and multiple
tools to aid in the detection of such agents on online social networking
platforms. We propose that the social network topology of a user would be
sufficient to determine whether the user is a automated agent or a human. To
test this, we use a publicly available dataset containing users on Twitter
labelled as either automated social agent or human. Using an unsupervised
machine learning approach, we obtain a detection accuracy rate of 70%
Metric projection for dynamic multiplex networks
Evolving multiplex networks are a powerful model for representing the
dynamics along time of different phenomena, such as social networks, power
grids, biological pathways. However, exploring the structure of the multiplex
network time series is still an open problem. Here we propose a two-steps
strategy to tackle this problem based on the concept of distance (metric)
between networks. Given a multiplex graph, first a network of networks is built
for each time steps, and then a real valued time series is obtained by the
sequence of (simple) networks by evaluating the distance from the first element
of the series. The effectiveness of this approach in detecting the occurring
changes along the original time series is shown on a synthetic example first,
and then on the Gulf dataset of political events
Online Human-Bot Interactions: Detection, Estimation, and Characterization
Increasing evidence suggests that a growing amount of social media content is
generated by autonomous entities known as social bots. In this work we present
a framework to detect such entities on Twitter. We leverage more than a
thousand features extracted from public data and meta-data about users:
friends, tweet content and sentiment, network patterns, and activity time
series. We benchmark the classification framework by using a publicly available
dataset of Twitter bots. This training data is enriched by a manually annotated
collection of active Twitter users that include both humans and bots of varying
sophistication. Our models yield high accuracy and agreement with each other
and can detect bots of different nature. Our estimates suggest that between 9%
and 15% of active Twitter accounts are bots. Characterizing ties among
accounts, we observe that simple bots tend to interact with bots that exhibit
more human-like behaviors. Analysis of content flows reveals retweet and
mention strategies adopted by bots to interact with different target groups.
Using clustering analysis, we characterize several subclasses of accounts,
including spammers, self promoters, and accounts that post content from
connected applications.Comment: Accepted paper for ICWSM'17, 10 pages, 8 figures, 1 tabl
- …