24,036 research outputs found
Clustering and Community Detection in Directed Networks: A Survey
Networks (or graphs) appear as dominant structures in diverse domains,
including sociology, biology, neuroscience and computer science. In most of the
aforementioned cases graphs are directed - in the sense that there is
directionality on the edges, making the semantics of the edges non symmetric.
An interesting feature that real networks present is the clustering or
community structure property, under which the graph topology is organized into
modules commonly called communities or clusters. The essence here is that nodes
of the same community are highly similar while on the contrary, nodes across
communities present low similarity. Revealing the underlying community
structure of directed complex networks has become a crucial and
interdisciplinary topic with a plethora of applications. Therefore, naturally
there is a recent wealth of research production in the area of mining directed
graphs - with clustering being the primary method and tool for community
detection and evaluation. The goal of this paper is to offer an in-depth review
of the methods presented so far for clustering directed networks along with the
relevant necessary methodological background and also related applications. The
survey commences by offering a concise review of the fundamental concepts and
methodological base on which graph clustering algorithms capitalize on. Then we
present the relevant work along two orthogonal classifications. The first one
is mostly concerned with the methodological principles of the clustering
algorithms, while the second one approaches the methods from the viewpoint
regarding the properties of a good cluster in a directed network. Further, we
present methods and metrics for evaluating graph clustering results,
demonstrate interesting application domains and provide promising future
research directions.Comment: 86 pages, 17 figures. Physics Reports Journal (To Appear
Characterization of complex networks: A survey of measurements
Each complex network (or class of networks) presents specific topological
features which characterize its connectivity and highly influence the dynamics
of processes executed on the network. The analysis, discrimination, and
synthesis of complex networks therefore rely on the use of measurements capable
of expressing the most relevant topological features. This article presents a
survey of such measurements. It includes general considerations about complex
network characterization, a brief review of the principal models, and the
presentation of the main existing measurements. Important related issues
covered in this work comprise the representation of the evolution of complex
networks in terms of trajectories in several measurement spaces, the analysis
of the correlations between some of the most traditional measurements,
perturbation analysis, as well as the use of multivariate statistics for
feature selection and network classification. Depending on the network and the
analysis task one has in mind, a specific set of features may be chosen. It is
hoped that the present survey will help the proper application and
interpretation of measurements.Comment: A working manuscript with 78 pages, 32 figures. Suggestions of
measurements for inclusion are welcomed by the author
Detecting Core-Periphery Structures by Surprise
Detecting the presence of mesoscale structures in complex networks is of
primary importance. This is especially true for financial networks, whose
structural organization deeply affects their resilience to events like default
cascades, shocks propagation, etc. Several methods have been proposed, so far,
to detect communities, i.e. groups of nodes whose connectivity is significantly
large. Communities, however do not represent the only kind of mesoscale
structures characterizing real-world networks: other examples are provided by
bow-tie structures, core-periphery structures and bipartite structures. Here we
propose a novel method to detect statistically-signifcant bimodular structures,
i.e. either bipartite or core-periphery ones. It is based on a modification of
the surprise, recently proposed for detecting communities. Our variant allows
for bimodular nodes partitions to be revealed, by letting links to be placed
either 1) within the core part and between the core and the periphery parts or
2) just between the (empty) layers of a bipartite network. From a technical
point of view, this is achieved by employing a multinomial hypergeometric
distribution instead of the traditional (binomial) hypergeometric one; as in
the latter case, this allows a p-value to be assigned to any given
(bi)partition of the nodes. To illustrate the performance of our method, we
report the results of its application to several real-world networks, including
social, economic and financial ones.Comment: 11 pages, 10 figures. Python code freely available at
https://github.com/jeroenvldj/bimodular_surpris
Mal-Netminer: Malware Classification Approach based on Social Network Analysis of System Call Graph
As the security landscape evolves over time, where thousands of species of
malicious codes are seen every day, antivirus vendors strive to detect and
classify malware families for efficient and effective responses against malware
campaigns. To enrich this effort, and by capitalizing on ideas from the social
network analysis domain, we build a tool that can help classify malware
families using features driven from the graph structure of their system calls.
To achieve that, we first construct a system call graph that consists of system
calls found in the execution of the individual malware families. To explore
distinguishing features of various malware species, we study social network
properties as applied to the call graph, including the degree distribution,
degree centrality, average distance, clustering coefficient, network density,
and component ratio. We utilize features driven from those properties to build
a classifier for malware families. Our experimental results show that
influence-based graph metrics such as the degree centrality are effective for
classifying malware, whereas the general structural metrics of malware are less
effective for classifying malware. Our experiments demonstrate that the
proposed system performs well in detecting and classifying malware families
within each malware class with accuracy greater than 96%.Comment: Mathematical Problems in Engineering, Vol 201
Identifying communities by influence dynamics in social networks
Communities are not static; they evolve, split and merge, appear and
disappear, i.e. they are product of dynamical processes that govern the
evolution of the network. A good algorithm for community detection should not
only quantify the topology of the network, but incorporate the dynamical
processes that take place on the network. We present a novel algorithm for
community detection that combines network structure with processes that support
creation and/or evolution of communities. The algorithm does not embrace the
universal approach but instead tries to focus on social networks and model
dynamic social interactions that occur on those networks. It identifies
leaders, and communities that form around those leaders. It naturally supports
overlapping communities by associating each node with a membership vector that
describes node's involvement in each community. This way, in addition to
overlapping communities, we can identify nodes that are good followers to their
leader, and also nodes with no clear community involvement that serve as a
proxy between several communities and are equally as important. We run the
algorithm for several real social networks which we believe represent a good
fraction of the wide body of social networks and discuss the results including
other possible applications.Comment: 10 pages, 6 figure
Overlapping modularity at the critical point of k-clique percolation
One of the most remarkable social phenomena is the formation of communities
in social networks corresponding to families, friendship circles, work teams,
etc. Since people usually belong to several different communities at the same
time, the induced overlaps result in an extremely complicated web of the
communities themselves. Thus, uncovering the intricate community structure of
social networks is a non-trivial task with great potential for practical
applications, gaining a notable interest in the recent years. The Clique
Percolation Method (CPM) is one of the earliest overlapping community finding
methods, which was already used in the analysis of several different social
networks. In this approach the communities correspond to k-clique percolation
clusters, and the general heuristic for setting the parameters of the method is
to tune the system just below the critical point of k-clique percolation.
However, this rule is based on simple physical principles and its validity was
never subject to quantitative analysis. Here we examine the quality of the
partitioning in the vicinity of the critical point using recently introduced
overlapping modularity measures. According to our results on real social- and
other networks, the overlapping modularities show a maximum close to the
critical point, justifying the original criteria for the optimal parameter
settings.Comment: 20 pages, 6 figure
- …