7,329 research outputs found
Motif Clustering and Overlapping Clustering for Social Network Analysis
Motivated by applications in social network community analysis, we introduce
a new clustering paradigm termed motif clustering. Unlike classical clustering,
motif clustering aims to minimize the number of clustering errors associated
with both edges and certain higher order graph structures (motifs) that
represent "atomic units" of social organizations. Our contributions are
two-fold: We first introduce motif correlation clustering, in which the goal is
to agnostically partition the vertices of a weighted complete graph so that
certain predetermined "important" social subgraphs mostly lie within the same
cluster, while "less relevant" social subgraphs are allowed to lie across
clusters. We then proceed to introduce the notion of motif covers, in which the
goal is to cover the vertices of motifs via the smallest number of (near)
cliques in the graph. Motif cover algorithms provide a natural solution for
overlapping clustering and they also play an important role in latent feature
inference of networks. For both motif correlation clustering and its extension
introduced via the covering problem, we provide hardness results, algorithmic
solutions and community detection results for two well-studied social networks
Community structure and ethnic preferences in school friendship networks
Recently developed concepts and techniques of analyzing complex systems
provide new insight into the structure of social networks. Uncovering recurrent
preferences and organizational principles in such networks is a key issue to
characterize them. We investigate school friendship networks from the Add
Health database. Applying threshold analysis, we find that the friendship
networks do not form a single connected component through mutual strong
nominations within a school, while under weaker conditions such
interconnectedness is present. We extract the networks of overlapping
communities at the schools (c-networks) and find that they are scale free and
disassortative in contrast to the direct friendship networks, which have an
exponential degree distribution and are assortative. Based on the network
analysis we study the ethnic preferences in friendship selection. The clique
percolation method we use reveals that when in minority, the students tend to
build more densely interconnected groups of friends. We also find an asymmetry
in the behavior of black minorities in a white majority as compared to that of
white minorities in a black majority.Comment: submitted to Physica
Structure of n-clique networks embedded in a complex network
We propose the n-clique network as a powerful tool for understanding global
structures of combined highly-interconnected subgraphs, and provide theoretical
predictions for statistical properties of the n-clique networks embedded in a
complex network using the degree distribution and the clustering spectrum.
Furthermore, using our theoretical predictions, we find that the statistical
properties are invariant between 3-clique networks and original networks for
several observable real-world networks with the scale-free connectivity and the
hierarchical modularity. The result implies that structural properties are
identical between the 3-clique networks and the original networks.Comment: 12 pages, 5 figure
Compressive Network Analysis
Modern data acquisition routinely produces massive amounts of network data.
Though many methods and models have been proposed to analyze such data, the
research of network data is largely disconnected with the classical theory of
statistical learning and signal processing. In this paper, we present a new
framework for modeling network data, which connects two seemingly different
areas: network data analysis and compressed sensing. From a nonparametric
perspective, we model an observed network using a large dictionary. In
particular, we consider the network clique detection problem and show
connections between our formulation with a new algebraic tool, namely Randon
basis pursuit in homogeneous spaces. Such a connection allows us to identify
rigorous recovery conditions for clique detection problems. Though this paper
is mainly conceptual, we also develop practical approximation algorithms for
solving empirical problems and demonstrate their usefulness on real-world
datasets
Weighted network modules
The inclusion of link weights into the analysis of network properties allows
a deeper insight into the (often overlapping) modular structure of real-world
webs. We introduce a clustering algorithm (CPMw, Clique Percolation Method with
weights) for weighted networks based on the concept of percolating k-cliques
with high enough intensity. The algorithm allows overlaps between the modules.
First, we give detailed analytical and numerical results about the critical
point of weighted k-clique percolation on (weighted) Erdos-Renyi graphs. Then,
for a scientist collaboration web and a stock correlation graph we compute
three-link weight correlations and with the CPMw the weighted modules. After
reshuffling link weights in both networks and computing the same quantities for
the randomised control graphs as well, we show that groups of 3 or more strong
links prefer to cluster together in both original graphs.Comment: 19 pages, 7 figure
On combinatorial optimisation in analysis of protein-protein interaction and protein folding networks
Abstract: Protein-protein interaction networks and protein folding networks represent prominent research topics at the intersection of bioinformatics and network science. In this paper, we present a study of these networks from combinatorial optimisation point of view. Using a combination of classical heuristics and stochastic optimisation techniques, we were able to identify several interesting combinatorial properties of biological networks of the COSIN project. We obtained optimal or near-optimal solutions to maximum clique and chromatic number problems for these networks. We also explore patterns of both non-overlapping and overlapping cliques in these networks. Optimal or near-optimal solutions to partitioning of these networks into non-overlapping cliques and to maximum independent set problem were discovered. Maximal cliques are explored by enumerative techniques. Domination in these networks is briefly studied, too. Applications and extensions of our findings are discussed
Modeling the clustering in citation networks
For the study of citation networks, a challenging problem is modeling the
high clustering. Existing studies indicate that the promising way to model the
high clustering is a copying strategy, i.e., a paper copies the references of
its neighbour as its own references. However, the line of models highly
underestimates the number of abundant triangles observed in real citation
networks and thus cannot well model the high clustering. In this paper, we
point out that the failure of existing models lies in that they do not capture
the connecting patterns among existing papers. By leveraging the knowledge
indicated by such connecting patterns, we further propose a new model for the
high clustering in citation networks. Experiments on two real world citation
networks, respectively from a special research area and a multidisciplinary
research area, demonstrate that our model can reproduce not only the power-law
degree distribution as traditional models but also the number of triangles, the
high clustering coefficient and the size distribution of co-citation clusters
as observed in these real networks
- …