714 research outputs found
Variational Bayesian Inference and Complexity Control for Stochastic Block Models
It is now widely accepted that knowledge can be acquired from networks by
clustering their vertices according to connection profiles. Many methods have
been proposed and in this paper we concentrate on the Stochastic Block Model
(SBM). The clustering of vertices and the estimation of SBM model parameters
have been subject to previous work and numerous inference strategies such as
variational Expectation Maximization (EM) and classification EM have been
proposed. However, SBM still suffers from a lack of criteria to estimate the
number of components in the mixture. To our knowledge, only one model based
criterion, ICL, has been derived for SBM in the literature. It relies on an
asymptotic approximation of the Integrated Complete-data Likelihood and recent
studies have shown that it tends to be too conservative in the case of small
networks. To tackle this issue, we propose a new criterion that we call ILvb,
based on a non asymptotic approximation of the marginal likelihood. We describe
how the criterion can be computed through a variational Bayes EM algorithm
The blessing of transitivity in sparse and stochastic networks
The interaction between transitivity and sparsity, two common features in
empirical networks, implies that there are local regions of large sparse
networks that are dense. We call this the blessing of transitivity and it has
consequences for both modeling and inference. Extant research suggests that
statistical inference for the Stochastic Blockmodel is more difficult when the
edges are sparse. However, this conclusion is confounded by the fact that the
asymptotic limit in all of the previous studies is not merely sparse, but also
non-transitive. To retain transitivity, the blocks cannot grow faster than the
expected degree. Thus, in sparse models, the blocks must remain asymptotically
small. \n Previous algorithmic research demonstrates that small "local"
clusters are more amenable to computation, visualization, and interpretation
when compared to "global" graph partitions. This paper provides the first
statistical results that demonstrate how these small transitive clusters are
also more amenable to statistical estimation. Theorem 2 shows that a "local"
clustering algorithm can, with high probability, detect a transitive stochastic
block of a fixed size (e.g. 30 nodes) embedded in a large graph. The only
constraint on the ambient graph is that it is large and sparse--it could be
generated at random or by an adversary--suggesting a theoretical explanation
for the robust empirical performance of local clustering algorithms
Structural and Functional Discovery in Dynamic Networks with Non-negative Matrix Factorization
Time series of graphs are increasingly prevalent in modern data and pose
unique challenges to visual exploration and pattern extraction. This paper
describes the development and application of matrix factorizations for
exploration and time-varying community detection in time-evolving graph
sequences. The matrix factorization model allows the user to home in on and
display interesting, underlying structure and its evolution over time. The
methods are scalable to weighted networks with a large number of time points or
nodes, and can accommodate sudden changes to graph topology. Our techniques are
demonstrated with several dynamic graph series from both synthetic and real
world data, including citation and trade networks. These examples illustrate
how users can steer the techniques and combine them with existing methods to
discover and display meaningful patterns in sizable graphs over many time
points.Comment: 16 pages, 17 figure
Low-distortion Inference of Latent Similarities from a Multiplex Social Network
Much of social network analysis is - implicitly or explicitly - predicated on
the assumption that individuals tend to be more similar to their friends than
to strangers. Thus, an observed social network provides a noisy signal about
the latent underlying "social space:" the way in which individuals are similar
or dissimilar. Many research questions frequently addressed via social network
analysis are in reality questions about this social space, raising the question
of inverting the process: Given a social network, how accurately can we
reconstruct the social structure of similarities and dissimilarities?
We begin to address this problem formally. Observed social networks are
usually multiplex, in the sense that they reflect (dis)similarities in several
different "categories," such as geographical proximity, kinship, or similarity
of professions/hobbies. We assume that each such category is characterized by a
latent metric capturing (dis)similarities in this category. Each category gives
rise to a separate social network: a random graph parameterized by this metric.
For a concrete model, we consider Kleinberg's small world model and some
variations thereof. The observed social network is the unlabeled union of these
graphs, i.e., the presence or absence of edges can be observed, but not their
origins. Our main result is an algorithm which reconstructs each metric with
provably low distortion.Comment: 51 pages. Compared to the previous version: many small changes to
improve presentation and clarit
Overlapping Community Detection via Local Spectral Clustering
Large graphs arise in a number of contexts and understanding their structure
and extracting information from them is an important research area. Early
algorithms on mining communities have focused on the global structure, and
often run in time functional to the size of the entire graph. Nowadays, as we
often explore networks with billions of vertices and find communities of size
hundreds, it is crucial to shift our attention from macroscopic structure to
microscopic structure in large networks. A growing body of work has been
adopting local expansion methods in order to identify the community members
from a few exemplary seed members.
In this paper, we propose a novel approach for finding overlapping
communities called LEMON (Local Expansion via Minimum One Norm). The algorithm
finds the community by seeking a sparse vector in the span of the local spectra
such that the seeds are in its support. We show that LEMON can achieve the
highest detection accuracy among state-of-the-art proposals. The running time
depends on the size of the community rather than that of the entire graph. The
algorithm is easy to implement, and is highly parallelizable. We further
provide theoretical analysis on the local spectral properties, bounding the
measure of tightness of extracted community in terms of the eigenvalues of
graph Laplacian.
Moreover, given that networks are not all similar in nature, a comprehensive
analysis on how the local expansion approach is suited for uncovering
communities in different networks is still lacking. We thoroughly evaluate our
approach using both synthetic and real-world datasets across different domains,
and analyze the empirical variations when applying our method to inherently
different networks in practice. In addition, the heuristics on how the seed set
quality and quantity would affect the performance are provided.Comment: Extended version to the conference proceeding in WWW'1
Partitioning Networks with Node Attributes by Compressing Information Flow
Real-world networks are often organized as modules or communities of similar
nodes that serve as functional units. These networks are also rich in content,
with nodes having distinguishing features or attributes. In order to discover a
network's modular structure, it is necessary to take into account not only its
links but also node attributes. We describe an information-theoretic method
that identifies modules by compressing descriptions of information flow on a
network. Our formulation introduces node content into the description of
information flow, which we then minimize to discover groups of nodes with
similar attributes that also tend to trap the flow of information. The method
has several advantages: it is conceptually simple and does not require ad-hoc
parameters to specify the number of modules or to control the relative
contribution of links and node attributes to network structure. We apply the
proposed method to partition real-world networks with known community
structure. We demonstrate that adding node attributes helps recover the
underlying community structure in content-rich networks more effectively than
using links alone. In addition, we show that our method is faster and more
accurate than alternative state-of-the-art algorithms.Comment: 10 page
Network Analysis of Particles and Grains
The arrangements of particles and forces in granular materials have a complex
organization on multiple spatial scales that ranges from local structures to
mesoscale and system-wide ones. This multiscale organization can affect how a
material responds or reconfigures when exposed to external perturbations or
loading. The theoretical study of particle-level, force-chain, domain, and bulk
properties requires the development and application of appropriate physical,
mathematical, statistical, and computational frameworks. Traditionally,
granular materials have been investigated using particulate or continuum
models, each of which tends to be implicitly agnostic to multiscale
organization. Recently, tools from network science have emerged as powerful
approaches for probing and characterizing heterogeneous architectures across
different scales in complex systems, and a diverse set of methods have yielded
fascinating insights into granular materials. In this paper, we review work on
network-based approaches to studying granular matter and explore the potential
of such frameworks to provide a useful description of these systems and to
enhance understanding of their underlying physics. We also outline a few open
questions and highlight particularly promising future directions in the
analysis and design of granular matter and other kinds of material networks
Ensemble-Based Discovery of Disjoint, Overlapping and Fuzzy Community Structures in Networks
Though much work has been done on ensemble clustering in data mining, the
application of ensemble methods to community detection in networks is in its
infancy. In this paper, we propose two ensemble methods: ENDISCO and MEDOC.
ENDISCO performs disjoint community detection. In contrast, MEDOC performs
disjoint, overlapping, and fuzzy community detection and represents the first
ever ensemble method for fuzzy and overlapping community detection. We run
extensive experiments with both algorithms against both synthetic and several
real-world datasets for which community structures are known. We show that
ENDISCO and MEDOC both beat the best-known existing standalone community
detection algorithms (though we emphasize that they leverage them). In the case
of disjoint community detection, we show that both ENDISCO and MEDOC beat an
existing ensemble community detection algorithm both in terms of multiple
accuracy measures and run-time. We further show that our ensemble algorithms
can help explore core-periphery structure of network communities, identify
stable communities in dynamic networks and help solve the "degeneracy of
solutions" problem, generating robust results
A Survey of Community Search Over Big Graphs
With the rapid development of information technologies, various big graphs
are prevalent in many real applications (e.g., social media and knowledge
bases). An important component of these graphs is the network community.
Essentially, a community is a group of vertices which are densely connected
internally. Community retrieval can be used in many real applications, such as
event organization, friend recommendation, and so on. Consequently, how to
efficiently find high-quality communities from big graphs is an important
research topic in the era of big data. Recently a large group of research
works, called community search, have been proposed. They aim to provide
efficient solutions for searching high-quality communities from large networks
in real-time. Nevertheless, these works focus on different types of graphs and
formulate communities in different manners, and thus it is desirable to have a
comprehensive review of these works.
In this survey, we conduct a thorough review of existing community search
works. Moreover, we analyze and compare the quality of communities under their
models, and the performance of different solutions. Furthermore, we point out
new research directions. This survey does not only help researchers to have a
better understanding of existing community search solutions, but also provides
practitioners a better judgment on choosing the proper solutions
GrAMME: Semi-Supervised Learning using Multi-layered Graph Attention Models
Modern data analysis pipelines are becoming increasingly complex due to the
presence of multi-view information sources. While graphs are effective in
modeling complex relationships, in many scenarios a single graph is rarely
sufficient to succinctly represent all interactions, and hence multi-layered
graphs have become popular. Though this leads to richer representations,
extending solutions from the single-graph case is not straightforward.
Consequently, there is a strong need for novel solutions to solve classical
problems, such as node classification, in the multi-layered case. In this
paper, we consider the problem of semi-supervised learning with multi-layered
graphs. Though deep network embeddings, e.g. DeepWalk, are widely adopted for
community discovery, we argue that feature learning with random node
attributes, using graph neural networks, can be more effective. To this end, we
propose to use attention models for effective feature learning, and develop two
novel architectures, GrAMME-SG and GrAMME-Fusion, that exploit the inter-layer
dependencies for building multi-layered graph embeddings. Using empirical
studies on several benchmark datasets, we evaluate the proposed approaches and
demonstrate significant performance improvements in comparison to
state-of-the-art network embedding strategies. The results also show that using
simple random features is an effective choice, even in cases where explicit
node attributes are not available
- …