15 research outputs found
Semi-supervised stochastic blockmodel for structure analysis of signed networks
© 2020 Elsevier B.V. Finding hidden structural patterns is a critical problem for all types of networks, including signed networks. Among all of the methods for structural analysis of complex network, stochastic blockmodel (SBM) is an important research tool because it is flexible and can generate networks with many different types of structures. However, most existing SBM learning methods for signed networks are unsupervised, leading to poor performance in terms of finding hidden structural patterns, especially when handling noisy and sparse networks. Learning SBM in a semi-supervised way is a promising avenue for overcoming the above difficulty. In this type of model, a small number of labelled nodes and a large number of unlabelled nodes, coupled with their network structures, are simultaneously used to train SBM. We propose a novel semi-supervised signed stochastic blockmodel and its learning algorithm based on variational Bayesian inference, with the goal of discovering both assortative (the nodes connect more densely in same clusters than that in different clusters) and disassortative (the nodes link more sparsely in same clusters than that in different clusters) structures from signed networks. The proposed model is validated through a number of experiments wherein it compared with the state-of-the-art methods using both synthetic and real-world data. The carefully designed tests, allowing to account for different scenarios, show our method outperforms other approaches existing in this space. It is especially relevant in the case of noisy and sparse networks as they constitute the majority of the real-world networks
SSBM: A Signed Stochastic Block Model for Multiple Structure Discovery in Large-Scale Exploratory Signed Networks
Signed network structure discovery has received extensive attention and has
become a research focus in the field of network science. However, most of the
existing studies are focused on the networks with a single structure, e.g.,
community or bipartite, while ignoring multiple structures, e.g., the
coexistence of community and bipartite structures. Furthermore, existing
studies were faced with challenge regarding large-scale signed networks due to
their high time complexity, especially when determining the number of clusters
in the observed network without any prior knowledge. In view of this, we
propose a mathematically principled method for signed network multiple
structure discovery named the Signed Stochastic Block Model (SSBM). The SSBM
can capture the multiple structures contained in signed networks, e.g.,
community, bipartite, and coexistence of them, by adopting a probabilistic
model. Moreover, by integrating the minimum message length (MML) criterion and
component-wise EM (CEM) algorithm, a scalable learning algorithm that has the
ability of model selection is proposed to handle large-scale signed networks.
By comparing state-of-the-art methods on synthetic and real-world signed
networks, extensive experimental results demonstrate the effectiveness and
efficiency of SSBM in discovering large-scale exploratory signed networks with
multiple structures
EUSN 2021 Book of Abstracts, Fifth European Conference on Social Networks
Book of abstract of the fifth European conference on Social Networks EUSN 202
A Stochastic Block Model Approach for the Analysis of Multilevel Networks: an Application to the Sociology of Organizations
A multilevel network is defined as the junction of two interaction networks,
one level representing the interactions between individuals and the other the
interactions between organizations. The levels are linked by an affiliation
relationship, each individual belonging to a unique organization. A new
Stochastic Block Model is proposed as a unified probalistic framework tailored
for multilevel networks. This model contains latent blocks accounting for
heterogeneity in the patterns of connection within each level and introducing
dependencies between the levels. The sought connection patterns are not
specified a priori which makes this approach flexible. Variational methods are
used for the model inference and an Integrated Classified Likelihood criterion
is developed for choosing the number of blocks and also for deciding whether
the two levels are dependent or not. A comprehensive simulation study exhibits
the benefit of considering this approach, illustrates the robustness of the
clustering and highlights the reliability of the criterion used for model
selection. This approach is applied on a sociological dataset collected during
a television program trade fair, the inter-organizational level being the
economic network between companies and the inter-individual level being the
informal network between their representatives. It brings a synthetic
representation of the two networks unraveling their intertwined structure and
confirms the coopetition at stake
Empirical Bayes estimation for random dot product graph representation of the stochastic blockmodel
Network models are increasingly used to model datasets that involve interacting units, particularly
random graph models where the vertices represent individual entities and the edges represent
the presence or absence of a specified interaction between these entities. Finding inherent
communities in networks (i.e. partitioning vertices with a more similar interaction pattern into
groups) is considered to be a fundamental task in network analysis, which aids in understanding
the structural properties of real-world networks. Despite a large amount of research on this task
since the emergence of graphical representation of relational data, this still remains a challenge.
In particular, within the statistical community, the use of the stochastic blockmodel for this task
is currently of immense interest.
Recent theoretical developments have shown that adjacency spectral embedding of graphs yields
tractable distributional results. Specifically, a random dot product graph formulation of the
stochastic blockmodel provides a mixture of multivariate Gaussians for the asymptotic distribution
of the latent positions estimated by adjacency spectral embedding. The first part of this
thesis seeks to employ this new theory to provide an empirical Bayes model for estimating block
memberships of vertices in a stochastic blockmodel graph. Posterior inference is conducted using
a Metropolis-within-Gibbs algorithm. Performance of the model is illustrated through Monte
Carlo simulation studies and experimental results on a Wikipedia dataset. Results show performance
gains over other alternative models that are considered.
Instead of a complete classification of vertices via community detection, one may wish to discover
whether vertices possess an attribute of interest. Given that this attribute is observed for a few
vertices, the goal is to find other vertices that possess that same attribute. As an example, if a
few employees in a company are known to have committed fraud, how can we identify others who
may be complicit? This is a special case of community detection, known as vertex nomination,
which has recently grown rapidly as a research topic. The second part of this thesis extends
the empirical Bayes model for vertex nomination based on information contained in the graph
structure. This yields promising simulation results as well as real-data results from an Enron
email dataset.
Recent studies have shown that information pertinent to vertex nomination exists not only in
the graph structure but also in the edge attributes (Coppersmith and Priebe, 2012; Suwan et al.,
2015). This motivates the third part of this thesis by further extending the model to exploit
both graph structure and edge attributes for vertex nomination. Simulation studies confirm the
benefit of doing so. However, the same benefit is not observed when the model is applied to the
Enron email dataset; further investigations suggest that this is due to the data violating one of
the model assumptions
AN EDGE-CENTRIC PERSPECTIVE FOR BRAIN NETWORK COMMUNITIES
Thesis (Ph.D.) - Indiana University, Department of Psychological and Brain Sciences and Program in Neuroscience, 2021The brain is a complex system organized on multiple scales and operating in both a local and distributed manner. Individual neurons and brain regions participate in specific functions, while at the same time existing in the context of a larger network, supporting a range of different functionalities. Building brain networks comprised of distinct neural elements (nodes) and their interrelationships (edges), allows us to model the brain from both local and global perspectives, and to deploy a wide array of computational network tools. A popular network analysis approach is community detection, which aims to subdivide a networkâs nodes into clusters that can used to represent and evaluate network organization. Prevailing community detection approaches applied to brain networks are designed to find densely interconnected sets of nodes, leading to the notion that the brain is organized in an exclusively modular manner. Furthermore, many brain network analyses tend to focus on the nodes, evidenced by the search for modular groupings of neural elements that might serve a common function. In this thesis, we describe the application of community detection algorithms that are sensitive to alternative cluster configurations, enhancing our understanding of brain network organization. We apply a framework called the stochastic block model, which we use to uncover evidence of non-modular organization in human anatomical brain networks across the life span, and in the informatically-collated rat cerebral cortex. We also propose a framework to cluster functional brain network edges in human data, which naturally results in an overlapping organization at the level of nodes that bridges canonical functional systems. These alternative methods utilize the connection patterns of brain network edges in ways that prevailing approaches do not. Thus, we motivate an alternative outlook which focuses on the importance of information provided by the brainâs interconnections, or edges. We call this an edge-centric perspective. The edge-centric approaches developed here offer new ways to characterize distributed brain organization and contribute to a fundamental change in perspective in our thinking about the brain
A Bayesian Analysis of Weighted Stochastic Block Models With Applications in Brain Functional Connectomics
The network paradigm has become a popular approach for modeling complex systems, with applications ranging from social sciences to genetics to neuroscience and beyond. Often the individual connections between network nodes are of less interest than network char- acteristics such as its community structure - the tendency in many real-data networks for nodes to be naturally organized in groups with dense connections between nodes in the same (unobserved) group but sparse connections between nodes in different groups. Char- acterizing the structure of networks is of particular interest in the study of brain function, especially in the context of diseases and disorders such as Alzheimerâs disease and attention deficit hyperactivity disorder (ADHD), where disruption of functional brain networks has been observed. The stochastic block model (SBM) is a probabilistic formulation of the community de- tection problem that has been utilized to estimate latent communities in both binary and weighted networks, but as of yet not in brain networks. We build a flexible Bayesian hierar- chical framework for the SBM to capture the community structure in weighted graphs, with a focus on the application in functional brain networks. First, we fit a version of the SBM to Gaussian-weighted networks via an efficient Gibbs sampling algorithm. We compare results from simulated networks to several existing esti- mation methods and then apply our approach to estimate the community structures in the functional resting brain networks of 185 subjects from the ADHD-200 sample. Next, we extend this probabilistic framework and our efficient estimation algorithm to capture the shared latent structure in groups of networks; we perform simulation studies and then apply this extended model to the same sample of brain networks from the ADHD-200 sample. Finally, we adapt this model to allow for more complex latent structures and incorporate a regression component to test for differences in the latent functional brain structure between study groups. After examining the ability of this approach to capture the latent structures in simulated networks, we apply this method once again to the same set of functional brain networks to assess the differences between ADHD subtypes and healthy control subjects in latent functional brain structure.Doctor of Philosoph
Community detection in graphs
The modern science of networks has brought significant advances to our
understanding of complex systems. One of the most relevant features of graphs
representing real systems is community structure, or clustering, i. e. the
organization of vertices in clusters, with many edges joining vertices of the
same cluster and comparatively few edges joining vertices of different
clusters. Such clusters, or communities, can be considered as fairly
independent compartments of a graph, playing a similar role like, e. g., the
tissues or the organs in the human body. Detecting communities is of great
importance in sociology, biology and computer science, disciplines where
systems are often represented as graphs. This problem is very hard and not yet
satisfactorily solved, despite the huge effort of a large interdisciplinary
community of scientists working on it over the past few years. We will attempt
a thorough exposition of the topic, from the definition of the main elements of
the problem, to the presentation of most methods developed, with a special
focus on techniques designed by statistical physicists, from the discussion of
crucial issues like the significance of clustering and how methods should be
tested and compared against each other, to the description of applications to
real networks.Comment: Review article. 103 pages, 42 figures, 2 tables. Two sections
expanded + minor modifications. Three figures + one table + references added.
Final version published in Physics Report
Neural function approximation on graphs: shape modelling, graph discrimination & compression
Graphs serve as a versatile mathematical abstraction of real-world phenomena in numerous scientific disciplines. This thesis is part of the Geometric Deep Learning subject area, a family of learning paradigms, that capitalise on the increasing volume of non-Euclidean data so as to solve real-world tasks in a data-driven manner. In particular, we focus on the topic of graph function approximation using neural networks, which lies at the heart of many relevant methods. In the first part of the thesis, we contribute to the understanding and design of Graph Neural Networks (GNNs). Initially, we investigate the problem of learning on signals supported on a fixed graph. We show that treating graph signals as general graph spaces is restrictive and conventional GNNs have limited expressivity. Instead, we expose a more enlightening perspective by drawing parallels between graph signals and signals on Euclidean grids, such as images and audio. Accordingly, we propose a permutation-sensitive GNN based on an operator analogous to shifts in grids and instantiate it on 3D meshes for shape modelling (Spiral Convolutions). Following, we focus on learning on general graph spaces and in particular on functions that are invariant to graph isomorphism. We identify a fundamental trade-off between invariance, expressivity and computational complexity, which we address with a symmetry-breaking mechanism based on substructure encodings (Graph Substructure Networks). Substructures are shown to be a powerful tool that provably improves expressivity while controlling computational complexity, and a useful inductive bias in network science and chemistry. In the second part of the thesis, we discuss the problem of graph compression, where we analyse the information-theoretic principles and the connections with graph generative models. We show that another inevitable trade-off surfaces, now between computational complexity and compression quality, due to graph isomorphism. We propose a substructure-based dictionary coder - Partition and Code (PnC) - with theoretical guarantees that can be adapted to different graph distributions by estimating its parameters from observations. Additionally, contrary to the majority of neural compressors, PnC is parameter and sample efficient and is therefore of wide practical relevance. Finally, within this framework, substructures are further illustrated as a decisive archetype for learning problems on graph spaces.Open Acces