45 research outputs found
Supervised Blockmodelling
Collective classification models attempt to improve classification
performance by taking into account the class labels of related instances.
However, they tend not to learn patterns of interactions between classes and/or
make the assumption that instances of the same class link to each other
(assortativity assumption). Blockmodels provide a solution to these issues,
being capable of modelling assortative and disassortative interactions, and
learning the pattern of interactions in the form of a summary network. The
Supervised Blockmodel provides good classification performance using link
structure alone, whilst simultaneously providing an interpretable summary of
network interactions to allow a better understanding of the data. This work
explores three variants of supervised blockmodels of varying complexity and
tests them on four structurally different real world networks.Comment: Workshop on Collective Learning and Inference on Structured Data 201
Topological Feature Based Classification
There has been a lot of interest in developing algorithms to extract clusters
or communities from networks. This work proposes a method, based on
blockmodelling, for leveraging communities and other topological features for
use in a predictive classification task. Motivated by the issues faced by the
field of community detection and inspired by recent advances in Bayesian topic
modelling, the presented model automatically discovers topological features
relevant to a given classification task. In this way, rather than attempting to
identify some universal best set of clusters for an undefined goal, the aim is
to find the best set of clusters for a particular purpose.
Using this method, topological features can be validated and assessed within
a given context by their predictive performance.
The proposed model differs from other relational and semi-supervised learning
models as it identifies topological features to explain the classification
decision. In a demonstration on a number of real networks the predictive
capability of the topological features are shown to rival the performance of
content based relational learners. Additionally, the model is shown to
outperform graph-based semi-supervised methods on directed and approximately
bipartite networks.Comment: Awarded 3rd Best Student Paper at 14th International Conference on
Information Fusion 201
Active Discovery of Network Roles for Predicting the Classes of Network Nodes
Nodes in real world networks often have class labels, or underlying
attributes, that are related to the way in which they connect to other nodes.
Sometimes this relationship is simple, for instance nodes of the same class are
may be more likely to be connected. In other cases, however, this is not true,
and the way that nodes link in a network exhibits a different, more complex
relationship to their attributes. Here, we consider networks in which we know
how the nodes are connected, but we do not know the class labels of the nodes
or how class labels relate to the network links. We wish to identify the best
subset of nodes to label in order to learn this relationship between node
attributes and network links. We can then use this discovered relationship to
accurately predict the class labels of the rest of the network nodes.
We present a model that identifies groups of nodes with similar link
patterns, which we call network roles, using a generative blockmodel. The model
then predicts labels by learning the mapping from network roles to class labels
using a maximum margin classifier. We choose a subset of nodes to label
according to an iterative margin-based active learning strategy. By integrating
the discovery of network roles with the classifier optimisation, the active
learning process can adapt the network roles to better represent the network
for node classification. We demonstrate the model by exploring a selection of
real world networks, including a marine food web and a network of English
words. We show that, in contrast to other network classifiers, this model
achieves good classification accuracy for a range of networks with different
relationships between class labels and network links
Detecting change points in the large-scale structure of evolving networks
Interactions among people or objects are often dynamic in nature and can be
represented as a sequence of networks, each providing a snapshot of the
interactions over a brief period of time. An important task in analyzing such
evolving networks is change-point detection, in which we both identify the
times at which the large-scale pattern of interactions changes fundamentally
and quantify how large and what kind of change occurred. Here, we formalize for
the first time the network change-point detection problem within an online
probabilistic learning framework and introduce a method that can reliably solve
it. This method combines a generalized hierarchical random graph model with a
Bayesian hypothesis test to quantitatively determine if, when, and precisely
how a change point has occurred. We analyze the detectability of our method
using synthetic data with known change points of different types and
magnitudes, and show that this method is more accurate than several previously
used alternatives. Applied to two high-resolution evolving social networks,
this method identifies a sequence of change points that align with known
external "shocks" to these networks
Multiscale mixing patterns in networks
Assortative mixing in networks is the tendency for nodes with the same
attributes, or metadata, to link to each other. It is a property often found in
social networks manifesting as a higher tendency of links occurring between
people with the same age, race, or political belief. Quantifying the level of
assortativity or disassortativity (the preference of linking to nodes with
different attributes) can shed light on the factors involved in the formation
of links and contagion processes in complex networks. It is common practice to
measure the level of assortativity according to the assortativity coefficient,
or modularity in the case of discrete-valued metadata. This global value is the
average level of assortativity across the network and may not be a
representative statistic when mixing patterns are heterogeneous. For example, a
social network spanning the globe may exhibit local differences in mixing
patterns as a consequence of differences in cultural norms. Here, we introduce
an approach to localise this global measure so that we can describe the
assortativity, across multiple scales, at the node level. Consequently we are
able to capture and qualitatively evaluate the distribution of mixing patterns
in the network. We find that for many real-world networks the distribution of
assortativity is skewed, overdispersed and multimodal. Our method provides a
clearer lens through which we can more closely examine mixing patterns in
networks.Comment: 11 pages, 7 figure
Detectability thresholds and optimal algorithms for community structure in dynamic networks
We study the fundamental limits on learning latent community structure in
dynamic networks. Specifically, we study dynamic stochastic block models where
nodes change their community membership over time, but where edges are
generated independently at each time step. In this setting (which is a special
case of several existing models), we are able to derive the detectability
threshold exactly, as a function of the rate of change and the strength of the
communities. Below this threshold, we claim that no algorithm can identify the
communities better than chance. We then give two algorithms that are optimal in
the sense that they succeed all the way down to this limit. The first uses
belief propagation (BP), which gives asymptotically optimal accuracy, and the
second is a fast spectral clustering algorithm, based on linearizing the BP
equations. We verify our analytic and algorithmic results via numerical
simulation, and close with a brief discussion of extensions and open questions.Comment: 9 pages, 3 figure
Hierarchical community structure in networks
Modular and hierarchical structures are pervasive in real-world complex
systems. A great deal of effort has gone into trying to detect and study these
structures. Important theoretical advances in the detection of modular, or
"community", structures have included identifying fundamental limits of
detectability by formally defining community structure using probabilistic
generative models. Detecting hierarchical community structure introduces
additional challenges alongside those inherited from community detection. Here
we present a theoretical study on hierarchical community structure in networks,
which has thus far not received the same rigorous attention. We address the
following questions: 1)~How should we define a valid hierarchy of communities?
2)~How should we determine if a hierarchical structure exists in a network? and
3)~how can we detect hierarchical structure efficiently? We approach these
questions by introducing a definition of hierarchy based on the concept of
stochastic externally equitable partitions and their relation to probabilistic
models, such as the popular stochastic block model. We enumerate the challenges
involved in detecting hierarchies and, by studying the spectral properties of
hierarchical structure, present an efficient and principled method for
detecting them.Comment: 22 pages, 12 figure
Network constraints on the mixing patterns of binary node metadata
We consider the network constraints on the bounds of the assortativity
coefficient, which measures the tendency of nodes with the same attribute
values to be interconnected. The assortativity coefficient is the Pearson's
correlation coefficient of node attribute values across network edges and
ranges between -1 and 1. We focus here on the assortativity of binary node
attributes and show that properties of the network, such as degree distribution
and the number of nodes with each attribute value place constraints upon the
attainable values of the assortativity coefficient. We explore the
assortativity in three different spaces, that is, ensembles of graph
configurations and node-attribute assignments that are valid for a given set of
network constraints. We provide means for obtaining bounds on the extremal
values of assortativity for each of these spaces. Finally, we demonstrate that
under certain conditions the network constraints severely limit the maximum and
minimum values of assortativity, which may present issues in how we interpret
the assortativity coefficient.Comment: 18 pages, 7 figure
The architecture of an empirical genotype-phenotype map
Recent advances in highâthroughput technologies are bringing the study of empirical genotypeâphenotype (GP) maps to the fore. Here, we use data from proteinâbinding microarrays to study an empirical GP map of transcription factor (TF) âbinding preferences. In this map, each genotype is a DNA sequence. The phenotype of this DNA sequence is its ability to bind one or more TFs. We study this GP map using genotype networks, in which nodes represent genotypes with the same phenotype, and edges connect nodes if their genotypes differ by a single small mutation. We describe the structure and arrangement of genotype networks within the space of all possible binding sites for 525 TFs from three eukaryotic species encompassing three kingdoms of life (animal, plant, and fungi). We thus provide a highâresolution depiction of the architecture of an empirical GP map. Among a number of findings, we show that these genotype networks are âsmallâworldâ and assortative, and that they ubiquitously overlap and interface with one another. We also use polymorphism data from Arabidopsis thaliana to show how genotype network structure influences the evolution of TFâbinding sites in vivo. We discuss our findings in the context of regulatory evolution