58 research outputs found
Entrograms and coarse graining of dynamics on complex networks
Using an information theoretic point of view, we investigate how a dynamics
acting on a network can be coarse grained through the use of graph partitions.
Specifically, we are interested in how aggregating the state space of a Markov
process according to a partition impacts on the thus obtained lower-dimensional
dynamics. We highlight that for a dynamics on a particular graph there may be
multiple coarse grained descriptions that capture different, incomparable
features of the original process. For instance, a coarse graining induced by
one partition may be commensurate with a time-scale separation in the dynamics,
while another coarse graining may correspond to a different lower-dimensional
dynamics that preserves the Markov property of the original process. Taking
inspiration from the literature of Computational Mechanics, we find that a
convenient tool to summarise and visualise such dynamical properties of a
coarse grained model (partition) is the entrogram. The entrogram gathers
certain information-theoretic measures, which quantify how information flows
across time steps. These information theoretic quantities include the entropy
rate, as well as a measure for the memory contained in the process, i.e., how
well the dynamics can be approximated by a first order Markov process. We use
the entrogram to investigate how specific macro-scale connection patterns in
the state-space transition graph of the original dynamics result in desirable
properties of coarse grained descriptions. We thereby provide a fresh
perspective on the interplay between structure and dynamics in networks, and
the process of partitioning from an information theoretic perspective. We focus
on networks that may be approximated by both a core-periphery or a clustered
organization, and highlight that each of these coarse grained descriptions can
capture different aspects of a Markov process acting on the network.Comment: 17 pages, 6 figue
A unified data representation theory for network visualization, ordering and coarse-graining
Representation of large data sets became a key question of many scientific
disciplines in the last decade. Several approaches for network visualization,
data ordering and coarse-graining accomplished this goal. However, there was no
underlying theoretical framework linking these problems. Here we show an
elegant, information theoretic data representation approach as a unified
solution of network visualization, data ordering and coarse-graining. The
optimal representation is the hardest to distinguish from the original data
matrix, measured by the relative entropy. The representation of network nodes
as probability distributions provides an efficient visualization method and, in
one dimension, an ordering of network nodes and edges. Coarse-grained
representations of the input network enable both efficient data compression and
hierarchical visualization to achieve high quality representations of larger
data sets. Our unified data representation theory will help the analysis of
extensive data sets, by revealing the large-scale structure of complex networks
in a comprehensible form.Comment: 13 pages, 5 figure
Learning Interpretable Collective Variables of the Noisy Voter Model
We present a data-driven method to learn and understand collective variables
for noisy voter model dynamics on networks. A collective variable (CV) is a
projection of the high-dimensional system state into a low-dimensional space
that preserves the essential dynamical information. Thus, CVs can be used to
improve our understanding of complex emergent behaviors and to enable an easier
analysis and prediction. We demonstrate our method using three example
networks: the stochastic block model, a ring-shaped graph, and a scale-free
network generated by the Albert--Barab\'asi model. Our method combines the
recent transition manifold approach with a linear regression step to produce
interpretable CVs that describe the role and importance of each network node
Detection of vulnerable communities in East Africa via novel data streams and dynamic stochastic block models
In developing countries it is challenging to collect data on poverty and its associated community health characteristics. Data collection in this context is impractically laborious and resource greedy. Additionally due to the sensitive nature of these themes the data is often unreliable. There is a need for alternative methods of detection of vulnerable communities. However, promising opportunities arise via novel rich data streams such as Call Data Records stemming from the ubiquitous use of mobile phones. Despite the growth of Call Data Record data there has been limited previous application to problems of poverty and development. This thesis makes three main contributions: (i) Methods of collecting ground truth data in Developing areas; (ii) Best practices in application to detect vulnerable regions; (iii) Development of new applications of statistical approaches to the problem via the stochastic block model. This work is focused on Dar es Salaam in Tanzania. Having more reliable and easily accessible truths on these vulnerabilities can have a high potential impact for policy makers and NGOs trying to make positive changes to reduce devastating effects of poverty. This thesis produces comprehensive results to amend the current knowledge gaps, via rigorous fine-grained data collection processes surveying the 452 subwards in Dar es Salaam in relation to poverty and social vulnerability
Hierarchical community structure in networks
Modular and hierarchical structures are pervasive in real-world complex
systems. A great deal of effort has gone into trying to detect and study these
structures. Important theoretical advances in the detection of modular, or
"community", structures have included identifying fundamental limits of
detectability by formally defining community structure using probabilistic
generative models. Detecting hierarchical community structure introduces
additional challenges alongside those inherited from community detection. Here
we present a theoretical study on hierarchical community structure in networks,
which has thus far not received the same rigorous attention. We address the
following questions: 1)~How should we define a valid hierarchy of communities?
2)~How should we determine if a hierarchical structure exists in a network? and
3)~how can we detect hierarchical structure efficiently? We approach these
questions by introducing a definition of hierarchy based on the concept of
stochastic externally equitable partitions and their relation to probabilistic
models, such as the popular stochastic block model. We enumerate the challenges
involved in detecting hierarchies and, by studying the spectral properties of
hierarchical structure, present an efficient and principled method for
detecting them.Comment: 22 pages, 12 figure
Module Identification for Biological Networks
Advances in high-throughput techniques have enabled researchers to produce large-scale data on molecular interactions. Systematic analysis of these large-scale interactome datasets based on their graph representations has the potential to yield a better understanding of the functional organization of the corresponding biological systems. One way to chart out the underlying cellular functional organization is to identify functional modules in these biological networks. However, there are several challenges of module identification for biological networks. First, different from social and computer networks, molecules work together with different interaction patterns; groups of molecules working together may have different sizes. Second, the degrees of nodes in biological networks obey the power-law distribution, which indicates that there exist many nodes with very low degrees and few nodes with high degrees. Third, molecular interaction data contain a large number of false positives and false negatives.
In this dissertation, we propose computational algorithms to overcome those challenges. To identify functional modules based on interaction patterns, we develop efficient algorithms based on the concept of block modeling. We propose a subgradient Frank-Wolfe algorithm with path generation method to identify functional modules and recognize the functional organization of biological networks. Additionally, inspired by random walk on networks, we propose a novel two-hop random walk strategy to detect fine-size functional modules based on interaction patterns. To overcome the degree heterogeneity problem, we propose an algorithm to identify functional modules with the topological structure that is well separated from the rest of the network as well as densely connected. In order to minimize the impact of the existence of noisy interactions in biological networks, we propose methods to detect conserved functional modules for multiple biological networks by integrating the topological and orthology information across different biological networks. For every algorithm we developed, we compare each of them with the state-of-the-art algorithms on several biological networks. The comparison results on the known gold standard biological function annotations show that our methods can enhance the accuracy of predicting protein complexes and protein functions
- …