33 research outputs found
Understanding Community Structure for Large Networks
The general theme of this thesis is to improve our understanding of community structure for large networks. A scientific challenge across fields (e.g., neuroscience, genetics, and social science) is to understand what drives the interactions between nodes in a network. One of the fundamental concepts in this context is community structure: the tendency of nodes to connect based on similar characteristics. Network models where a single parameter per node governs the propensity of connection are popular in practice. They frequently arise as null models that indicate a lack of community structure, since they cannot readily describe networks whose aggregate links behave in a block-like manner. We generalize such a model called the degree-based model to a flexible, nonparametric class of network models, covering weighted, multi-edge, and power-law networks, and provide limit theorems that describe their asymptotic properties. We establish a theoretical foundation for modularity: a well-known measure for the strength of community structure and derive its asymptotic properties under the assumption of a lack of community structure (formalized by the class of degree-based models described above). This enables us to assess how informative covariates are for the network interactions. Modularity is intuitive and practically effective but until now has lacked a sound theoretical basis. We derive modularity from first principles, and give it a formal statistical interpretation. Moreover, by acknowledging that different community assignments may explain different aspects of a network’s observed structure, we extend the applicability of modularity beyond its typical use to find a single “best” community assignment. We develop from our theoretical results a methodology to quantify network community structure. After validating it using several benchmark examples, we investigate a multi-edge network of corporate email interactions. Here, we demonstrate that our method can identify those covariates that are informative and therefore improves our understanding of the network
Network models of stochastic processes in cancer
Complex systems which can be modelled as networks are ubiquitous. Well-known examples include social and economic networks, as well as many examples in cell biology such as gene regulatory and protein signalling networks. Many cell biological processes are inherently stochastic and non-stationary, and this is the perspective from which I have developed novel mathematical and computational statistical models, focusing particularly on network models. These models are primarily motivated by cell biological processes relating to DNA methylation and stem cell and cancer biology, but can be generalised to other systems and domains. I have used these and other models to identify and analyse novel DNA-based cancer biomarkers
Sampling designs and robustness for the analysis of network data
This manuscript addresses three new practical methodologies for topics on Bayesian analysis regarding sampling designs and robustness on network data: / In the first part of this thesis we propose a general approach for comparing sampling designs. The approach is based on the concept of data compression from information theory. The criterion for comparing sampling designs is formulated so that the results prove to be robust with respect to some of the most widely used loss functions for point estimation and prediction. The rationale behind the proposed approach is to find sampling designs such that preserve the largest amount of information possible from the original data generating mechanism. The approach is inspired by the same principle as the reference prior, with the difference that, for the proposed approach, the argument of the optimization is the sampling design rather than the prior. The information contained in the data generating mechanism can be encoded in a distribution defined either in parameter’s space (posterior distribution) or in the space of observables (predictive distribution). The results obtained in this part enable us to relate statements about a feature of an observed subgraph and a feature of a full graph. It is proven that such statements can not be connected by invoking conditional statements only; it is necessary to specify a joint distribution for the random graph model and the sampling design for all values of fully and partially observed random network features. We use this rationale to formulate statements at the level of the sampling graph that help to make non-trivial statements about the full network. The joint distribution of the underlying network and the sampling mechanism enable the statistician to relate both type of conditional statements. Thus, for random network partially and fully observed features joint distribution is considered and useful statements for practitioners are provided. / The second general theme of this thesis is robustness on networks. A method for robustness on exchangeable random networks is developed. The approach is inspired by the concept of graphon approximation through a stochastic block model. An exchangeable model is assumed to infer a feature of a random networks with the objective to see how the quality of that inference gets degraded if the model is slightly modified. Decision theory methods are considered under model misspecification by quantifying stability of optimal actions to perturbations to the approximating model within a well defined neighborhood of model space. The approach is inspired by all recent developments across the context of robustness in recent research in the robust control, macroeconomics and financial mathematics literature. / In all topics, simulation analysis is complemented with comprehensive experimental studies, which show the benefits of our modeling and estimation methods
Centrality measures for graphons: Accounting for uncertainty in networks
As relational datasets modeled as graphs keep increasing in size and their
data-acquisition is permeated by uncertainty, graph-based analysis techniques
can become computationally and conceptually challenging. In particular, node
centrality measures rely on the assumption that the graph is perfectly known --
a premise not necessarily fulfilled for large, uncertain networks. Accordingly,
centrality measures may fail to faithfully extract the importance of nodes in
the presence of uncertainty. To mitigate these problems, we suggest a
statistical approach based on graphon theory: we introduce formal definitions
of centrality measures for graphons and establish their connections to
classical graph centrality measures. A key advantage of this approach is that
centrality measures defined at the modeling level of graphons are inherently
robust to stochastic variations of specific graph realizations. Using the
theory of linear integral operators, we define degree, eigenvector, Katz and
PageRank centrality functions for graphons and establish concentration
inequalities demonstrating that graphon centrality functions arise naturally as
limits of their counterparts defined on sequences of graphs of increasing size.
The same concentration inequalities also provide high-probability bounds
between the graphon centrality functions and the centrality measures on any
sampled graph, thereby establishing a measure of uncertainty of the measured
centrality score. The same concentration inequalities also provide
high-probability bounds between the graphon centrality functions and the
centrality measures on any sampled graph, thereby establishing a measure of
uncertainty of the measured centrality score.Comment: Authors ordered alphabetically, all authors contributed equally. 21
pages, 7 figure
Community Recovery in the Geometric Block Model
To capture the inherent geometric features of many community detection
problems, we propose to use a new random graph model of communities that we
call a Geometric Block Model. The geometric block model builds on the random
geometric graphs (Gilbert, 1961), one of the basic models of random graphs for
spatial networks, in the same way that the well-studied stochastic block model
builds on the Erd\H{o}s-R\'{en}yi random graphs. It is also a natural extension
of random community models inspired by the recent theoretical and practical
advancements in community detection. To analyze the geometric block model, we
first provide new connectivity results for random annulus graphs which are
generalizations of random geometric graphs. The connectivity properties of
geometric graphs have been studied since their introduction, and analyzing them
has been more difficult than their Erd\H{o}s-R\'{en}yi counterparts due to
correlated edge formation.
We then use the connectivity results of random annulus graphs to provide
necessary and sufficient conditions for efficient recovery of communities for
the geometric block model. We show that a simple triangle-counting algorithm to
detect communities in the geometric block model is near-optimal. For this we
consider the following two regimes of graph density.
In the regime where the average degree of the graph grows logarithmically
with the number of vertices, we show that our algorithm performs extremely
well, both theoretically and practically. In contrast, the triangle-counting
algorithm is far from being optimum for the stochastic block model in the
logarithmic degree regime. We simulate our results on both real and synthetic
datasets to show superior performance of both the new model as well as our
algorithm.Comment: 53 pages, 18 figures. Accepted at the Journal of Machine Learning
Research (JMLR). Shorter versions accepted in AAAI 2018 (see
arXiv:1709.05510) and RANDOM 2019 (see arXiv:1804.05013). arXiv admin note:
text overlap with arXiv:1804.0501
AN EDGE-CENTRIC PERSPECTIVE FOR BRAIN NETWORK COMMUNITIES
Thesis (Ph.D.) - Indiana University, Department of Psychological and Brain Sciences and Program in Neuroscience, 2021The brain is a complex system organized on multiple scales and operating in both a local and distributed manner. Individual neurons and brain regions participate in specific functions, while at the same time existing in the context of a larger network, supporting a range of different functionalities. Building brain networks comprised of distinct neural elements (nodes) and their interrelationships (edges), allows us to model the brain from both local and global perspectives, and to deploy a wide array of computational network tools. A popular network analysis approach is community detection, which aims to subdivide a network’s nodes into clusters that can used to represent and evaluate network organization. Prevailing community detection approaches applied to brain networks are designed to find densely interconnected sets of nodes, leading to the notion that the brain is organized in an exclusively modular manner. Furthermore, many brain network analyses tend to focus on the nodes, evidenced by the search for modular groupings of neural elements that might serve a common function. In this thesis, we describe the application of community detection algorithms that are sensitive to alternative cluster configurations, enhancing our understanding of brain network organization. We apply a framework called the stochastic block model, which we use to uncover evidence of non-modular organization in human anatomical brain networks across the life span, and in the informatically-collated rat cerebral cortex. We also propose a framework to cluster functional brain network edges in human data, which naturally results in an overlapping organization at the level of nodes that bridges canonical functional systems. These alternative methods utilize the connection patterns of brain network edges in ways that prevailing approaches do not. Thus, we motivate an alternative outlook which focuses on the importance of information provided by the brain’s interconnections, or edges. We call this an edge-centric perspective. The edge-centric approaches developed here offer new ways to characterize distributed brain organization and contribute to a fundamental change in perspective in our thinking about the brain
Computation in Complex Networks
Complex networks are one of the most challenging research focuses of disciplines, including physics, mathematics, biology, medicine, engineering, and computer science, among others. The interest in complex networks is increasingly growing, due to their ability to model several daily life systems, such as technology networks, the Internet, and communication, chemical, neural, social, political and financial networks. The Special Issue “Computation in Complex Networks" of Entropy offers a multidisciplinary view on how some complex systems behave, providing a collection of original and high-quality papers within the research fields of: • Community detection • Complex network modelling • Complex network analysis • Node classification • Information spreading and control • Network robustness • Social networks • Network medicin
Recommended from our members
Bayesian Methods for Discovering Structure in Neural Spike Trains
Neuroscience is entering an exciting new age. Modern recording technologies enable simultaneous measurements of thousands of neurons in organisms performing complex behaviors. Such recordings offer an unprecedented opportunity to glean insight into the mechanistic underpinnings of intelligence, but they also present an extraordinary statistical and computational challenge: how do we make sense of these large scale recordings? This thesis develops a suite of tools that instantiate hypotheses about neural computation in the form of probabilistic models and a corresponding set of Bayesian inference algorithms that efficiently fit these models to neural spike trains. From the posterior distribution of model parameters and variables, we seek to advance our understanding of how the brain works.
Concretely, the challenge is to hypothesize latent structure in neural populations, encode that structure in a probabilistic model, and efficiently fit the model to neural spike trains. To surmount this challenge, we introduce a collection of structural motifs, the design patterns from which we construct interpretable models. In particular, we focus on random network models, which provide an intuitive bridge between latent types and features of neurons and the temporal dynamics of neural populations. In order to reconcile these models with the discrete nature of spike trains, we build on the Hawkes process — a multivariate generalization of the Poisson process — and its discrete time analogue, the linear autoregressive Poisson model. By leveraging the linear nature of these models and the Poisson superposition principle, we derive elegant auxiliary variable formulations and efficient inference algorithms. We then generalize these to nonlinear and nonstationary models of neural spike trains and take advantage of the Pólya-gamma augmentation to develop novel Markov chain Monte Carlo (MCMC) inference algorithms. In a variety of real neural recordings, we show how our methods reveal interpretable structure underlying neural spike trains.
In the latter chapters, we shift our focus from autoregressive models to latent state space models of neural activity. We perform an empirical study of Bayesian nonparametric methods for hidden Markov models of neural spike trains. Then, we develop an MCMC algorithm for switching linear dynamical systems with discrete observations and a novel algorithm for sampling PĂłlya-gamma random variables that enables efficient annealed importance sampling for model comparison.
Finally, we consider the “Bayesian brain” hypothesis — the hypothesis that neural circuits are themselves performing Bayesian inference. We show how one particular implementation of this hypothesis implies autoregressive dynamics of the form studied in earlier chapters, thereby providing a theoretical interpretation of our probabilistic models. This closes the loop, connecting top-down theory with bottom-up inferences, and suggests a path toward translating large scale recording capabilities into new insights about neural computation.Engineering and Applied Sciences - Computer Scienc