Search CORE

33 research outputs found

Understanding Community Structure for Large Networks

Author: Franke B
Publication venue: UCL (University College London)
Publication date: 28/12/2016
Field of study

The general theme of this thesis is to improve our understanding of community structure for large networks. A scientific challenge across fields (e.g., neuroscience, genetics, and social science) is to understand what drives the interactions between nodes in a network. One of the fundamental concepts in this context is community structure: the tendency of nodes to connect based on similar characteristics. Network models where a single parameter per node governs the propensity of connection are popular in practice. They frequently arise as null models that indicate a lack of community structure, since they cannot readily describe networks whose aggregate links behave in a block-like manner. We generalize such a model called the degree-based model to a flexible, nonparametric class of network models, covering weighted, multi-edge, and power-law networks, and provide limit theorems that describe their asymptotic properties. We establish a theoretical foundation for modularity: a well-known measure for the strength of community structure and derive its asymptotic properties under the assumption of a lack of community structure (formalized by the class of degree-based models described above). This enables us to assess how informative covariates are for the network interactions. Modularity is intuitive and practically effective but until now has lacked a sound theoretical basis. We derive modularity from first principles, and give it a formal statistical interpretation. Moreover, by acknowledging that different community assignments may explain different aspects of a network’s observed structure, we extend the applicability of modularity beyond its typical use to find a single “best” community assignment. We develop from our theoretical results a methodology to quantify network community structure. After validating it using several benchmark examples, we investigate a multi-edge network of corporate email interactions. Here, we demonstrate that our method can identify those covariates that are informative and therefore improves our understanding of the network

UCL Discovery

Network models of stochastic processes in cancer

Author: Bartlett TE
Publication venue: UCL (University College London)
Publication date: 28/09/2015
Field of study

Complex systems which can be modelled as networks are ubiquitous. Well-known examples include social and economic networks, as well as many examples in cell biology such as gene regulatory and protein signalling networks. Many cell biological processes are inherently stochastic and non-stationary, and this is the perspective from which I have developed novel mathematical and computational statistical models, focusing particularly on network models. These models are primarily motivated by cell biological processes relating to DNA methylation and stem cell and cancer biology, but can be generalised to other systems and domains. I have used these and other models to identify and analyse novel DNA-based cancer biomarkers

UCL Discovery

Sampling designs and robustness for the analysis of network data

Author: Papamichalis Marios
Publication venue: UCL (University College London)
Publication date: 28/07/2019
Field of study

This manuscript addresses three new practical methodologies for topics on Bayesian analysis regarding sampling designs and robustness on network data: / In the first part of this thesis we propose a general approach for comparing sampling designs. The approach is based on the concept of data compression from information theory. The criterion for comparing sampling designs is formulated so that the results prove to be robust with respect to some of the most widely used loss functions for point estimation and prediction. The rationale behind the proposed approach is to find sampling designs such that preserve the largest amount of information possible from the original data generating mechanism. The approach is inspired by the same principle as the reference prior, with the difference that, for the proposed approach, the argument of the optimization is the sampling design rather than the prior. The information contained in the data generating mechanism can be encoded in a distribution defined either in parameter’s space (posterior distribution) or in the space of observables (predictive distribution). The results obtained in this part enable us to relate statements about a feature of an observed subgraph and a feature of a full graph. It is proven that such statements can not be connected by invoking conditional statements only; it is necessary to specify a joint distribution for the random graph model and the sampling design for all values of fully and partially observed random network features. We use this rationale to formulate statements at the level of the sampling graph that help to make non-trivial statements about the full network. The joint distribution of the underlying network and the sampling mechanism enable the statistician to relate both type of conditional statements. Thus, for random network partially and fully observed features joint distribution is considered and useful statements for practitioners are provided. / The second general theme of this thesis is robustness on networks. A method for robustness on exchangeable random networks is developed. The approach is inspired by the concept of graphon approximation through a stochastic block model. An exchangeable model is assumed to infer a feature of a random networks with the objective to see how the quality of that inference gets degraded if the model is slightly modified. Decision theory methods are considered under model misspecification by quantifying stability of optimal actions to perturbations to the approximating model within a well defined neighborhood of model space. The approach is inspired by all recent developments across the context of robustness in recent research in the robust control, macroeconomics and financial mathematics literature. / In all topics, simulation analysis is complemented with comprehensive experimental studies, which show the benefits of our modeling and estimation methods

UCL Discovery

Centrality measures for graphons: Accounting for uncertainty in networks

Author: Avella-Medina Marco
Parise Francesca
Schaub Michael T.
Segarra Santiago
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2018
Field of study

As relational datasets modeled as graphs keep increasing in size and their data-acquisition is permeated by uncertainty, graph-based analysis techniques can become computationally and conceptually challenging. In particular, node centrality measures rely on the assumption that the graph is perfectly known -- a premise not necessarily fulfilled for large, uncertain networks. Accordingly, centrality measures may fail to faithfully extract the importance of nodes in the presence of uncertainty. To mitigate these problems, we suggest a statistical approach based on graphon theory: we introduce formal definitions of centrality measures for graphons and establish their connections to classical graph centrality measures. A key advantage of this approach is that centrality measures defined at the modeling level of graphons are inherently robust to stochastic variations of specific graph realizations. Using the theory of linear integral operators, we define degree, eigenvector, Katz and PageRank centrality functions for graphons and establish concentration inequalities demonstrating that graphon centrality functions arise naturally as limits of their counterparts defined on sequences of graphs of increasing size. The same concentration inequalities also provide high-probability bounds between the graphon centrality functions and the centrality measures on any sampled graph, thereby establishing a measure of uncertainty of the measured centrality score. The same concentration inequalities also provide high-probability bounds between the graphon centrality functions and the centrality measures on any sampled graph, thereby establishing a measure of uncertainty of the measured centrality score.Comment: Authors ordered alphabetically, all authors contributed equally. 21 pages, 7 figure

arXiv.org e-Print Archive

Oxford University Research Archive

Community Recovery in the Geometric Block Model

Author: Galhotra Sainyam
Mazumdar Arya
Pal Soumyabrata
Saha Barna
Publication venue
Publication date: 17/11/2023
Field of study

To capture the inherent geometric features of many community detection problems, we propose to use a new random graph model of communities that we call a Geometric Block Model. The geometric block model builds on the random geometric graphs (Gilbert, 1961), one of the basic models of random graphs for spatial networks, in the same way that the well-studied stochastic block model builds on the Erd\H{o}s-R\'{en}yi random graphs. It is also a natural extension of random community models inspired by the recent theoretical and practical advancements in community detection. To analyze the geometric block model, we first provide new connectivity results for random annulus graphs which are generalizations of random geometric graphs. The connectivity properties of geometric graphs have been studied since their introduction, and analyzing them has been more difficult than their Erd\H{o}s-R\'{en}yi counterparts due to correlated edge formation. We then use the connectivity results of random annulus graphs to provide necessary and sufficient conditions for efficient recovery of communities for the geometric block model. We show that a simple triangle-counting algorithm to detect communities in the geometric block model is near-optimal. For this we consider the following two regimes of graph density. In the regime where the average degree of the graph grows logarithmically with the number of vertices, we show that our algorithm performs extremely well, both theoretically and practically. In contrast, the triangle-counting algorithm is far from being optimum for the stochastic block model in the logarithmic degree regime. We simulate our results on both real and synthetic datasets to show superior performance of both the new model as well as our algorithm.Comment: 53 pages, 18 figures. Accepted at the Journal of Machine Learning Research (JMLR). Shorter versions accepted in AAAI 2018 (see arXiv:1709.05510) and RANDOM 2019 (see arXiv:1804.05013). arXiv admin note: text overlap with arXiv:1804.0501

arXiv.org e-Print Archive

AN EDGE-CENTRIC PERSPECTIVE FOR BRAIN NETWORK COMMUNITIES

Author: Faskowitz Joshua
Publication venue: [Bloomington, Ind.] : Indiana University
Publication date: 01/05/2021
Field of study

Thesis (Ph.D.) - Indiana University, Department of Psychological and Brain Sciences and Program in Neuroscience, 2021The brain is a complex system organized on multiple scales and operating in both a local and distributed manner. Individual neurons and brain regions participate in specific functions, while at the same time existing in the context of a larger network, supporting a range of different functionalities. Building brain networks comprised of distinct neural elements (nodes) and their interrelationships (edges), allows us to model the brain from both local and global perspectives, and to deploy a wide array of computational network tools. A popular network analysis approach is community detection, which aims to subdivide a network’s nodes into clusters that can used to represent and evaluate network organization. Prevailing community detection approaches applied to brain networks are designed to find densely interconnected sets of nodes, leading to the notion that the brain is organized in an exclusively modular manner. Furthermore, many brain network analyses tend to focus on the nodes, evidenced by the search for modular groupings of neural elements that might serve a common function. In this thesis, we describe the application of community detection algorithms that are sensitive to alternative cluster configurations, enhancing our understanding of brain network organization. We apply a framework called the stochastic block model, which we use to uncover evidence of non-modular organization in human anatomical brain networks across the life span, and in the informatically-collated rat cerebral cortex. We also propose a framework to cluster functional brain network edges in human data, which naturally results in an overlapping organization at the level of nodes that bridges canonical functional systems. These alternative methods utilize the connection patterns of brain network edges in ways that prevailing approaches do not. Thus, we motivate an alternative outlook which focuses on the importance of information provided by the brain’s interconnections, or edges. We call this an edge-centric perspective. The edge-centric approaches developed here offer new ways to characterize distributed brain organization and contribute to a fundamental change in perspective in our thinking about the brain

IUScholarWorks (University of Indiana)

Computation in Complex Networks

Author
Publication venue: 'MDPI AG'
Publication date: 11/01/2022
Field of study

Complex networks are one of the most challenging research focuses of disciplines, including physics, mathematics, biology, medicine, engineering, and computer science, among others. The interest in complex networks is increasingly growing, due to their ability to model several daily life systems, such as technology networks, the Internet, and communication, chemical, neural, social, political and financial networks. The Special Issue “Computation in Complex Networks" of Entropy offers a multidisciplinary view on how some complex systems behave, providing a collection of original and high-quality papers within the research fields of: • Community detection • Complex network modelling • Complex network analysis • Node classification • Information spreading and control • Network robustness • Social networks • Network medicin

Directory of Open Access Books (DOAB)

Recommended from our members

Bayesian Methods for Discovering Structure in Neural Spike Trains

Author: Linderman Scott Warren
Publication venue: 'Harvard University Botany Libraries'
Publication date: 25/07/2017
Field of study

Neuroscience is entering an exciting new age. Modern recording technologies enable simultaneous measurements of thousands of neurons in organisms performing complex behaviors. Such recordings offer an unprecedented opportunity to glean insight into the mechanistic underpinnings of intelligence, but they also present an extraordinary statistical and computational challenge: how do we make sense of these large scale recordings? This thesis develops a suite of tools that instantiate hypotheses about neural computation in the form of probabilistic models and a corresponding set of Bayesian inference algorithms that efficiently fit these models to neural spike trains. From the posterior distribution of model parameters and variables, we seek to advance our understanding of how the brain works. Concretely, the challenge is to hypothesize latent structure in neural populations, encode that structure in a probabilistic model, and efficiently fit the model to neural spike trains. To surmount this challenge, we introduce a collection of structural motifs, the design patterns from which we construct interpretable models. In particular, we focus on random network models, which provide an intuitive bridge between latent types and features of neurons and the temporal dynamics of neural populations. In order to reconcile these models with the discrete nature of spike trains, we build on the Hawkes process — a multivariate generalization of the Poisson process — and its discrete time analogue, the linear autoregressive Poisson model. By leveraging the linear nature of these models and the Poisson superposition principle, we derive elegant auxiliary variable formulations and efficient inference algorithms. We then generalize these to nonlinear and nonstationary models of neural spike trains and take advantage of the Pólya-gamma augmentation to develop novel Markov chain Monte Carlo (MCMC) inference algorithms. In a variety of real neural recordings, we show how our methods reveal interpretable structure underlying neural spike trains. In the latter chapters, we shift our focus from autoregressive models to latent state space models of neural activity. We perform an empirical study of Bayesian nonparametric methods for hidden Markov models of neural spike trains. Then, we develop an MCMC algorithm for switching linear dynamical systems with discrete observations and a novel algorithm for sampling Pólya-gamma random variables that enables efficient annealed importance sampling for model comparison. Finally, we consider the “Bayesian brain” hypothesis — the hypothesis that neural circuits are themselves performing Bayesian inference. We show how one particular implementation of this hypothesis implies autoregressive dynamics of the form studied in earlier chapters, thereby providing a theoretical interpretation of our probabilistic models. This closes the loop, connecting top-down theory with bottom-up inferences, and suggests a path toward translating large scale recording capabilities into new insights about neural computation.Engineering and Applied Sciences - Computer Scienc

Harvard University - DASH