33 research outputs found

    Understanding Community Structure for Large Networks

    Get PDF
    The general theme of this thesis is to improve our understanding of community structure for large networks. A scientific challenge across fields (e.g., neuroscience, genetics, and social science) is to understand what drives the interactions between nodes in a network. One of the fundamental concepts in this context is community structure: the tendency of nodes to connect based on similar characteristics. Network models where a single parameter per node governs the propensity of connection are popular in practice. They frequently arise as null models that indicate a lack of community structure, since they cannot readily describe networks whose aggregate links behave in a block-like manner. We generalize such a model called the degree-based model to a flexible, nonparametric class of network models, covering weighted, multi-edge, and power-law networks, and provide limit theorems that describe their asymptotic properties. We establish a theoretical foundation for modularity: a well-known measure for the strength of community structure and derive its asymptotic properties under the assumption of a lack of community structure (formalized by the class of degree-based models described above). This enables us to assess how informative covariates are for the network interactions. Modularity is intuitive and practically effective but until now has lacked a sound theoretical basis. We derive modularity from first principles, and give it a formal statistical interpretation. Moreover, by acknowledging that different community assignments may explain different aspects of a network’s observed structure, we extend the applicability of modularity beyond its typical use to find a single “best” community assignment. We develop from our theoretical results a methodology to quantify network community structure. After validating it using several benchmark examples, we investigate a multi-edge network of corporate email interactions. Here, we demonstrate that our method can identify those covariates that are informative and therefore improves our understanding of the network

    Network models of stochastic processes in cancer

    Get PDF
    Complex systems which can be modelled as networks are ubiquitous. Well-known examples include social and economic networks, as well as many examples in cell biology such as gene regulatory and protein signalling networks. Many cell biological processes are inherently stochastic and non-stationary, and this is the perspective from which I have developed novel mathematical and computational statistical models, focusing particularly on network models. These models are primarily motivated by cell biological processes relating to DNA methylation and stem cell and cancer biology, but can be generalised to other systems and domains. I have used these and other models to identify and analyse novel DNA-based cancer biomarkers

    Sampling designs and robustness for the analysis of network data

    Get PDF
    This manuscript addresses three new practical methodologies for topics on Bayesian analysis regarding sampling designs and robustness on network data: / In the first part of this thesis we propose a general approach for comparing sampling designs. The approach is based on the concept of data compression from information theory. The criterion for comparing sampling designs is formulated so that the results prove to be robust with respect to some of the most widely used loss functions for point estimation and prediction. The rationale behind the proposed approach is to find sampling designs such that preserve the largest amount of information possible from the original data generating mechanism. The approach is inspired by the same principle as the reference prior, with the difference that, for the proposed approach, the argument of the optimization is the sampling design rather than the prior. The information contained in the data generating mechanism can be encoded in a distribution defined either in parameter’s space (posterior distribution) or in the space of observables (predictive distribution). The results obtained in this part enable us to relate statements about a feature of an observed subgraph and a feature of a full graph. It is proven that such statements can not be connected by invoking conditional statements only; it is necessary to specify a joint distribution for the random graph model and the sampling design for all values of fully and partially observed random network features. We use this rationale to formulate statements at the level of the sampling graph that help to make non-trivial statements about the full network. The joint distribution of the underlying network and the sampling mechanism enable the statistician to relate both type of conditional statements. Thus, for random network partially and fully observed features joint distribution is considered and useful statements for practitioners are provided. / The second general theme of this thesis is robustness on networks. A method for robustness on exchangeable random networks is developed. The approach is inspired by the concept of graphon approximation through a stochastic block model. An exchangeable model is assumed to infer a feature of a random networks with the objective to see how the quality of that inference gets degraded if the model is slightly modified. Decision theory methods are considered under model misspecification by quantifying stability of optimal actions to perturbations to the approximating model within a well defined neighborhood of model space. The approach is inspired by all recent developments across the context of robustness in recent research in the robust control, macroeconomics and financial mathematics literature. / In all topics, simulation analysis is complemented with comprehensive experimental studies, which show the benefits of our modeling and estimation methods

    Centrality measures for graphons: Accounting for uncertainty in networks

    Full text link
    As relational datasets modeled as graphs keep increasing in size and their data-acquisition is permeated by uncertainty, graph-based analysis techniques can become computationally and conceptually challenging. In particular, node centrality measures rely on the assumption that the graph is perfectly known -- a premise not necessarily fulfilled for large, uncertain networks. Accordingly, centrality measures may fail to faithfully extract the importance of nodes in the presence of uncertainty. To mitigate these problems, we suggest a statistical approach based on graphon theory: we introduce formal definitions of centrality measures for graphons and establish their connections to classical graph centrality measures. A key advantage of this approach is that centrality measures defined at the modeling level of graphons are inherently robust to stochastic variations of specific graph realizations. Using the theory of linear integral operators, we define degree, eigenvector, Katz and PageRank centrality functions for graphons and establish concentration inequalities demonstrating that graphon centrality functions arise naturally as limits of their counterparts defined on sequences of graphs of increasing size. The same concentration inequalities also provide high-probability bounds between the graphon centrality functions and the centrality measures on any sampled graph, thereby establishing a measure of uncertainty of the measured centrality score. The same concentration inequalities also provide high-probability bounds between the graphon centrality functions and the centrality measures on any sampled graph, thereby establishing a measure of uncertainty of the measured centrality score.Comment: Authors ordered alphabetically, all authors contributed equally. 21 pages, 7 figure

    Community Recovery in the Geometric Block Model

    Full text link
    To capture the inherent geometric features of many community detection problems, we propose to use a new random graph model of communities that we call a Geometric Block Model. The geometric block model builds on the random geometric graphs (Gilbert, 1961), one of the basic models of random graphs for spatial networks, in the same way that the well-studied stochastic block model builds on the Erd\H{o}s-R\'{en}yi random graphs. It is also a natural extension of random community models inspired by the recent theoretical and practical advancements in community detection. To analyze the geometric block model, we first provide new connectivity results for random annulus graphs which are generalizations of random geometric graphs. The connectivity properties of geometric graphs have been studied since their introduction, and analyzing them has been more difficult than their Erd\H{o}s-R\'{en}yi counterparts due to correlated edge formation. We then use the connectivity results of random annulus graphs to provide necessary and sufficient conditions for efficient recovery of communities for the geometric block model. We show that a simple triangle-counting algorithm to detect communities in the geometric block model is near-optimal. For this we consider the following two regimes of graph density. In the regime where the average degree of the graph grows logarithmically with the number of vertices, we show that our algorithm performs extremely well, both theoretically and practically. In contrast, the triangle-counting algorithm is far from being optimum for the stochastic block model in the logarithmic degree regime. We simulate our results on both real and synthetic datasets to show superior performance of both the new model as well as our algorithm.Comment: 53 pages, 18 figures. Accepted at the Journal of Machine Learning Research (JMLR). Shorter versions accepted in AAAI 2018 (see arXiv:1709.05510) and RANDOM 2019 (see arXiv:1804.05013). arXiv admin note: text overlap with arXiv:1804.0501

    AN EDGE-CENTRIC PERSPECTIVE FOR BRAIN NETWORK COMMUNITIES

    Get PDF
    Thesis (Ph.D.) - Indiana University, Department of Psychological and Brain Sciences and Program in Neuroscience, 2021The brain is a complex system organized on multiple scales and operating in both a local and distributed manner. Individual neurons and brain regions participate in specific functions, while at the same time existing in the context of a larger network, supporting a range of different functionalities. Building brain networks comprised of distinct neural elements (nodes) and their interrelationships (edges), allows us to model the brain from both local and global perspectives, and to deploy a wide array of computational network tools. A popular network analysis approach is community detection, which aims to subdivide a network’s nodes into clusters that can used to represent and evaluate network organization. Prevailing community detection approaches applied to brain networks are designed to find densely interconnected sets of nodes, leading to the notion that the brain is organized in an exclusively modular manner. Furthermore, many brain network analyses tend to focus on the nodes, evidenced by the search for modular groupings of neural elements that might serve a common function. In this thesis, we describe the application of community detection algorithms that are sensitive to alternative cluster configurations, enhancing our understanding of brain network organization. We apply a framework called the stochastic block model, which we use to uncover evidence of non-modular organization in human anatomical brain networks across the life span, and in the informatically-collated rat cerebral cortex. We also propose a framework to cluster functional brain network edges in human data, which naturally results in an overlapping organization at the level of nodes that bridges canonical functional systems. These alternative methods utilize the connection patterns of brain network edges in ways that prevailing approaches do not. Thus, we motivate an alternative outlook which focuses on the importance of information provided by the brain’s interconnections, or edges. We call this an edge-centric perspective. The edge-centric approaches developed here offer new ways to characterize distributed brain organization and contribute to a fundamental change in perspective in our thinking about the brain

    Computation in Complex Networks

    Get PDF
    Complex networks are one of the most challenging research focuses of disciplines, including physics, mathematics, biology, medicine, engineering, and computer science, among others. The interest in complex networks is increasingly growing, due to their ability to model several daily life systems, such as technology networks, the Internet, and communication, chemical, neural, social, political and financial networks. The Special Issue “Computation in Complex Networks" of Entropy offers a multidisciplinary view on how some complex systems behave, providing a collection of original and high-quality papers within the research fields of: • Community detection • Complex network modelling • Complex network analysis • Node classification • Information spreading and control • Network robustness • Social networks • Network medicin
    corecore